Deep Dive Into Ballerina Runtime - Types I
Table of Content⌗
Background⌗
Every programming language has some kind of an in built type system. This is even applicable for those languages with dynamic type systems. For a compiler to function, these default types are essential.
Since ballerina is a statically typed language, Ballerina Native compiler defines a fairly descriptive in build type system.
In this small series of articles, we’ll explore the Ins and outs of it as well try to fix some open issues. Possibly #142 since it focuses on another important part of ballerina, semantic subtyping.
Semantic Subtyping⌗
A type system of a programming language is often defined using set of rules. At the core of this, subtyping plays a major role.
Subtyping plays a major role in OOP languages. In fact, it can be defined through OOP concept, polymorphism. According to Wikipedia, a subtype is essentially a datatype, which is related to another, by either semantically or syntactically.
Say there is a class called Cat. and that Cat inherits from class Animal. In this specific scenario, class Cat is a subtype of class Animal.
Now how can this be useful in the context of a compiler? Image the following function
float addFloats(float a, float b) {
return a + b;
}
above addFloats
function expects two floats as arguments an returns a float. But what happens if call addFloats
function like this,
addFloats(1, 2)
Here both parameters passed are integers. A compiler can approach this problem in multiple ways. one of which is to allow use of integers wherever floats are expected. Here, float is the super type and the int is the subtype. Therefore the subtype relation can be written as int <: flaot
.
the other approach is to use a generic interface, or a type that both flaot and integer are derived from.
Number addFloats(Number a, Number b) {
return a + b;
}
here the super type is Number
and elements of both subtypes int
and float
can be passed as parameters to the function.
Semantic Subtyping implementation in Ballerina⌗
core.bal file contains some of the most important parts of the semantic subtyping implementation of the ballerina language.
Env
classed defined in that file is to be the base for each semantic type that is a subtype.
However what really caught me is the defintions of below records.
public type BasicTypeCode
BT_NIL|BT_BOOLEAN|BT_INT|BT_FLOAT|BT_DECIMAL
|BT_STRING|BT_ERROR|BT_TYPEDESC|BT_HANDLE|BT_FUNCTION
|BT_FUTURE|BT_STREAM
|BT_LIST|BT_MAPPING|BT_TABLE|BT_XML|BT_OBJECT
|BT_CELL|BT_UNDEF;
Here it defines a new type union called BasicTypeCode
with it can be one of the basic types.
type Atom RecAtom|TypeAtom;
type RecAtom int;
type TypeAtom readonly & record {|
int index;
AtomicType atomicType;
|};
type AtomicType ListAtomicType|MappingAtomicType|CellAtomicType;
Union type Atom
, which is defined above can be either of RecAtom
or TypeAtom
.
a RecAtom
is essentially another name for int
.
TypeAtom
, on the other hand is a record which describes, as the name implies, an atomic type.
type TypeAtom readonly & record {|
int index;
AtomicType atomicType;
|};
type AtomicType ListAtomicType|MappingAtomicType|CellAtomicType;
An atomic type in general, is a type that garantees I/O (to address space of the program) operations through a single instruction. Say for instance,
mov [rbp+4], 4
These types cannot be divided or filter out furthur. This means, these are the building blocks of all the types implemented in a language.
Here in ballerina, AtomicType
is implemented as a union of the following, ListAtomicType, MappingAtomicType, CellAtomicType
.
From what I can see, TypeAtom
is essentially something like std::variant
in C++, a type safe union for AtomicType
. Let us get to exact details later.
Those above types are implemented in files that are specific to them. So we are about to go over them.
ListAtomicType⌗
ListAtomicType is implemented as record of FixedLengthArray
and CellSemType
. definitions can be found in list.bal
public type ListAtomicType readonly & record {|
readonly & FixedLengthArray members;
CellSemType rest;
|};
FixedLengthArray
is also a record. An array of a fixed length is represented using this type. According to the comments, the list members can be of any sem type.
public type FixedLengthArray record {|
CellSemType[] initial;
int fixedLength;
|};
if member initial
contains 3 integers 1, 2 and 3, then fixedLength
should be set to 3. Another important fact about FixedLengthArray
is that, if initial
is an array of 100 element semtypes string and int, as if initial = [string, int]; fixedLength = 100
, then, an int
is repeated 99 times.
MappingAtomicType⌗
this is another basic type that implements a list.
public type MappingAtomicType readonly & record {|
// sorted
string[] names;
CellSemType[] types;
CellSemType rest;
|};
mapping.bal file does not provide much documentation on this specific type.
CellAtomicType⌗
CellAtomicType is a record of two members of types SemType
and CellMutability
.
public type CellMutability CELL_MUT_NONE|CELL_MUT_LIMITED|CELL_MUT_UNLIMITED;
public type CellAtomicType readonly & record {|
SemType ty;
CellMutability mut;
|};
CellAtomicType's
mutability is defined by member mut
, which can be either CELL_MUT_NONE
, CELL_MUT_LIMITED
or CELL_MUT_UNLIMITED
.
In the section we’ll go over what exactly is a Cell type.
class Env⌗
As we went through this earlier in the post, class Env is the base for all semantic types in ballerina language.
public isolated class Env {
private final table<TypeAtom> key(atomicType) atomTable = table [];
// Set up index 0 to be used by VAL_READONLY
private final ListAtomicType?[] recListAtoms = [ LIST_ATOMIC_RO ];
private final MappingAtomicType?[] recMappingAtoms = [ MAPPING_ATOMIC_RO ];
private final FunctionAtomicType?[] recFunctionAtoms = [];
// Count of the total number of non-nil members
// of recListAtoms, recMappingAtoms and recFunctionAtoms
private int recAtomCount = 2;
First member of the class is an atomTable
, which is basically a ballerina table of TypeAtom
, which we already discussed with it’s member atomicType
as the key.
Lets keep other members of the class for a later discussion and focus on the init function.
public isolated function init() {
// Reserving the first two indexes of atomTable to represent cell VAL and cell NEVER typeAtoms.
// This is to avoid passing down env argument when doing cell type operations.
// Please refer to the cellSubtypeDataEnsureProper() in cell.bal
_ = self.cellAtom(CELL_ATOMIC_VAL);
_ = self.cellAtom(CELL_ATOMIC_NEVER);
// Reserving the next index of atomTable to represent the typeAtom required to construct
// equivalent subtypes of map<any|error> and (any|error)[].
_ = self.cellAtom(CELL_ATOMIC_INNER);
// Reserving the next two indexes of atomTable to represent typeAtoms related to (map<any|error>)[].
// This is to avoid passing down env argument when doing tableSubtypeComplement operation.
_ = self.cellAtom(CELL_ATOMIC_INNER_MAPPING);
_ = self.listAtom(LIST_ATOMIC_MAPPING);
// Reserving the next three indexes of atomTable to represent typeAtoms related to readonly type.
// This is to avoid requiring context when referring to readonly type.
// CELL_ATOMIC_INNER_MAPPING_RO & LIST_ATOMIC_MAPPING_RO are typeAtoms reuquired to construct readonly & (map<readonly>)[]
// which is then used for readonly table type when constructing VAL_READONLY.
_ = self.cellAtom(CELL_ATOMIC_INNER_MAPPING_RO);
_ = self.listAtom(LIST_ATOMIC_MAPPING_RO);
_ = self.cellAtom(CELL_ATOMIC_INNER_RO);
}
In the comments it says first two indexes of the atomTable is reserved for cell typeAtoms VAL and NEVER.
Cell Types⌗
Well I had the same question, what is a cell type? I put out my question on ballerina discord server and got a well descriptive answer.
From What I understand, a cell type is essentially an abstraction layer build around semantic subtyping to avoid confusion.
For the sake of argument, let us think of semantic subtyping as simply as ability to interchangably use types based on their semantic (meaning) characteristics.
Consider the following code
function example(int|string arg1) returns int|error {
if arg1 is string {
return int:fromString(arg1);
} else {
return arg1
}
}
the function example
expects an argument of either type int or type string. In the if statement it checks for the condition in which the arg1
is string. if it evaluate true, function returns an int parsed from the string arg1.
in the else statement, it simply returns the type int arg1
.
This exact feature lowkey reminds me of template metaprogramming techniques used by C++ developers for type deduction and to implement type restrictions.
Now consider the following code.
type NewRecord record {|
int|string x;
|}
Here the value space of both NewRecord
and int|string
is same. Therefore it is possible to pass a value of type NewRecord
to function example
.
…