This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Millfork Improvement Roadmap 001:
SOA Object System
Classes, pools and handles
Every class contains fields. For now, the limitation is to byte-sized and word-sized fields.
class Point {
word x
byte y
}
(Syntax and naming conventions may be subject to change. Unless specified otherwise, all declarations in code examples are assumed to be at the global scope, and all statements that cannot happen at the global scope are assumed to be located within a body of some unspecified function.)
Objects are referred to by 8-bit handles. A handle of value 0 is considered a null handle and doesn't point to any object:
Point p // declaring a variable holding a handle to a point
p = null // p is physically now 0
p = Point(0) // the same
p = 0 // won't compile, a byte is not a point
All handles are implicitly convertible to a type handle
, but not the other way around. null
is of its own, inexpressible type that is implicitly convertible to all handle types.
handle h
h = p // ok
h = null // ok
p = h // won't compile
p = Point(h) // explicit casting makes it compile
If a variable holds the null handle, then casting it will yield a valid null handle (since it's zeroes all around).
Actual objects live in pools. All pools are fixed-size. All pools join together into one common pool (called a universe), of maximum size 255. A single pool can be considered a slice of the universe. The order of pools in a universe is undefined. A universe is implemented as a collection of arrays, called field arrays, each for every field of the object. Word-sized fields are split between high and low bytes. The layout those arrays in memory is undefined. Examples below use some concrete pool layout for illustration purposes.
pool Point triangle [3] // handles 1-3 refer to here
pool Point quad [4] // handles 4-7 refer to here
// effectively, it's the same as:
// const byte triangle$offset = 1
// const byte quad$offset = 4
// const byte triangle.size = 3
// const byte quad.size = 4
// array Point$x$hi [7]
// array Point$x$lo [7]
// array Point$y [7]
Indexing into those arrays will use the handle minus 1, so using a null handle might read from before the array, giving undefined results.
You can refer to pool elements by the indexing operator:
p = triangle[0] // equivalent to p = Point(1)
You can create a one-object pool that will give you simpler syntax later
object Point tmp_point
p = tmp_point
// instead of:
// pool Point tmp_point [1]
// p = tmp_point[0]
You can access the members of the object via the handle using the ->
operator (binding very tightly and to the left)
p->x + p->y
// compiles to:
// (Point$x$hi[byte(p)-1]:Point$x$lo[byte(p)-1]) + Point$y[byte(p)-1]
Class hierarchies
Classes can inherit from each other. This is an is-a
relationship. A class can inherit from multiple other classes. A class inherits all the fields of its parents. If a class inherits a field via multiple parents, it's inherited only once. A class can be abstract. A non-abstract class is called a concrete class.
abstract class Object { }
abstract class HasPosition(Object) {
Point position
}
abstract class HasColour {
byte colour
}
class Background (HasColour) { }
class Sprite (HasColour, HasPosition) { }
abstract class Recolourable (HasColour) { }
class RecolourableSprite (Sprite, Recolourable) { }
// RecolourableSprite has only one field called `colour`
A graph where all the vertices are classes and all the edges are parent-child relationships is called the class graph. The class graph can be split into one or more connected components. Each of those components is called a class hierarchy. Each hierarchy has to have exactly one base class (one class with no superclasses), called a root class. In the example above, that class is Object
. The name of the hierarchy is the same as the name of the root class, although in case of automatically generated hierarchy-wide functions, a name of any class in the hierarchy will work (so Object.delete
, Background.delete
and Sprite.delete
are all aliases for the same function)
All fields in the hierarchy have to have unique names. If you want to use a field with the same name and the same type in two different unrelated classes, consider creating an abstract class with just that field and inheriting from it. If you want to use a field with the same name but a different type, it's not allowed at all.
A class is called a leaf class if it's not abstract and it has no descendants. In the example above there are such classes: RecolourableSprite
and Background
Before further processing, the hierarchy is trimmed:
-
all unused leaf classes are removed, all runtime type checks for those classes are stubbed to always fail
-
all abstract classes with exactly one child are merged with it, all runtime type checks for those classes are changed to check for the child
A hierarchy is called a singular hierarchy if it contains only one class. The class in a singular hierarchy is never abstract, even if declared otherwise. The Point
hierarchy described at the beginning is a singular hierarchy.
Every hierarchy has its own universe. Handles to different universes can have the same numerical value, but they will point to different objects.
Object o
Sprite s
Point p
o == p // won't compile, different hierarchies
byte(o) == byte(p) // will compile, but it's quite meaningless
o == s // will compile
o = s // will compile, Sprite is a subclass of Object
s = o // won't compile, Object is not a subclass of Sprite
s = Sprite(o) // will compile, but may cause problems later if o doesn't point to a sprite
If a pool contains objects of a non-leaf class, then it needs to account for fields of all possible objects:
pool Object objects [5]
// creates arrays called Object$position and Object$colour
pool Background backgrounds [5]
// extends the array Object$colour by 5 elements, but not the Object$position array
The compiler tries to order the pools in the universe so that the field arrays are as short as possible. Note that in the example above, the compiler may decide that backgrounds
contains objects with handles 1–5 and objects
with 6–10, meaning that x->colour
will compile to Object$colour[byte(x)-6]
.
(An optimal pool ordering algorithm is yet to be determined.)
Type identifiers
Each object may have a type identifier. A type identifier is an 8-bit number that identifies the actual runtime class of the object. The value 0 is reserved for uninitialized objects and means that they can be recycled. Abstract classes don't have type identifiers.
From the implementation perspective, it is implemented as a field called typeid
in all objects of the hierarchy. However, if the program doesn't need type identifiers, then the hierarchy will not have the typeid
field.
(The type of the typeid
field is going to be either byte
, or a hierarchy-specific opaque 8-bit type)
There are two types of type identifiers: compact identifiers and fast identifiers.
Compact identifiers are required if:
-
the program is using dynamic dispatch for this hierarchy
-
a class in this hierarchy has its type identifier overridden by the programmer:
class Thing @1 { }
Fast identifiers are required if:
- the program is using runtime type checks for this hierarchy
Any identifiers are required if:
-
the program is using the
typeid
field explicitly for this hierarchy -
the program is using object allocation mechanism for this hierarchy(described later)
Note that a non-singular hierarchy doesn't necessarily need identifiers.
If the hierarchy needs any kind of identifiers, but doesn't need particularly either compact or fast identifiers, then compact identifiers are chosen. If the hierarchy needs both compact and fast identifiers, then the typeid
field contains compact identifiers and a translation table is created that translates them to fast identifiers during type checks.
A hierarchy that uses identifiers can contain maximum of 255 concrete classes. Fast identifiers are more limiting, so they may fail to compile at even lower class counts. This is due to the fact that compact identifiers are consecutive (unless overridden) byte values, starting from 1, while fast identifiers are bitmasks (see: Near Optimal Hierarchical Encoding of Types by A. Krall, J. Vitek & R.N. Horspool, 1997)
A runtime typecheck if x
points to an object of class T
(abstract or concrete) in hierarchy H
may be implemented as follows:
-
if
typeid
is a fast identifier:
H$typeid[byte(x)-1] & T$fasttypeid == T$fasttypeid
-
if
typeid
is a compact identifier:
H$typeid$xlat[H$typeid[byte(x)-1]] & T$fasttypeid == T$fasttypeid
The syntax for such check is yet to be determined. For now, let's assume the syntax is <type>.is(<handle>)
, where <handle>
is a handle within the <type>
's hierarchy.
A runtime typecheck for whether a handle is null
or points to an object of a given class is also available. If possible, it should be achieved using the same code with an extra check. An example on how it might be implemented on 6502 for a class T
in hierarchy H
:
T.isNullOr: // asm clear_zero T.isNullOr(byte a)
TAY
BEQ __exit
T.is: // asm clear_zero T.is(byte y)
LDA H$typeid, Y
// if the typeid is compact instead of fast:
// TAY
// LDA H$typeid$xlat, Y
AND #T$fasttypeid
EOR #T$fasttypeid
__exit:
RTS
Manual object allocation
To mark all objects in a pool as uninitialized, you can call a function called <poolname>.clear
:
pool Object objects [7]
objects.clear()
// this calls Object$clear(objects$offset, objects.size)
To mark all objects in a hierarchy as uninitalized, you can call a function called <hierarchyname>.clear
:
Object.clear()
// this calls Object$clear(1, Object$count)
To allocate a new object in a pool, call <poolname>.new(<typename>)
. The given type must be a subtype of the pool element type. One uninitialized object in the pool will be given the correct type identifier and its handle will be returned. If the pool is full, then null
is returned.
objects.new(Sprite)
// this calls Sprite(Object$new(objects$offset, objects.size, Sprite.typeid))
To delete a new object in a pool, call <hierarchyname>.delete(<handle>)
. The handle cannot be null.
Sprite s
s = objects.new(Sprite)
if s != null {
// do things with s
Object.delete(s)
}
To check if a handle refers to a valid object (as opposed to an uninitialised one), check the value of the typeid
field.
TODO: decide whether ...$clear
and ...$new
should take start & size or start & end
Virtual method dispatch
Classes can have methods:
abstract class Recolourable(HasColour) {
void change_colour(self, byte new_colour) {
self->colour = new_colour
}
}
This defines a function void Recolourable.change_colour(Recolourable Recolourable.change_colour$self, byte Recolourable.change_colour$p1)
which can be called statically and directly and will by treated as a normal function. Unlike a normal function however, methods have a different calling convention:
-
all parameters except for the
self
parameter are guaranteed to be passed via global variables -
self
may be passed in a different way than a parameter to a normal function even if there are no other parameters to the method
The self
parameter is obligatory.
Within an abstract class, a method may be declared abstract:
void method(self) abstract
A subclass is allowed to override a method. Within a hierarchy, all methods with a given name have to have the same return type and use the same parameter types (except for self
, which is always of type of the class the method was declared in).
If you call the method via h->change_colour(c)
where h
is of static type H
, then the compiler will rewrite it to an appropriate call:
-
if
H
contains a non-abstract definition ofchange_colour
and none of its descendants does, the call will be rewritten asH.change_colour(h, c)
-
if neither the static type of
h
nor its ancestors contains a definition (abstract or not) ofchange_colour
, a compile-time error occurs ("Method change_colour not defined in H" or similar) -
if neither
H
doesn't contain a non-abstract definition ofchange_colour
nor the narrowest common subclass of ancestors ofH
which contain a non-abstract definition ofchange_colour
do, then a compile-time error occurs ("Ambiguous method change_colour" or similar). This may happen even without calling the method, at the time of class hierarchy analysis. -
otherwise, a call to
HH.change_colour$dynamic(h, c)
is made, whereHH
is the hierarchy ofH
The ...$dynamic
function does a return dispatch on the compact type identifier of the object pointed to by the handle. It has an identical calling convention as methods and it can therefore ignore the parameters, including the handle itself, and just pass them unchanged to the implementations.
The actual dynamic choice of the implementation of a called method is similar to the static choice, just precomputed into a jumptable.
Object copying
To check, if a pool can contain a copy of given object, call <poolname>.can_contain(<handle>)
. The handle has to be in the same hierarchy.
objects.can_contain(h)
// calls Sprite.isOrNull(h)
Calling <poolname>.can_contain(<typename>)
also works and yields a compile-time constant.
Copy is implemented as automatically generated method:
h->copy_into(objects[0])
// alternative potential syntax:
objects[0] := h
This method should use a different calling convention than other methods to ensure that it is reentrant. For example on 6502, it can use the X and Y registers for each handle.
If the source handle points to an uninitialized object, then unlike all the other methods, copy_into
doesn't cause undefined behaviour, but simply does nothing. In contrast, copying involving a null handle causes undefined behaviour.
If not compiling for speed, the compiler may generate only one copy method implementation per field combination used by pools. If the pool of the source handle can be determined statically, then such method is used:
h := objects[2]
// compiles to
H$copy0(objects$offset + 2, h)
If not, then another method is called that checks statically for the field combination of a given pool and uses a jumptable to pick the implementation.
If compiling for speed, the compiler may generate normal virtual methods for every class with a new field, just with a different calling convention.
Garbage collection
If the program is using garbage collection for a given hierarchy, then all the objects in that hierarchy will be given another field $marked_for_gc
(for space saving reasons, this field may be a bitmask if compiling without -Of
) and a method void $mark_for_gc(self)
, with custom implementation for every class that introduces a new field with a handle to an object in the same hierarchy. But if no class does that, then $mark_for_gc
is not generated.
The garbage collector is a simple mark-and-sweep.
Let's assume your hierarchy is called H
. First, you need to unmark all roots:
H.unmark_all()
Then you need to mark the roots, i.e. the objects that should not be collected:
H.mark(x)
This sets the $marked_for_gc
for the given object and if it wasn't marked before, calls x->$mark_for_gc()
on it. The $mark_for_gc
method calles H.mark
on all non-null fields of x
that refer to objects within this hierarchy. If the handle points to an uninitialized object, then unlike all the other methods, $mark_for_gc
doesn't cause undefined behaviour, but simply does nothing.
You can mark all roots within a pool:
pool H important_things [10]
important_things.mark_all()
After you mark all your roots, you can just call H.sweep()
to do the actual collection. All unmarked objects have their typeid
changed to 0 and become therefore uninitialized. The marked status of objects after this process is undefined. Calling H.sweep
without unmarking and marking roots again leads to undefined behaviour.