mirror of
				https://github.com/irmen/prog8.git
				synced 2025-11-04 10:16:13 +00:00 
			
		
		
		
	it was still there for backward compatibility reasons with really old prog8 code. If you need a word array to be not split, just use @nosplit on the array.
		
			
				
	
	
		
			264 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			264 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _pointers:
 | 
						|
 | 
						|
********************
 | 
						|
Structs and Pointers
 | 
						|
********************
 | 
						|
 | 
						|
.. attention::
 | 
						|
    The 6502 cpu lacks some features (addressing modes, registers) to make pointers work efficiently.
 | 
						|
    Also it requires that pointer variables have to be in zero page, or copied to a temporary zero page variable,
 | 
						|
    before they can even be used as a pointer. This means that pointer operations in prog8 compile
 | 
						|
    to rather large and inefficient assembly code most of the time, when compared to direct array access or regular variables.
 | 
						|
    At least try to place heavily used pointer variables in zero page using ``@requirezp`` on their declaration,
 | 
						|
    if zero page space allows.
 | 
						|
    Pointer variables that don't have a zeropage tag specified will be treated as having ``@zp`` so they get
 | 
						|
    priority over other variables to be placed into zeropage.
 | 
						|
 | 
						|
.. note::
 | 
						|
    Due to a few limitations in the language parser, some pointer related syntax is currently unsupported.
 | 
						|
    The compiler tries its best to give a descriptive error message but sometimes there is still a
 | 
						|
    parser limitation that has to be worked around at the moment. For example, this assigment syntax doesn't parse correctly::
 | 
						|
 | 
						|
        ^^Node  np
 | 
						|
        np[2].field = 9999          ; cannot use this syntax as assignment target right now
 | 
						|
        ubyte value = np[2].field   ; note that using it as expression value works fine
 | 
						|
 | 
						|
    To work around this you'll have to explicitly write the pointer dereferencing operator,
 | 
						|
    or break up the expression in multiple steps (which can be beneficial too when you are assigning multiple fields
 | 
						|
    because it will save a pointer calculation for every assignment)::
 | 
						|
 | 
						|
        ^^Node  np
 | 
						|
        np[2]^^.field = 9999
 | 
						|
 | 
						|
        ; alternatively, split up:
 | 
						|
        ^^Node thenode = &&np[2]
 | 
						|
        thenode.field = 9999
 | 
						|
 | 
						|
 | 
						|
Legacy untyped pointers (uword)
 | 
						|
-------------------------------
 | 
						|
 | 
						|
Prior to version 12 of the language, the only pointer type available was a plain ``uword`` value (the memory address)
 | 
						|
which could be used as a pointer to an ``ubyte`` (the byte value at that memory address).
 | 
						|
Array indexing on an ``uword`` simply means to point to the ``ubyte`` at the location of the address + index value.
 | 
						|
 | 
						|
When the address of a value (explicitly) or a value of a reference type (string, array) was passed as an argument to a subroutine call,
 | 
						|
it became one of these plain ``uword`` 'pointers'. The subroutine receiving it always had to interpret the 'pointer'
 | 
						|
explicitly for what it actually pointed to, if that wasn't a simple byte.
 | 
						|
 | 
						|
Some implicit conversions were allowed too (such as putting ``str`` as the type of a subroutine parameter,
 | 
						|
which would be changed to ``uword`` by the compiler).
 | 
						|
 | 
						|
Since Prog8 version 12 there now are *typed pointers* that better express the intent and tell the compiler how to use the pointer;
 | 
						|
these are explained below.
 | 
						|
 | 
						|
*For backward compatibility reasons, this untyped uword pointer still exists in the language.*
 | 
						|
You can assign any other pointer type to an untyped pointer variable (uword) without the need for an explicit cast.
 | 
						|
You can assign an untyped pointer (uword) to a typed pointer variable without the need for an explicit cast.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Typed pointer to simple datatype
 | 
						|
--------------------------------
 | 
						|
 | 
						|
Prog8 syntax has the 'double hat' token ``^^`` that appears either in front of a type ("pointer to this type") or
 | 
						|
after a pointer variable ("get the value it points to" - a pointer dereference).
 | 
						|
 | 
						|
So the syntax for declaring typed pointers looks like this:
 | 
						|
 | 
						|
``^^type``: pointer to a type
 | 
						|
    You can declare a pointer to any numeric datatype (bytes, words, longs, floats), and booleans, and struct types.
 | 
						|
    Structs are explained in the next section.
 | 
						|
    So for example; ``^^float fptr`` declares fptr as a pointer to a float value.
 | 
						|
 | 
						|
``^^type[size]``: array with size size containing pointers to a type.
 | 
						|
    So for example; ``^^word[100] values`` declares values to be an array of 100 pointers to words.
 | 
						|
    Note that an array of pointers (regardless of the type they point to) is always a split word array.
 | 
						|
    (this is the most efficient way to access the pointers, and they need to be copied to zeropage first to
 | 
						|
    be able to use them anyway. It also allows for arrays of up to 256 pointers instead of 128.)
 | 
						|
 | 
						|
It is not possible to define pointers to *arrays*; ``^^(type[])`` is invalid syntax.
 | 
						|
 | 
						|
Pointers of different types cannot be assigned to one another, unless you use an explicit cast.
 | 
						|
This rule is not enforced for untyped pointers/regular uword, as described earlier.
 | 
						|
 | 
						|
The ``str`` type in subroutine parameters and return values has always been a bit weird in the sense that in these cases,
 | 
						|
the string is actually passed by reference (it's address pointer is passed) instead of a ``str`` variable that is accessed by value.
 | 
						|
In previous Prog8 versions these were untyped uword pointers, but since version 12, these are now translated as ``^^ubyte``.
 | 
						|
Resulting assembly code should be equivalent still.
 | 
						|
 | 
						|
.. note::
 | 
						|
    **Pointers to subroutines:**
 | 
						|
    While Prog8 allows you to take the address of a subroutine, it has no support yet for typed function pointers.
 | 
						|
    Calling a routine through a pointer with ``goto``, ``call()`` and such, only works with the raw uword address for now.
 | 
						|
 | 
						|
 | 
						|
Dereferencing a pointer, pointer arithmetic
 | 
						|
-------------------------------------------
 | 
						|
 | 
						|
To get the value the pointer points at, you *dereference* the pointer. The syntax for that is: ``pointer^^``.
 | 
						|
Say the pointer variable is of type ``^^float``, then ``pointer^^`` will return the float value it points at.
 | 
						|
 | 
						|
You can also use array indexing syntax to get the n-th value. For example: ``floatpointer[3]`` will return the
 | 
						|
fourth floating point value in the sequence that the floatpointer points at. Because the pointer is a typed pointer,
 | 
						|
the compiler knows what the size of the value is that it points at and correctly skips forward the required number of bytes in memory.
 | 
						|
In this case, say a float takes 5 bytes, then ``floatpointer[3]`` will return the float value stored at memory address floatpointer+15.
 | 
						|
Notice that ``floatpointer[0]`` is equivalent to ``floatpointer^^``.
 | 
						|
 | 
						|
You can add and subtract values from a pointer, this is called **pointer arithmetic**.
 | 
						|
For example, to advance a pointer to the next value, you can use ``pointer++``.
 | 
						|
To make it point to the preceding value, you can use ``pointer--``.
 | 
						|
**Adding or subtracting X to a pointer will change the pointer by X times the size of the value it points at (the same as the C language does it),
 | 
						|
instead of simply adding or subtracting the value from the pointer address value.**
 | 
						|
(that is what Prog8 still does for untyped uword pointers, or pointers to a type that just takes up a single byte of memory).
 | 
						|
 | 
						|
That special pointer arithmetic is also performed for pointers to struct types:
 | 
						|
the compiler knows the memory storage size of the whole struct type and advances or rewinds
 | 
						|
the pointer value (memory address) by the appropriate number of bytes (X times the size of the struct). More info about structs can be found below.
 | 
						|
 | 
						|
 | 
						|
Structs
 | 
						|
-------
 | 
						|
 | 
						|
A struct is a grouping of multiple variables. Say your game is going to track several enemy sprites on the screen,
 | 
						|
in which case it may be useful to describe the various properties of an enemy together in a struct type, rather than
 | 
						|
dealing with all of them separately.  You first define the struct type like so::
 | 
						|
 | 
						|
    struct Enemy {
 | 
						|
        ubyte xpos, ypos
 | 
						|
        uword health
 | 
						|
        bool elite
 | 
						|
    }
 | 
						|
 | 
						|
You can use boolean fields, numeric fields (byte, word, long, float), and pointer fields (including str, which is translated into ^^ubyte).
 | 
						|
You cannot nest struct types nor put arrays in them as a field.
 | 
						|
Fields in a struct are 'packed' (meaning the values are placed back-to-back in memory), and placed in memory in order of declaration. This guarantees exact size and place of the fields.
 | 
						|
``sizeof()`` knows how to calculate the combined size of a struct, and ``offsetof()`` can be used to get the byte offset of a given field in the struct.
 | 
						|
The size of a struct cannot exceed 1 memory page (256 bytes).
 | 
						|
 | 
						|
You can copy the whole contents of a struct to another one by assigning the dereferenced pointers::
 | 
						|
 | 
						|
    ^^Enemy e1,e2
 | 
						|
    e1^^ = e2^^     ; copies all fields of e2 into e1
 | 
						|
 | 
						|
 | 
						|
The struct type creates a new name scape, so accessing the fields of a struct is done as usual with the dotted notation.
 | 
						|
Because it implies pointer dereferencing you can usually omit the explicit `^^`, prog8 will know what it means::
 | 
						|
 | 
						|
    if e1.ypos > 300
 | 
						|
        e1.health -= 10
 | 
						|
 | 
						|
    ; explicit dereferencing notation:
 | 
						|
 | 
						|
    if e1^^.ypos > 300
 | 
						|
        e1^^.health -= 10
 | 
						|
 | 
						|
 | 
						|
.. note::
 | 
						|
    Structs are currently only supported as a *reference type* (they always have to be accessed through a pointer).
 | 
						|
    It is not yet possible to use them as a value type, or as memory-mapped types.
 | 
						|
    This means you cannot create an array of structs either - only arrays of pointers to structs.
 | 
						|
    There are a couple of simple case where the compiler does allow assignment of struct instances though, and it will
 | 
						|
    automatically copy all the fields for you. You are allowed to write::
 | 
						|
 | 
						|
        ptr2^^ = ptr1^^
 | 
						|
        ptr2^^ = ptr1[2]
 | 
						|
        ptr2[2] = ptr1^^
 | 
						|
 | 
						|
    The compiler replaces this with a memory copy if these are pointers to a struct.
 | 
						|
    In the future more cases may be supported.
 | 
						|
 | 
						|
.. note::
 | 
						|
    Using structs instead of plain arrays usually results in more and less efficent code being generated.
 | 
						|
    This is because the 6502 CPU is not particularly well equipped to dealing with pointers and accessing struct fields via offsets,
 | 
						|
    as compared to direct variable access or array indexing. The prog8 program code may be easier to work with though!
 | 
						|
 | 
						|
.. note::
 | 
						|
    Accessing the first field in a struct is more efficient than subsequent fields, because it
 | 
						|
    is at offset 0 so no additional addition has to be computed on a pointer to reach the first field.
 | 
						|
    Try to put the most often accessed field as the first field to potentially gain a rather substantial boost in code efficiency.
 | 
						|
 | 
						|
 | 
						|
Static initialization of structs
 | 
						|
================================
 | 
						|
 | 
						|
You can 'allocate' and statically initialize a struct. This behave much like initializing arrays does,
 | 
						|
and it won't reset to the original value when the program is restarted, so beware.
 | 
						|
*Remember that the struct is statically allocated, and appears just once in the memory:*
 | 
						|
This means that, for instance, if you do this in a subroutine that gets
 | 
						|
called multiple times, or inside a loop, the struct *will be the same instance every time*.
 | 
						|
Read below if you need *dynamic* struct allocation!
 | 
						|
You write a static struct initialization expression like this:
 | 
						|
 | 
						|
``^^Node : [1,"one", 1000, true, 1.111]``
 | 
						|
    statically places an instance of struct 'Node' in memory, with its fields set to 1, "one", 1000 etcetera and returns the address of this struct.
 | 
						|
    The values in the initialization array must correspond exactly with the first to last declared fields in the struct type.
 | 
						|
``^^Node : []``
 | 
						|
    (without values) Places a 'Node' instance in BSS variable space instead, which gets zeroed out at program startup.
 | 
						|
    Returns the address of this empty struct.
 | 
						|
 | 
						|
It is also possible to put struct initializer inside arrays to make them all statically initialized and accessible via the array::
 | 
						|
 | 
						|
    ^^Node[] allnodes = [
 | 
						|
        ^^Node: [1,"one", 1000, true, 1.111],
 | 
						|
        ^^Node: [2,"two", 2000, false, 2.222],
 | 
						|
        ^^Node: [],
 | 
						|
        ^^Node: [],
 | 
						|
    ]
 | 
						|
 | 
						|
Short form initializers
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
If the required type can be inferred from the context you can also omit the struct pointer type prefix altogether.
 | 
						|
The initializer value then is syntactically the same as an array, but Prog8 internally turns it back into a proper
 | 
						|
struct initializer value based on the the type of the array element or pointer variable it is assigned to.
 | 
						|
So you can write the above in short form as::
 | 
						|
 | 
						|
    ^^Node nodepointer = [1,2,3,4]
 | 
						|
 | 
						|
    ^^Node[] allnodes = [
 | 
						|
        [1,"one", 1000, true, 1.111],
 | 
						|
        [2,"two", 2000, false, 2.222],
 | 
						|
        [],
 | 
						|
        []
 | 
						|
    ]
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Dynamic allocation of structs
 | 
						|
=============================
 | 
						|
 | 
						|
There is no real 'dynamic' memory allocation in Prog8. Everything is statically allocated. This doesn't change with struct types.
 | 
						|
However, it is possible to write a dynamic memory handling library yourself (it has to track memory blocks manually).
 | 
						|
If you ask such a library to give you a pointer to a piece of memory with size ``sizeof(Enemy)`` you can use that as
 | 
						|
a dynamic pointer to an Enemy struct.
 | 
						|
 | 
						|
An example of how a super simple dynamic allocator could look like::
 | 
						|
 | 
						|
    ^^Node newnode = allocator.alloc(sizeof(Node))
 | 
						|
    ...
 | 
						|
 | 
						|
    allocator {
 | 
						|
        ; extremely trivial arena allocator
 | 
						|
        uword buffer = memory("arena", 2000, 0)
 | 
						|
        uword next = buffer
 | 
						|
 | 
						|
        sub alloc(ubyte size) -> uword {
 | 
						|
            defer next += size
 | 
						|
            return next
 | 
						|
        }
 | 
						|
 | 
						|
        sub freeall() {
 | 
						|
            ; cannot free individual allocations only the whole arena at once
 | 
						|
            next = buffer
 | 
						|
        }
 | 
						|
    }
 | 
						|
 | 
						|
 | 
						|
Address-Of: untyped vs typed
 | 
						|
----------------------------
 | 
						|
 | 
						|
``&`` still returns an untyped (uword) pointer, as it did in older Prog8 versions. This is for backward compatibility reasons so existing programs don't break.
 | 
						|
The new *double ampersand* operator ``&&`` returns a *typed* pointer to the value. The semantics are slightly different from the old untyped address-of operator, because adding or subtracting
 | 
						|
a number from a typed pointer uses *pointer arithmetic* that takes the size of the value that it points to into account.
 |