mirror of
				https://github.com/irmen/prog8.git
				synced 2025-10-31 15:16:13 +00:00 
			
		
		
		
	it was still there for backward compatibility reasons with really old prog8 code. If you need a word array to be not split, just use @nosplit on the array.
		
			
				
	
	
		
			264 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			264 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _pointers:
 | |
| 
 | |
| ********************
 | |
| Structs and Pointers
 | |
| ********************
 | |
| 
 | |
| .. attention::
 | |
|     The 6502 cpu lacks some features (addressing modes, registers) to make pointers work efficiently.
 | |
|     Also it requires that pointer variables have to be in zero page, or copied to a temporary zero page variable,
 | |
|     before they can even be used as a pointer. This means that pointer operations in prog8 compile
 | |
|     to rather large and inefficient assembly code most of the time, when compared to direct array access or regular variables.
 | |
|     At least try to place heavily used pointer variables in zero page using ``@requirezp`` on their declaration,
 | |
|     if zero page space allows.
 | |
|     Pointer variables that don't have a zeropage tag specified will be treated as having ``@zp`` so they get
 | |
|     priority over other variables to be placed into zeropage.
 | |
| 
 | |
| .. note::
 | |
|     Due to a few limitations in the language parser, some pointer related syntax is currently unsupported.
 | |
|     The compiler tries its best to give a descriptive error message but sometimes there is still a
 | |
|     parser limitation that has to be worked around at the moment. For example, this assigment syntax doesn't parse correctly::
 | |
| 
 | |
|         ^^Node  np
 | |
|         np[2].field = 9999          ; cannot use this syntax as assignment target right now
 | |
|         ubyte value = np[2].field   ; note that using it as expression value works fine
 | |
| 
 | |
|     To work around this you'll have to explicitly write the pointer dereferencing operator,
 | |
|     or break up the expression in multiple steps (which can be beneficial too when you are assigning multiple fields
 | |
|     because it will save a pointer calculation for every assignment)::
 | |
| 
 | |
|         ^^Node  np
 | |
|         np[2]^^.field = 9999
 | |
| 
 | |
|         ; alternatively, split up:
 | |
|         ^^Node thenode = &&np[2]
 | |
|         thenode.field = 9999
 | |
| 
 | |
| 
 | |
| Legacy untyped pointers (uword)
 | |
| -------------------------------
 | |
| 
 | |
| Prior to version 12 of the language, the only pointer type available was a plain ``uword`` value (the memory address)
 | |
| which could be used as a pointer to an ``ubyte`` (the byte value at that memory address).
 | |
| Array indexing on an ``uword`` simply means to point to the ``ubyte`` at the location of the address + index value.
 | |
| 
 | |
| When the address of a value (explicitly) or a value of a reference type (string, array) was passed as an argument to a subroutine call,
 | |
| it became one of these plain ``uword`` 'pointers'. The subroutine receiving it always had to interpret the 'pointer'
 | |
| explicitly for what it actually pointed to, if that wasn't a simple byte.
 | |
| 
 | |
| Some implicit conversions were allowed too (such as putting ``str`` as the type of a subroutine parameter,
 | |
| which would be changed to ``uword`` by the compiler).
 | |
| 
 | |
| Since Prog8 version 12 there now are *typed pointers* that better express the intent and tell the compiler how to use the pointer;
 | |
| these are explained below.
 | |
| 
 | |
| *For backward compatibility reasons, this untyped uword pointer still exists in the language.*
 | |
| You can assign any other pointer type to an untyped pointer variable (uword) without the need for an explicit cast.
 | |
| You can assign an untyped pointer (uword) to a typed pointer variable without the need for an explicit cast.
 | |
| 
 | |
| 
 | |
| 
 | |
| Typed pointer to simple datatype
 | |
| --------------------------------
 | |
| 
 | |
| Prog8 syntax has the 'double hat' token ``^^`` that appears either in front of a type ("pointer to this type") or
 | |
| after a pointer variable ("get the value it points to" - a pointer dereference).
 | |
| 
 | |
| So the syntax for declaring typed pointers looks like this:
 | |
| 
 | |
| ``^^type``: pointer to a type
 | |
|     You can declare a pointer to any numeric datatype (bytes, words, longs, floats), and booleans, and struct types.
 | |
|     Structs are explained in the next section.
 | |
|     So for example; ``^^float fptr`` declares fptr as a pointer to a float value.
 | |
| 
 | |
| ``^^type[size]``: array with size size containing pointers to a type.
 | |
|     So for example; ``^^word[100] values`` declares values to be an array of 100 pointers to words.
 | |
|     Note that an array of pointers (regardless of the type they point to) is always a split word array.
 | |
|     (this is the most efficient way to access the pointers, and they need to be copied to zeropage first to
 | |
|     be able to use them anyway. It also allows for arrays of up to 256 pointers instead of 128.)
 | |
| 
 | |
| It is not possible to define pointers to *arrays*; ``^^(type[])`` is invalid syntax.
 | |
| 
 | |
| Pointers of different types cannot be assigned to one another, unless you use an explicit cast.
 | |
| This rule is not enforced for untyped pointers/regular uword, as described earlier.
 | |
| 
 | |
| The ``str`` type in subroutine parameters and return values has always been a bit weird in the sense that in these cases,
 | |
| the string is actually passed by reference (it's address pointer is passed) instead of a ``str`` variable that is accessed by value.
 | |
| In previous Prog8 versions these were untyped uword pointers, but since version 12, these are now translated as ``^^ubyte``.
 | |
| Resulting assembly code should be equivalent still.
 | |
| 
 | |
| .. note::
 | |
|     **Pointers to subroutines:**
 | |
|     While Prog8 allows you to take the address of a subroutine, it has no support yet for typed function pointers.
 | |
|     Calling a routine through a pointer with ``goto``, ``call()`` and such, only works with the raw uword address for now.
 | |
| 
 | |
| 
 | |
| Dereferencing a pointer, pointer arithmetic
 | |
| -------------------------------------------
 | |
| 
 | |
| To get the value the pointer points at, you *dereference* the pointer. The syntax for that is: ``pointer^^``.
 | |
| Say the pointer variable is of type ``^^float``, then ``pointer^^`` will return the float value it points at.
 | |
| 
 | |
| You can also use array indexing syntax to get the n-th value. For example: ``floatpointer[3]`` will return the
 | |
| fourth floating point value in the sequence that the floatpointer points at. Because the pointer is a typed pointer,
 | |
| the compiler knows what the size of the value is that it points at and correctly skips forward the required number of bytes in memory.
 | |
| In this case, say a float takes 5 bytes, then ``floatpointer[3]`` will return the float value stored at memory address floatpointer+15.
 | |
| Notice that ``floatpointer[0]`` is equivalent to ``floatpointer^^``.
 | |
| 
 | |
| You can add and subtract values from a pointer, this is called **pointer arithmetic**.
 | |
| For example, to advance a pointer to the next value, you can use ``pointer++``.
 | |
| To make it point to the preceding value, you can use ``pointer--``.
 | |
| **Adding or subtracting X to a pointer will change the pointer by X times the size of the value it points at (the same as the C language does it),
 | |
| instead of simply adding or subtracting the value from the pointer address value.**
 | |
| (that is what Prog8 still does for untyped uword pointers, or pointers to a type that just takes up a single byte of memory).
 | |
| 
 | |
| That special pointer arithmetic is also performed for pointers to struct types:
 | |
| the compiler knows the memory storage size of the whole struct type and advances or rewinds
 | |
| the pointer value (memory address) by the appropriate number of bytes (X times the size of the struct). More info about structs can be found below.
 | |
| 
 | |
| 
 | |
| Structs
 | |
| -------
 | |
| 
 | |
| A struct is a grouping of multiple variables. Say your game is going to track several enemy sprites on the screen,
 | |
| in which case it may be useful to describe the various properties of an enemy together in a struct type, rather than
 | |
| dealing with all of them separately.  You first define the struct type like so::
 | |
| 
 | |
|     struct Enemy {
 | |
|         ubyte xpos, ypos
 | |
|         uword health
 | |
|         bool elite
 | |
|     }
 | |
| 
 | |
| You can use boolean fields, numeric fields (byte, word, long, float), and pointer fields (including str, which is translated into ^^ubyte).
 | |
| You cannot nest struct types nor put arrays in them as a field.
 | |
| Fields in a struct are 'packed' (meaning the values are placed back-to-back in memory), and placed in memory in order of declaration. This guarantees exact size and place of the fields.
 | |
| ``sizeof()`` knows how to calculate the combined size of a struct, and ``offsetof()`` can be used to get the byte offset of a given field in the struct.
 | |
| The size of a struct cannot exceed 1 memory page (256 bytes).
 | |
| 
 | |
| You can copy the whole contents of a struct to another one by assigning the dereferenced pointers::
 | |
| 
 | |
|     ^^Enemy e1,e2
 | |
|     e1^^ = e2^^     ; copies all fields of e2 into e1
 | |
| 
 | |
| 
 | |
| The struct type creates a new name scape, so accessing the fields of a struct is done as usual with the dotted notation.
 | |
| Because it implies pointer dereferencing you can usually omit the explicit `^^`, prog8 will know what it means::
 | |
| 
 | |
|     if e1.ypos > 300
 | |
|         e1.health -= 10
 | |
| 
 | |
|     ; explicit dereferencing notation:
 | |
| 
 | |
|     if e1^^.ypos > 300
 | |
|         e1^^.health -= 10
 | |
| 
 | |
| 
 | |
| .. note::
 | |
|     Structs are currently only supported as a *reference type* (they always have to be accessed through a pointer).
 | |
|     It is not yet possible to use them as a value type, or as memory-mapped types.
 | |
|     This means you cannot create an array of structs either - only arrays of pointers to structs.
 | |
|     There are a couple of simple case where the compiler does allow assignment of struct instances though, and it will
 | |
|     automatically copy all the fields for you. You are allowed to write::
 | |
| 
 | |
|         ptr2^^ = ptr1^^
 | |
|         ptr2^^ = ptr1[2]
 | |
|         ptr2[2] = ptr1^^
 | |
| 
 | |
|     The compiler replaces this with a memory copy if these are pointers to a struct.
 | |
|     In the future more cases may be supported.
 | |
| 
 | |
| .. note::
 | |
|     Using structs instead of plain arrays usually results in more and less efficent code being generated.
 | |
|     This is because the 6502 CPU is not particularly well equipped to dealing with pointers and accessing struct fields via offsets,
 | |
|     as compared to direct variable access or array indexing. The prog8 program code may be easier to work with though!
 | |
| 
 | |
| .. note::
 | |
|     Accessing the first field in a struct is more efficient than subsequent fields, because it
 | |
|     is at offset 0 so no additional addition has to be computed on a pointer to reach the first field.
 | |
|     Try to put the most often accessed field as the first field to potentially gain a rather substantial boost in code efficiency.
 | |
| 
 | |
| 
 | |
| Static initialization of structs
 | |
| ================================
 | |
| 
 | |
| You can 'allocate' and statically initialize a struct. This behave much like initializing arrays does,
 | |
| and it won't reset to the original value when the program is restarted, so beware.
 | |
| *Remember that the struct is statically allocated, and appears just once in the memory:*
 | |
| This means that, for instance, if you do this in a subroutine that gets
 | |
| called multiple times, or inside a loop, the struct *will be the same instance every time*.
 | |
| Read below if you need *dynamic* struct allocation!
 | |
| You write a static struct initialization expression like this:
 | |
| 
 | |
| ``^^Node : [1,"one", 1000, true, 1.111]``
 | |
|     statically places an instance of struct 'Node' in memory, with its fields set to 1, "one", 1000 etcetera and returns the address of this struct.
 | |
|     The values in the initialization array must correspond exactly with the first to last declared fields in the struct type.
 | |
| ``^^Node : []``
 | |
|     (without values) Places a 'Node' instance in BSS variable space instead, which gets zeroed out at program startup.
 | |
|     Returns the address of this empty struct.
 | |
| 
 | |
| It is also possible to put struct initializer inside arrays to make them all statically initialized and accessible via the array::
 | |
| 
 | |
|     ^^Node[] allnodes = [
 | |
|         ^^Node: [1,"one", 1000, true, 1.111],
 | |
|         ^^Node: [2,"two", 2000, false, 2.222],
 | |
|         ^^Node: [],
 | |
|         ^^Node: [],
 | |
|     ]
 | |
| 
 | |
| Short form initializers
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| If the required type can be inferred from the context you can also omit the struct pointer type prefix altogether.
 | |
| The initializer value then is syntactically the same as an array, but Prog8 internally turns it back into a proper
 | |
| struct initializer value based on the the type of the array element or pointer variable it is assigned to.
 | |
| So you can write the above in short form as::
 | |
| 
 | |
|     ^^Node nodepointer = [1,2,3,4]
 | |
| 
 | |
|     ^^Node[] allnodes = [
 | |
|         [1,"one", 1000, true, 1.111],
 | |
|         [2,"two", 2000, false, 2.222],
 | |
|         [],
 | |
|         []
 | |
|     ]
 | |
| 
 | |
| 
 | |
| 
 | |
| Dynamic allocation of structs
 | |
| =============================
 | |
| 
 | |
| There is no real 'dynamic' memory allocation in Prog8. Everything is statically allocated. This doesn't change with struct types.
 | |
| However, it is possible to write a dynamic memory handling library yourself (it has to track memory blocks manually).
 | |
| If you ask such a library to give you a pointer to a piece of memory with size ``sizeof(Enemy)`` you can use that as
 | |
| a dynamic pointer to an Enemy struct.
 | |
| 
 | |
| An example of how a super simple dynamic allocator could look like::
 | |
| 
 | |
|     ^^Node newnode = allocator.alloc(sizeof(Node))
 | |
|     ...
 | |
| 
 | |
|     allocator {
 | |
|         ; extremely trivial arena allocator
 | |
|         uword buffer = memory("arena", 2000, 0)
 | |
|         uword next = buffer
 | |
| 
 | |
|         sub alloc(ubyte size) -> uword {
 | |
|             defer next += size
 | |
|             return next
 | |
|         }
 | |
| 
 | |
|         sub freeall() {
 | |
|             ; cannot free individual allocations only the whole arena at once
 | |
|             next = buffer
 | |
|         }
 | |
|     }
 | |
| 
 | |
| 
 | |
| Address-Of: untyped vs typed
 | |
| ----------------------------
 | |
| 
 | |
| ``&`` still returns an untyped (uword) pointer, as it did in older Prog8 versions. This is for backward compatibility reasons so existing programs don't break.
 | |
| The new *double ampersand* operator ``&&`` returns a *typed* pointer to the value. The semantics are slightly different from the old untyped address-of operator, because adding or subtracting
 | |
| a number from a typed pointer uses *pointer arithmetic* that takes the size of the value that it points to into account.
 |