For boolean and numeric variables, you can actually declare them in one go by listing the names in a comma separated list.
Type tags, and the optional initialization value, are applied equally to all variables in such a list.
Various examples::
word thing = 0
byte counter = len([1, 2, 3]) * 20
byte age = 2018 - 1974
float wallet = 55.25
ubyte x,y,z ; declare three ubyte variables x y and z
str name = "my name is Alice"
uword address = &counter
bool flag = true
byte[] values = [11, 22, 33, 44, 55]
byte[5] values ; array of 5 bytes, initially set to zero
byte[5] values = [255]*5 ; initialize with five 255 bytes
word @zp zpword = 9999 ; prioritize this when selecting vars for zeropage storage
uword @requirezp zpaddr = $3000 ; we require this variable in zeropage
word @shared asmvar ; variable is used in assembly code but not elsewhere
byte @nozp memvar ; variable that is never in zeropage
Here are the tags you can add to a variable:
========== ======
Tag Effect
========== ======
@zp prioritize the variable for putting it into Zero page. No guarantees; if ZP is full the variable will be placed in another memory location.
@requirezp force the variable into Zero page. If ZP is full, compilation will fail.
@nozp force the variable to normal system ram, never place it into zeropage.
@shared means the variable is shared with some assembly code and that it cannot be optimized away if not used elsewhere.
@split (only valid on (u)word arrays) Makes the array to be placed in memory as 2 separate byte arrays; one with the LSBs one with the MSBs of the word values. Usually improves performance and code size.
@alignword aligns string or array variable on an even memory address
@align64 aligns string or array variable on a 64 byte address interval (example: for C64 sprite data)
@alignpage aligns string or array variable on a 256 byte address interval (example: to avoid page boundaries)
@dirty the variable won't be initialized by Prog8 which means that its value is undefined. You'll have to set it yourself before using the variable. Used to reduce overhead in certain scenarios. 🦶🔫 Footgun warning.
========== ======
Variables can be defined inside any scope (blocks, subroutines etc.) See :ref:`blocks`.
When declaring a numeric variable it is possible to specify the initial value, if you don't want it to be zero.
For other data types it is required to specify that initial value it should get.
Values will usually be part of an expression or assignment statement::
12345 ; integer number
$aa43 ; hex integer number
%100101 ; binary integer number (% is also remainder operator so be careful)
false ; boolean false
-33.456e52 ; floating point number
"Hi, I am a string" ; text string, encoded with default encoding
'a' ; byte value (ubyte) for the letter a
sc:"Alternate" ; text string, encoded with c64 screencode encoding
sc:'a' ; byte value of the letter a in c64 screencode encoding
byte counter = 42 ; variable of size 8 bits, with initial value 42
**putting a variable in zeropage:**
If you add the ``@zp`` tag to the variable declaration, the compiler will prioritize this variable
when selecting variables to put into zeropage (but no guarantees). If there are enough free locations in the zeropage,
it will try to fill it with as much other variables as possible (before they will be put in regular memory pages).
Use ``@requirezp`` tag to *force* the variable into zeropage, but if there is no more free space the compilation will fail.
It's possible to put strings, arrays and floats into zeropage too, however because Zp space is really scarce
this is not advised as they will eat up the available space very quickly. It's best to only put byte or word
variables in zeropage. By the way, there is also ``@nozp`` to keep a variable *out of the zeropage* at all times.
Example::
byte @zp smallcounter = 42
uword @requirezp zppointer = $4000
**shared variables:**
If you add the ``@shared`` tag to the variable declaration, the compiler will know that this variable
is a prog8 variable shared with some assembly code elsewhere. This means that the assembly code can
refer to the variable even if it's otherwise not used in prog8 code itself.
(usually, these kinds of 'unused' variables are optimized away by the compiler, resulting in an error
when assembling the rest of the code). Example::
byte @shared assemblyVariable = 42
**uninitialized variables:**
All variables will be initialized by prog8 at startup, they'll get their assigned initialization value, or be cleared to zero.
This (re)initialization is also done on each subroutine entry for the variables declared in the subroutine.
There may be certain scenarios where this initialization is redundant and/or where you want to avoid the overhead of it.
In some cases, Prog8 itself can detect that a variable doesn't need a separate automatic initialization to zero, if
it's trivial that it is not being read between the variable's declaration and the first assignment. For instance, when
you declare a variable immediately before a for loop where it is the loop variable. However Prog8 is not yet very smart
at detecting these redundant initializations. If you want to be sure, check the generated assembly output.
In any case, you can use the ``@dirty`` tag on the variable declaration to make the variable *not* being (re)initialized by Prog8.
This means its value will be undefined (it can be anything) until you assign a value yourself! Don't use such
a variable before you have done so. 🦶🔫 Footgun warning.
**memory alignment:**
A string or array variable can be aligned to a couple of possible interval sizes in memory.
The use for this is very situational, but two examples are: sprite data for the C64 that needs
to be on a 64 byte aligned memory address, or an array aligned on a full page boundary to avoid
any possible extra page boundary clock cycles on certain instructions when accessing the array.
You can align on word, 64 bytes, and page boundaries::
``byte[x]`` signed byte array x bytes ``byte[4] myvar``
``ubyte[x]`` unsigned byte array x bytes ``ubyte[4] myvar``
``word[x]`` signed word array 2*x bytes ``word[4] myvar``
``uword[x]`` unsigned word array 2*x bytes ``uword[4] myvar``
``float[x]`` floating-point array 5*x bytes ``float[4] myvar``. The 5 bytes per float is on CBM targets.
``bool[x]`` boolean array x bytes ``bool[4] myvar`` note: consider using bit flags in a byte or word instead to save space
``byte[]`` signed byte array depends on value ``byte[] myvar = [1, 2, 3, 4]``
``ubyte[]`` unsigned byte array depends on value ``ubyte[] myvar = [1, 2, 3, 4]``
``word[]`` signed word array depends on value ``word[] myvar = [1, 2, 3, 4]``
``uword[]`` unsigned word array depends on value ``uword[] myvar = [1, 2, 3, 4]``
``float[]`` floating-point array depends on value ``float[] myvar = [1.1, 2.2, 3.3, 4.4]``
``bool[]`` boolean array depends on value ``bool[] myvar = [true, false, true]`` note: consider using bit flags in a byte or word instead to save space
Arrays can be created from a list of booleans, bytes, words, floats, or addresses of other variables
(such as explicit address-of expressions, strings, or other array variables) - values in an array literal
always have to be constants. Here are some examples of arrays::
byte[10] array ; array of 10 bytes, initially set to 0
byte[] array = [1, 2, 3, 4] ; initialize the array, size taken from value
ubyte[99] array = [255]*99 ; initialize array with 99 times 255 [255, 255, 255, 255, ...]
byte[] array = 100 to 199 ; initialize array with [100, 101, ..., 198, 199]
str[] names = ["ally", "pete"] ; array of string pointers/addresses (equivalent to array of uwords)
uword[] others = [names, array] ; array of pointers/addresses to other arrays
bool[2] flags = [true, false] ; array of two boolean values (take up 1 byte each, like a byte array)
value = array[3] ; the fourth value in the array (index is 0-based)
char = string[4] ; the fifth character (=byte) in the string
char = string[-2] ; the second-to-last character in the string (Python-style indexing from the end)
..note::
Right now, the array should be small enough to be indexable by a single byte index.
This means byte arrays should be <= 256 elements, word arrays <= 128 elements (256 if
it's a split array - see below), and float arrays <= 51 elements.
Arrays can be initialized with a range expression or an array literal value.
You can write out such an initializer value over several lines if you want to improve readability.
When an initialization value is given, you are allowed to omit the array size in the declaration,
because it can be inferred from the initialization value.
You can use '*' to repeat array fragments to build up a larger array.
You can assign a new value to an element in the array, but you can't assign a whole
new array to another array at once. This is usually a costly operation. If you really
need this you have to write it out depending on the use case: you can copy the memory using
``sys.memcopy(sourcearray, targetarray, sizeof(targetarray))``. Or perhaps use ``sys.memset`` instead to
set it all to the same value, or maybe even simply assign the individual elements.
Note that the various keywords for the data type and variable type (``byte``, ``word``, ``const``, etc.)
can't be used as *identifiers* elsewhere. You can't make a variable, block or subroutine with the name ``byte``
for instance.
Using the ``in`` operator you can easily check if a value is present in an array,
example: ``if choice in [1,2,3,4] {....}``
**Arrays at a specific memory location:**
Using the memory-mapped syntax it is possible to define an array to be located at a specific memory location.
For instance to reference the first 5 rows of the Commodore 64's screen matrix as an array, you can define::
&ubyte[5*40] top5screenrows = $0400
This way you can set the second character on the second row from the top like this::
top5screenrows[41] = '!'
**Array indexing on a pointer variable:**
An uword variable can be used in limited scenarios as a 'pointer' to a byte in memory at a specific,
dynamic, location. You can use array indexing on a pointer variable to use it as a byte array at
a dynamic location in memory: currently this is equivalent to directly referencing the bytes in
memory at the given index. In contrast to a real array variable, the index value can be the size of a word.
Unlike array variables, negative indexing for pointer variables does *not* mean it will be counting from the end, because the size of the buffer is unknown.
Instead, it simply addresses memory that lies *before* the pointer variable.
See also :ref:`pointervars`
**LSB/MSB split word arrays:**
For (u)word arrays, you can make the compiler layout the array in memory as two separate arrays,
one with the LSBs and one with the MSBs of the word values. This makes it more efficient to access
values from the array (smaller and faster code). It also doubles the maximum size of the array from 128 words to 256 words!
The ``@split`` tag should be added to the variable declaration to do this.
In the assembly code, the array will then be generated as two byte arrays namely ``name_lsb`` and ``name_msb``.
..caution::
Not all array operations are supported yet on "split word arrays".
If you get an error message, simply revert to a regular word array and please report the issue,
so that more support can be added in the future where it is needed.
.._encodings:
Strings
^^^^^^^
Strings are a sequence of characters enclosed in double quotes. The length is limited to 255 characters.
They're stored and treated much the same as a byte array,
but they have some special properties because they are considered to be *text*.
Strings (without encoding prefix) will be encoded (translated from ASCII/UTF-8) into bytes via the
*default encoding* for the target platform. On the CBM machines, this is CBM PETSCII.
Strings in the default encoding are stored in the machine's default character encoding (which is PETSCII on the CBM machines).
You can choose to store it in another encodings such as ``sc`` (screencodes) or ``iso`` (iso-8859-15).
Here are examples of the possible encodings:
-``"hello"`` a string translated into the default character encoding (PETSCII on the CBM machines)
-``petscii:"hello"`` string in CBM PETSCII encoding
-``sc:"my name is Alice"`` string in CBM screencode encoding
-``iso:"Ich heiße François"`` string in iso-8859-15 encoding (Latin)
-``iso5:"Хозяин и Работник"`` string in iso-8859-5 encoding (Cyrillic)
-``iso16:"zażółć gęślą jaźń"`` string in iso-8859-16 encoding (Eastern Europe)
-``atascii:"I am Atari!"`` string in "atascii" encoding (Atari 8-bit)
-``cp437:"≈ IBM Pc ≈ ♂♀♪☺¶"`` string in "cp437" encoding (IBM PC codepage 437)
So the following is a string literal that will be encoded into memory bytes using the iso encoding.
It can be correctly displayed on the screen only if a iso-8859-15 charset has been activated first
(the Commander X16 has this feature built in)::
iso:"Käse, Straße"
You can concatenate two string literals using '+', which can be useful to
split long strings over separate lines. But remember that the length
of the total string still cannot exceed 255 characters.
A string literal can also be repeated a given number of times using '*', where the repeat number must be a constant value.
And a new string value can be assigned to another string, but no bounds check is done!
So be sure the destination string is large enough to contain the new value (it is overwritten in memory)::
str string1 = "first part" + "second part"
str string2 = "hello!" * 10
string1 = string2
string1 = "new value"
There are several escape sequences available to put special characters into your string value:
-``\\`` - the backslash itself, has to be escaped because it is the escape symbol by itself
-``\n`` - newline character (move cursor down and to beginning of next line)
-``\r`` - carriage return character (more or less the same as newline if printing to the screen)
-``\"`` - quote character (otherwise it would terminate the string)
-``\'`` - apostrophe character (has to be escaped in character literals, is okay inside a string)
-``\uHHHH`` - a unicode codepoint \u0000 - \uffff (16-bit hexadecimal)
-``\xHH`` - 8-bit hex value that will be copied verbatim *without encoding*
- String literals can contain many symbols directly if they have a PETSCII equivalent, such as "♠♥♣♦π▚●○╳".
Characters like ^, _, \\, {, } and | (that have no direct PETSCII counterpart) are still accepted and converted to the closest PETSCII equivalents. (Make sure you save the source file in UTF-8 encoding if you use this.)
Using the ``in`` operator you can easily check if a character is present in a string,
example: ``if '@' in email_address {....}`` (however this gives no clue about the location
in the string where the character is present, if you need that, use the ``strings.find()``
library function instead)
**Caution:**
This checks *all* elements in the string with the length as it was initially declared.
Even when a string was changed and is terminated early with a 0-byte early,
the containment check with ``in`` will still look at all character positions in the initial string.
Consider using ``strings.find`` followed by ``if_cs`` (for instance) to do a "safer" search
for a character in such strings (one that stops at the first 0 byte)
..hint::
Strings/arrays and uwords (=memory address) can often be interchanged.
An array of strings is actually an array of uwords where every element is the memory
address of the string. You can pass a memory address to assembly functions
that require a string as an argument.
For regular assignments you still need to use an explicit ``&`` (address-of) to take
the address of the string or array.
..hint::
You can declare parameters and return values of subroutines as ``str``,
but in this case that is equivalent to declaring them as ``uword`` (because
in this case, the address of the string is passed as argument or returned as value).
..note:: Strings and their (im)mutability
*String literals outside of a string variable's initialization value*,
are considered to be "constant", i.e. the string isn't going to change
during the execution of the program. The compiler takes advantage of this in certain
ways. For instance, multiple identical occurrences of a string literal are folded into
just one string allocation in memory. Examples of such strings are the string literals
passed to a subroutine as arguments.
*Strings that aren't such string literals are considered to be unique*, even if they
are the same as a string defined elsewhere. This includes the strings assigned to
a string variable in its declaration! These kind of strings are not deduplicated and
are just copied into the program in their own unique part of memory. This means that
it is okay to treat those strings as mutable; you can safely change the contents
of such a string without destroying other occurrences (as long as you stay within
the size of the allocated string!)
.._range-expression:
Ranges
^^^^^^
A special value is the *range expression* which represents a range of integer numbers or characters,
from the starting value to (and including) the ending value::
<start> to <end> [ step <step> ]
<start> downto <end> [ step <step> ]
You an provide a step value if you need something else than the default increment which is one (or,
in case of downto, a decrement of one). Unlike the start and end values, the step value must be a constant.
Because a step of minus one is so common you can just use
the downto variant to avoid having to specify the step as well::
0 to 7 ; range of values 0, 1, 2, 3, 4, 5, 6, 7
20 downto 10 step -3 ; range of values 20, 17, 14, 11
aa = 5
xx = 10
aa to xx ; range of 5, 6, 7, 8, 9, 10
for i in 0 to 127 {
; i loops 0, 1, 2, ... 127
}
Range expressions are most often used in for loops, but can be also be used to create array initialization values::
byte[] array = 100 to 199 ; initialize array with [100, 101, ..., 198, 199]
Special types: const and memory-mapped
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When using ``const``, the value of the 'variable' cannot be changed; it has become a compile-time constant value instead.
You'll have to specify the initial value expression. This value is then used
by the compiler everywhere you refer to the constant (and no memory is allocated
for the constant itself). Onlythe simple numeric types (byte, word, float) can be defined as a constant.
If something is defined as a constant, very efficient code can usually be generated from it.
Variables on the other hand can't be optimized as much, need memory, and more code to manipulate them.
Note that a subset of the library routines in the ``math``, ``strings`` and ``floats`` modules are recognised in
compile time expressions. For example, the compiler knows what ``math.sin8u(12)`` is and replaces it with the computed result.
When using ``&`` (the address-of operator but now applied to a datatype), the variable will point to specific location in memory,
rather than being newly allocated. The initial value (mandatory) must be a valid
memory address. Reading the variable will read the given data type from the
address you specified, and setting the variable will directly modify that memory location(s)::
const byte max_age = 2000 - 1974 ; max_age will be the constant value 26
&word SCREENCOLORS = $d020 ; a 16-bit word at the address $d020-$d021
.._pointervars:
Direct access to memory locations ('peek' and 'poke')