mirror of
https://github.com/irmen/prog8.git
synced 2025-01-12 19:29:50 +00:00
428 lines
16 KiB
ReStructuredText
428 lines
16 KiB
ReStructuredText
.. _programstructure:
|
|
|
|
===================
|
|
Programming in IL65
|
|
===================
|
|
|
|
This chapter describes a high level overview of the elements that make up a program.
|
|
Details about the syntax can be found in the :ref:`syntaxreference` chapter.
|
|
|
|
|
|
Elements of a program
|
|
---------------------
|
|
|
|
Program
|
|
Consists of one or more *modules*.
|
|
|
|
Module
|
|
A file on disk with the ``.ill`` suffix. It contains *directives* and *code blocks*.
|
|
Whitespace and indentation in the source code are arbitrary and can be tabs or spaces or both.
|
|
You can also add *comments* to the source code.
|
|
One moudule file can *import* others, and also import *library modules*.
|
|
|
|
Comments
|
|
Everything after a semicolon ``;`` is a comment and is ignored by the compiler.
|
|
If the whole line is just a comment, it will be copied into the resulting assembly source code.
|
|
This makes it easier to understand and relate the generated code. Examples::
|
|
|
|
A = 42 ; set the initial value to 42
|
|
; next is the code that...
|
|
|
|
Directive
|
|
These are special instructions for the compiler, to change how it processes the code
|
|
and what kind of program it creates. A directive is on its own line in the file, and
|
|
starts with ``%``, optionally followed by some arguments.
|
|
|
|
Code block
|
|
A block of actual program code. It defines a *scope* (also known as 'namespace') and
|
|
can contain IL65 *code*, *variable declarations* and *subroutines*.
|
|
More details about this below: :ref:`blocks`.
|
|
|
|
Variable declarations
|
|
The data that the code works on is stored in variables ('named values that can change').
|
|
The compiler allocates the required memory for them.
|
|
There is *no dynamic memory allocation*. The storage size of all variables
|
|
is fixed and is determined at compile time.
|
|
Variable declarations tend to appear at the top of the code block that uses them.
|
|
They define the name and type of the variable, and its initial value.
|
|
IL65 supports a small list of data types, including special 'memory mapped' types
|
|
that don't allocate storage but instead point to a fixed location in the address space.
|
|
|
|
Code
|
|
These are the instructions that make up the program's logic. There are different kinds of instructions
|
|
('statements' is a better name):
|
|
|
|
- value assignment
|
|
- looping (for, while, repeat, unconditional jumps)
|
|
- conditional execution (if - then - else, and conditional jumps)
|
|
- subroutine calls
|
|
- label definition
|
|
|
|
Subroutine
|
|
Defines a piece of code that can be called by its name from different locations in your code.
|
|
It accepts parameters and can return result values.
|
|
It can define its own variables but it's not possible to define subroutines nested in other subroutines.
|
|
To keep things simple, you can only define subroutines inside code blocks from a module.
|
|
|
|
Label
|
|
This is a named position in your code where you can jump to from another place.
|
|
You can jump to it with a jump statement elsewhere. It is also possible to use a
|
|
subroutine call to a label (but without parameters and return value).
|
|
|
|
|
|
Scope
|
|
Also known as 'namespace', this is a named box around the symbols defined in it.
|
|
This prevents name collisions (or 'namespace pollution'), because the name of the scope
|
|
is needed as prefix to be able to access the symbols in it.
|
|
Anything *inside* the scope can refer to symbols in the same scope without using a prefix.
|
|
There are three scopes in IL65:
|
|
|
|
- global (no prefix)
|
|
- code block
|
|
- subroutine
|
|
|
|
Modules are *not* a scope! Everything defined in a module is merged into the global scope.
|
|
|
|
|
|
.. _blocks:
|
|
|
|
Blocks, Scopes, and accessing Symbols
|
|
-------------------------------------
|
|
|
|
Blocks are the separate pieces of code and data of your program. They are combined
|
|
into a single output program. No code or data can occur outside a block. Here's an example::
|
|
|
|
~ main $c000 {
|
|
; this is code inside the block...
|
|
}
|
|
|
|
|
|
The name of a block must be unique in your entire program.
|
|
Also be careful when importing other modules; blocks in your own code cannot have
|
|
the same name as a block defined in an imported module or library.
|
|
|
|
It's possible to omit this name, but then you can only refer to the contents of the block via its absolute address,
|
|
which is required in this case. If you omit *both* name and address, the block is *ignored* by the compiler (and a warning is displayed).
|
|
This is a way to quickly "comment out" a piece of code that is unfinshed or may contain errors that you
|
|
want to work on later, because the contents of the ignored block are not fully parsed either.
|
|
|
|
The address can be used to place a block at a specific location in memory.
|
|
Usually it is omitted, and the compiler will automatically choose the location (usually immediately after
|
|
the previous block in memory).
|
|
The address must be >= ``$0200`` (because ``$00``--``$ff`` is the ZP and ``$100``--``$200`` is the cpu stack).
|
|
|
|
A block is also a *scope* in your program so the symbols in the block don't clash with
|
|
symbols of the same name defined elsewhere in the same file or in another file.
|
|
You can refer to the symbols in a particular block by using a *dotted name*: ``blockname.symbolname``.
|
|
Labels inside a subroutine are appended again to that; ``blockname.subroutinename.label``.
|
|
|
|
Every symbol is 'public' and can be accessed from elsewhere given its dotted name.
|
|
|
|
|
|
**The special "ZP" ZeroPage block**
|
|
|
|
Blocks named "ZP" are treated a bit differently: they refer to the ZeroPage.
|
|
The contents of every block with that name (this one may occur multiple times) are merged into one.
|
|
Its start address is always set to ``$04``, because ``$00 - $01`` are used by the hardware
|
|
and ``$02 - $03`` are reserved as general purpose scratch registers.
|
|
|
|
|
|
Program Start and Entry Point
|
|
-----------------------------
|
|
|
|
Your program must have a single entry point where code execution begins.
|
|
The compiler expects a ``start`` subroutine in the ``main`` block for this,
|
|
taking no parameters and having no return value.
|
|
As any subroutine, it has to end with a ``return`` statement (or a ``goto`` call)::
|
|
|
|
~ main {
|
|
sub start () -> () {
|
|
; program entrypoint code here
|
|
return
|
|
}
|
|
}
|
|
|
|
The ``main`` module is always relocated to the start of your programs
|
|
address space, and the ``start`` subroutine (the entrypoint) will be on the
|
|
first address. This will also be the address that the BASIC loader program (if generated)
|
|
calls with the SYS statement.
|
|
|
|
|
|
Variables and values
|
|
--------------------
|
|
|
|
Variables are named values that can change during the execution of the program.
|
|
When declaring a variable it is required to specify the initial value it should get.
|
|
Values will usually be part of an expression or assignment statement::
|
|
|
|
12345 ; integer number
|
|
$aa43 ; hex integer number
|
|
%100101 ; binary integer number
|
|
"Hi, I am a string" ; text string
|
|
-33.456e52 ; floating point number
|
|
|
|
byte counter = 42 ; variable of size 8 bits, with initial value 42
|
|
|
|
byte[4] array = [1, 2, 3, 4] ; initialize the array
|
|
byte[99] array = 255 ; initialize array with all 255's [255, 255, 255, 255, ...]
|
|
byte[100] array = 100 to 199 ; initialize array with [100, 101, ..., 198, 199]
|
|
|
|
|
|
Note that the various keywords for the data type and variable type (``byte``, ``word``, ``const``, etc.)
|
|
cannot be used as *identifiers* elsewhere. You can't make a variable, block or subroutine with the name ``byte``
|
|
for instance.
|
|
|
|
|
|
Special types: const and memory-mapped
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When using ``const``, the value of the 'variable' can no longer be changed.
|
|
You'll have to specify the initial value expression. This value is then used
|
|
by the compiler everywhere you refer to the constant (and no storage is allocated
|
|
for the constant itself).
|
|
|
|
When using ``memory``, the variable will point to specific location in memory,
|
|
rather than being newly allocated. The initial value (mandatory) must be a valid
|
|
memory address. Reading the variable will read the given data type from the
|
|
address you specified, and setting the varible will directly modify that memory location(s)::
|
|
|
|
const byte max_age = 2000 - 1974 ; max_age will be the constant value 26
|
|
memory word SCREENCOLORS = $d020 ; a 16-bit word at the addres $d020-$d021
|
|
|
|
|
|
Integers
|
|
^^^^^^^^
|
|
|
|
Integers are 8 or 16 bit numbers and can be written in normal decimal notation,
|
|
in hexadecimal and in binary notation.
|
|
|
|
@todo right now only unsinged integers are supported (0-255 for byte types, 0-65535 for word types)
|
|
|
|
|
|
Strings
|
|
^^^^^^^
|
|
|
|
Strings are a sequence of characters enclosed in ``"`` quotes.
|
|
They're stored and treated much the same as a byte array,
|
|
but they have some special properties because they are considered to be *text*.
|
|
Strings in your source code files will be encoded (translated from ASCII/UTF-8) into either CBM PETSCII or C-64 screencodes.
|
|
PETSCII is the default choice. If you need screencodes (also called 'poke' codes) instead,
|
|
you have to use the ``str_s`` variants of the string type identifier.
|
|
If you assign a string literal of length 1 to a non-string variable, it is treated as a *byte* value instead
|
|
with has the PETSCII value of that single character,
|
|
|
|
|
|
Floating point numbers
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Floats are stored in the 5-byte 'MFLPT' format that is used on CBM machines,
|
|
and also most float operations are specific to the Commodore-64.
|
|
This is because routines in the C-64 BASIC and KERNAL ROMs are used for that.
|
|
So floating point operations will only work if the C-64 BASIC ROM (and KERNAL ROM)
|
|
are banked in (and your code imports the ``c64lib.ill``)
|
|
|
|
The largest 5-byte MFLPT float that can be stored is: **1.7014118345e+38** (negative: **-1.7014118345e+38**)
|
|
|
|
|
|
Initial values across multiple runs of the program
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The initial values of your variables will be restored automatically when the program is (re)started,
|
|
*except for string variables*. It is assumed these are left unchanged by the program.
|
|
If you do modify them in-place, you should take care yourself that they work as
|
|
expected when the program is restarted.
|
|
|
|
|
|
|
|
Indirect addressing and address-of
|
|
----------------------------------
|
|
|
|
The ``#`` operator is used to take the address of the symbol following it.
|
|
It can be used for example to work with the *address* of a memory mapped variable rather than
|
|
the value it holds. You could take the address of a string as well, but that is redundant:
|
|
the compiler already treats those as a value that you manipulate via its address.
|
|
For most other types this prefix is not supported and will result in a compilation error.
|
|
The resulting value is simply a 16 bit word. Example::
|
|
|
|
AX = #somevar
|
|
|
|
|
|
**Indirect addressing:**
|
|
@todo ???
|
|
|
|
**Indirect addressing in jumps:**
|
|
@todo ???
|
|
For an indirect ``goto`` statement, the compiler will issue the 6502 CPU's special instruction
|
|
(``jmp`` indirect). A subroutine call (``jsr`` indirect) is emitted
|
|
using a couple of instructions.
|
|
|
|
|
|
Loops
|
|
-----
|
|
|
|
The *for*-loop is used to iterate over a range of values. Iteration is done in steps of 1, but you can change this.
|
|
The *while*-loop is used to repeat a piece of code while a certain condition is still true.
|
|
The *repeat--until* loop is used to repeat a piece of code until a certain condition is true.
|
|
|
|
You can also create loops by using the ``goto`` statement, but this should usually be avoided.
|
|
|
|
|
|
Conditional Execution
|
|
---------------------
|
|
|
|
.. todo::
|
|
not sure how to handle direct translation into
|
|
[cc, cs, vc, vs, eq, ne, true, not, zero, pos, neg, lt, gt, le, ge]
|
|
It defaults to 'true' (=='ne', not-zero) if omitted. ('pos' will translate into 'pl', 'neg' into 'mi')
|
|
@todo signed: lts==neg?, gts==eq+pos?, les==neg+eq?, ges==pos?
|
|
|
|
.. todo::
|
|
eventually allow local variable definitions inside the sub blocks but for now,
|
|
they have to use the same variables as the block the ``if`` statement itself is in.
|
|
|
|
|
|
Conditional execution means that the flow of execution changes based on certiain conditions,
|
|
rather than having fixed gotos or subroutine calls::
|
|
|
|
if A > 4 goto overflow
|
|
|
|
if X == 3 then Y = 4
|
|
if X == 3 then Y = 4 else A = 2
|
|
|
|
if X == 5 {
|
|
Y = 99
|
|
} else {
|
|
A = 3
|
|
}
|
|
|
|
condition = arithmetic expression or logical expression or comparison expression or status_register_flags ( ``SR.cs`` , ``SR.cc``, ``SR.pl`` etc... @todo )
|
|
|
|
|
|
|
|
|
|
Conditional jumps are compiled into 6502's branching instructions (such as ``bne`` and ``bcc``) so
|
|
the rather strict limit on how *far* it can jump applies. The compiler itself can't figure this
|
|
out unfortunately, so it is entirely possible to create code that cannot be assembled successfully.
|
|
You'll have to restructure your gotos in the code (place target labels closer to the branch)
|
|
if you run into this type of assembler error.
|
|
|
|
|
|
Assignments
|
|
-----------
|
|
|
|
Assignment statements assign a single value to a target variable or memory location.
|
|
Augmented assignments (such as ``A += X``) are also available, but these are just shorthands
|
|
for normal assignments (``A = A + X``).
|
|
|
|
|
|
Expressions
|
|
-----------
|
|
|
|
In most places where a number or other value is expected, you can use just the number, or a full constant expression.
|
|
The expression is parsed and evaluated by the compiler itself at compile time, and the (constant) resulting value is used in its place.
|
|
Expressions can contain function calls to the math library (sin, cos, etc) and you can also use
|
|
all builtin functions (max, avg, min, sum etc). They can also reference idendifiers defined elsewhere in your code,
|
|
if this makes sense.
|
|
|
|
|
|
Arithmetic and Logical expressions
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
Arithmetic expressions are expressions that calculate a numeric result (integer or floating point).
|
|
Many common arithmetic operators can be used and follow the regular precedence rules.
|
|
|
|
Logical expressions are expressions that calculate a boolean result, true or false
|
|
(which in IL65 will effectively be a 1 or 0 integer value).
|
|
|
|
You can use parentheses to group parts of an expresion to change the precedence.
|
|
Usually the normal precedence rules apply (``*`` goes before ``+`` etc.) but subexpressions
|
|
within parentheses will be evaluated first. So ``(4 + 8) * 2`` is 24 and not 20,
|
|
and ``(true or false) and false`` is false instead of true.
|
|
|
|
|
|
Subroutines
|
|
-----------
|
|
|
|
Defining a subroutine
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Subroutines are parts of the code that can be repeatedly invoked using a subroutine call from elsewhere.
|
|
Their definition, using the sub statement, includes the specification of the required input- and output parameters.
|
|
For now, only register based parameters are supported (A, X, Y and paired registers,
|
|
the carry status bit SC and the interrupt disable bit SI as specials).
|
|
For subroutine return values, the special SZ register is also available, it means the zero status bit.
|
|
|
|
|
|
Calling a subroutine
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The output variables must occur in the correct sequence of return registers as specified
|
|
in the subroutine's definiton. It is possible to not specify any of them but the compiler
|
|
will issue a warning then if the result values of a subroutine call are discarded.
|
|
If you don't have a variable to store the output register in, it's then required
|
|
to list the register itself instead as output variable.
|
|
|
|
Arguments should match the subroutine definition. You are allowed to omit the parameter names.
|
|
If no definition is available (because you're directly calling memory or a label or something else),
|
|
you can freely add arguments (but in this case they all have to be named).
|
|
|
|
To jump to a subroutine (without returning), prefix the subroutine call with the word 'goto'.
|
|
Unlike gotos in other languages, here it take arguments as well, because it
|
|
essentially is the same as calling a subroutine and only doing something different when it's finished.
|
|
|
|
**Register preserving calls:** use the ``!`` followed by a combination of A, X and Y (or followed
|
|
by nothing, which is the same as AXY) to tell the compiler you want to preserve the origial
|
|
value of the given registers after the subroutine call. Otherwise, the subroutine may just
|
|
as well clobber all three registers. Preserving the original values does result in some
|
|
stack manipulation code to be inserted for every call like this, which can be quite slow.
|
|
|
|
|
|
Built-in Functions
|
|
------------------
|
|
|
|
The compiler has the following built-in functions that you can use in expressions:
|
|
|
|
sin(value)
|
|
Sine.
|
|
|
|
cos(value)
|
|
Cosine.
|
|
|
|
abs(value)
|
|
Absolute value.
|
|
|
|
acos(value)
|
|
Arccosine.
|
|
|
|
asin(value)
|
|
Arcsine.
|
|
|
|
tan(value)
|
|
Tangent.
|
|
|
|
atan(value)
|
|
Arctangent.
|
|
|
|
log(value)
|
|
Natural logarithm.
|
|
|
|
log10(value)
|
|
Base-10 logarithm.
|
|
|
|
sqrt(value)
|
|
Square root.
|
|
|
|
max(value [, value, ...])
|
|
Maximum of the values.
|
|
|
|
min(value [, value, ...])
|
|
Minumum of the values.
|
|
|
|
round(value)
|
|
Rounds the floating point to an integer.
|
|
|
|
rad(value)
|
|
Degrees to radians.
|
|
|
|
deg(value)
|
|
Radians to degrees.
|