mirror of
				https://github.com/irmen/prog8.git
				synced 2025-11-03 19:16:13 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			428 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			428 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _programstructure:
 | 
						|
 | 
						|
===================
 | 
						|
Programming in IL65
 | 
						|
===================
 | 
						|
 | 
						|
This chapter describes a high level overview of the elements that make up a program.
 | 
						|
Details about the syntax can be found in the :ref:`syntaxreference` chapter.
 | 
						|
 | 
						|
 | 
						|
Elements of a program
 | 
						|
---------------------
 | 
						|
 | 
						|
Program
 | 
						|
	Consists of one or more *modules*.
 | 
						|
 | 
						|
Module
 | 
						|
	A file on disk with the ``.ill`` suffix. It contains *directives* and *code blocks*.
 | 
						|
	Whitespace and indentation in the source code are arbitrary and can be tabs or spaces or both.
 | 
						|
	You can also add *comments* to the source code.
 | 
						|
	One moudule file can *import* others, and also import *library modules*.
 | 
						|
 | 
						|
Comments
 | 
						|
	Everything after a semicolon ``;`` is a comment and is ignored by the compiler.
 | 
						|
	If the whole line is just a comment, it will be copied into the resulting assembly source code.
 | 
						|
	This makes it easier to understand and relate the generated code. Examples::
 | 
						|
 | 
						|
		A = 42    ; set the initial value to 42
 | 
						|
		; next is the code that...
 | 
						|
 | 
						|
Directive
 | 
						|
	These are special instructions for the compiler, to change how it processes the code
 | 
						|
	and what kind of program it creates. A directive is on its own line in the file, and
 | 
						|
	starts with ``%``, optionally followed by some arguments.
 | 
						|
 | 
						|
Code block
 | 
						|
	A block of actual program code. It defines a *scope* (also known as 'namespace') and
 | 
						|
	can contain IL65 *code*, *variable declarations* and *subroutines*.
 | 
						|
	More details about this below: :ref:`blocks`.
 | 
						|
 | 
						|
Variable declarations
 | 
						|
	The data that the code works on is stored in variables ('named values that can change').
 | 
						|
	The compiler allocates the required memory for them.
 | 
						|
	There is *no dynamic memory allocation*. The storage size of all variables
 | 
						|
	is fixed and is determined at compile time.
 | 
						|
	Variable declarations tend to appear at the top of the code block that uses them.
 | 
						|
	They define the name and type of the variable, and its initial value.
 | 
						|
	IL65 supports a small list of data types, including special 'memory mapped' types
 | 
						|
	that don't allocate storage but instead point to a fixed location in the address space.
 | 
						|
 | 
						|
Code
 | 
						|
	These are the instructions that make up the program's logic. There are different kinds of instructions
 | 
						|
	('statements' is a better name):
 | 
						|
 | 
						|
	- value assignment
 | 
						|
	- looping  (for, while, repeat, unconditional jumps)
 | 
						|
	- conditional execution (if - then - else, and conditional jumps)
 | 
						|
	- subroutine calls
 | 
						|
	- label definition
 | 
						|
 | 
						|
Subroutine
 | 
						|
	Defines a piece of code that can be called by its name from different locations in your code.
 | 
						|
	It accepts parameters and can return result values.
 | 
						|
	It can define its own variables but it's not possible to define subroutines nested in other subroutines.
 | 
						|
	To keep things simple, you can only define subroutines inside code blocks from a module.
 | 
						|
 | 
						|
Label
 | 
						|
	This is a named position in your code where you can jump to from another place.
 | 
						|
	You can jump to it with a jump statement elsewhere. It is also possible to use a
 | 
						|
	subroutine call to a label (but without parameters and return value).
 | 
						|
 | 
						|
 | 
						|
Scope
 | 
						|
	Also known as 'namespace', this is a named box around the symbols defined in it.
 | 
						|
	This prevents name collisions (or 'namespace pollution'), because the name of the scope
 | 
						|
	is needed as prefix to be able to access the symbols in it.
 | 
						|
	Anything *inside* the scope can refer to symbols in the same scope without using a prefix.
 | 
						|
	There are three scopes in IL65:
 | 
						|
 | 
						|
	- global (no prefix)
 | 
						|
	- code block
 | 
						|
	- subroutine
 | 
						|
 | 
						|
	Modules are *not* a scope! Everything defined in a module is merged into the global scope.
 | 
						|
 | 
						|
 | 
						|
.. _blocks:
 | 
						|
 | 
						|
Blocks, Scopes, and accessing Symbols
 | 
						|
-------------------------------------
 | 
						|
 | 
						|
Blocks are the separate pieces of code and data of your program. They are combined
 | 
						|
into a single output program.  No code or data can occur outside a block. Here's an example::
 | 
						|
 | 
						|
	~ main $c000 {
 | 
						|
		; this is code inside the block...
 | 
						|
	}
 | 
						|
 | 
						|
 | 
						|
The name of a block must be unique in your entire program.
 | 
						|
Also be careful when importing other modules; blocks in your own code cannot have
 | 
						|
the same name as a block defined in an imported module or library.
 | 
						|
 | 
						|
It's possible to omit this name, but then you can only refer to the contents of the block via its absolute address,
 | 
						|
which is required in this case. If you omit *both* name and address, the block is *ignored* by the compiler (and a warning is displayed).
 | 
						|
This is a way to quickly "comment out" a piece of code that is unfinshed or may contain errors that you
 | 
						|
want to work on later, because the contents of the ignored block are not fully parsed either.
 | 
						|
 | 
						|
The address can be used to place a block at a specific location in memory.
 | 
						|
Usually it is omitted, and the compiler will automatically choose the location (usually immediately after
 | 
						|
the previous block in memory).
 | 
						|
The address must be >= ``$0200`` (because ``$00``--``$ff`` is the ZP and ``$100``--``$200`` is the cpu stack).
 | 
						|
 | 
						|
A block is also a *scope* in your program so the symbols in the block don't clash with
 | 
						|
symbols of the same name defined elsewhere in the same file or in another file.
 | 
						|
You can refer to the symbols in a particular block by using a *dotted name*: ``blockname.symbolname``.
 | 
						|
Labels inside a subroutine are appended again to that; ``blockname.subroutinename.label``.
 | 
						|
 | 
						|
Every symbol is 'public' and can be accessed from elsewhere given its dotted name.
 | 
						|
 | 
						|
 | 
						|
**The special "ZP" ZeroPage block**
 | 
						|
 | 
						|
Blocks named "ZP" are treated a bit differently: they refer to the ZeroPage.
 | 
						|
The contents of every block with that name (this one may occur multiple times) are merged into one.
 | 
						|
Its start address is always set to ``$04``, because ``$00 - $01`` are used by the hardware
 | 
						|
and ``$02 - $03`` are reserved as general purpose scratch registers.
 | 
						|
 | 
						|
 | 
						|
Program Start and Entry Point
 | 
						|
-----------------------------
 | 
						|
 | 
						|
Your program must have a single entry point where code execution begins.
 | 
						|
The compiler expects a ``start`` subroutine in the ``main`` block for this,
 | 
						|
taking no parameters and having no return value.
 | 
						|
As any subroutine, it has to end with a ``return`` statement (or a ``goto`` call)::
 | 
						|
 | 
						|
	~ main {
 | 
						|
	    sub start () -> ()  {
 | 
						|
	        ; program entrypoint code here
 | 
						|
	        return
 | 
						|
	    }
 | 
						|
	}
 | 
						|
 | 
						|
The ``main`` module is always relocated to the start of your programs
 | 
						|
address space, and the ``start`` subroutine (the entrypoint) will be on the
 | 
						|
first address. This will also be the address that the BASIC loader program (if generated)
 | 
						|
calls with the SYS statement.
 | 
						|
 | 
						|
 | 
						|
Variables and values
 | 
						|
--------------------
 | 
						|
 | 
						|
Variables are named values that can change during the execution of the program.
 | 
						|
When declaring a variable it is required to specify the initial value it should get.
 | 
						|
Values will usually be part of an expression or assignment statement::
 | 
						|
 | 
						|
	12345			; integer number
 | 
						|
	$aa43			; hex integer number
 | 
						|
	%100101			; binary integer number
 | 
						|
	"Hi, I am a string"	; text string
 | 
						|
	-33.456e52		; floating point number
 | 
						|
 | 
						|
	byte  counter  = 42	; variable of size 8 bits, with initial value 42
 | 
						|
 | 
						|
	byte[4]  array = [1, 2, 3, 4]    ; initialize the array
 | 
						|
	byte[99] array = 255             ; initialize array with all 255's [255, 255, 255, 255, ...]
 | 
						|
	byte[100] array = 100 to 199     ; initialize array with [100, 101, ..., 198, 199]
 | 
						|
 | 
						|
 | 
						|
Note that the various keywords for the data type and variable type (``byte``, ``word``, ``const``, etc.)
 | 
						|
cannot be used as *identifiers* elsewhere. You can't make a variable, block or subroutine with the name ``byte``
 | 
						|
for instance.
 | 
						|
 | 
						|
 | 
						|
Special types: const and memory-mapped
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
When using ``const``, the value of the 'variable' can no longer be changed.
 | 
						|
You'll have to specify the initial value expression. This value is then used
 | 
						|
by the compiler everywhere you refer to the constant (and no storage is allocated
 | 
						|
for the constant itself).
 | 
						|
 | 
						|
When using ``memory``, the variable will point to specific location in memory,
 | 
						|
rather than being newly allocated. The initial value (mandatory) must be a valid
 | 
						|
memory address.  Reading the variable will read the given data type from the
 | 
						|
address you specified, and setting the varible will directly modify that memory location(s)::
 | 
						|
 | 
						|
	const  byte  max_age = 2000 - 1974      ; max_age will be the constant value 26
 | 
						|
	memory word  SCREENCOLORS = $d020       ; a 16-bit word at the addres $d020-$d021
 | 
						|
 | 
						|
 | 
						|
Integers
 | 
						|
^^^^^^^^
 | 
						|
 | 
						|
Integers are 8 or 16 bit numbers and can be written in normal decimal notation,
 | 
						|
in hexadecimal and in binary notation.
 | 
						|
 | 
						|
@todo right now only unsinged integers are supported (0-255 for byte types, 0-65535 for word types)
 | 
						|
 | 
						|
 | 
						|
Strings
 | 
						|
^^^^^^^
 | 
						|
 | 
						|
Strings are a sequence of characters enclosed in ``"`` quotes.
 | 
						|
They're stored and treated much the same as a byte array,
 | 
						|
but they have some special properties because they are considered to be *text*.
 | 
						|
Strings in your source code files will be encoded (translated from ASCII/UTF-8) into either CBM PETSCII or C-64 screencodes.
 | 
						|
PETSCII is the default choice. If you need screencodes (also called 'poke' codes) instead,
 | 
						|
you have to use the ``str_s`` variants of the string type identifier.
 | 
						|
If you assign a string literal of length 1 to a non-string variable, it is treated as a *byte* value instead
 | 
						|
with has the PETSCII value of that single character,
 | 
						|
 | 
						|
 | 
						|
Floating point numbers
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Floats are stored in the 5-byte 'MFLPT' format that is used on CBM machines,
 | 
						|
and also most float operations are specific to the Commodore-64.
 | 
						|
This is because routines in the C-64 BASIC and KERNAL ROMs are used for that.
 | 
						|
So floating point operations will only work if the C-64 BASIC ROM (and KERNAL ROM)
 | 
						|
are banked in (and your code imports the ``c64lib.ill``)
 | 
						|
 | 
						|
The largest 5-byte MFLPT float that can be stored is: **1.7014118345e+38**   (negative: **-1.7014118345e+38**)
 | 
						|
 | 
						|
 | 
						|
Initial values across multiple runs of the program
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The initial values of your variables will be restored automatically when the program is (re)started,
 | 
						|
*except for string variables*. It is assumed these are left unchanged by the program.
 | 
						|
If you do modify them in-place, you should take care yourself that they work as
 | 
						|
expected when the program is restarted.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Indirect addressing and address-of
 | 
						|
----------------------------------
 | 
						|
 | 
						|
The ``#`` operator is used to take the address of the symbol following it.
 | 
						|
It can be used for example to work with the *address* of a memory mapped variable rather than
 | 
						|
the value it holds.  You could take the address of a string as well, but that is redundant:
 | 
						|
the compiler already treats those as a value that you manipulate via its address.
 | 
						|
For most other types this prefix is not supported and will result in a compilation error.
 | 
						|
The resulting value is simply a 16 bit word. Example::
 | 
						|
 | 
						|
	AX = #somevar
 | 
						|
 | 
						|
 | 
						|
**Indirect addressing:**
 | 
						|
@todo ???
 | 
						|
 | 
						|
**Indirect addressing in jumps:**
 | 
						|
@todo ???
 | 
						|
For an indirect ``goto`` statement, the compiler will issue the 6502 CPU's special instruction
 | 
						|
(``jmp`` indirect).  A subroutine call (``jsr`` indirect) is emitted
 | 
						|
using a couple of instructions.
 | 
						|
 | 
						|
 | 
						|
Loops
 | 
						|
-----
 | 
						|
 | 
						|
The *for*-loop is used to iterate over a range of values. Iteration is done in steps of 1, but you can change this.
 | 
						|
The *while*-loop is used to repeat a piece of code while a certain condition is still true.
 | 
						|
The *repeat--until* loop is used to repeat a piece of code until a certain condition is true.
 | 
						|
 | 
						|
You can also create loops by using the ``goto`` statement, but this should usually be avoided.
 | 
						|
 | 
						|
 | 
						|
Conditional Execution
 | 
						|
---------------------
 | 
						|
 | 
						|
.. todo::
 | 
						|
	not sure how to handle direct translation into
 | 
						|
	[cc, cs, vc, vs, eq, ne, true, not, zero, pos, neg, lt, gt, le, ge]
 | 
						|
	It defaults to 'true' (=='ne', not-zero) if omitted. ('pos' will translate into 'pl', 'neg' into 'mi')
 | 
						|
	@todo signed: lts==neg?, gts==eq+pos?, les==neg+eq?, ges==pos?
 | 
						|
 | 
						|
.. todo::
 | 
						|
	eventually allow local variable definitions inside the sub blocks but for now,
 | 
						|
	they have to use the same variables as the block the ``if`` statement itself is in.
 | 
						|
 | 
						|
 | 
						|
Conditional execution means that the flow of execution changes based on certiain conditions,
 | 
						|
rather than having fixed gotos or subroutine calls::
 | 
						|
 | 
						|
	if A > 4 goto overflow
 | 
						|
 | 
						|
	if X == 3 then Y = 4
 | 
						|
	if X == 3 then Y = 4 else A = 2
 | 
						|
 | 
						|
	if X == 5 {
 | 
						|
		Y = 99
 | 
						|
	} else {
 | 
						|
		A = 3
 | 
						|
	}
 | 
						|
 | 
						|
condition = arithmetic expression or  logical expression or  comparison expression or  status_register_flags ( ``SR.cs`` , ``SR.cc``, ``SR.pl`` etc... @todo )
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Conditional jumps are compiled into 6502's branching instructions (such as ``bne`` and ``bcc``) so
 | 
						|
the rather strict limit on how *far* it can jump applies. The compiler itself can't figure this
 | 
						|
out unfortunately, so it is entirely possible to create code that cannot be assembled successfully.
 | 
						|
You'll have to restructure your gotos in the code (place target labels closer to the branch)
 | 
						|
if you run into this type of assembler error.
 | 
						|
 | 
						|
 | 
						|
Assignments
 | 
						|
-----------
 | 
						|
 | 
						|
Assignment statements assign a single value to a target variable or memory location.
 | 
						|
Augmented assignments (such as ``A += X``) are also available, but these are just shorthands
 | 
						|
for normal assignments (``A = A + X``).
 | 
						|
 | 
						|
 | 
						|
Expressions
 | 
						|
-----------
 | 
						|
 | 
						|
In most places where a number or other value is expected, you can use just the number, or a full constant expression.
 | 
						|
The expression is parsed and evaluated by the compiler itself at compile time, and the (constant) resulting value is used in its place.
 | 
						|
Expressions can contain function calls to the math library (sin, cos, etc) and you can also use
 | 
						|
all builtin functions (max, avg, min, sum etc). They can also reference idendifiers defined elsewhere in your code,
 | 
						|
if this makes sense.
 | 
						|
 | 
						|
 | 
						|
Arithmetic and Logical expressions
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
Arithmetic expressions are expressions that calculate a numeric result (integer or floating point).
 | 
						|
Many common arithmetic operators can be used and follow the regular precedence rules.
 | 
						|
 | 
						|
Logical expressions are expressions that calculate a boolean result, true or false
 | 
						|
(which in IL65 will effectively be a 1 or 0 integer value).
 | 
						|
 | 
						|
You can use parentheses to group parts of an expresion to change the precedence.
 | 
						|
Usually the normal precedence rules apply (``*`` goes before ``+`` etc.) but subexpressions
 | 
						|
within parentheses will be evaluated first. So ``(4 + 8) * 2`` is 24 and not 20,
 | 
						|
and ``(true or false) and false`` is false instead of true.
 | 
						|
 | 
						|
 | 
						|
Subroutines
 | 
						|
-----------
 | 
						|
 | 
						|
Defining a subroutine
 | 
						|
^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
Subroutines are parts of the code that can be repeatedly invoked using a subroutine call from elsewhere.
 | 
						|
Their definition, using the sub statement, includes the specification of the required input- and output parameters.
 | 
						|
For now, only register based parameters are supported (A, X, Y and paired registers,
 | 
						|
the carry status bit SC and the interrupt disable bit SI as specials).
 | 
						|
For subroutine return values, the special SZ register is also available, it means the zero status bit.
 | 
						|
 | 
						|
 | 
						|
Calling a subroutine
 | 
						|
^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
The output variables must occur in the correct sequence of return registers as specified
 | 
						|
in the subroutine's definiton. It is possible to not specify any of them but the compiler
 | 
						|
will issue a warning then if the result values of a subroutine call are discarded.
 | 
						|
If you don't have a variable to store the output register in, it's then required
 | 
						|
to list the register itself instead as output variable.
 | 
						|
 | 
						|
Arguments should match the subroutine definition. You are allowed to omit the parameter names.
 | 
						|
If no definition is available (because you're directly calling memory or a label or something else),
 | 
						|
you can freely add arguments (but in this case they all have to be named).
 | 
						|
 | 
						|
To jump to a subroutine (without returning), prefix the subroutine call with the word 'goto'.
 | 
						|
Unlike gotos in other languages, here it take arguments as well, because it
 | 
						|
essentially is the same as calling a subroutine and only doing something different when it's finished.
 | 
						|
 | 
						|
**Register preserving calls:** use the ``!`` followed by a combination of A, X and Y (or followed
 | 
						|
by nothing, which is the same as AXY) to tell the compiler you want to preserve the origial
 | 
						|
value of the given registers after the subroutine call.  Otherwise, the subroutine may just
 | 
						|
as well clobber all three registers. Preserving the original values does result in some
 | 
						|
stack manipulation code to be inserted for every call like this, which can be quite slow.
 | 
						|
 | 
						|
 | 
						|
Built-in Functions
 | 
						|
------------------
 | 
						|
 | 
						|
The compiler has the following built-in functions that you can use in expressions:
 | 
						|
 | 
						|
sin(value)
 | 
						|
	Sine.
 | 
						|
 | 
						|
cos(value)
 | 
						|
	Cosine.
 | 
						|
 | 
						|
abs(value)
 | 
						|
	Absolute value.
 | 
						|
 | 
						|
acos(value)
 | 
						|
	Arccosine.
 | 
						|
 | 
						|
asin(value)
 | 
						|
	Arcsine.
 | 
						|
 | 
						|
tan(value)
 | 
						|
	Tangent.
 | 
						|
 | 
						|
atan(value)
 | 
						|
	Arctangent.
 | 
						|
 | 
						|
log(value)
 | 
						|
	Natural logarithm.
 | 
						|
 | 
						|
log10(value)
 | 
						|
	Base-10 logarithm.
 | 
						|
 | 
						|
sqrt(value)
 | 
						|
	Square root.
 | 
						|
 | 
						|
max(value [, value, ...])
 | 
						|
	Maximum of the values.
 | 
						|
 | 
						|
min(value [, value, ...])
 | 
						|
	Minumum of the values.
 | 
						|
 | 
						|
round(value)
 | 
						|
	Rounds the floating point to an integer.
 | 
						|
 | 
						|
rad(value)
 | 
						|
	Degrees to radians.
 | 
						|
 | 
						|
deg(value)
 | 
						|
	Radians to degrees.
 |