diff --git a/README.md b/README.md index 034e7c715..715b91fab 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,37 @@ -# IL65 - codename 'Sick' +IL65 / 'Sick' - Experimental Programming Language for 8-bit 6502/6510 microprocessors +===================================================================================== -Intermediate Language for the 8-bit 6502/6510 microprocessors. -Mainly targeted at the Commodore-64, but should be system independent. +*Written by Irmen de Jong (irmen@razorvine.net)* -Work in progress. +*Software license: GNU GPL 3.0, see LICENSE* + + +This is an experimental programming language for the 8-bit 6502/6510 microprocessor from the late 1970's and 1980's +as used in many home computers from that era. IL65 is a medium to low level programming language, +which aims to provide many conveniences over raw assembly code (even when using a macro assembler): + +- reduction of source code length +- easier program understanding (because it's higher level, and more terse) +- modularity, symbol scoping, subroutines +- subroutines have enforced input- and output parameter definitions +- automatic variable allocations +- various data types other than just bytes +- automatic type conversions +- floating point operations +- automatically preserving and restoring CPU registers state, when calling routines that otherwise would clobber these +- abstracting away low level aspects such as zero page handling, program startup, explicit memory addresses +- @todo: conditionals and loops +- @todo: memory block operations + +It still allows for low level programming however and inline assembly blocks +to write performance critical pieces of code, but otherwise compiles fairly straightforwardly +into 6502 assembly code. This resulting code is assembled into a binary program by using +an external macro assembler, [64tass](https://sourceforge.net/projects/tass64/). +It can be compiled pretty easily for various platforms (Linux, Mac OS, Windows) or just ask me +to provide a small precompiled executable if you need that. +You need [Python 3.5](https://www.python.org/downloads/) or newer to run IL65 itself. + +IL65 is mainly targeted at the Commodore-64 machine, but should be mostly system independent. + + +See [the reference document](reference.md) for detailed information. diff --git a/il65/__init__.py b/il65/__init__.py index f688bb4c8..04f92b363 100644 --- a/il65/__init__.py +++ b/il65/__init__.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors Written by Irmen de Jong (irmen@razorvine.net) License: GNU GPL 3.0, see LICENSE diff --git a/il65/__main__.py b/il65/__main__.py index 521cc308c..427977c50 100644 --- a/il65/__main__.py +++ b/il65/__main__.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors Written by Irmen de Jong (irmen@razorvine.net) License: GNU GPL 3.0, see LICENSE diff --git a/il65/astparse.py b/il65/astparse.py index 45734165d..ce2b44194 100644 --- a/il65/astparse.py +++ b/il65/astparse.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors This is the expression parser/evaluator. Written by Irmen de Jong (irmen@razorvine.net) diff --git a/il65/codegen.py b/il65/codegen.py index 3a903cd85..a4ce113ac 100644 --- a/il65/codegen.py +++ b/il65/codegen.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors, codename 'Sick' +Programming Language for 6502/6510 microprocessors, codename 'Sick' This is the assembly code generator (from the parse tree) Written by Irmen de Jong (irmen@razorvine.net) diff --git a/il65/main.py b/il65/main.py index 81db760fc..acd2ab9a7 100644 --- a/il65/main.py +++ b/il65/main.py @@ -1,7 +1,7 @@ #! /usr/bin/env python3 """ -Intermediate Language for 6502/6510 microprocessors, codename 'Sick' +Programming Language for 6502/6510 microprocessors, codename 'Sick' This is the main program that drives the rest. Written by Irmen de Jong (irmen@razorvine.net) diff --git a/il65/parse.py b/il65/parse.py index 905cba5bd..e15b9ab16 100644 --- a/il65/parse.py +++ b/il65/parse.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors This is the parser of the IL65 code, that generates a parse tree. Written by Irmen de Jong (irmen@razorvine.net) diff --git a/il65/preprocess.py b/il65/preprocess.py index a8d301073..fad782ee6 100644 --- a/il65/preprocess.py +++ b/il65/preprocess.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors This is the preprocessing parser of the IL65 code, that only generates a symbol table. Written by Irmen de Jong (irmen@razorvine.net) diff --git a/il65/symbols.py b/il65/symbols.py index f2fb2c354..116d3bac5 100644 --- a/il65/symbols.py +++ b/il65/symbols.py @@ -1,5 +1,5 @@ """ -Intermediate Language for 6502/6510 microprocessors +Programming Language for 6502/6510 microprocessors Here are the symbol (name) operations such as lookups, datatype definitions. Written by Irmen de Jong (irmen@razorvine.net) diff --git a/reference.txt b/reference.md similarity index 51% rename from reference.txt rename to reference.md index 6c931a88d..bf9ade286 100644 --- a/reference.txt +++ b/reference.md @@ -1,102 +1,182 @@ ------------------------------------------------------------- -il65 - "Intermediate Language for 6502/6510 microprocessors" ------------------------------------------------------------- - Written by Irmen de Jong (irmen@razorvine.net) - License: GNU GPL 3.0, see LICENSE ------------------------------------------------------------- +IL65 / 'Sick' - Experimental Programming Language for 8-bit 6502/6510 microprocessors +===================================================================================== + +*Written by Irmen de Jong (irmen@razorvine.net)* + +*Software license: GNU GPL 3.0, see LICENSE* -The python program parses it and generates 6502 assembler code. -It uses the 64tass macro cross assembler to assemble it into binary files. +This is an experimental programming language for the 8-bit 6502/6510 microprocessor from the late 1970's and 1980's +as used in many home computers from that era. IL65 is a medium to low level programming language, +which aims to provide many conveniences over raw assembly code (even when using a macro assembler): +- reduction of source code length +- easier program understanding (because it's higher level, and more terse) +- modularity, symbol scoping, subroutines +- subroutines have enforced input- and output parameter definitions +- automatic variable allocations +- various data types other than just bytes +- automatic type conversions +- floating point operations +- automatically preserving and restoring CPU registers state, when calling routines that otherwise would clobber these +- abstracting away low level aspects such as zero page handling, program startup, explicit memory addresses +- @todo: conditionals and loops +- @todo: memory block operations + +It still allows for low level programming however and inline assembly blocks +to write performance critical pieces of code, but otherwise compiles fairly straightforwardly +into 6502 assembly code. This resulting code is assembled into a binary program by using +an external macro assembler, [64tass](https://sourceforge.net/projects/tass64/). +It can be compiled pretty easily for various platforms (Linux, Mac OS, Windows) or just ask me +to provide a small precompiled executable if you need that. +You need [Python 3.5](https://www.python.org/downloads/) or newer to run IL65 itself. + +IL65 is mainly targeted at the Commodore-64 machine, but should be mostly system independent. MEMORY MODEL ------------ -Zero page: $00 - $ff -Hardware stack: $100 - $1ff -Free RAM/ROM: $0200 - $ffff +Most of the 64 kilobyte address space can be accessed by your program. -Reserved: - -data direction $00 -bank select $01 -NMI VECTOR $fffa -RESET VECTOR $fffc -IRQ VECTOR $fffe - -A particular 6502/6510 machine such as the Commodore-64 will have many other -special addresses due to: - - ROMs installed in the machine (basic, kernel and character generator roms) - - memory-mapped I/O registers (for the video and sound chip for example) - - RAM areas used for screen graphics and sprite data. +| type | memory area | note | +|-----------------|-------------------------|-----------------------------------------------------------------| +| Zero page | ``$00`` - ``$ff`` | contains many sensitive system variables | +| Hardware stack | ``$100`` - ``$1ff`` | is used by the CPU and should normally not be accessed directly | +| Free RAM or ROM | ``$0200`` - ``$ffff`` | free to use memory area, often a mix of RAM and ROM | -Usable Hardware registers: - A, X, Y, - AX, AY, XY (16-bit combined register pairs) - SC (status register Carry flag) - These cannot occur as variable names - they will always refer to the hardware registers. +A few memory addresses are reserved and cannot be used freely by your own code, +because they have a special hardware function, or are reserved for internal use by the compiler: + +| reserved | address | +|----------------|------------| +| data direction | ``$00`` | +| bank select | ``$01`` | +| scratch var #1 | ``$02`` | +| scratch var #2 | ``$03`` | +| NMI vector | ``$fffa`` | +| RESET vector | ``$fffc`` | +| IRQ vector | ``$fffe`` | + +A particular 6502/6510 machine such as the Commodore-64 will have many other special addresses due to: + +- ROMs installed in the machine (BASIC, kernel and character generator roms) +- memory-mapped I/O registers (for the video and sound chip for example) +- RAM areas used for screen graphics and sprite data. -The zero page locations $02-$ff can be regarded as 254 other registers. -Free zero page addresses on the C-64: - $02,$03 # reserved as scratch addresses - $04,$05 - $06 - $0a - $2a - $52 - $93 - $f7,$f8 - $f9,$fa - $fb,$fc - $fd,$fe +### Usable Hardware Registers +The following 6502 hardware registers are directly accessible in your code (and are reserved symbols): + +- ``A``, ``X``, ``Y`` +- ``AX``, ``AY``, ``XY`` (surrogate registers: 16-bit combined register pairs in LSB byte order lo/hi) +- ``SC`` (status register's Carry flag) + + +### Zero Page ("ZP") + +The zero page locations ``$02`` - ``$ff`` can be regarded as 254 other registers because +they take less clock cycles to access and need fewer instruction bytes than access to other memory locations. +Theoretically you can use all of them in your program but there are a few limitations: +- ``$02`` and ``$03`` are reserved for internal use as scratch registers by IL65 +- most other addresses often are in use by the machine's operating system or kernal, + and overwriting them can crash the machine. Your program must take over the entire + system to be able to safely use all zero page locations. +- it's often more convenient to let IL65 allocate the particular locations for you and just + use symbolic names in your code. + +For the Commodore-64 here is a list of free-to-use zero page locations even when its BASIC and KERNAL are active: + +``$02`` - ``$03`` (but see remark above); ``$04`` - ``$05``; ``$06``; +``$0a``; ``$2a``; ``$52``; ``$93``; +``$f7`` - ``$f8``; ``$f9`` - ``$fa``; ``$fb`` - ``$fc``; ``$fd`` - ``$fe`` + +IL65 knows about all this: it will use the above zero page locations to place its ZP variables in, +until they're all used up. You can instruct it to treat your program as taking over the entire +machine, in which case all of the zero page locations are suddenly available for variables. +IL65 can generate a special routine that saves and restores the zero page to let your program run +and return safely back to the system afterwards - you don't have to take care of that yourself. + + +DATA TYPES +---------- + +IL65 supports the following data types: + +| type | size | type identifier | example | +|--------------------|------------|-----------------|---------------------------------------------------| +| (unsigned) byte | 8 bits | ``.byte`` | ``$8f`` | +| (unsigned) integer | 16 bits | ``.word`` | ``$8fee`` | +| boolean | 1 byte | | ``true``, ``false`` (aliases for the numeric values 1 and 0) | +| character | 1 byte | | ``'@'`` (converted to a numeric byte) | +| floating-point | 40 bits | ``.float`` | ``1.2345`` (stored in 5-byte cbm MFLPT format) | +| string | variable | ``.text``, ``.stext`` | ``"hello."`` (implicitly terminated by a 0-byte) | +| pascal-string | variable | ``.ptext``, ``.pstext`` | ``"hello."`` (implicit first byte = length, no 0-byte | +| address-of | 16 bits | | ``#variable`` | +| indirect | variable | | ``[ address ]`` | + +Strings can be writen in your code as CBM PETSCII or as C-64 screencode variants, +these will be translated by the compiler. PETSCII is the default, if you need screencodes you +have to use the ``s`` variants of the type identifier. + +For many floating point operations, the compiler has to use routines in the C-64 BASIC and KERNAL ROMs. +So they will only work if the BASIC ROM (and KERNAL ROM) are banked in, and your code imports the ``c654lib.ill``. + +The largest 5-byte MFLPT float that can be stored is: 1.7014118345e+38 (negative: -1.7014118345e+38) + +The ``#`` prefix is used to take the address of something. This is sometimes useful, +for instance when you want to manipulate the *address* of a memory mapped variable rather than +the value it represents. You can take the address of a string as well, but the compiler already +treats those as a value that you manipulate via its address, so the ``#`` is ignored here. +For most other types this prefix is not supported. + +**Indirect addressing:** The ``[address]`` syntax means: the contents of the memory at address, or "indirect addressing". +By default, if not otherwise known, a single byte is assumed. You can add the ``.byte`` or ``.word`` or ``.float`` +type identifier suffix to make it clear what data type the address points to. +This addressing mode is only supported for constant (integer) addresses and not for variable types, +unless it is part of a subroutine call statement. For an indirect goto call, the 6502 CPU has a special opcode +(``jmp`` indirect) and an indirect subroutine call (``jsr`` indirect) is synthesized using a couple of instructions. PROGRAM STRUCTURE ----------------- +In IL65 every line in the source file can only contain *one* statement or declaration. +Compilation is done on *one* main source code file, but other files can be imported. + +### Comments -OUTPUT MODES: -------------- -output raw ; no load address bytes -output prg ; include the first two load address bytes, (default is $0801), no basic program -output prg,sys ; include the first two load address bytes, basic start program with sys call to code, default code start - ; immediately after the basic program at $081d, or beyond. - -address $0801 ; override program start address (default is set to $c000 for raw mode and $0801 for c-64 prg mode) - ; cannot be used if output mode is prg,sys because basic programs always have to start at $0801 +Everything after a semicolon '``;``' is a comment and is ignored. +If the comment is the only thing on the line, it is copied into the resulting assembly source code. +This makes it easier to understand and relate the generated code. -data types: - byte 8 bits $8f (unsigned, @todo signed bytes) - int 16 bits $8fee (unsigned, @todo signed ints) - bool true/false (aliases for the integer values 1 and 0, not a true datatype by itself) - char '@' (converted to a byte) - float 40 bits 1.2345 (stored in 5-byte cbm MFLPT format) - @todo 24 and 32 bits integers, unsigned and signed? - string 0-terminated sequence of bytes "hello." (implicit 0-termination byte) - pstring sequence of bytes where first byte is the length. (no 0-termination byte) - For strings, both petscii and screencode variants can be written in source, they will be translated at compile/assembler time. +### Output Modes + +The default format of the generated program is a "raw" binary where code starts at ``$c000``. +You can generate other types of programs as well, by telling IL65 via an output mode statement +at the beginning of your program: + +| mode declaration | meaning | +|--------------------|------------------------------------------------------------------------------------| +| ``output raw`` | no load address bytes | +| ``output prg`` | include the first two load address bytes, (default is ``$0801``), no BASIC program | +| ``output prg,sys`` | include the first two load address bytes, BASIC start program with sys call to code, default code start is immediately after the BASIC program at ``$081d``, or beyond | +| | | +| ``address $0801`` | override program start address (default is set to ``$c000`` for raw mode and ``$0801`` for C-64 prg mode). Cannot be used if output mode is ``prg,sys`` because BASIC programs always have to start at ``$0801``. | - Note: for many floating point operations, the compiler uses routines in the C64 BASIC and KERNAL ROMs. - So they will only work if the BASIC ROM (and KERNAL ROM) are banked in. - largest 5-byte MFLPT float: 1.7014118345e+38 (negative: -1.7014118345e+38) +### Program Entry Point +Every program has to have one entry point where code execution begins. +The compiler looks for the ``start`` label in the ``main`` block for this. +For proper program termination, this block has to end with a ``return`` statement (or a ``goto`` call). +Blocks and other details are described below. + - Note: with the # prefix you can take the address of something. This is sometimes useful, - for instance when you want to manipulate the ADDRESS of a memory mapped variable rather than - the value it represents. You can take the address of a string as well, but the compiler already - treats those as a value that you manipulate via its address, so the # is ignored here. - - - -BLOCKS ------- +### Blocks ~ blockname [address] { statements @@ -112,8 +192,7 @@ You can omit the blockname but then you can only refer to the contents of the bl which is required in this case. If you omit both, the block is ignored altogether (and a warning is displayed). -IMPORTING, INCLUDING and BINARY-INCLUDING files ------------------------------------------------ +### Importing, Including and Binary-Including Files import "filename[.ill]" Can only be used outside of a block (usually at the top of your file). @@ -132,8 +211,8 @@ asmbinary "filename.bin" [, [, ]] -ASSIGNMENTS ------------ +### Assignments + Assignment statements assign a single value to one or more variables or memory locations. If you know that you have to assign the same value to more than one thing at once, it is more efficient to write it as a multi-assign instead of several separate assignments. The compiler @@ -145,8 +224,7 @@ tries to detect this situation however and optimize it itself if it finds the ca -EXPRESSIONS ------------ +### Expressions In most places where a number or other value is expected, you can use just the number, or a full constant expression. The expression is parsed and evaluated by Python itself at compile time, and the (constant) resulting value is used in its place. @@ -156,20 +234,9 @@ Expressions can contain function calls to the math library (sin, cos, etc) and y all builtin functions (max, avg, min, sum etc). They can also reference idendifiers defined elsewhere in your code, if this makes sense. -The syntax "[address]" means: the contents of the memory at address, or "indirect addressing". -By default, if not otherwise known, a single byte is assumed. You can add the ".byte" or ".word" or ".float" suffix -to make it clear what data type the address points to. -This addressing mode is only supported for constant (integer) addresses and not for variable types, -unless it is part of a subroutine call statement. For an indirect goto call, the 6502 CPU has a special opcode -(JMP indirect) and an indirect subroutine call (JSR indirect) is synthesized using a couple of instructions. - -Everything after a semicolon ';' is a comment and is ignored, however the comment (if it is the only thing -on the line) is copied into the resulting assembly source code. - -SUBROUTINES DEFINITIONS ------------------------ +### Subroutine Definition Subroutines are parts of the code that can be repeatedly invoked using a subroutine call from elsewhere. Their definition, using the sub statement, includes the specification of the required input- and output parameters. @@ -197,8 +264,7 @@ but instead assign a memory address to it: example: "sub CLOSE (logical: A) -> (A?, X?, Y?) = $FFC3" -SUBROUTINE CALLS ----------------- +### Subroutine Calling You call a subroutine like this: subroutinename_or_address [!] ( [arguments...] ) @@ -219,9 +285,10 @@ essentially is the same as calling a subroutine and only doing something differe @todo support assigning call return values (so that you can assign these to other variables, and allows the subroutine call be an actual expression) +TODOS +----- -FLOW CONTROL ------------- +### Flow Control Required building blocks: additional forms of 'go' statement: including an if clause, comparison statement. @@ -245,8 +312,8 @@ Required building blocks: additional forms of 'go' statement: including an if cl -IF_XX: ------- +### IF_XX: + if[_XX] [] { ... } @@ -272,8 +339,7 @@ il65_if_999 ... (true part) il65_if_999_end ; code continues after this -IF X Y: ------------------------ +### IF X Y: ==> DESUGARING ==> compare X, Y @@ -283,9 +349,9 @@ IF X Y: +### While + -WHILE: ------- while[_XX] { ... continue @@ -304,9 +370,7 @@ il65_while_999_check il65_while_999_end ; code continues after this - -REPEAT: ------- +### Repeat repeat { ... @@ -324,9 +388,7 @@ il65_repeat_999 il65_repeat_999_end ; code continues after this - -FOR: ----- +### For for = to [step ] { ... @@ -357,8 +419,7 @@ il65_for_999_end ; code continues after this -MACROS ------- +### Macros @todo macros are meta-code (written in Python syntax) that actually runs in a preprecessing step during the compilation, and produces output value that is then replaced on that point in the input source. @@ -367,8 +428,7 @@ Allows us to create pre calculated sine tables and such. Something like: var .array sinetable ``[sin(x) * 10 for x in range(100)]`` -MEMORY BLOCK OPERATIONS ------------------------ +### Memory Block Operations @todo matrix,list,string memory block operations: - matrix type operations (whole matrix, per row, per column, individual row/column) @@ -389,16 +449,20 @@ these should call (or emit inline) optimized pieces of assembly code, so they ru -REGISTER PRESERVATION BLOCK: @todo (no)preserve ----------------------------- +### Register Preservation Block - preserve [regs] { .... } adds register preservation around the containing code default = all 3 regs, or specify which. - nopreserve [regs] { .... } removes register preservation on all statements in the block that would otherwise have it. +preserve [regs] { .... } adds register preservation around the containing code default = all 3 regs, or specify which. +nopreserve [regs] { .... } removes register preservation on all statements in the block that would otherwise have it. +### Bitmap Definition (for Sprites and Characters) -@todo BITMAP DEFINITIONS: to define CHARACTERS (8x8 monochrome or 4x8 multicolor = 8 bytes) --> PLACE in memory on correct address (???k aligned) and SPRITES (24x21 monochrome or 12x21 multicolor = 63 bytes) --> PLACE in memory on correct address (base+sprite pointer, 64-byte aligned) + + +### More Datatypes + +@todo 24 and 32 bits integers, unsigned and signed?