diff --git a/.gitignore b/.gitignore index 95617e58a..e1e3951d5 100644 --- a/.gitignore +++ b/.gitignore @@ -7,13 +7,12 @@ /build/ /dist/ /output/ -.cache/ +.*cache/ .eggs/ *.directory *.prg *.asm *.labels.txt -.mypy_cache/ __pycache__/ parser.out parsetab.py diff --git a/LICENSE b/LICENSE index 94a9ed024..cda76ebf4 100644 --- a/LICENSE +++ b/LICENSE @@ -127,7 +127,7 @@ Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component -(kernel, window system, and so on) of the specific operating system +(kernal, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. diff --git a/README.md b/README.md index a9760f60b..9fdc6a1df 100644 --- a/README.md +++ b/README.md @@ -21,13 +21,12 @@ which aims to provide many conveniences over raw assembly code (even when using - automatic type conversions - floating point operations - optional automatic preserving and restoring CPU registers state, when calling routines that otherwise would clobber these -- abstracting away low level aspects such as zero page handling, program startup, explicit memory addresses +- abstracting away low level aspects such as ZeroPage handling, program startup, explicit memory addresses - breakpoints, that let the Vice emulator drop into the monitor if execution hits them - source code labels automatically loaded in Vice emulator so it can show them in disassembly - conditional gotos - various code optimizations (code structure, logical and numerical expressions, ...) -- @todo: loops -- @todo: memory block operations + It still allows for low level programming however and inline assembly blocks to write performance critical pieces of code, but otherwise compiles fairly straightforwardly diff --git a/reference.md b/docs/programming.md similarity index 54% rename from reference.md rename to docs/programming.md index 59f7070ab..b786df6d2 100644 --- a/reference.md +++ b/docs/programming.md @@ -1,133 +1,171 @@ -IL65 / 'Sick' - Experimental Programming Language for 8-bit 6502/6510 microprocessors -===================================================================================== +What is a Program? +------------------ -*Written by Irmen de Jong (irmen@razorvine.net)* +A "complete program" is a compiled, assembled, and linked together single unit. +It contains all of the program's code and data and has a certain file format that +allows it to be loaded directly on the target system. -*Software license: GNU GPL 3.0, see file LICENSE* +Most programs will need a tiny BASIC launcher that does a SYS into the generated machine code, +but it is also possible to output just binary programs that can be loaded into memory elsewhere. + +Compiling a program +------------------- + +Compilation of a program is done by compiling just the main source code module file. +Other modules that the code needs can be imported from within the file. +The compiler will eventually link them together into one output program. + +Program code structure +---------------------- + +A program is created by compiling and linking *one or more module source code files*. + +### Module file + +This is a file with the ``.ill`` suffix, without spaces in its name, containing: +- source code comments +- global program options +- imports of other modules +- one or more *code blocks* + +The filename doesn't really matter as long as it doesn't contain spaces. +The full name of the symbols defined in the file is not impacted by the filename. + +#### Source code comments + + A=5 ; set the initial value to 5 + ; next is the code that... + +In any file, everything after a semicolon '``;``' is considered a comment and is ignored by the compiler. +If all of the line is just a comment, it will be copied into the resulting assembly source code. +This makes it easier to understand and relate the generated code. -This is an experimental programming language for the 8-bit 6502/6510 microprocessor from the late 1970's and 1980's -as used in many home computers from that era. IL65 is a medium to low level programming language, -which aims to provide many conveniences over raw assembly code (even when using a macro assembler): +#### Things global to the program -- reduction of source code length -- easier program understanding (because it's higher level, and more terse) -- option to automatically run the compiled program in the Vice emulator -- modularity, symbol scoping, subroutines -- subroutines have enforced input- and output parameter definitions -- various data types other than just bytes (16-bit words, floats, strings, 16-bit register pairs) -- automatic variable allocations, automatic string variables and string sharing -- constant folding in expressions (compile-time evaluation) -- automatic type conversions -- floating point operations -- optional automatic preserving and restoring CPU registers state, when calling routines that otherwise would clobber these -- abstracting away low level aspects such as zero page handling, program startup, explicit memory addresses -- breakpoints, that let the Vice emulator drop into the monitor if execution hits them -- source code labels automatically loaded in Vice emulator so it can show them in disassembly -- conditional gotos -- various code optimizations (code structure, logical and numerical expressions, ...) -- @todo: loops -- @todo: memory block operations +The global program options that can be put at the top of a module file, +determine the settings for the entire output program. +They're all optional (defaults will be chosen as mentioned below). +If specified though, they can only occur once in the entire program: + + %output prg + %address $0801 + %launcher none + %zp compatible -It still allows for low level programming however and inline assembly blocks -to write performance critical pieces of code, but otherwise compiles fairly straightforwardly -into 6502 assembly code. This resulting code is assembled into a binary program by using -an external macro assembler, [64tass](https://sourceforge.net/projects/tass64/). -It can be compiled pretty easily for various platforms (Linux, Mac OS, Windows) or just ask me -to provide a small precompiled executable if you need that. -You need [Python 3.5](https://www.python.org/downloads/) or newer to run IL65 itself. +##### ``%output`` : select output format of the program +- ``raw`` : no header at all, just the raw machine code data +- ``prg`` : C64 program (with load address header) -IL65 is mainly targeted at the Commodore-64 machine, but should be mostly system independent. +The default is ``prg``. -Memory Model ------------- +##### ``%address`` : specify start address of the code -Most of the 64 kilobyte address space can be accessed by your program. - -| type | memory area | note | -|-----------------|-------------------------|-----------------------------------------------------------------| -| Zero page | ``$00`` - ``$ff`` | contains many sensitive system variables | -| Hardware stack | ``$100`` - ``$1ff`` | is used by the CPU and should normally not be accessed directly | -| Free RAM or ROM | ``$0200`` - ``$ffff`` | free to use memory area, often a mix of RAM and ROM | +- default for ``raw`` output is $c000 +- default for ``prg`` output is $0801 +- cannot be changed if you select ``prg`` with a ``basic`` launcher; + then it is always $081d (immediately after the BASIC program), and the BASIC program itself is always at $0801. + This is because the C64 expects BASIC programs to start at this address. -A few memory addresses are reserved and cannot be used freely by your own code, -because they have a special hardware function, or are reserved for internal use by the compiler: +##### ``%launcher`` : specify launcher type -| reserved | address | -|----------------|------------| -| data direction | ``$00`` | -| bank select | ``$01`` | -| scratch var #1 | ``$02`` | -| scratch var #2 | ``$03`` | -| NMI vector | ``$fffa`` | -| RESET vector | ``$fffc`` | -| IRQ vector | ``$fffe`` | - -A particular 6502/6510 machine such as the Commodore-64 will have many other special addresses due to: - -- ROMs installed in the machine (BASIC, kernel and character generator roms) -- memory-mapped I/O registers (for the video and sound chip for example) -- RAM areas used for screen graphics and sprite data. +Only relevant when using the ``prg`` output type. Defaults to ``basic``. +- ``basic`` : add a tiny C64 BASIC program, whith a SYS statement calling into the machine code +- ``none`` : no launcher logic is added at all -### Usable Hardware Registers +##### ``%zp`` : select ZeroPage behavior -The following 6502 hardware registers are directly accessible in your code (and are reserved symbols): - -- ``A``, ``X``, ``Y`` -- ``AX``, ``AY``, ``XY`` (surrogate registers: 16-bit combined register pairs in LSB byte order lo/hi) -- ``SC`` (status register's Carry flag) -- ``SI`` (status register's Interrupt Disable flag) +- ``compatible`` : only use a few free locations in the ZP +- ``full`` : use the whole ZP for variables, makes the program faster but can't use BASIC or KERNAL routines anymore, and cannot exit cleanly +- ``full-restore`` : like ``full``, but makes a backup copy of the original values at program start. + These are restored when exiting the program back to the BASIC prompt + +Defaults to ``compatible``. +The exact meaning of these options can be found in the paragraph +about the ZeroPage in the system documentation. -### Zero Page ("ZP") +##### Program Start and Entry Point -The zero page locations ``$02`` - ``$ff`` can be regarded as 254 other registers because -they take less clock cycles to access and need fewer instruction bytes than access to other memory locations. -Theoretically you can use all of them in your program but there are a few limitations: -- several locations (``$02``, ``$03``, ``$fb - $fc``, ``$fd - $fe``) are reserved for internal use as scratch registers by IL65 -- most other addresses often are in use by the machine's operating system or kernal, - and overwriting them can crash the machine. Your program must take over the entire - system to be able to safely use all zero page locations. -- it's often more convenient to let IL65 allocate the particular locations for you and just - use symbolic names in your code. +Your program must have a single entry point where code execution begins. +The compiler expects a ``start`` subroutine in the ``main`` block for this, +taking no parameters and having no return value. +As any subroutine, it has to end with a ``return`` statement (or a ``goto`` call). -For the Commodore-64 here is a list of free-to-use zero page locations even when its BASIC and KERNAL are active: + ~ main { + sub start () -> () { + ; program entrypoint code here + return + } + } -``$02``; ``$03``; ``$04``; ``$05``; ``$06``; ``$2a``; ``$52``; -``$f7`` - ``$f8``; ``$f9`` - ``$fa``; ``$fb`` - ``$fc``; ``$fd`` - ``$fe`` +The ``main`` module is always relocated to the start of your programs +address space, and the ``start`` subroutine (the entrypoint) will be on the +first address. This will also be the address that the BASIC loader program (if generated) +calls with the SYS statement. -The four reserved locations mentioned above are subtracted from this set, leaving you with -five 1-byte and two 2-byte usable zero page registers. -IL65 knows about all this: it will use the above zero page locations to place its ZP variables in, -until they're all used up. You can instruct it to treat your program as taking over the entire -machine, in which case (almost) all of the zero page locations are suddenly available for variables. -IL65 can generate a special routine that saves and restores the zero page to let your program run -and return safely back to the system afterwards - you don't have to take care of that yourself. - -**IRQ and the Zero page:** - -The normal IRQ routine in the C-64's kernal will read and write several locations in the zero page: - -``$a0 - $a2``; ``$91``; ``$c0``; ``$c5``; ``$cb``; ``$f5 - $f6`` - -These locations will not be used by the compiler for zero page variables, so your variables will -not interfere with the IRQ routine and vice versa. This is true for the normal zp mode but also -for the mode where the whole zp has been taken over. So the normal IRQ vector is still -running when your program is entered, even when you use ``%zp clobber``. +Blocks and subroutines are explained below. -@todo: some global way (in ZP block) to promote certian other blocks/variables from that block or even -subroutine to the zeropage. Don't do this in the block itself because it's a global optimization -and if blocks require it themselves you can't combine various modules anymore once ZP runs out. - +#### Using other modules via import + +Immediately following the global program options at the top of the module file, +the imports of other modules are placed: + +``%import filename`` + +This reads and compiles the named module source file as part of your current program. +Symbols from the imported module become available in your code, +without a module or filename prefix. +You can import modules one at a time, and importing a module more than once has no effect. -Data Types ----------- +#### Blocks, Scopes, and accessing Symbols + +Blocks are the separate pieces of code and data of your program. They are combined +into a single output program. No code or data can occur outside a block. + + ~ blockname [address] { + [directives...] + [variables...] + [subroutines...] + } + +Block names must be unique in your entire program. +It's possible to omit the blockname, but then you can only refer to the contents of the block via its absolute address, +which is required in this case. If you omit *both* name and address, the block is *ignored* by the compiler (and a warning is displayed). +This is a way to quickly "comment out" a piece of code that is unfinshed or may contain errors that you +want to work on later, because the contents of the ignored block are not fully parsed either. + +The address can be used to place a block at a specific location in memory. +Otherwise the compiler will automatically choose the location (usually immediately after +the previous block in memory). +The address must be >= $0200 (because $00-$fff is the ZP and $100-$200 is the cpu stack). + +A block is also a *scope* in your program so the symbols in the block don't clash with +symbols of the same name defined elsewhere in the same file or in another file. +You can refer to the symbols in a particular block by using a *dotted name*: ``blockname.symbolname``. +Labels inside a subroutine are appended again to that; ``blockname.subroutinename.label``. + +Every symbol is 'public' and can be accessed from elsewhere given its dotted name. + + +**The special "ZP" ZeroPage block** + +Blocks named "ZP" are treated a bit differently: they refer to the ZeroPage. +The contents of every block with that name (this one may occur multiple times) are merged into one. +Its start address is always set to $04, because $00/$01 are used by the hardware +and $02/$03 are reserved as general purpose scratch registers. + + +Code elements +------------- + +### Data types for Variables and Values IL65 supports the following data types: @@ -162,9 +200,17 @@ routines in the C-64 BASIC and KERNAL ROMs are used. So floating point operations will only work if the C-64 BASIC ROM (and KERNAL ROM) are banked in, and your code imports the ``c654lib.ill``. The largest 5-byte MFLPT float that can be stored is: 1.7014118345e+38 (negative: -1.7014118345e+38) +The initial values of your variables will be restored automatically when the program is (re)started, +*except for string variables*. It is assumed these are left unchanged by the program. +If you do modify them in-place, you should take care yourself that they work as +expected when the program is restarted. -Indirect addressing and address-of ----------------------------------- + +@todo pointers/addresses? (as opposed to normal WORDs) +@todo signed integers (byte and word)? + + +### Indirect addressing and address-of **Address-of:** The ``#`` prefix is used to take the address of something. This is sometimes useful, @@ -183,68 +229,33 @@ For an indirect goto call, the 6502 CPU has a special instruction using a couple of instructions. -Program Structure ------------------ +### Conditional Execution -In IL65 every line in the source file can only contain *one* statement or definitons. -Compilation is done on *one* main source code file, but other files can be imported. -A source file can start with global *directives* (starting with ``%``) and continues -with imports and block definitions. +Conditional execution means that the flow of execution changes based on certiain conditions, +rather than having fixed gotos or subroutine calls. IL65 has a *conditional goto* statement for this, +that is translated into a comparison (if needed) and then a conditional branch instruction: + + if[_XX] [] goto