EightBall is an interpreter and bytecode compiler for a novel structured programming language. It runs on a number of 6502-based vintage systems and may also be compiled as a 32 bit Linux executable. The system also includes a simple line editor and the EightBall Virtual Machine, which runs the bytecode generated by the compiler.
EightBall tries to form a balance of the following qualities, in 20K or so of 6502 code:
* Statically typed
* Provides facilities which encourage structured programming ...
* ... Yet makes it easy to fiddle with hardware (PEEK and POKE and bit twiddling)
* Keep the language as simple and small as possible ...
* ... While providing powerful language primitives and encapsulation in subroutines, allowing the language to be extended by writing EightBall routines
With some small modifications, the code could also be built for any 6502-based system supported by the `cc65` compiler. For the interpreter/compiler program, upper and lower case text support is required (so Apple II/II+ would need an 80 column card.) The virtual machine program does not necessarily require lower case (if you do not use it in your EightBall code.)
There are executables and disk images available to download for Apple II, Commodore 64 and VIC-20. These may be run on real hardware or one of the many emulators that are available.
To run the main EightBall executable, which includes the line editor, interpreter and bytecode compiler, choose to start `EB.SYSTEM` from within the ProDOS launcher.
If you then invoke the EightBall Virtual Machine `EBVM.SYSTEM`, it will prompt you for the name of the bytecode file to load. Enter `test` at the prompt to run the code you just compiled. The VM is much faster than the interpreter.
The compiled code is written to the file `test` on the floppy diskette containing the EightBall system. (Note that if this file already exists an error will occur. This is a known deficiency which I will address in due course.)
If you then invoke the EightBall Virtual Machine `8BALLVM64.PRG`, it will prompt you for the name of the bytecode file to load. Enter `test` at the prompt to run the code you just compiled. The VM is much faster than the interpreter.
For the Commodore VIC20 (plus 32K expansion RAM), the file `eightball.d64` can be written to a real C1541 floppy, or to a solid state drive such as SD2IEC.
The compiled code is written to the file `bytecode` on the floppy diskette containing the EightBall system. (Note that if this file already exists an error will occur. This is a known deficiency which I will address in due course.)
Here is a simple test program you can enter to play with EightBall when getting started:
```
:i0
byte b=0
for b=1:10
pr.msg "Hello world ..."; pr.dec b; pr.nl
endfor
end
.
```
I have included the line editor commands to begin inserting text `:i0` and to leave the editor and return to the interpreter (a single period on its own.)
You can list the program using the `:l` (the letter Ell, not the number 1!) command and run it using the EightBall interpreter using the `run` command.
In order to build Apple diskette images I use the open source Apple Commander tool. ADTPro is an awesome tool for transferring disk images to a real Apple II via a serial (RS-232) cable.
Then, edit the `Makefile` to adjust the paths to point to your local installation of the cc65 compiler. If you wish to build disk images for Apple and Commodore machines, you will need to adjust the paths to point to your local installation of Apple Commander or VICE (for the `c1541` tool).
```
$ cd EightBall
$ vi Makefile
```
Once you are satisfied with the `Makefile`, building the software is simple:
```
$ make
```
This will build executables for Linux using `gcc` and for 6502 targets using `cc65`. The build targets are as follows:
Note that EightBall scripts on Commodore platforms must be encoded in PETSCII rather than ASCII. `unittest.8bp` is a PETSCII version of `unittest.8b` (created automatically using the Linux `tr` tool - see `Makefile` for details of how this is done.)
Look [here](#commodore-64) for further instructions.
There is a unit test script `unittest.8b` written in EightBall.
It is quite large so it does not load in all 8-bit platforms. Deleting the comments would help! However I usually test using the Linux EightBall environment, so large scripts are less of a problem. Currently the script loads and runs on C64, but not Apple II or VIC20 (due to lack of memory for the source code.)
The first four letters of the variable name are significant (this may be increased by changing `VARNUMCHARS` in `eightball.c`). Any letters after that are simply ignored by the parser.
If the initializer list is shorter than the number of elements in the array then the remaining elements are set to zero. The empty list initializes all elements to zero:
It is also possible to use string literals as array initializers. This is usually used with arrays of `byte` to initialize strings, for example:
byte msg[100] = "Please try again!"
The array `msg` will be initialized to the character values of the string literal, and a null terminator will be appended. Because strings are null-terminated, the string initializer can be no longer than the array size *minus one*:
Array dimensions must be known at compile time, but expressions made up of constants (both [defined constants](#defined-constants) and [literal constants](#literal-constants) are allowed for array dimensions and for the members of the initializer list (if any). This is allowed:
EightBall supports most of C's arithmetic, logical and bitwise operators. They have the same precedence as in C as well. Since the Commodore machines do not have all the ASCII character, some substitutions have been made (shown in parenthesis below.)
EightBall also implements 'star operators' for pointer dereferencing which will also be familiar to C programmers.
The `&` prefix operator returns a pointer to a variable which may be used to read and write the variable's contents. The operator may be applied to scalar variables, whole arrays and individual elements of arrays.
Note also that for arrays, evaluating just the array name with no index give the address of the start of the array. (This trick enables the array pass-by-reference feature to work.)
The following code will print "ALL THE SAME" on the console:
EightBall provides two 'star operators' which dereference pointers in a manner similar to the C star operator. One of these (`*`) operates on word values, the other (`^`) operates on byte values. Each of the operators may be used both for reading and writing through pointers.
Here is an example of a pointer to a word value:
word val = 0; ' Real value stored here
word addr = &val; ' Now addr points to val
*addr = 123; ' Now val is 123
pr.dec *addr; ' Recover the value via the pointer
pr.nl
Here is an example using a pointer to byte. This is similar to `PEEK` and `POKE` in BASIC.
word addr = $c000; ' addr points to hex $c000
byte val = ^addr; ' Read value from $c000 (PEEK)
^val = 0; ' Set value at $c000 to zero (POKE)
### Parenthesis
Parenthesis may be used to control the order of evaluation, for example:
pr.dec (10+2)*3; ' Prints 36
pr.dec 10+2*3; ' Prints 16
### Operator Precedence
| Precedence Level | Operators | Example | Example CBM |
Each subroutine has its own local variable scope. If a local variable is declared with the same name as a global variable, the global will not be available within the scope of the subroutine. When the subroutine returns, the local variables are destroyed.
Subroutines may take `byte` or `word` arguments, using the following syntax:
sub withArgs(byte flag, word val1, word val2)
' Do stuff
return 0
endsub
This could be called as follows:
word ww = 0; byte b = 0;
call withArgs(b, ww, ww+10)
When `withArgs` runs, the expression passed as the first argument (`b`) will be evaluated and the value assigned to the first formal argument `flag`, which will be created in the subroutine's local scope. Similarly, the second argument (`ww`) will be evaluated and the result assigned to `val1`. Finally, `ww+10` will be evaluated and assigned to `val2`.
Argument passing is by value, which means that `withArgs` can modify `flag`, `val1` or `val2` freely without the changes being visible to the caller.
Subroutines may be invoked within an expression. In this case, the subroutine is executed and the value returned is evaluated within the expression in which it appears.
Passing by reference allows a subroutine to modify a value passed to it. EightBall does this using pointers, in a manner that will be familiar to C programmers. Here is `adder` implemented using this pattern:
sub adder(word a, word b, word resptr)
*resptr = a+b
endsub
Then to call it:
word result
call adder(10, 20, &result)
This code takes the address of variable `result` using the ampersand operator and passes it to subroutine `adder` as `resptr`. The subroutine then uses the star operator to write the result of the addition of the first two arguments (10 + 20 in this example) to the word pointed to by `resptr`.
Unlike C, there are no special pointer types. Pointers must be stored in a `word` variable, since they do not fit in a `byte`. Pointers are dereferenced using the `*` operator to reference words or the `^` operator to reference bytes.
It is frequently useful to pass an array into a subroutine. It is not very useful to use pass by value for arrays, since this may mean copying a large object onto the stack. For these reasons, EightBall implements a special pass by reference mode for array variables, which operates in a manner similar to C.
Here is an example of a function which takes a regular variable and an array:
sub clearArray(byte arr[], word sz)
word i = 0
for i = 0 : sz-1
arr[i] = 0
endfor
endsub
This may be invoked like this:
word n = 10
byte A[n] = 99
call clearArray(A, n)
Note that the size of the array is not specified in the subroutine definition - any size array may be passed. Note also that the corresponding argument in the `call` is simply the array name (no [] or other annotation is permitted.)
This mechanism effectively passes a pointer to the array contents 'behind the scenes'.
### End Statement
The `end` statement marks the normal end of execution. This is often used to stop the flow of execution running off the end of the main program and into the subroutines (which causes an error):
Returns to ProDOS on Apple II, or to CBM BASIC on C64/VIC20.
### Clear Stored Program
new
### Clear All Variables
clear
### Show All Variables
vars
Variables are shown in tabular form. The letter 'b' indicates byte type, while 'w' indicates word type. For scalar variables, the value is shown. For arrays, the dimension(s) are shown.
### Show Free Space
free
The free space available for variables and for program text is shown on the console.
Allows a single character to be read from the keyboard. Be careful - this function assumes the argument passed to it a pointer to a byte value into which the character may be stored.
Allows a line of input to be read from the keyboard and to be stored to an array of byte values. This statement takes two arguments - the first is an array of byte values into which to write the string, the second is the maximum number of bytes to write.
Start inserting text before the specified line. The editor switches to insert mode, indicated by the '>' character (in inverse green on CBM). The following command will start inserting text at the beginning of an empty buffer:
:i0
>
One or more lines of code may then be entered. When you are done, enter a period '.' on a line on its own to return to EightBall immediate mode prompt.
Append is identical to the insert command described above, except that it starts inserting /after/ the specified line. This is often useful to adding lines following the end of an existing program.
This command allows an individual line to be replaced (like inserting a new line the deleting the old line). It is different to the insert and append commands in that the text is entered immediately following the command (not on a new line). For example:
:c21:word var1=12
will replace line 21 with `word var1=12`. Note the colon terminator following the line number.
Note that the syntax of this command is contrived to allow the CBM screen editor to work on listed output in a similar way to CBM BASIC. Code may be listed using the `:l` command and the screen may then be interactively edited using the cursor keys and return, just as in BASIC.
The EightBall Virtual Machine is a simple runtime VM for executing the bytecode produced by the EightBall compiler. The EightBall VM can run on 6502 systems (Apple II, Commodore VIC20, C64) or as a Linux process.
## How to use it?
The EightBall system is split into two separate executables:
- EightBall editor, interpreter and compiler
- EightBall VM, which runs the code built by the compiler
On Linux, the editor/interpreter/compiler is `eightball` and the Virtual Machine is `eightballvm`.
On Apple II ProDOS, the editor/interpreter/compiler is `eightball.system` and the VM is `8bvm.system`.
On Commodore VIC20, the editor/interpreter/compiler is `8ball20.prg` and the VM is `8ballvm20.prg`.
On Commodore C64, the editor/interpreter/compiler is `8ball64.prg` and the VM is `8ballvm64.prg`.
Here is how to use the compiler:
- Start the main EightBall editor/interpreter/compiler program.
- Write your program in the editor.
- Debug using the interpreter (`run` command).
- When it seems to work work okay, you can compile with the `comp` command.
The compiler will dump an assembly-style listing to the console and also write the VM bytecode to a binary file called `bytecode`. If all goes well, no inscrutable error messages will be displayed.
Then you can run the VM program for your platform. It will load the bytecode from the file `bytecode` and execute it. Running compiled code under the Virtual Machine is much faster than the interpreter (and also more memory efficient.)
The evaluation stack is used for all computations. The VM offers a variety of instructions for maniplating the evaluation stack. All calculations, regardless of the type of the variables involved, is performed using 16 bit arithmetic.
For shorthand, we define the names `X`, `Y`, `Z`, `T` for the top four slots in the evaluation stack. This notation is stolen from the world of HP RPN calculators.
Note that all the instructions with names ending in 'I' are so-called 'immediate mode' instructions. This means that the operand is the 16 bit word following the opcode, rather than the topmost element of the evaluation stack. The 'immediate mode' operand may be a data value or an address.
Relative mode instructions allow addressing relative to the frame pointer. This is helpful for easy access to local variables.
cc65 places the VM excutable code and static evaluation stack (32 bytes) in low memory. In an optimized virtual machine implementation, this would be placed in zero page.
These addresses are chosen to allow space for the EightBall VM executable, which loads below these addresses. These values can be tuned by inspecting the map files generated by cc65.
EightBall was first implemented as an interpreted language (although the language design was always intended to permit compilation.) The bytecode compiler and virtual machine were added with v0.5 in April 2018.
In order to use the least code possible, the compiler uses the same data structures as the interpreter, but in a different way.
### Interpreter Memory Organization
cc65 places the executable code of the EightBall line editor / interpreter / compiler in low memory.
There are two storage areas (or 'arenas') which are denoted as `HEAP1` and `HEAP2` in the `eightball.c` code. The historical origin of this organization is the fact that EightBall first originated as a language targetting the VIC20 with 32K expansion. In this configuration, there is an 8K memory block (starting at address $A000m referred to as BLK5 in the VIC20 design) which is not contiguous with the rest of RAM. For the VIC20, BLK5 was designated as `HEAP1` and the remainder of RAM (above the executable code) was designated `HEAP2`. For other 6502 architectures (Apple II, Commodore 64), the `HEAP1` / `HEAP2` arenas are maintained, but since there is no 'gap' in the memory map, the boundary between them may be adjusted to any arbitrary address.
The division of interpreter memory into two distinct blocks turns out to be quite useful, as we shall see below.
The source code of the program is stored in plain ASCII (or PETSCII on Commodore systems) text at the bottom of `HEAP2` immediately above the EightBall executable code (using routine `alloc2bttm()`). As more lines of source code are added, the it is added to the heap, growing upwards to higher addresses.
Note that the lower bounds of arena `HEAP2` have to be adjusted by hand in `eightball.c` when the code changes size. The size of the code segments generated by cc65 can be determined by inspecting the map file created by the compiler.
Global and local variables are allocated at the top of `HEAP1`, from the highest available memory address down. For each variable a small `var_t` header is stored, consisting of the first four characters of the name, a byte which records whether it is a `byte` or `word` variable and also the number of dimensions. If the number of dimensions is zero then this indicates a scalar variable, otherwise it is an array of the specified number of elements. The `var_t` header also includes a two byte pointer to next, allowing them to be assembled into a linked list.
- Two byte pointer to a block of `sz` bytes for a `byte[sz]` array
- Two byte pointer to a block of `2*sz` bytes for a `word[sz]` array
Normally when a global or local array is allocated, the data block immediately follows. However the pointer to the data block is exploited to allow the 'array pass by reference' feature to be implemented. In this case, the `var_t` header and the two byte datablock pointer is copied into the local frame (the pointer still refers to the original datablock of the array passed by reference.)
#### Subroutines: Entry and Return, Local Variables and Parameters
The interpreter maintains a pointer to the beginning of the local stack frame (`varslocal`) as well as to the beginning of the list (`varsbegin`) which allows the global variables to be located. When operating at the global scope (ie: not within a subroutine) `varslocal` points to `varsbegin`.
When entering a subroutine a special `var_t` entry is made for a `word` variable using the otherwise illegal name `"----"` to mark the stack frame and this is pushed to the call stack. The value of this this variable is used to store the current value of `varslocal` (ie: the previous stack frame). This is used to unwind the stack when a subroutine exits.
Local variables are allocated on `HEAP1` in exactly the same way as globals. The variable search routine `getintvar()` knows to search the local variables and then (if within a subroutine) the globals also. The stack frame marks allow `getintvar()` to know where the globals end and the stack frame of the first subroutine begins.
The interpreter creates a local variable for each parameter, copying the value provided by the caller. Parameters behave exactly like local variables, because they *are* local variables like any other.
When leaving a subroutine with `return` or `endsub`, the interpreter uses the innermost stack frame (which, remember, records the stack frame of its calling subroutine) to unwind the stack. The local variables and the innermost stack frame are released and `varslocal` is set to point to the caller stack frame. Finally, the flow of control returns to the statement following the `call` (or the evaluation of the expression including the function continues, in the case of function invocation.)
The compiler shares most of the infrastructure with the interpreter. The source code of the program is obviously still stored at the bbottom of `HEAP2`.
The compiled bytecode is written to the beginning of `HEAP1`, starting from the lowest address and working up. Since no actual data is stored in `HEAP1` when compiling (only `var_t` headers and addresses), it is hoped that there will be enough space for the compiled code without having it collide with the symbol tables (which are stored from the top of `HEAP1` going down).
The main difference is that instead of storing global and local variables in `HEAP1`, the compiler uses the `var_t` data structures to keep track of the variable during compilation only - they serve as temporary symbol tables so the compiler can keep track of the address of all the variables in scope. Instead of the payload described above, the entries created by the compiler contain a pointer to the address of the variable in the virtual machine's address space.
Within the VM there is no 'management overhead' for storing variables - a `word` is always two bytes, a `byte` always one byte. All of the housekeeping takes place within the compiler (which has to keep track of the address of every variable in scope.)
The compiler has a simple allocator (managed by `rt_push_callstack()` and `rt_pop_callstack()`) that mimics the behaviour of the virtual machine, keeping track of the value of the stack pointer (SP). In the same way that the interpreter allocates all variables (global and local) on the call stack, the compiler uses the same strategy of allocating all variables on the call stack of the virtual machine ("VM call stack" from now on.) Since the compiler target memory allocator functions keep track of the VM SP register, the compiler is able to push values to the call stack and still know the addresses to be able to access them later. This can make the compiler output hard to read for humans however!
#### Subroutines: Entry and Return, Local Variables and Parameters
The EightBall Virtual Machine has a number of features which are intended to make it easier to implement subroutine call and return, argument passing etc. In particular, there is a special frame pointer (FP) register which is useful for easily accessing parameter and locals.
Before generating code to enter a subroutine, the compiler ensures code has been generated to evaluate any parameters and push the result to the call stack. Then the compiler emits a `JSR` instruction to call the subroutine entry point. The virtual machine will automatically store the return address on the VM call stack and the VM program counter will be set to the entry point.
On entry to the subroutine, the compiler will emit VM instruction `SPFP` which pushes the current value of the frame pointer (FP) to the VM call stack and copies the stack pointer (SP) to the frame pointer (FP). This sets up the call frame allowing us to easily refer to the parameters and the local variables.
The virtual machine makes this simple by providing special instructions `LDRW`, `LDRB`, `STRW` and `STRB` which load and store `word` and `byte` values to memory using addressing *relative* to the frame pointer FP. In this relative addressing mode, the parameters which were pushed to the call stack before entry have small *positive* valued addresses (FP + offset). Local variables are pushed to the call stack, which grows down as usual. As a result, the local variables will have small *negative* addresses relative to the frame pointer (FP - offset).
At the same time, absolute addressing via instructions `LDAW`, `LDAB`, `STAW` and `STAB` can be used to access the global variables.
On exit from the subroutine, the compiler emits code to evaluate the return value and leave it on the evaluation stack in the topmost slot (X). It then emits a `FPSP` instruction which copies the frame pointer (FP) to the stack pointer (SP) and restores the value of the frame pointer by popping a word from the call stack. Copying FP to SP has the effect of immediately releasing all of the space (local variables) allocated in the topmost stack frame. The stack pointer is then positioned to where the frame pointer is topmost, so it is available to be popped and restored to FP. The overall effect is to unwind the stack back to the calling stack frame.
The compiler also maintains a linked list of subroutine calls and a linked list of subroutine entry points which are used for the final step of compilation - internal linkage. Subroutine calls and entry points are both represented using records of type `sub_t`, each of which contain the first eight characters of the subroutine name, a two byte address pointer and a two byte pointer to the next record.
The compiler allocates these linked lists (anchored by `callsbegin` and `subsbegin`) at the end of `HEAP2`, growing down towards the source code, which grows up from the bottom of this same arena. The linked list of subroutine calls is freed as soon as compilation is completed.
When compiling EightBall code, there are instances where the generated code needs to jump or branch ahead, to some location within code that has yet to be generated. In this case, the compiler will emit the dummy address `$ffff` and will come back later to insert the correct address, once it is known. This is referred to as an "address fixup."
#### Conditionals / While loops
When compiling `if` / `endif` or `if` . `else`, `endif` conditionals, the compiler needs to generate code to branch forward to jump over the `if` or `else` code blocks. Similarly, for `while` / `endwhile` loops, the compiler needs to branch forward to jump over the loop body if the condition is false. In all these cases, the address fixup is computed when the destination code is generated.
#### Subroutine Calls
Another situation where address fixups are required is subroutine calls. When a subroutine is called, a new entry is recorded in the `callsbegin` linked list, containing the beginning of the subroutine name and a pointer to the VM address of the call address to be fixed up. When a subroutine definition is encountered, a new entry is recorded in the `subsbegin` linked list, again containing the subroutine name but this time with the address of the entry point.
The final step of compilation involves iterating through the `callsbegin` list, looking up each subroutine name in the `subsbegin` list. If the name is found, then the dummy `$ffff` at the fixup address is replaced with the entry point of the filename. Otherwise a linkage error is (cryptically) reported.
A `byte` variable is one byte everywhere. A `word` variable is two bytes everywhere, except in the Linux interpreter (where is is 32 bit word, 4 bytes.)
This example shows how EightBall can support recursion. I should point out that it is much better to do this kind of thing using iteration, but this is a fun simple example:
pr.dec fact(3); pr.nl
end
sub fact(word val)
pr.msg "fact("; pr.dec val; pr.msg ")"; pr.nl
if val == 0
return 1
else
return val * fact(val-1)
endif
endsub
`fact(3)` calls `fact(2)`, which calls `fact(1)`, then finally `fact(0)`.