Additional info on internals

This commit is contained in:
Bobbi Webber-Manners 2018-05-01 20:21:39 -04:00 committed by GitHub
parent f457b18edc
commit c7081c357a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -926,24 +926,55 @@ cc65 places the executable code of the EightBall line editor / interpreter / com
The source code of the program is stored in plain ASCII (or PETSCII on Commodore systems) text in a buffer immediately above the EightBall executable code. Note that the lower bounds of this buffer have to be adjusted by hand in `eightball.c` when the code changes size. The size of the code segments generated by cc65 can be determined by inspecting the map file created by the compiler. (This is HEAP2 in `eightball.c`).
Global and local variables are allocated from the highest available memory address down. (This is HEAP1 in `eightball.c`). For each variable a small `var_t` header is stored, consisting of the first four characters of the name, a byte which records whether it is a `byte` or `word` variable and also the number of dimensions. If the number of dimensions is zero then this indicates a scalar variable, otherwise it is an array of the specified number of elements. The `var_t` header also includes a two byte pointer to next, allowing them to be assembled into a linked list. Following the `var_t` header the actual variable data is stored:
#### Variables
Global and local variables are allocated from the highest available memory address down. (This is HEAP1 in `eightball.c`). For each variable a small `var_t` header is stored, consisting of the first four characters of the name, a byte which records whether it is a `byte` or `word` variable and also the number of dimensions. If the number of dimensions is zero then this indicates a scalar variable, otherwise it is an array of the specified number of elements. The `var_t` header also includes a two byte pointer to next, allowing them to be assembled into a linked list.
Following the `var_t` header the actual variable data is stored:
- One byte for a `byte` scalar
- Two bytes for a `word` scalar
- `sz` bytes for a `byte[sz]` array
- `2*sz` bytes for a `word[sz]` array
- Two byte pointer to a block of `sz` bytes for a `byte[sz]` array
- Two byte pointer to a block of `2*sz` bytes for a `word[sz]` array
When entering a subroutine a special `var_t` entry is made for a `word` variable using the otherwise illegal name `"----"` to mark the stack frame. The value of this this variable is actually a pointer into the call stack which is used to unwind the stack when a subroutine exits.
Normally when a global or local array is allocated, the data block immediately follows. However the pointer to the data block is exploited to allow the 'array pass by reference' feature to be implemented. In this case, the `var_t` header and the two byte datablock pointer is copied into the local frame (the pointer still refers to the original datablock of the array passed by reference.)
#### Subroutines: Entry and Return, Local Variables and Parameters
The interpreter maintains a pointer to the beginning of the local stack frame (`varslocal`) as well as to the beginning of the list (`varsbegin`) which allows the global variables to be located. When operating at the global scope (ie: not within a subroutine) `varslocal` points to `varsbegin`.
When entering a subroutine a special `var_t` entry is made for a `word` variable using the otherwise illegal name `"----"` to mark the stack frame and this is pushed to the call stack. The value of this this variable is used to store the current value of `varslocal` (ie: the previous stack frame). This is used to unwind the stack when a subroutine exits.
Local variables are allocated on HEAP1 in exactly the same way as globals. The variable search routine `getintvar()` knows to search the local variables and then (if within a subroutine) the globals also. The stack frame marks allow `getintvar()` to know where the globals end and the stack frame of the first subroutine begins.
The interpreter creates a local variable for each parameter, copying the value provided by the caller. Parameters behave exactly like local variables, because they *are* local variables like any other.
When leaving a subroutine with `return` or `endsub`, the interpreter uses the innermost stack frame (which, remember, records the stack frame of its calling subroutine) to unwind the stack. The local variables and the innermost stack frame are released and `varslocal` is set to point to the caller stack frame. Finally, the flow of control returns to the statement following the `call` (or the evaluation of the expression including the function continues, in the case of function invocation.)
### Compiler Memory Organization
The compiler shares most of the infrastructure with the interpreter. The source code of the program is obviously still stored in HEAP2.
The compiled bytecode is written to the beginning of HEAP1, starting from the lowest address and working up. Since no actual data is stored in HEAP1 when compiling (only `var_t` headers and addresses), it is hoped that there will be enough space for the compiled code without having it collide with the symbol tables (which are stored from the top of HEAP1 going down).
#### Variables
The main difference is that instead of storing global and local variables in HEAP1, the compiler uses the `var_t` data structures to keep track of the variable during compilation only - they serve as temporary symbol tables so the compiler can keep track of the address of all the variables in scope. Instead of the payload described above, the entries created by the compiler contain a pointer to the address of the variable in the virtual machine's address space.
The compiled bytecode is written to the beginning of HEAP1, starting from the lowest address and working up. Since no actual data is stored in HEAP1 when compiling (only `var_t` headers and addresses), it is hoped that there will be enough space for the compiled code without having it collide with the symbol tables!
Within the VM there is no 'management overhead' for storing variables - a `word` is always two bytes, a `byte` always one byte. All of the housekeeping takes place within the compiler (which has to keep track of the address of every variable in scope.)
The compiler has a simple allocator (managed by `rt_push_callstack()` and `rt_pop_callstack()`) that mimics the behaviour of the virtual machine, keeping track of the value of the stack pointer (SP). In the same way that the interpreter allocates all variables (global and local) on the call stack, the compiler uses the same strategy of allocating all variables on the call stack of the virtual machine ("VM call stack" from now on.) Since the compiler target memory allocator functions keep track of the VM SP register, the compiler is able to push values to the call stack and still know the addresses to be able to access them later. This can make the compiler output hard to read for humans however!
#### Subroutines: Entry and Return, Local Variables and Parameters
The EightBall Virtual Machine has a number of features which are intended to make it easier to implement subroutine call and return, argument passing etc. In particular, there is a special frame pointer (FP) register which is useful for easily accessing parameter and locals.
Before generating code to enter a subroutine, the compiler ensures code has been generated to evaluate any parameters. The first parameter value should be at the top of the eval stack (X), the second parameter in the second level (Y) etc. Then the compiler emits a `JSR` instruction to call the subroutine entry point. The virtual machine will automatically store the return address on the VM call stack and the VM program counter will be set to the entry point.
On entry to the subroutine, the compiler will emit VM instruction `SPFP` which pushes the current value of the frame pointer (FP) to the VM call stack and copies the stack pointer (SP) to the frame pointer (FP). This sets up the call frame allowing us to easily refer to the parameters and the local variables.
...
#### Subroutine Call Linkage
The compiler also maintains a linked list of subroutine calls and a linked list of subroutine entry points which are used for the final step of compilation - internal linkage. Subroutine calls and entry points are both represented using records of type `sub_t`, each of which contain the first eight characters of the subroutine name, a two byte address pointer and a two byte pointer to the next record. Currently, the compiler allocates these linked lists (anchored by `callsbegin` and `subsbegin`) in HEAP2, after the source code. This space is not freed until HEAP2 is purged using the `new` command, so some space is lost with each use of the `comp` command.
### Compiler Address Fixups
When compiling EightBall code, there are instances where the generated code needs to jump or branch ahead, to some location within code that has yet to be generated. In this case, the compiler will emit the dummy address `$ffff` and will come back later to insert the correct address, once it is known. This is referred to as an "address fixup."