Updated information on internals (memory mgmt)

This commit is contained in:
Bobbi Webber-Manners 2018-05-02 18:55:14 -04:00 committed by GitHub
parent 8417bfe448
commit ac0279cf91
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -821,9 +821,9 @@ Then you can run the VM program for your platform. It will load the bytecode fr
The EightBall Virtual machine has the following features:
- 16 level evaluation stack. Each cell on the evaluation stack is 16 bits.
- Call stack. This stack is byte-orientated (rather than word-orientated like the evaluation stack). It occupies most of system memory.
- Program counter - 16 bits
- Stack pointer - 16 bits - used to address the call stack
- Frame pointer - 16 bits - makes addressing locals and parameters easier for subroutine code
- Program counter (PC) - 16 bits
- Stack pointer (SP) - 16 bits - used to address the call stack
- Frame pointer (FP) - 16 bits - makes addressing locals and parameters easier for subroutine code
The evaluation stack is used for all computations. The VM offers a variety of instructions for maniplating the evaluation stack. All calculations, regardless of the type of the variables involved, is performed using 16 bit arithmetic.
@ -927,10 +927,16 @@ In order to use the least code possible, the compiler uses the same data structu
cc65 places the executable code of the EightBall line editor / interpreter / compiler in low memory.
The source code of the program is stored in plain ASCII (or PETSCII on Commodore systems) text in a buffer immediately above the EightBall executable code. Note that the lower bounds of this buffer have to be adjusted by hand in `eightball.c` when the code changes size. The size of the code segments generated by cc65 can be determined by inspecting the map file created by the compiler. (This is HEAP2 in `eightball.c`).
There are two storage areas (or 'arenas') which are denoted as `HEAP1` and `HEAP2` in the `eightball.c` code. The historical origin of this organization is the fact that EightBall first originated as a language targetting the VIC20 with 32K expansion. In this configuration, there is an 8K memory block (starting at address $A000m referred to as BLK5 in the VIC20 design) which is not contiguous with the rest of RAM. For the VIC20, BLK5 was designated as `HEAP1` and the remainder of RAM (above the executable code) was designated `HEAP2`. For other 6502 architectures (Apple II, Commodore 64), the `HEAP1` / `HEAP2` arenas are maintained, but since there is no 'gap' in the memory map, the boundary between them may be adjusted to any arbitrary address.
The division of interpreter memory into two distinct blocks turns out to be quite useful, as we shall see below.
The source code of the program is stored in plain ASCII (or PETSCII on Commodore systems) text at the bottom of `HEAP2` immediately above the EightBall executable code (using routine `alloc2bttm()`). As more lines of source code are added, the it is added to the heap, growing upwards to higher addresses.
Note that the lower bounds of arena `HEAP2` have to be adjusted by hand in `eightball.c` when the code changes size. The size of the code segments generated by cc65 can be determined by inspecting the map file created by the compiler.
#### Variables
Global and local variables are allocated from the highest available memory address down. (This is HEAP1 in `eightball.c`). For each variable a small `var_t` header is stored, consisting of the first four characters of the name, a byte which records whether it is a `byte` or `word` variable and also the number of dimensions. If the number of dimensions is zero then this indicates a scalar variable, otherwise it is an array of the specified number of elements. The `var_t` header also includes a two byte pointer to next, allowing them to be assembled into a linked list.
Global and local variables are allocated at the top of `HEAP1`, from the highest available memory address down. For each variable a small `var_t` header is stored, consisting of the first four characters of the name, a byte which records whether it is a `byte` or `word` variable and also the number of dimensions. If the number of dimensions is zero then this indicates a scalar variable, otherwise it is an array of the specified number of elements. The `var_t` header also includes a two byte pointer to next, allowing them to be assembled into a linked list.
Following the `var_t` header the actual variable data is stored:
- One byte for a `byte` scalar
@ -945,20 +951,25 @@ The interpreter maintains a pointer to the beginning of the local stack frame (`
When entering a subroutine a special `var_t` entry is made for a `word` variable using the otherwise illegal name `"----"` to mark the stack frame and this is pushed to the call stack. The value of this this variable is used to store the current value of `varslocal` (ie: the previous stack frame). This is used to unwind the stack when a subroutine exits.
Local variables are allocated on HEAP1 in exactly the same way as globals. The variable search routine `getintvar()` knows to search the local variables and then (if within a subroutine) the globals also. The stack frame marks allow `getintvar()` to know where the globals end and the stack frame of the first subroutine begins.
Local variables are allocated on `HEAP1` in exactly the same way as globals. The variable search routine `getintvar()` knows to search the local variables and then (if within a subroutine) the globals also. The stack frame marks allow `getintvar()` to know where the globals end and the stack frame of the first subroutine begins.
The interpreter creates a local variable for each parameter, copying the value provided by the caller. Parameters behave exactly like local variables, because they *are* local variables like any other.
When leaving a subroutine with `return` or `endsub`, the interpreter uses the innermost stack frame (which, remember, records the stack frame of its calling subroutine) to unwind the stack. The local variables and the innermost stack frame are released and `varslocal` is set to point to the caller stack frame. Finally, the flow of control returns to the statement following the `call` (or the evaluation of the expression including the function continues, in the case of function invocation.)
#### Summary of Interpreter Memory Allocations
- `HEAP1` - Global and local variables, growing down from the top of the arena.
- `HEAP2` - Source code, growing up from the bottom of the arena.
### Compiler Memory Organization
The compiler shares most of the infrastructure with the interpreter. The source code of the program is obviously still stored in HEAP2.
The compiler shares most of the infrastructure with the interpreter. The source code of the program is obviously still stored at the bbottom of `HEAP2`.
The compiled bytecode is written to the beginning of HEAP1, starting from the lowest address and working up. Since no actual data is stored in HEAP1 when compiling (only `var_t` headers and addresses), it is hoped that there will be enough space for the compiled code without having it collide with the symbol tables (which are stored from the top of HEAP1 going down).
The compiled bytecode is written to the beginning of `HEAP1`, starting from the lowest address and working up. Since no actual data is stored in `HEAP1` when compiling (only `var_t` headers and addresses), it is hoped that there will be enough space for the compiled code without having it collide with the symbol tables (which are stored from the top of `HEAP1` going down).
#### Variables
The main difference is that instead of storing global and local variables in HEAP1, the compiler uses the `var_t` data structures to keep track of the variable during compilation only - they serve as temporary symbol tables so the compiler can keep track of the address of all the variables in scope. Instead of the payload described above, the entries created by the compiler contain a pointer to the address of the variable in the virtual machine's address space.
The main difference is that instead of storing global and local variables in `HEAP1`, the compiler uses the `var_t` data structures to keep track of the variable during compilation only - they serve as temporary symbol tables so the compiler can keep track of the address of all the variables in scope. Instead of the payload described above, the entries created by the compiler contain a pointer to the address of the variable in the virtual machine's address space.
Within the VM there is no 'management overhead' for storing variables - a `word` is always two bytes, a `byte` always one byte. All of the housekeeping takes place within the compiler (which has to keep track of the address of every variable in scope.)
@ -980,7 +991,18 @@ On exit from the subroutine, the compiler emits code to evaluate the return valu
The return value is left on the evaluation stack. If the calling code does not use it, the compiler must issue a `DROP` instruction to discard it.
#### Subroutine Call Linkage
The compiler also maintains a linked list of subroutine calls and a linked list of subroutine entry points which are used for the final step of compilation - internal linkage. Subroutine calls and entry points are both represented using records of type `sub_t`, each of which contain the first eight characters of the subroutine name, a two byte address pointer and a two byte pointer to the next record. Currently, the compiler allocates these linked lists (anchored by `callsbegin` and `subsbegin`) in HEAP2, after the source code. This space is not freed until HEAP2 is purged using the `new` command, so some space is lost with each use of the `comp` command.
The compiler also maintains a linked list of subroutine calls and a linked list of subroutine entry points which are used for the final step of compilation - internal linkage. Subroutine calls and entry points are both represented using records of type `sub_t`, each of which contain the first eight characters of the subroutine name, a two byte address pointer and a two byte pointer to the next record.
The compiler allocates these linked lists (anchored by `callsbegin` and `subsbegin`) at the end of `HEAP2`, growing down towards the source code, which grows up from the bottom of this same arena. The linked list of subroutine calls is freed as soon as compilation is completed.
#### Summary of Compiler Memory Allocations
- `HEAP1
- Generated bytecode, growing up from the bottom of the arena. Discarded after compilation.
- Global and local variables, growing down from the top of the arena.
- `HEAP2`
- Source code, growing up from the bottom of the arena.
- Compiler linkage tables, growing down from the top of the arena. Discarded after compilation.
### Compiler Address Fixups