Call Stacks All our previous work has been assuming FORTRAN-style calling conventions. In this, all procedure-local variables are actually secretly globals. This means that a function that calls itself will end up stomping on its previous values, and everything will be hideously scrambled. Various workarounds for this are covered in . Here, we solve the problem fully.
Recursion A procedure in C or other similar languages declares a chunk of storage that's unique to that invocation. This chunk is just large enough to hold the return address and all the local variables, and is called the stack frame. Stack frames are arranged on a call stack; when a function is called, the stack grows with the new frame, and when that function returns, its frame is destroyed. Once the main function returns, the stack is empty. Most modern architectures are designed to let you implement variable access like this directly, without touching the registers at all. The x86 architecture even dedicates a register to function explicitly as the stack pointer, and then one could read, say, the fifth 16-bit variable into the register AX with the command MOV AX, [SP+10]. As we saw in , the 6502 isn't nearly as convenient. We'd need to keep the stack pointer somewhere on the zero page, then load the Y register with 10, then load the accumulator with an indexed-indirect call. This is verbose, keeps trashing our registers, and it's very, very slow. So, in the spirit of programmers everywhere, we'll cheat.
Our Goals The system we develop should have all of the following characteristics. It should be intuitive to program for. The procedure bodies should be easily readable and writable by humans, even in assembler form. It should be efficient. Variable accesses are very common, so procedures shouldn't cost much to run. It should allow multiple arity in both arguments and return values. We won't require that an unlimited amount of information be passable, but it should allow more than the three bytes the registers give us. It should permit tail call elimination, an optimization that will allow certain forms of recursion to actually not grow the stack. Here is a system that meets all these properties. Reserve two bytes of the zero page for a stack pointer. At the beginning of the program, set it to the top of memory. Divide the remainder of Zero Page into two parts: The scratch space, which is where arguments and return values go, and which may be scrambled by any function call, and The local area, which all functions must restore to their initial state once finished. Assign to each procedure a frame size S, which is a maximum size on the amount of the local area the procedure can use. The procedure's variables will sit in the first S bytes of the local area. Upon entering the procedure, push the first S bytes of the local area onto the stack; upon exit, pop hose S bytes back on top of the local area. While the procedure is running, only touch the local area and the scratch space. This meets our design criteria neatly: It's as intuitive as such a system will get. You have to call init'stack at the beginning, and you need to ensure that save'stack and restore'stack are called right. The procedure's program text can pretend that it's just referring to its own variables, just like with the old style. If a procedure doesn't call anyone, then it can just do all its work in the scratch space. It's efficient; the inside of the procedure is likely to be faster and smaller than its FORTRAN-style counterpart, because all variable references are on the Zero Page. Both arguments and return values can be as large as the scratch space. It's not infinite, but it's probably good enough. Tail call elimination is possible; just restore the stack before making the JMP to the tail call target. The necessary support code is pretty straightforward. The stack modification routines take the size of the frame in the accumulator, and while saving the local area, it copies over the corresponding values from the scratch space. (This is because most functions will be wanting to keep their arguments around across calls.) .scope ; Stack routines .data zp .space _sp $02 .space _counter $01 .space fun'args $10 .space fun'vars $40 .text init'stack: lda #$00 sta _sp lda #$A0 sta _sp+1 rts save'stack: sta _counter sec lda _sp sbc _counter sta _sp lda _sp+1 sbc #$00 sta _sp+1 ldy #$00 * lda fun'vars, y sta (_sp), y lda fun'args, y sta fun'vars, y iny dec _counter bne - rts restore'stack: pha sta _counter ldy #$00 * lda (_sp), y sta fun'vars, y iny dec _counter bne - pla clc adc _sp sta _sp lda _sp+1 adc #$00 sta _sp+1 rts .scend
Example: Fibonnacci Numbers About the simplest interesting recursive function is the Fibonacci numbers. The function fib(x) is defined as being 1 if x is 0 or 1, and being fib(x-2)+fib(x-1) otherwise. Actually expressing it like that directly produces a very inefficient implementation, but it's a simple demonstration of the system. Here's code for expressing the fib function: .scope ; Uint16 fib (Uint8 x): compute Xth fibonnaci number. ; fib(0) = fib(1) = 1. ; Stack usage: 3. fib: lda #$03 jsr save'stack lda fun'vars cmp #$02 bcc _base dec fun'args jsr fib lda fun'args sta fun'vars+1 lda fun'args+1 sta fun'vars+2 lda fun'vars sec sbc #$02 sta fun'args jsr fib clc lda fun'args adc fun'vars+1 sta fun'args lda fun'args+1 adc fun'vars+2 sta fun'args+1 jmp _done _base: ldy #$01 sty fun'args dey sty fun'args+1 _done: lda #$03 jsr restore'stack rts .scend The full application, which deals with interfacing with CBM BASIC and handles console I/O and such, is in .