Ophis/doc/hll5.sgm

<chapter>
<title>Call Stacks</title>

<para>
  All our previous work has been assuming FORTRAN-style calling
  conventions.  In this, all procedure-local variables are actually
  secretly globals.  This means that a function that calls itself will
  end up stomping on its previous values, and everything will be
  hideously scrambled.  Various workarounds for this are covered
  in <xref linkend="hll2">.  Here, we solve the problem fully.
</para>

<section>
  <title>Recursion</title>

  <para>
    A procedure in C or other similar languages declares a chunk of
    storage that's unique to that invocation.  This chunk is just
    large enough to hold the return address and all the local
    variables, and is called the <emphasis>stack frame</emphasis>.
    Stack frames are arranged on a <emphasis>call stack</emphasis>;
    when a function is called, the stack grows with the new frame, and
    when that function returns, its frame is destroyed.  Once the main
    function returns, the stack is empty.
  </para>

  <para>
    Most modern architectures are designed to let you implement
    variable access like this directly, without touching the registers
    at all.  The x86 architecture even dedicates a register to
    function explicitly as the <emphasis>stack pointer</emphasis>, and
    then one could read, say, the fifth 16-bit variable into the
    register AX with the command <literal>MOV AX, [SP+10]</literal>.
  </para>

  <para>
    As we saw in <xref linkend="hll3">, the 6502 isn't nearly as
    convenient.  We'd need to keep the stack pointer somewhere on the
    zero page, then load the Y register with 10, then load the
    accumulator with an indexed-indirect call.  This is verbose, keeps
    trashing our registers, and it's very, very slow.
  </para>

  <para>
    So, in the spirit of programmers everywhere, we'll cheat.
  </para>
</section>
<section>
  <title>Our Goals</title>

  <para>
    The system we develop should have all of the following
    characteristics.
  </para>

  <itemizedlist>
    <listitem><para>It should be <emphasis>intuitive to program for</emphasis>.  The procedure bodies should be easily readable and writable by humans, even in assembler form.</para></listitem>
    <listitem><para>It should be <emphasis>efficient</emphasis>.  Variable accesses are very common, so procedures shouldn't cost much to run.</para></listitem>
    <listitem><para>It should allow <emphasis>multiple arity</emphasis> in both arguments and return values.  We won't require that an unlimited amount of information be passable, but it should allow more than the three bytes the registers give us.</para></listitem>
    <listitem><para>It should permit <emphasis>tail call elimination</emphasis>, an optimization that will allow certain forms of recursion to actually not grow the stack.</para></listitem>
  </itemizedlist>

  <para>
    Here is a system that meets all these properties.
  </para>

  <itemizedlist>
    <listitem><para>Reserve two bytes of the zero page for a stack pointer.  At the beginning of the program, set it to the top of memory.</para></listitem>
    <listitem><para>Divide the remainder of Zero Page into two parts:
        <itemizedlist>
          <listitem><para>The <emphasis>scratch space</emphasis>, which is where arguments and return values go, and which may be scrambled by any function call, and</para></listitem>
          <listitem><para>The <emphasis>local area</emphasis>, which all functions must restore to their initial state once finished.</para></listitem>
        </itemizedlist>
    </para></listitem>
    <listitem><para>Assign to each procedure a <emphasis>frame size</emphasis> S, which is a maximum size on the amount of the local area the procedure can use.  The procedure's variables will sit in the first S bytes of the local area.</para></listitem>
    <listitem><para>Upon entering the procedure, push the first S bytes of the local area onto the stack; upon exit, pop hose S bytes back on top of the local area.</para></listitem>
    <listitem><para>While the procedure is running, only touch the local area and the scratch space.</para></listitem>
  </itemizedlist>

  <para>This meets our design criteria neatly:</para>

  <itemizedlist>
    <listitem><para>It's as intuitive as such a system will get.  You have to call <literal>init'stack</literal> at the beginning, and you need to ensure that <literal>save'stack</literal> and <literal>restore'stack</literal> are called right.  The procedure's program text can pretend that it's just referring to its own variables, just like with the old style.  If a procedure doesn't call <emphasis>anyone</emphasis>, then it can just do all its work in the scratch space.</para></listitem>
    <listitem><para>It's efficient; the inside of the procedure is likely to be faster and smaller than its FORTRAN-style counterpart, because all variable references are on the Zero Page.</para></listitem>
    <listitem><para>Both arguments and return values can be as large as the scratch space.  It's not infinite, but it's probably good enough.</para></listitem>
    <listitem><para>Tail call elimination is possible; just restore the stack before making the JMP to the tail call target.</para></listitem>
  </itemizedlist>

  <para>
    The necessary support code is pretty straightforward.  The stack
    modification routines take the size of the frame in the
    accumulator, and while saving the local area, it copies over the
    corresponding values from the scratch space.  (This is because
    most functions will be wanting to keep their arguments around
    across calls.)
  </para>

<programlisting>
.scope
; Stack routines
.data zp
.space _sp      $02
.space _counter $01
.space fun'args $10
.space fun'vars $40

.text
init'stack:
        lda     #$00
        sta     _sp
        lda     #$A0
        sta     _sp+1
        rts

save'stack:
        sta     _counter
        sec
        lda     _sp
        sbc     _counter
        sta     _sp
        lda     _sp+1
        sbc     #$00
        sta     _sp+1
        ldy     #$00
*       lda     fun'vars, y
        sta     (_sp), y
        lda     fun'args, y
        sta     fun'vars, y
        iny
        dec     _counter
        bne -
        rts

restore'stack:
        pha
        sta     _counter
        ldy     #$00
*       lda     (_sp), y
        sta     fun'vars, y
        iny
        dec     _counter
        bne -
        pla
        clc
        adc     _sp
        sta     _sp
        lda     _sp+1
        adc     #$00
        sta     _sp+1
        rts
.scend
</programlisting>
</section>
<section>
  <title>Example: Fibonnacci Numbers</title>

  <para>
    About the simplest <quote>interesting</quote> recursive function
    is the Fibonacci numbers.  The function fib(x) is defined as being
    1 if x is 0 or 1, and being fib(x-2)+fib(x-1) otherwise.
  </para>

  <para>
    Actually expressing it like that directly produces a very
    inefficient implementation, but it's a simple demonstration of the
    system.  Here's code for expressing the fib function:
  </para>

<programlisting>
.scope
; Uint16 fib (Uint8 x): compute Xth fibonnaci number.
; fib(0) = fib(1) = 1.
; Stack usage: 3.

fib:    lda     #$03
        jsr     save'stack
        lda     fun'vars
        cmp     #$02
        bcc     _base

        dec     fun'args
        jsr     fib
        lda     fun'args
        sta     fun'vars+1
        lda     fun'args+1
        sta     fun'vars+2
        lda     fun'vars
        sec
        sbc     #$02
        sta     fun'args
        jsr     fib
        clc
        lda     fun'args
        adc     fun'vars+1
        sta     fun'args
        lda     fun'args+1
        adc     fun'vars+2
        sta     fun'args+1
        jmp     _done

_base:  ldy     #$01
        sty     fun'args
        dey
        sty     fun'args+1

_done:  lda     #$03
        jsr     restore'stack
        rts
.scend
</programlisting>

  <para>
    The full application, which deals with interfacing with CBM BASIC
    and handles console I/O and such, is in <xref linkend="fib-src"
    endterm="fib-fname">.
  </para>
</section>
</chapter>