Structured Programming
This essay discusses the machine language equivalents of the
basic structured programming
concepts that are part
of the imperative
family of programming languages:
if/then/else, for/next, while loops, and procedures. It also
discusses basic use of variables, as well as arrays, multi-byte data
types (records), and sub-byte data types (bitfields). It closes by
hand-compiling pseudo-code for an insertion sort on linked lists
into assembler. A complete Commodore 64 application is included as
a sample with this essay.
Control constructs
Branches: if x then y else z
This is almost the most basic control construct.
The most basic is if x then
y, which is a simple branch instruction
(bcc/bcs/beq/bmi/bne/bpl/bvc/bvs) past the then
clause if the conditional is false:
iny
bne no'overflow
inx
no'overflow:
;; rest of code
This increments the value of the y register, and if it just
wrapped back around to zero, it increments the x register too.
It is basically equivalent to the C statement if
((++y)==0) ++x;. We need a few more labels to handle
else clauses as well.
;; Computation of the conditional expression.
;; We assume for the sake of the example that
;; we want to execute the THEN clause if the
;; zero bit is set, otherwise the ELSE
;; clause. This will happen after a CMP,
;; which is the most common kind of 'if'
;; statement anyway.
BNE else'clause
;; THEN clause code goes here.
JMP end'of'if'stmt
else'clause:
;; ELSE clause code goes here.
end'of'if'stmt:
;; ... rest of code.
Free loops: while x do y
A free loop is one that might execute any
number of times. These are basically just a combination
of if and goto. For
a while x do y
loop, that executes zero or more
times, you'd have code like this...
loop'begin:
;; ... computation of condition, setting zero
;; bit if loop is finished...
beq loop'done
;; ... loop body goes here
jmp loop'begin
loop'done:
;; ... rest of program.
If you want to ensure that the loop body executes at least once
(do y while x), just move the test to the end.
loop'begin:
;; ... loop body goes here
;; ... computation of condition, setting zero
;; bit if loop is finished...
bne loop'begin
;; ... rest of program.
The choice of zero bit is kind of arbitrary here. If the
condition involves the carry bit, or overflow, or negative, then
replace the beq with bcs/bvs/bmi appropriately.
Bounded loops: for i = x to y do z
A special case of loops is one where you know exactly how many
times you're going through it—this is called
a bounded loop. Suppose you're copying 16
bytes from $C000 to $D000. The C code for that would look
something like this:
int *a = 0xC000;
int *b = 0xD000;
int i;
for (i = 0; i < 16; i++) { a[i] = b[i]; }
C doesn't directly support bounded loops;
its for statement is just syntactic
sugar
for a while statement. However, we can take
advantage of special purpose machine instructions to get very
straightforward code:
ldx #$00
loop:
lda $c000, x
sta $d000, x
inx
cpx #$10
bmi loop
However, remember that every arithmetic operation,
including inx and dex,
sets the various flags, including the Zero bit. That means that
if we can make our computation end when the
counter hits zero, we can shave off some bytes:
ldx #$10
loop:
lda #$bfff, x
sta #$cfff, x
dex
bne loop
Notice that we had to change the addresses we're indexing from,
because x takes a slightly different range of values. The space
savings is small here, and it's become slightly more unclear.
(It also hasn't actually saved any time, because the lda and sta
instructions are crossing a page boundary where they weren't
before—but if the start or end arrays began at $b020 or
something this wouldn't be an issue.) This tends to work better
when the precise value of the counter isn't used in the
computation—so let us consider the NES, which uses memory
location $2007 as a port to its video memory. Suppose we wish
to jam 4,096 copies of the hex value $20 into the video memory.
We can write this very cleanly, using the X
and Y registers as indices in a nested loop.
ldx #$10
ldy #$00
lda #$20
loop:
sta $2007
iny
bne loop
dex
bne loop
Work through this code. Convince yourself that
the sta is executed exactly 16*256 = 4096
times.
This is an example of a nested loop: a loop
inside a loop. Since our internal loop didn't need the X or Y
registers, we got to use both of them, which is nice, because
they have special incrementing and decrementing instructions.
The accumulator lacks these instructions, so it is a poor choice
to use for index variables. If you have a bounded loop and
don't have access to registers, use memory locations
instead:
lda #$10
sta counter ; loop 16 times
loop:
;; Do stuff that trashes all the registers
dec counter
bne loop
That's it! These are the basic control constructs for using
inside of procedures. Before talking about how to organize
procedures, I'll briefly cover the way the 6502 handles its
stack, because stacks and procedures are very tightly
intertwined.
The stack
The 6502 has an onboard stack in page 1. You can modify the stack
pointer by storing values in X register and
using txs; an empty
stack is
value $FF. Going into a procedure pushes the address of the next
instruction onto the stack, and RTS pops that value off and jumps
there. (Well, not precisely. JSR actually pushes a value that's
one instruction short, and RTS loads the value, increases it by
one, and THEN jumps there. But that's only an issue if you're
using RTS to implement jump tables.) On an interrupt, the next
instruction's address is pushed on the stack, then the process
flags, and it jumps to the handler. The return from interrupt
restores the flags and the PC, just as if nothing had
happened.
The stack only has 256 possible entries; since addresses take two
bytes to store, that means that if you call something that calls
something that calls something that (etc., etc., 129 times), your
computation will fail. This can happen faster if you save
registers or memory values on the stack (see below).
Procedures and register saving
All programming languages are designed around the concept of
procedures.Yes, all of them. Functional languages
just let you do more things with them, logic programming has
implicit calls to query procedures, and
object-oriented methods
are just normal procedures
that take one extra argument in secret.
Procedures let you break a computation up into different parts,
then use them independently. However, compilers do a lot of work
for you behind the scenes to let you think this. Consider the
following assembler code. How many times does the loop
execute?
loop: ldx #$10 jsr do'stuff dex bne loop
The correct answer is I don't know, but
it should be 16.
The reason we don't
know is because we're assuming here that
the do'stuff routine doesn't change the value
of the X register. If it does, than all sorts of chaos could
result. For major routines that aren't called often but are
called in places where the register state is important, you should
store the old registers on the stack with code like this:
do'stuff:
pha
txa
pha
tya
pha
;; Rest of do'stuff goes here
pla
tay
pla
tax
pla
rts
(Remember, the last item pushed onto the stack is the first one
pulled off, so you have to restore them in reverse order.) That's
three more bytes on the stack, so you don't want to do this if you
don't absolutely have to. If do'stuff
actually doesn't touch X, there's no need to
save and restore the value. This technique is
called callee-save.
The reverse technique is called caller-save
and pushes important registers onto the stack before the routine
is called, then restores them afterwards. Each technique has its
advantages and disadvantages. The best way to handle it in your
own code is to mark at the top of each routine which registers
need to be saved by the caller. (It's also useful to note things
like how it takes arguments and how it returns values.)
Variables
Variables come in several flavors.
Global variables
Global variables are variables that can be reached from any
point in the program. Since the 6502 has no memory protection,
these are easy to declare. Take some random chunk of unused
memory and declare it to be the global variables area. All
reasonable assemblers have commands that let you give a symbolic
name to a memory location—you can use this to give your
globals names.
Local variables
All modern languages have some concept of local
variables
, which are data values unique to that
invocation of that procedure. In modern architecures, this data
is stored into and read directly off of the stack. The 6502
doesn't really let you do this cleanly; I'll discuss ways of
handling it in a later essay. If you're implementing a system
from scratch, you can design your memory model to not require
such extreme measures. There are three basic techniques.
Treat local variables like registers
This means that any memory location you use, you save on the
stack and restore afterwards. This
can really eat up stack space, and it's
really slow, it's often pointless, and it has a tendency to
overflow the stack. I can't recommend it. But it does let
you do recursion right, if you don't need to save much memory
and you aren't recursing very deep.
Procedure-based memory allocation
With this technique, you give each procedure its own little
chunk of memory for use with its data. All the variables are
still, technically, globals; a
routine could interfere with another's,
but the discipline of only mess with real globals, and
your own locals
is very, very easy to maintain.
This has many advantages. It's very
fast, both to write and to run, because loading a variable is
an Absolute or Zero Page instruction. Also, any procedure may
call any other procedure, as long as it doesn't wind up
calling itself at some point.
It has two major disadvantages. First, if many routines need
a lot of space, it can consume more memory than it should.
Also, this technique can require significant assembler
support—you must ensure that no procedure's local
variables are defined in the same place as any other
procedure, and it essentially requires a full symbolic linker
to do right. Ophis includes commands for memory
segmentation simulation that automate most of this
task, and make writing general libraries feasible.
Partition-based memory allocation
It's not really necessary that no
procedure overwrite memory used by any other procedure. It's
only required that procedures don't write on the memory that
their callers use. Suppose that your
program is organized into a bunch of procedures, and each fall
into one of three sets:
Procedures in set A don't call anyone.
Procedures in set B only call procedures in set A.
Procedures in set C only call procedures in sets A or B.
Now, each set can be given its own chunk
of memory, and we can be absolutely sure that no procedures
overwrite each other. Even if every procedure in set C uses
the same memory location, they'll never
step on each other, because there's no way to get to any other
routine in set C from any routine in set
C.
This has the same time efficiencies as procedure-based memory
allocation, and, given a thoughtful design aimed at using this
technique, also can use significantly less memory at run time.
It's also requires much less assembler support, as addresses
for variables may be assigned by hand without having to worry
about those addresses already being used. However, it does
impose a very tight discipline on the design of the overall
system, so you'll have to do a lot more work before you start
actually writing code.
Constants
Constants are variables
that don't change. If
you know that the value you're using is not going to change, you
should fold it into the code, either as an Immediate operand
wherever it's used, or (if it's more complicated than that)
as .byte commands in between the procedures.
This is especially important for ROM-based systems such as the
NES; the NES has very little RAM available, so constants should
be kept in the more plentiful ROM wherever possible.
Data structures
So far, we've been treating data as a bunch of one-byte values.
There really isn't a lot you can do just with bytes. This section
talks about how to deal with larger and smaller elements.
Arrays
An array is a bunch of data elements in a
row. An array of bytes is very easy to handle with the 6502
chip, because the various indexed addressing modes handle it for
you. Just load the index into the X or Y register and do an
absolute indexed load. In general, these are going to be
zero-indexed (that is, a 32-byte array is indexed from 0 to 31.)
This code would initialize a byte array with 32 entries to
0:
lda #$00
tax
loop:
sta array,x
inx
cpx #$20
bne loop
(If you count down to save instructions, remember to adjust the
base address so that it's still writing the same memory
location.)
This approach to arrays has some limits. Primary among them is
that we can't have arrays of size larger than 256; we can't fit
our index into the index register. In order to address larger
arrays, we need to use the indirect indexed addressing mode. We
use 16-bit addition to add the offset to the base pointer, then
set the Y register to 0 and then load the value
with lda (ptr),y.
Well, actually, we can do better than that. Suppose we want to
clear out 8K of ram, from $2000 to $4000. We can use the Y
register to hold the low byte of our offset, and only update the
high bit when necessary. That produces the following
loop:
lda #$00 ; Set pointer value to base ($2000)
sta ptr
lda #$20
sta ptr+1
lda #$00 ; Storing a zero
ldx #$20 ; 8,192 ($2000) iterations: high byte
ldy #$00 ; low byte.
loop:
sta (ptr),y
iny
bne loop ; If we haven't wrapped around, go back
inc ptr+1 ; Otherwise update high byte
dex ; bump counter
bne loop ; and continue if we aren't done
This code could be optimized further; the loop prelude in
particular loads a lot of redundant values that could be
compressed down further:
lda #$00
tay
ldx #$20
sta ptr
stx ptr+1
That's not directly relevant to arrays, but these sorts of
things are good things to keep in mind when writing your code.
Done well, they can make it much smaller and faster; done
carelessly, they can force a lot of bizarre dependencies on your
code and make it impossible to modify later.
Records
A record is a collection of values all
referred to as one variable. This has no immediate
representation in assembler. If you have a global variable
that's two bytes and a code pointer, this is exactly equivalent
to three seperate variables. You can just put one label in
front of it, and refer to the first byte
as label, the second
as label+1, and the code pointer
a label+2.
This really applies to all data structures that take up more
than one byte. When dealing with the pointer, a 16-bit value,
we refer to the low byte as ptr
(or label+2, in the example above), and the
high byte as ptr+1
(or label+3).
Arrays of records are more interesting. There are two
possibilities for these. The way most high level languages
treat it is by keeping the records contiguous. If you have an
array of two sixteen bit integers, then the records are stored
in order, one at a time. The first is in location $1000, the
next in $1004, the next in $1008, and so on. You can do this
with the 6502, but you'll probably have to use the indirect
indexed mode if you want to be able to iterate
conveniently.
Another, more unusual, but more efficient approach is to keep
each byte as a seperate array, just like in the arrays example
above. To illustrate, here's a little bit of code to go through
a contiguous array of 16 bit integers, adding their values to
some total variable:
ldx #$10 ; Number of elements in the array
ldy #$00 ; Byte index from array start
loop:
clc
lda array, y ; Low byte
adc total
sta total
lda array+1, y ; High byte
adc total+1
sta total+1
iny ; Jump ahead to next entry
iny
dex ; Check for loop termination
bne loop
And here's the same loop, keeping the high and low bytes in
seperate arrays:
ldx #$00
loop:
clc
lda lowbyte,x
adc total
sta total
lda highbyte,x
adc total+1
sta total+1
inx
cpx #$10
bne loop
Which approach is the right one depends on what you're doing.
For large arrays, the first approach is better, as you only need
to maintain one base pointer. For smaller arrays, the easier
indexing makes the second approach more convenient.
Bitfields
To store values that are smaller than a byte, you can save space
by putting multiple values in a byte. To extract a sub-byte
value, use the bitmasking commands:
To set bits, use the ORA command. ORA #$0F sets the lower four bits to 1 and leaves the rest unchanged.
To clear bits, use the AND command. AND #$F0 sets the lower four bits to 0 and leaves the rest unchanged.
To reverse bits, use the EOR command. EOR #$0F reverses the lower four bits and leaves the rest unchanged.
To test if a bit is 0, AND away everything but that bit, then see if the Zero bit was set. If the bit is in the top two bits of a memory location, you can use the BIT command instead (which stores bit 7 in the Negative bit, and bit 6 in the Overflow bit).
A modest example: Insertion sort on linked lists
To demonstrate these techniques, we will now produce code to
perform insertion sort on a linked list. We'll start by defining
our data structure, then defining the routines we want to write,
then producing actual code for those routines. A downloadable
version that will run unmodified on a Commodore 64 closes the
chapter.
The data structure
We don't really want to have to deal with pointers if we can
possibly avoid it, but it's hard to do a linked list without
them. Instead of pointers, we will
use cursors: small integers that represent
the index into the array of values. This lets us use the
many-small-byte-arrays technique for our data. Furthermore, our
random data that we're sorting never has to move, so we may
declare it as a constant and only bother with changing the
values of head and
the next arrays. The data record definition
looks like this:
head : byte;
data : const int[16] = [838, 618, 205, 984, 724, 301, 249, 946,
925, 43, 114, 697, 985, 633, 312, 86];
next : byte[16];
Exactly how this gets represented will vary from assembler to
assembler. Ophis does it like this:
.data
.space head 1
.space next 16
.text
lb: .byte <$838,<$618,<$205,<$984,<$724,<$301,<$249,<$946
.byte <$925,<$043,<$114,<$697,<$985,<$633,<$312,<$086
hb: .byte >$838,>$618,>$205,>$984,>$724,>$301,>$249,>$946
.byte >$925,>$043,>$114,>$697,>$985,>$633,>$312,>$086
Doing an insertion sort
To do an insertion sort, we clear the list by setting the 'head'
value to -1, and then insert each element into the list one at a
time, placing each element in its proper order in the list. We
can consider the lb/hb structure alone as an array of 16
integers, and just insert each one into the list one at a
time.
procedure insertion_sort
head := -1;
for i := 0 to 15 do
insert_elt i
end
end
This translates pretty directly. We'll have insert_elt take its
argument in the X register, and loop with that. However, given
that insert_elt is going to be a complex procedure, we'll save
the value first. The assembler code becomes:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; insertion'sort: Sorts the list defined by head, next, hb, lb.
; Arguments: None.
; Modifies: All registers destroyed, head and next array sorted.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
insertion'sort:
lda #$FF ; Clear list by storing the terminator in 'head'
sta head
ldx #$0 ; Loop through the lb/hb array, adding each
insertion'sort'loop: ; element one at a time
txa
pha
jsr insert_elt
pla
tax
inx
cpx #$10
bne insertion'sort'loop
rts
Inserting an element
The pseudocode for inserting an element is a bit more
complicated. If the list is empty, or the value we're inserting
goes at the front, then we have to update the value
of head. Otherwise, we can iterate through
the list until we find the element that our value fits in after
(so, the first element whose successor is larger than our
value). Then we update the next pointers directly and exit.
procedure insert_elt i
begin
if head = -1 then begin
head := i;
next[i] := -1;
return;
end;
val := data[i];
if val < data[i] then begin
next[i] := head;
head := i;
return;
end;
current := head;
while (next[current] <> -1 and val < data[next[current]]) do
current := next[current];
end;
next[i] := next[current];
next[current] := i;
end;
This produces the following rather hefty chunk of code:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; insert_elt: Insert an element into the linked list. Maintains the
; list in sorted, ascending order. Used by
; insertion'sort.
; Arguments: X register holds the index of the element to add.
; Modifies: All registers destroyed; head and next arrays updated
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.data
.space lbtoinsert 1
.space hbtoinsert 1
.space indextoinsert 1
.text
insert_elt:
ldy head ; If the list is empty, make
cpy #$FF ; head point at it, and return.
bne insert_elt'list'not'empty
stx head
tya
sta next,x
rts
insert_elt'list'not'empty:
lda lb,x ; Cache the data we're inserting
sta lbtoinsert
lda hb,x
sta hbtoinsert
stx indextoinsert
ldy head ; Compare the first value with
sec ; the data. If the data must
lda lb,y ; be inserted at the front...
sbc lbtoinsert
lda hb,y
sbc hbtoinsert
bmi insert_elt'not'smallest
tya ; Set its next pointer to the
sta next,x ; old head, update the head
stx head ; pointer, and return.
rts
insert_elt'not'smallest:
ldx head
insert_elt'loop: ; At this point, we know that
lda next,x ; argument > data[X].
tay
cpy #$FF ; if next[X] = #$FF, insert arg at end.
beq insert_elt'insert'after'current
lda lb,y ; Otherwise, compare arg to
sec ; data[next[X]]. If we insert
sbc lbtoinsert ; before that...
lda hb,y
sbc hbtoinsert
bmi insert_elt'goto'next
insert_elt'insert'after'current: ; Fix up all the next links
tya
ldy indextoinsert
sta next,y
tya
sta next,x
rts ; and return.
insert_elt'goto'next: ; Otherwise, let X = next[X]
tya ; and go looping again.
tax
jmp insert_elt'loop
The complete application
The full application, which deals with interfacing with CBM
BASIC and handles console I/O and such, is
in .