Structured Programming This essay discusses the machine language equivalents of the basic structured programming concepts that are part of the imperative family of programming languages: if/then/else, for/next, while loops, and procedures. It also discusses basic use of variables, as well as arrays, multi-byte data types (records), and sub-byte data types (bitfields). It closes by hand-compiling pseudo-code for an insertion sort on linked lists into assembler. A complete Commodore 64 application is included as a sample with this essay.
Control constructs
Branches: <literal>if x then y else z</literal> This is almost the most basic control construct. The most basic is if x then y, which is a simple branch instruction (bcc/bcs/beq/bmi/bne/bpl/bvc/bvs) past the then clause if the conditional is false: iny bne no'overflow inx no'overflow: ;; rest of code This increments the value of the y register, and if it just wrapped back around to zero, it increments the x register too. It is basically equivalent to the C statement if ((++y)==0) ++x;. We need a few more labels to handle else clauses as well. ;; Computation of the conditional expression. ;; We assume for the sake of the example that ;; we want to execute the THEN clause if the ;; zero bit is set, otherwise the ELSE ;; clause. This will happen after a CMP, ;; which is the most common kind of 'if' ;; statement anyway. BNE else'clause ;; THEN clause code goes here. JMP end'of'if'stmt else'clause: ;; ELSE clause code goes here. end'of'if'stmt: ;; ... rest of code.
Free loops: <literal>while x do y</literal> A free loop is one that might execute any number of times. These are basically just a combination of if and goto. For a while x do y loop, that executes zero or more times, you'd have code like this... loop'begin: ;; ... computation of condition, setting zero ;; bit if loop is finished... beq loop'done ;; ... loop body goes here jmp loop'begin loop'done: ;; ... rest of program. If you want to ensure that the loop body executes at least once (do y while x), just move the test to the end. loop'begin: ;; ... loop body goes here ;; ... computation of condition, setting zero ;; bit if loop is finished... bne loop'begin ;; ... rest of program. The choice of zero bit is kind of arbitrary here. If the condition involves the carry bit, or overflow, or negative, then replace the beq with bcs/bvs/bmi appropriately.
Bounded loops: <literal>for i = x to y do z</literal> A special case of loops is one where you know exactly how many times you're going through it—this is called a bounded loop. Suppose you're copying 16 bytes from $C000 to $D000. The C code for that would look something like this: int *a = 0xC000; int *b = 0xD000; int i; for (i = 0; i < 16; i++) { a[i] = b[i]; } C doesn't directly support bounded loops; its for statement is just syntactic sugar for a while statement. However, we can take advantage of special purpose machine instructions to get very straightforward code: ldx #$00 loop: lda $c000, x sta $d000, x inx cpx #$10 bmi loop However, remember that every arithmetic operation, including inx and dex, sets the various flags, including the Zero bit. That means that if we can make our computation end when the counter hits zero, we can shave off some bytes: ldx #$10 loop: lda #$bfff, x sta #$cfff, x dex bne loop Notice that we had to change the addresses we're indexing from, because x takes a slightly different range of values. The space savings is small here, and it's become slightly more unclear. (It also hasn't actually saved any time, because the lda and sta instructions are crossing a page boundary where they weren't before—but if the start or end arrays began at $b020 or something this wouldn't be an issue.) This tends to work better when the precise value of the counter isn't used in the computation—so let us consider the NES, which uses memory location $2007 as a port to its video memory. Suppose we wish to jam 4,096 copies of the hex value $20 into the video memory. We can write this very cleanly, using the X and Y registers as indices in a nested loop. ldx #$10 ldy #$00 lda #$20 loop: sta $2007 iny bne loop dex bne loop Work through this code. Convince yourself that the sta is executed exactly 16*256 = 4096 times. This is an example of a nested loop: a loop inside a loop. Since our internal loop didn't need the X or Y registers, we got to use both of them, which is nice, because they have special incrementing and decrementing instructions. The accumulator lacks these instructions, so it is a poor choice to use for index variables. If you have a bounded loop and don't have access to registers, use memory locations instead: lda #$10 sta counter ; loop 16 times loop: ;; Do stuff that trashes all the registers dec counter bne loop That's it! These are the basic control constructs for using inside of procedures. Before talking about how to organize procedures, I'll briefly cover the way the 6502 handles its stack, because stacks and procedures are very tightly intertwined.
The stack The 6502 has an onboard stack in page 1. You can modify the stack pointer by storing values in X register and using txs; an empty stack is value $FF. Going into a procedure pushes the address of the next instruction onto the stack, and RTS pops that value off and jumps there. (Well, not precisely. JSR actually pushes a value that's one instruction short, and RTS loads the value, increases it by one, and THEN jumps there. But that's only an issue if you're using RTS to implement jump tables.) On an interrupt, the next instruction's address is pushed on the stack, then the process flags, and it jumps to the handler. The return from interrupt restores the flags and the PC, just as if nothing had happened. The stack only has 256 possible entries; since addresses take two bytes to store, that means that if you call something that calls something that calls something that (etc., etc., 129 times), your computation will fail. This can happen faster if you save registers or memory values on the stack (see below).
Procedures and register saving All programming languages are designed around the concept of procedures.Yes, all of them. Functional languages just let you do more things with them, logic programming has implicit calls to query procedures, and object-oriented methods are just normal procedures that take one extra argument in secret. Procedures let you break a computation up into different parts, then use them independently. However, compilers do a lot of work for you behind the scenes to let you think this. Consider the following assembler code. How many times does the loop execute? loop: ldx #$10 jsr do'stuff dex bne loop The correct answer is I don't know, but it should be 16. The reason we don't know is because we're assuming here that the do'stuff routine doesn't change the value of the X register. If it does, than all sorts of chaos could result. For major routines that aren't called often but are called in places where the register state is important, you should store the old registers on the stack with code like this: do'stuff: pha txa pha tya pha ;; Rest of do'stuff goes here pla tay pla tax pla rts (Remember, the last item pushed onto the stack is the first one pulled off, so you have to restore them in reverse order.) That's three more bytes on the stack, so you don't want to do this if you don't absolutely have to. If do'stuff actually doesn't touch X, there's no need to save and restore the value. This technique is called callee-save. The reverse technique is called caller-save and pushes important registers onto the stack before the routine is called, then restores them afterwards. Each technique has its advantages and disadvantages. The best way to handle it in your own code is to mark at the top of each routine which registers need to be saved by the caller. (It's also useful to note things like how it takes arguments and how it returns values.)
Variables Variables come in several flavors.
Global variables Global variables are variables that can be reached from any point in the program. Since the 6502 has no memory protection, these are easy to declare. Take some random chunk of unused memory and declare it to be the global variables area. All reasonable assemblers have commands that let you give a symbolic name to a memory location—you can use this to give your globals names.
Local variables All modern languages have some concept of local variables, which are data values unique to that invocation of that procedure. In modern architecures, this data is stored into and read directly off of the stack. The 6502 doesn't really let you do this cleanly; I'll discuss ways of handling it in a later essay. If you're implementing a system from scratch, you can design your memory model to not require such extreme measures. There are three basic techniques.
Treat local variables like registers This means that any memory location you use, you save on the stack and restore afterwards. This can really eat up stack space, and it's really slow, it's often pointless, and it has a tendency to overflow the stack. I can't recommend it. But it does let you do recursion right, if you don't need to save much memory and you aren't recursing very deep.
Procedure-based memory allocation With this technique, you give each procedure its own little chunk of memory for use with its data. All the variables are still, technically, globals; a routine could interfere with another's, but the discipline of only mess with real globals, and your own locals is very, very easy to maintain. This has many advantages. It's very fast, both to write and to run, because loading a variable is an Absolute or Zero Page instruction. Also, any procedure may call any other procedure, as long as it doesn't wind up calling itself at some point. It has two major disadvantages. First, if many routines need a lot of space, it can consume more memory than it should. Also, this technique can require significant assembler support—you must ensure that no procedure's local variables are defined in the same place as any other procedure, and it essentially requires a full symbolic linker to do right. Ophis includes commands for memory segmentation simulation that automate most of this task, and make writing general libraries feasible.
Partition-based memory allocation It's not really necessary that no procedure overwrite memory used by any other procedure. It's only required that procedures don't write on the memory that their callers use. Suppose that your program is organized into a bunch of procedures, and each fall into one of three sets: Procedures in set A don't call anyone. Procedures in set B only call procedures in set A. Procedures in set C only call procedures in sets A or B. Now, each set can be given its own chunk of memory, and we can be absolutely sure that no procedures overwrite each other. Even if every procedure in set C uses the same memory location, they'll never step on each other, because there's no way to get to any other routine in set C from any routine in set C. This has the same time efficiencies as procedure-based memory allocation, and, given a thoughtful design aimed at using this technique, also can use significantly less memory at run time. It's also requires much less assembler support, as addresses for variables may be assigned by hand without having to worry about those addresses already being used. However, it does impose a very tight discipline on the design of the overall system, so you'll have to do a lot more work before you start actually writing code.
Constants Constants are variables that don't change. If you know that the value you're using is not going to change, you should fold it into the code, either as an Immediate operand wherever it's used, or (if it's more complicated than that) as .byte commands in between the procedures. This is especially important for ROM-based systems such as the NES; the NES has very little RAM available, so constants should be kept in the more plentiful ROM wherever possible.
Data structures So far, we've been treating data as a bunch of one-byte values. There really isn't a lot you can do just with bytes. This section talks about how to deal with larger and smaller elements.
Arrays An array is a bunch of data elements in a row. An array of bytes is very easy to handle with the 6502 chip, because the various indexed addressing modes handle it for you. Just load the index into the X or Y register and do an absolute indexed load. In general, these are going to be zero-indexed (that is, a 32-byte array is indexed from 0 to 31.) This code would initialize a byte array with 32 entries to 0: lda #$00 tax loop: sta array,x inx cpx #$20 bne loop (If you count down to save instructions, remember to adjust the base address so that it's still writing the same memory location.) This approach to arrays has some limits. Primary among them is that we can't have arrays of size larger than 256; we can't fit our index into the index register. In order to address larger arrays, we need to use the indirect indexed addressing mode. We use 16-bit addition to add the offset to the base pointer, then set the Y register to 0 and then load the value with lda (ptr),y. Well, actually, we can do better than that. Suppose we want to clear out 8K of ram, from $2000 to $4000. We can use the Y register to hold the low byte of our offset, and only update the high bit when necessary. That produces the following loop: lda #$00 ; Set pointer value to base ($2000) sta ptr lda #$20 sta ptr+1 lda #$00 ; Storing a zero ldx #$20 ; 8,192 ($2000) iterations: high byte ldy #$00 ; low byte. loop: sta (ptr),y iny bne loop ; If we haven't wrapped around, go back inc ptr+1 ; Otherwise update high byte dex ; bump counter bne loop ; and continue if we aren't done This code could be optimized further; the loop prelude in particular loads a lot of redundant values that could be compressed down further: lda #$00 tay ldx #$20 sta ptr stx ptr+1 That's not directly relevant to arrays, but these sorts of things are good things to keep in mind when writing your code. Done well, they can make it much smaller and faster; done carelessly, they can force a lot of bizarre dependencies on your code and make it impossible to modify later.
Records A record is a collection of values all referred to as one variable. This has no immediate representation in assembler. If you have a global variable that's two bytes and a code pointer, this is exactly equivalent to three seperate variables. You can just put one label in front of it, and refer to the first byte as label, the second as label+1, and the code pointer a label+2. This really applies to all data structures that take up more than one byte. When dealing with the pointer, a 16-bit value, we refer to the low byte as ptr (or label+2, in the example above), and the high byte as ptr+1 (or label+3). Arrays of records are more interesting. There are two possibilities for these. The way most high level languages treat it is by keeping the records contiguous. If you have an array of two sixteen bit integers, then the records are stored in order, one at a time. The first is in location $1000, the next in $1004, the next in $1008, and so on. You can do this with the 6502, but you'll probably have to use the indirect indexed mode if you want to be able to iterate conveniently. Another, more unusual, but more efficient approach is to keep each byte as a seperate array, just like in the arrays example above. To illustrate, here's a little bit of code to go through a contiguous array of 16 bit integers, adding their values to some total variable: ldx #$10 ; Number of elements in the array ldy #$00 ; Byte index from array start loop: clc lda array, y ; Low byte adc total sta total lda array+1, y ; High byte adc total+1 sta total+1 iny ; Jump ahead to next entry iny dex ; Check for loop termination bne loop And here's the same loop, keeping the high and low bytes in seperate arrays: ldx #$00 loop: clc lda lowbyte,x adc total sta total lda highbyte,x adc total+1 sta total+1 inx cpx #$10 bne loop Which approach is the right one depends on what you're doing. For large arrays, the first approach is better, as you only need to maintain one base pointer. For smaller arrays, the easier indexing makes the second approach more convenient.
Bitfields To store values that are smaller than a byte, you can save space by putting multiple values in a byte. To extract a sub-byte value, use the bitmasking commands: To set bits, use the ORA command. ORA #$0F sets the lower four bits to 1 and leaves the rest unchanged. To clear bits, use the AND command. AND #$F0 sets the lower four bits to 0 and leaves the rest unchanged. To reverse bits, use the EOR command. EOR #$0F reverses the lower four bits and leaves the rest unchanged. To test if a bit is 0, AND away everything but that bit, then see if the Zero bit was set. If the bit is in the top two bits of a memory location, you can use the BIT command instead (which stores bit 7 in the Negative bit, and bit 6 in the Overflow bit).
A modest example: Insertion sort on linked lists To demonstrate these techniques, we will now produce code to perform insertion sort on a linked list. We'll start by defining our data structure, then defining the routines we want to write, then producing actual code for those routines. A downloadable version that will run unmodified on a Commodore 64 closes the chapter.
The data structure We don't really want to have to deal with pointers if we can possibly avoid it, but it's hard to do a linked list without them. Instead of pointers, we will use cursors: small integers that represent the index into the array of values. This lets us use the many-small-byte-arrays technique for our data. Furthermore, our random data that we're sorting never has to move, so we may declare it as a constant and only bother with changing the values of head and the next arrays. The data record definition looks like this: head : byte; data : const int[16] = [838, 618, 205, 984, 724, 301, 249, 946, 925, 43, 114, 697, 985, 633, 312, 86]; next : byte[16]; Exactly how this gets represented will vary from assembler to assembler. Ophis does it like this: .data .space head 1 .space next 16 .text lb: .byte <$838,<$618,<$205,<$984,<$724,<$301,<$249,<$946 .byte <$925,<$043,<$114,<$697,<$985,<$633,<$312,<$086 hb: .byte >$838,>$618,>$205,>$984,>$724,>$301,>$249,>$946 .byte >$925,>$043,>$114,>$697,>$985,>$633,>$312,>$086
Doing an insertion sort To do an insertion sort, we clear the list by setting the 'head' value to -1, and then insert each element into the list one at a time, placing each element in its proper order in the list. We can consider the lb/hb structure alone as an array of 16 integers, and just insert each one into the list one at a time. procedure insertion_sort head := -1; for i := 0 to 15 do insert_elt i end end This translates pretty directly. We'll have insert_elt take its argument in the X register, and loop with that. However, given that insert_elt is going to be a complex procedure, we'll save the value first. The assembler code becomes: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; insertion'sort: Sorts the list defined by head, next, hb, lb. ; Arguments: None. ; Modifies: All registers destroyed, head and next array sorted. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; insertion'sort: lda #$FF ; Clear list by storing the terminator in 'head' sta head ldx #$0 ; Loop through the lb/hb array, adding each insertion'sort'loop: ; element one at a time txa pha jsr insert_elt pla tax inx cpx #$10 bne insertion'sort'loop rts
Inserting an element The pseudocode for inserting an element is a bit more complicated. If the list is empty, or the value we're inserting goes at the front, then we have to update the value of head. Otherwise, we can iterate through the list until we find the element that our value fits in after (so, the first element whose successor is larger than our value). Then we update the next pointers directly and exit. procedure insert_elt i begin if head = -1 then begin head := i; next[i] := -1; return; end; val := data[i]; if val < data[i] then begin next[i] := head; head := i; return; end; current := head; while (next[current] <> -1 and val < data[next[current]]) do current := next[current]; end; next[i] := next[current]; next[current] := i; end; This produces the following rather hefty chunk of code: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; insert_elt: Insert an element into the linked list. Maintains the ; list in sorted, ascending order. Used by ; insertion'sort. ; Arguments: X register holds the index of the element to add. ; Modifies: All registers destroyed; head and next arrays updated ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; .data .space lbtoinsert 1 .space hbtoinsert 1 .space indextoinsert 1 .text insert_elt: ldy head ; If the list is empty, make cpy #$FF ; head point at it, and return. bne insert_elt'list'not'empty stx head tya sta next,x rts insert_elt'list'not'empty: lda lb,x ; Cache the data we're inserting sta lbtoinsert lda hb,x sta hbtoinsert stx indextoinsert ldy head ; Compare the first value with sec ; the data. If the data must lda lb,y ; be inserted at the front... sbc lbtoinsert lda hb,y sbc hbtoinsert bmi insert_elt'not'smallest tya ; Set its next pointer to the sta next,x ; old head, update the head stx head ; pointer, and return. rts insert_elt'not'smallest: ldx head insert_elt'loop: ; At this point, we know that lda next,x ; argument > data[X]. tay cpy #$FF ; if next[X] = #$FF, insert arg at end. beq insert_elt'insert'after'current lda lb,y ; Otherwise, compare arg to sec ; data[next[X]]. If we insert sbc lbtoinsert ; before that... lda hb,y sbc hbtoinsert bmi insert_elt'goto'next insert_elt'insert'after'current: ; Fix up all the next links tya ldy indextoinsert sta next,y tya sta next,x rts ; and return. insert_elt'goto'next: ; Otherwise, let X = next[X] tya ; and go looping again. tax jmp insert_elt'loop
The complete application The full application, which deals with interfacing with CBM BASIC and handles console I/O and such, is in .