Functionals This essay deals with indirect calls. These are the core of an enormous number of high level languages: LISP's closures, C's function pointers, C++ and Java's virtual method calls, and some implementations of the switch statement. These techniques vary in complexity, and most will not be appropriate for large-scale assembler projects. Of them, however, the Data-Directed approach is the most likely to lead to organized and maintainable code.
Function Pointers Because assembly language is totally untyped, function pointers are the same as any other sixteen-bit integer. This makes representing them really quite easy; most assemblers should permit routines to be declared simply by naming the routine as a .word directly. To actually invoke these methods, copy them to some sixteen-bit location (say, target) and then invoking the method is a simple matter of the using an indirect jump: the JMP (target) instruction. There's really only one subtlety here, and it's that the indirect jump is an indirect jump, not an indirect function call. Thus, if some function A makes in indirect jump to some routine, when that routine returns, it returns to whoever called A, not A itself. There are several ways of dealing with this, but only one correct way, which is to structure your procedures so that any call to JMP (xxxx) occurs at the very end.
A quick digression on how subroutines work Ordinarily, subroutines are called with JSR and finished with RTS. The JSR instruction takes its own address, adds 2 to it, and pushes this 16-bit value on the stack, high byte first, then low byte (so that the low byte will be popped off first). But wait, you may object. All JSR instructions are three bytes long. This return address is in the middle of the instruction. And you would be quite right; the RTS instruction pops off the 16-bit address, adds one to it, and then sets the program counter to that value. So it is possible to set up a JSR indirect kind of operation by adding two to the indirect jump's address and then pushing that value onto the stack before making the jump; however, you wouldn't want to do this. It takes six bytes and trashes your accumulator, and you can get the same functionality with half the space and with no register corruption by simply defining the indirect jump to be a one-instruction routine and JSR-ing to it directly. As an added bonus, that way if you have multiple indirect jumps through the same pointer, you don't need to duplicate the jump instruction. Does this mean that abusing JSR and RTS is a dead-end, though? Not at all...
Dispatch-on-type and Data-Directed Assembler Most of the time, you care about function pointers because you've arranged them in some kind of table. You hand it an index representing the type of your argument, or which method it is you're calling, or some other determinator, and then you index into an array of routines and execute the right one. Writing a generic routine to do this is kind of a pain. First you have to pass a 16-bit pointer in, then you have to dereference it to figure out where your table is, then you have to do an indexed dereference on that to get the routine you want to run, then you need to copy it out to somewhere fixed so that you can write your jump instruction. And making this non-generic doesn't help a whole lot, since that only saves you the first two steps, but now you have to write them out in every single indexed jump instruction. If only there were some way to easily and quickly pass in a local pointer directly... Something, say, like the JSR instruction, only not for program code. Or we could just use the JSR statement itself, but only call this routine at the ends of other routines, much like we were organizing for indirect jumps to begin with. This lets us set up routines that look like this: jump'table'alpha: jsr do'jump'table .word alpha'0, alpha'1, alpha'2 Where the alpha'x routines are the ones to be called when the index has that value. This leaves the implementation of do'jump'table, which in this case uses the Y register to hold the index: do'jump'table: sta _scratch pla sta _jmpptr pla sta _jmpptr+1 tya asl tay iny lda (_jmpptr), y sta _target iny lda (_jmpptr), y sta _target+1 lda _scratch jmp (_target) The TYA:ASL:TAY:INY sequence can actually be omitted if you don't mind having your Y indices be 1, 3, 5, 7, 9, etc., instead of 0, 1, 2, 3, 4, etc. Likewise, the instructions dealing with _scratch can be omitted if you don't mind trashing the accumulator. Keeping the accumulator and X register pristine for the target call comes in handy, though, because it means we can pass in a pointer argument purely in registers. This will come in handy soon...
VTables and Object-Oriented Assembler The usual technique for getting something that looks object-oriented in non-object-oriented languages is to fill a structure with function pointers, and have those functions take the structure itself as an argument. This works just fine in assembler, of course (and doesn't really require anything more than your traditional jump-indirects), but it's also possible to use a lot of the standard optimizations that languages such as C++ provide. The most important of these is the vtable. Each object type has its own vtable, and it's a list of function pointers for all the methods that type provides. This is a space savings over the traditional structs-with-function-pointers approach because when you have many objects of the same class, you only have to represent the vtable once. So that all objects may be treated identically, the vtable location is traditionally fixed as being the first entry in the corresponding structure. Virtual method invocation takes an object pointer (traditionally called self or this) and a method index and invokes the approprate method on that object. Gee, where have we seen that before? sprite'vtable: jsr do'jump'table .word sprite'init, sprite'update, sprite'render We mentioned before that vtables are generally the first entries in objects. We can play another nasty trick here, paying an additional byte per object to have the vtable be not merely a pointer to its vtable routine, but an actual jump instruction to it. (That is, if an object is at location X, then location X is the byte value $4C, representing JMP, location X+1 is the low byte of the vtable, and location X+2 is the high byte of the vtable.) Given that, our invokevirtual function becomes very simple indeed: invokevirtual: sta this stx this+1 jmp (this) Which, combined with all our previous work here, takes the this pointer in .AX and a method identifier in .Y and invokes that method on that object. Arguments besides this need to be set up before the call to invokevirtual, probably in some global argument array somewhere as discussed back in .
A final reminder We've been talking about all these routines as if they could be copy-pasted or hand-compiled from C++ or Java code. This isn't really the case, primarily because local variables in your average assembler routines aren't really local, so multiple calls to the same method will tend to trash the program state. And since a lot of the machinery described here shares a lot of memory (in particular, every single method invocation everywhere shares a this), attempting to shift over standard OO code into this format is likely to fail miserably. You can get an awful lot of flexibility out of even just one layer of method-calls, though, given a thoughtful design. The do'jump'table routine, or one very like it, was extremely common in NES games in the mid-1980s and later, usually as the beginning of the frame-update loop. If you find you really need multiple layers of method calls, though, then you really are going to need a full-on program stack, and that's going to be several kinds of mess. That's the topic for the final chapter.