Pointers and Indirection The basics of pointers versus cursors (or, at the 6502 assembler level, the indirect indexed addressing mode versus the absolute indexed ones) were covered in This essay seeks to explain the uses of the indirect modes, and how to implement pointer operations with them. It does not seek to explain why you'd want to use pointers for something to begin with; for a tutorial on proper pointer usage, consult any decent C textbook.
The absolute basics A pointer is a variable holding the address of a memory location. Memory locations take 16 bits to represent on the 6502: thus, we need two bytes to hold it. Any decent assembler will have ways of taking the high and low bytes of an address; use these to acquire the raw values you need. The 6502 chip does not have any simple pure indirect modes (except for JMP, which is a matter for a later essay); all are indexed, and they're indexed different ways depending on which index register you use.
The simplest example When doing a simple, direct dereference (that is, something equivalent to the C code c=*b;) the code looks like this: ldy #0 lda (b), y sta c Even with this simple example, there are several important things to notice. The variable b must be on the zero page, and furthermore, it cannot be $FF. All your pointer values need to be either stored on the zero page to begin with or copied there before use. The y in the lda statement must be y. It cannot be x (that's a different form of indirection), and it cannot be a constant. If you're doing a lot of indirection, be sure to keep your Y register free to handle the indexing on the pointers. The b variable is used alone. Statements like lda (b+2), y are syntactically valid and sometimes even correct: it dereferences the value next to b after adding y to the value therein. However, it is almost guaranteed that what you *really* wanted to do was compute *(b+2) (that is, take the address of b, add 2 to that, and dereference that value); see the next section for how to do this properly. In nearly all cases, it is the Y-register's version (Indirect Indexed) that you want to use when you're dealing with pointers. Even though either version could be used for this example, we use the Y register to establish this habit.
Pointer arithmetic Pointer arithmetic is an obscenely powerful and dangerous technique. However, it's the most straightforward way to deal with enormous arrays, structs, indexable stacks, and nearly everything you do in C. (C has no native array or string types primarily because it allows arbitrary pointer arithmetic, which is strong enough to handle all of those without complaint and at blazing speed. It also allows for all kinds of buffer overrun security holes, but let's face it, who's going to be cracking root on your Apple II?) There are a number of ways to implement this on the 6502. We'll deal with them in increasing order of design complexity.
The straightforward, slow way When computing a pointer value, you simply treat the pointer as if it were a 16-bit integer. Do all the math you need, then when the time comes to dereference it, simply do a direct dereference as above. This is definitely doable, and it's not difficult. However, it is costly in both space and time. When dealing with arbitrary indices large enough that they won't fit in the Y register, or when creating values that you don't intend to dereference (such as subtracting two pointers to find the length of a string), this is also the only truly usable technique.
The clever fast way But wait, you say. Often when we compute a value, at least one of the operations is going to be an addition, and we're almost certain to have that value be less than 256! Surely we may save ourselves an operation by loading that value into the Y register and having the load operation itself perform the final addition! Very good. This is the fastest technique, and sometimes it's even the most readable. These cases usually involve repeated reading of various fields from a structure or record. The base pointer always points to the base of the structure (or the top of the local variable list, or what have you) and the Y register takes values that index into that structure. This lets you keep the pointer variable in memory largely static and requires no explicit arithmetic instructions at all. However, this technique is highly opaque and should always be well documented, indicating exactly what you think you're pointing at. Then, when you get garbage results, you can compare your comments and the resulting Y values with the actual definition of the structure to see who's screwing up. For a case where we still need to do arithmetic, consider the classic case of needing to clear out a large chunk of memory. The following code fills the 4KB of memory between $C000 and $D000 with zeroes: lda #$C0 ; Store #$C000 in mem (low byte first) sta mem+1 lda #$00 sta mem ldx #$04 ; x holds number of times to execute outer loop tay ; accumulator and y are both 0 loop: sta (mem), y iny bne loop ; Inner loop ends when y wraps around to 0 inc mem+1 ; "Carry" from the iny to the core pointer dex ; Decrement outer loop count, quit if done bne loop Used carefully, proper use of the Y register can make your code smaller, faster, and more readable. Used carelessly it can make your code an unreadable, unmaintainable mess. Use it wisely, and with care, and it will be your greatest ally in writing flexible code.
What about Indexed Indirect? This essay has concerned itself almost exclusively with the Indirect Indexed—or (Indirect), Y—mode. What about Indexed Indirect—(Indirect, X)? This is a much less useful mode than the Y register's version. While the Y register indirection lets you implement pointers and arrays in full generality, the X register is useful for pretty much only one application: lookup tables for single byte values. Even coming up with a motivating example for this is difficult, but here goes. Suppose you have multiple, widely disparate sections of memory that you're watching for signals. The following routine takes a resource index in the accumulator and returns the status byte for the corresponding resource. ; This data is sitting on the zero page somewhere resource_status_table: .word resource0_status, resource1_status, .word resource2_status, resource3_status, ; etc. etc. etc. ; This is the actual program code .text getstatus: clc ; Multiply argument by 2 before putting it in X, so that it asl ; produces a value that's properly word-indexed tax lda (resource_status_table, x) rts Why having a routine such as this is better than just having the calling routine access resourceN_status itself as an absolute memory load is left as an exercise for the reader. That aside, this code fragment does serve as a reminder that when indexing an array of anything other than bytes, you must multiply your index by the size of the objects you want to index. C does this automatically—assembler does not. Stay sharp.
Comparison with the other indexed forms Pointers are slow. It sounds odd saying this, when C is the fastest language around on modern machines precisely because of its powerful and extensive use of pointers. However, modern architectures are designed to be optimized for C-style code (as an example, the x86 architecture allows statements like mov eax, [bs+bx+4*di] as a single instruction), while the 6502 is not. An (Indirect, Y) operation can take up to 6 cycles to complete just on its own, while the preparation of that command costs additional time and scribbles over a bunch of registers, meaning memory operations to save the values and yet more time spent. The simple code given at the beginning of this essay—loading *b into the accumulator—takes 7 cycles, not counting the 6 it takes to load b with the appropriate value to begin with. If b is known to contain a specific value, we can write a single Absolute mode instruction to load its value, which takes only 4 cycles and also preserves the value in the Y register. Clearly, Absolute mode should be used whenever possible. One might be tempted to use self-modifying code to solve this problem. This actually doesn't pay off near enough for the hassle it generates; for self-modifying code, the address must be generated, then stored in the instruction, and then the data must be loaded. Cost: 16 cycles for 2 immediate loads, 2 absolute stores, and 1 absolute load. For the straight pointer dereference, we generate the address, store it in the pointer, clear the index, then dereference that. Cost: 17 cycles for 3 immediate loads, 2 zero page stores, and 1 indexed indirect load. Furthermore, unlike in the self-modifying case, loops where simple arithmetic is being continuously performed only require repeating the final load instruction, which allows for much greater time savings over an equivalent self-modifying loop. (This point is also completely moot for NES programmers or anyone else whose programs are sitting in ROM, because programs stored on a ROM cannot modify themselves.)
Conclusion That's pretty much it for pointers. Though they tend to make programs hairy, and learning how to properly deal with pointers is what separates real C programmers from the novices, the basic mechanics of them are not complex. With pointers you can do efficient passing of large structures, pass-by-reference, complicated return values, and dynamic memory management—and now these wondrous toys may be added to your assembler programs, too (assuming you have that kind of space to play with).