mirror of
https://github.com/michaelcmartin/Ophis.git
synced 2024-12-22 03:29:55 +00:00
298 lines
12 KiB
Plaintext
298 lines
12 KiB
Plaintext
|
<chapter id="hll3">
|
||
|
<title>Pointers and Indirection</title>
|
||
|
|
||
|
<para>
|
||
|
The basics of pointers versus cursors (or, at the 6502 assembler
|
||
|
level, the indirect indexed addressing mode versus the absolute
|
||
|
indexed ones) were covered in <xref linkend="hll2"> This essay seeks
|
||
|
to explain the uses of the indirect modes, and how to implement
|
||
|
pointer operations with them. It does <emphasis>not</emphasis> seek to explain
|
||
|
why you'd want to use pointers for something to begin with; for a
|
||
|
tutorial on proper pointer usage, consult any decent C textbook.
|
||
|
</para>
|
||
|
|
||
|
<section>
|
||
|
<title>The absolute basics</title>
|
||
|
|
||
|
<para>
|
||
|
A pointer is a variable holding the address of a memory location.
|
||
|
Memory locations take 16 bits to represent on the 6502: thus, we
|
||
|
need two bytes to hold it. Any decent assembler will have ways of
|
||
|
taking the high and low bytes of an address; use these to acquire
|
||
|
the raw values you need. The 6502 chip does not have any
|
||
|
simple <quote>pure</quote> indirect modes (except
|
||
|
for <literal>JMP</literal>, which is a matter for a later essay);
|
||
|
all are indexed, and they're indexed different ways depending on
|
||
|
which index register you use.
|
||
|
</para>
|
||
|
|
||
|
<section>
|
||
|
<title>The simplest example</title>
|
||
|
|
||
|
<para>
|
||
|
When doing a simple, direct dereference (that is, something
|
||
|
equivalent to the C code <literal>c=*b;</literal>) the code
|
||
|
looks like this:
|
||
|
</para>
|
||
|
|
||
|
<programlisting>
|
||
|
ldy #0
|
||
|
lda (b), y
|
||
|
sta c
|
||
|
</programlisting>
|
||
|
|
||
|
<para>
|
||
|
Even with this simple example, there are several important
|
||
|
things to notice.
|
||
|
</para>
|
||
|
|
||
|
<itemizedlist>
|
||
|
<listitem>
|
||
|
<para>
|
||
|
The variable <literal>b</literal> <emphasis>must be on the
|
||
|
zero page</emphasis>, and furthermore, it <emphasis>cannot
|
||
|
be $FF.</emphasis> All your pointer values need to be
|
||
|
either stored on the zero page to begin with or copied
|
||
|
there before use.
|
||
|
</para>
|
||
|
</listitem>
|
||
|
<listitem>
|
||
|
<para>
|
||
|
The <literal>y</literal> in the <literal>lda</literal>
|
||
|
statement must be y. It cannot be x (that's a different
|
||
|
form of indirection), and it cannot be a constant. If
|
||
|
you're doing a lot of indirection, be sure to keep your Y
|
||
|
register free to handle the indexing on the
|
||
|
pointers.
|
||
|
</para>
|
||
|
</listitem>
|
||
|
<listitem>
|
||
|
<para>
|
||
|
The <literal>b</literal> variable is used alone. Statements
|
||
|
like <literal>lda (b+2), y</literal> are syntactically valid
|
||
|
and sometimes even correct: it dereferences the value next
|
||
|
to <literal>b</literal> after adding y to the value therein.
|
||
|
However, it is almost guaranteed that what you *really*
|
||
|
wanted to do was compute <literal>*(b+2)</literal> (that is,
|
||
|
take the address of b, add 2 to <emphasis>that</emphasis>,
|
||
|
and dereference that value); see the next section for how to
|
||
|
do this properly.
|
||
|
</para>
|
||
|
</listitem>
|
||
|
</itemizedlist>
|
||
|
|
||
|
<para>
|
||
|
In nearly all cases, it is the Y-register's version (Indirect
|
||
|
Indexed) that you want to use when you're dealing with pointers.
|
||
|
Even though either version could be used for this example, we
|
||
|
use the Y register to establish this habit.
|
||
|
</para>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section>
|
||
|
<title>Pointer arithmetic</title>
|
||
|
|
||
|
<para>
|
||
|
Pointer arithmetic is an obscenely powerful and dangerous
|
||
|
technique. However, it's the most straightforward way to deal
|
||
|
with enormous arrays, structs, indexable stacks, and nearly
|
||
|
everything you do in C. (C has no native array or string types
|
||
|
primarily because it allows arbitrary pointer arithmetic, which is
|
||
|
strong enough to handle all of those without complaint and at
|
||
|
blazing speed. It also allows for all kinds of buffer overrun
|
||
|
security holes, but let's face it, who's going to be cracking root
|
||
|
on your Apple II?) There are a number of ways to implement this
|
||
|
on the 6502. We'll deal with them in increasing order of design
|
||
|
complexity.
|
||
|
</para>
|
||
|
|
||
|
<section>
|
||
|
<title>The straightforward, slow way</title>
|
||
|
|
||
|
<para>
|
||
|
When computing a pointer value, you simply treat the pointer as
|
||
|
if it were a 16-bit integer. Do all the math you need, then
|
||
|
when the time comes to dereference it, simply do a direct
|
||
|
dereference as above. This is definitely doable, and it's not
|
||
|
difficult. However, it is costly in both space and time.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
When dealing with arbitrary indices large enough that they won't
|
||
|
fit in the Y register, or when creating values that you don't
|
||
|
intend to dereference (such as subtracting two pointers to find
|
||
|
the length of a string), this is also the only truly usable
|
||
|
technique.
|
||
|
</para>
|
||
|
</section>
|
||
|
<section>
|
||
|
<title>The clever fast way</title>
|
||
|
|
||
|
<para>
|
||
|
But wait, you say. Often when we compute a value, at least one
|
||
|
of the operations is going to be an addition, and we're almost
|
||
|
certain to have that value be less than 256! Surely we may save
|
||
|
ourselves an operation by loading that value into the Y register
|
||
|
and having the load operation itself perform the final
|
||
|
addition!
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
Very good. This is the fastest technique, and sometimes it's
|
||
|
even the most readable. These cases usually involve repeated
|
||
|
reading of various fields from a structure or record. The base
|
||
|
pointer always points to the base of the structure (or the top
|
||
|
of the local variable list, or what have you) and the Y register
|
||
|
takes values that index into that structure. This lets you keep
|
||
|
the pointer variable in memory largely static and requires no
|
||
|
explicit arithmetic instructions at all.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
However, this technique is highly opaque and should always be
|
||
|
well documented, indicating exactly what you think you're
|
||
|
pointing at. Then, when you get garbage results, you can
|
||
|
compare your comments and the resulting Y values with the actual
|
||
|
definition of the structure to see who's screwing up.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
For a case where we still need to do arithmetic, consider the
|
||
|
classic case of needing to clear out a large chunk of memory.
|
||
|
The following code fills the 4KB of memory between $C000 and
|
||
|
$D000 with zeroes:
|
||
|
</para>
|
||
|
|
||
|
<programlisting>
|
||
|
lda #$C0 ; Store #$C000 in mem (low byte first)
|
||
|
sta mem+1
|
||
|
lda #$00
|
||
|
sta mem
|
||
|
ldx #$04 ; x holds number of times to execute outer loop
|
||
|
tay ; accumulator and y are both 0
|
||
|
loop: sta (mem), y
|
||
|
iny
|
||
|
bne loop ; Inner loop ends when y wraps around to 0
|
||
|
inc mem+1 ; "Carry" from the iny to the core pointer
|
||
|
dex ; Decrement outer loop count, quit if done
|
||
|
bne loop
|
||
|
</programlisting>
|
||
|
|
||
|
<para>
|
||
|
Used carefully, proper use of the Y register can make your code
|
||
|
smaller, faster, <emphasis>and</emphasis> more readable. Used
|
||
|
carelessly it can make your code an unreadable, unmaintainable
|
||
|
mess. Use it wisely, and with care, and it will be your
|
||
|
greatest ally in writing flexible code.
|
||
|
</para>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section>
|
||
|
<title>What about Indexed Indirect?</title>
|
||
|
|
||
|
<para>
|
||
|
This essay has concerned itself almost exclusively with the
|
||
|
Indirect Indexed—or (Indirect), Y—mode. What about Indexed
|
||
|
Indirect—(Indirect, X)? This is a <emphasis>much</emphasis>
|
||
|
less useful mode than the Y register's version. While the Y
|
||
|
register indirection lets you implement pointers and arrays in
|
||
|
full generality, the X register is useful for pretty much only one
|
||
|
application: lookup tables for single byte values.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
Even coming up with a motivating example for this is difficult,
|
||
|
but here goes. Suppose you have multiple, widely disparate
|
||
|
sections of memory that you're watching for signals. The
|
||
|
following routine takes a resource index in the accumulator and
|
||
|
returns the status byte for the corresponding resource.
|
||
|
</para>
|
||
|
|
||
|
<programlisting>
|
||
|
; This data is sitting on the zero page somewhere
|
||
|
resource_status_table: .word resource0_status, resource1_status,
|
||
|
.word resource2_status, resource3_status,
|
||
|
; etc. etc. etc.
|
||
|
|
||
|
; This is the actual program code
|
||
|
.text
|
||
|
getstatus:
|
||
|
clc ; Multiply argument by 2 before putting it in X, so that it
|
||
|
asl ; produces a value that's properly word-indexed
|
||
|
tax
|
||
|
lda (resource_status_table, x)
|
||
|
rts
|
||
|
</programlisting>
|
||
|
|
||
|
<para>
|
||
|
Why having a routine such as this is better than just having the
|
||
|
calling routine access resourceN_status itself as an absolute
|
||
|
memory load is left as an exercise for the reader. That aside,
|
||
|
this code fragment does serve as a reminder that when indexing an
|
||
|
array of anything other than bytes, you must multiply your index
|
||
|
by the size of the objects you want to index. C does this
|
||
|
automatically—assembler does not. Stay sharp.
|
||
|
</para>
|
||
|
</section>
|
||
|
<section>
|
||
|
<title>Comparison with the other indexed forms</title>
|
||
|
|
||
|
<para>
|
||
|
Pointers are slow. It sounds odd saying this, when C is the
|
||
|
fastest language around on modern machines precisely because of
|
||
|
its powerful and extensive use of pointers. However, modern
|
||
|
architectures are designed to be optimized for C-style code (as an
|
||
|
example, the x86 architecture allows statements like <literal>mov
|
||
|
eax, [bs+bx+4*di]</literal> as a single instruction), while the
|
||
|
6502 is not. An (Indirect, Y) operation can take up to 6 cycles
|
||
|
to complete just on its own, while the preparation of that command
|
||
|
costs additional time <emphasis>and</emphasis> scribbles over a
|
||
|
bunch of registers, meaning memory operations to save the values
|
||
|
and yet more time spent. The simple code given at the beginning
|
||
|
of this essay—loading <literal>*b</literal> into the
|
||
|
accumulator—takes 7 cycles, not counting the 6 it takes to
|
||
|
load b with the appropriate value to begin with. If b is known to
|
||
|
contain a specific value, we can write a single Absolute mode
|
||
|
instruction to load its value, which takes only 4 cycles and also
|
||
|
preserves the value in the Y register. Clearly, Absolute mode
|
||
|
should be used whenever possible.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
One might be tempted to use self-modifying code to solve this
|
||
|
problem. This actually doesn't pay off near enough for the hassle
|
||
|
it generates; for self-modifying code, the address must be
|
||
|
generated, then stored in the instruction, and then the data must
|
||
|
be loaded. Cost: 16 cycles for 2 immediate loads, 2 absolute
|
||
|
stores, and 1 absolute load. For the straight pointer
|
||
|
dereference, we generate the address, store it in the pointer,
|
||
|
clear the index, then dereference that. Cost: 17 cycles for 3
|
||
|
immediate loads, 2 zero page stores, and 1 indexed indirect load.
|
||
|
Furthermore, unlike in the self-modifying case, loops where simple
|
||
|
arithmetic is being continuously performed only require repeating
|
||
|
the final load instruction, which allows for much greater time
|
||
|
savings over an equivalent self-modifying loop.
|
||
|
</para>
|
||
|
|
||
|
<para>
|
||
|
(This point is also completely moot for NES programmers or anyone
|
||
|
else whose programs are sitting in ROM, because programs stored on
|
||
|
a ROM cannot modify themselves.)
|
||
|
</para>
|
||
|
</section>
|
||
|
<section>
|
||
|
<title>Conclusion</title>
|
||
|
|
||
|
<para>
|
||
|
That's pretty much it for pointers. Though they tend to make
|
||
|
programs hairy, and learning how to properly deal with pointers is
|
||
|
what separates real C programmers from the novices, the basic
|
||
|
mechanics of them are not complex. With pointers you can do
|
||
|
efficient passing of large structures, pass-by-reference,
|
||
|
complicated return values, and dynamic memory management—and
|
||
|
now these wondrous toys may be added to your assembler programs,
|
||
|
too (assuming you have that kind of space to play with).
|
||
|
</para>
|
||
|
</section>
|
||
|
</chapter>
|