mirror of
https://github.com/michaelcmartin/Ophis.git
synced 2024-09-14 10:54:26 +00:00
Update documentation.
This commit is contained in:
parent
07f807d680
commit
ffd96a8c2f
291
doc/cmdref.sgm
291
doc/cmdref.sgm
@ -302,10 +302,10 @@
|
||||
</para>
|
||||
<programlisting>
|
||||
.macro store16 ; `store16 dest, src
|
||||
lda #<_2
|
||||
sta _1
|
||||
lda #>_2
|
||||
sta _1+1
|
||||
lda #<_2
|
||||
sta _1
|
||||
lda #>_2
|
||||
sta _1+1
|
||||
.macend
|
||||
</programlisting>
|
||||
<para>
|
||||
@ -361,91 +361,202 @@
|
||||
follow.
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>.advance</literal> <emphasis>address</emphasis>:
|
||||
Forces the program counter to
|
||||
be <emphasis>address</emphasis>. Unlike
|
||||
the <literal>.org</literal>
|
||||
directive, <literal>.advance</literal> outputs zeroes until the
|
||||
program counter reaches a specified address. Attempting
|
||||
to <literal>.advance</literal> to a point behind the current
|
||||
program counter is an assemble-time error.</para></listitem>
|
||||
<listitem><para><literal>.alias</literal> <emphasis>label</emphasis> <emphasis>value</emphasis>: The
|
||||
.alias directive assigns an arbitrary value to a label. This
|
||||
value may be an arbitrary argument, but cannot reference any
|
||||
label that has not already been defined (this prevents
|
||||
recursive label dependencies).</para></listitem>
|
||||
<listitem><para><literal>.byte</literal> <emphasis>arg</emphasis> [ , <emphasis>arg</emphasis>, ... ]:
|
||||
Specifies a series of arguments, which are evaluated, and
|
||||
strings, which are included as raw ASCII data. The final
|
||||
results of these arguments must be one byte in size. Seperate
|
||||
constants are seperated by comments.</para></listitem>
|
||||
<listitem><para><literal>.checkpc</literal> <emphasis>address</emphasis>: Ensures that the
|
||||
program counter is less than or equal to the address
|
||||
specified, and emits an assemble-time error if it is not.
|
||||
<emphasis>This produces no code in the final binary - it is there to
|
||||
ensure that linking a large amount of data together does not
|
||||
overstep memory boundaries.</emphasis></para></listitem>
|
||||
<listitem><para><literal>.data</literal> <emphasis>[label]</emphasis>: Sets the segment to
|
||||
the segment name specified and disallows output. If no label
|
||||
is given, switches to the default data segment.</para></listitem>
|
||||
<listitem><para><literal>.incbin</literal> <emphasis>filename</emphasis>: Inserts the
|
||||
contents of the file specified as binary data. Use it to
|
||||
include graphics information, precompiled code, or other
|
||||
non-assembler data.</para></listitem>
|
||||
<listitem><para><literal>.include</literal> <emphasis>filename</emphasis>: Includes the
|
||||
entirety of the file specified at that point in the program.
|
||||
Use this to order your final sources.</para></listitem>
|
||||
<listitem><para><literal>.org</literal> <emphasis>address</emphasis>: Sets the program
|
||||
counter to the address specified. <emphasis>This does not emit any
|
||||
code in and of itself, nor does it overwrite anything that
|
||||
previously existed.</emphasis> If you wish to jump ahead in memory,
|
||||
use <literal>.advance</literal>.</para></listitem>
|
||||
<listitem><para><literal>.require</literal> <emphasis>filename</emphasis>: Includes the entirety
|
||||
of the file specified at that point in the program. Unlike <literal>.include</literal>,
|
||||
however, code included with <literal>.require</literal> will only be inserted once.
|
||||
The <literal>.require</literal> directive is useful for ensuring that certain code libraries
|
||||
are somewhere in the final binary. They are also very useful for guaranteeing that
|
||||
macro libraries are available.</para></listitem>
|
||||
<listitem><para><literal>.space</literal> <emphasis>label</emphasis> <emphasis>size</emphasis>: This
|
||||
directive is used to organize global variables. It defines the
|
||||
label specified to be at the current location of the program
|
||||
counter, and then advances the program counter <emphasis>size</emphasis>
|
||||
steps ahead. No actual code is produced. This is equivalent
|
||||
to <literal>label: .org ^+size</literal>.</para></listitem>
|
||||
<listitem><para><literal>.text</literal> <emphasis>[label]</emphasis>: Sets the segment to
|
||||
the segment name specified and allows output. If no label is
|
||||
given, switches to the default text segment.</para></listitem>
|
||||
<listitem><para><literal>.word</literal> <emphasis>arg</emphasis> [ , <emphasis>arg</emphasis>, ... ]:
|
||||
Like <literal>.byte</literal>, but values are all treated as two-byte
|
||||
values and stored low-end first (as is the 6502's wont). Use
|
||||
this to create jump tables (an unadorned label will evaluate
|
||||
to that label's location) or otherwise store 16-bit
|
||||
data.</para></listitem>
|
||||
<listitem><para><literal>.dword</literal> <emphasis>arg</emphasis> [ , <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.word</literal>, but for 32-bit values.</para></listitem>
|
||||
<listitem><para><literal>.wordbe</literal> <emphasis>arg</emphasis> [ , <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.word</literal>, but stores the value in a big-endian format (high byte first).</para></listitem>
|
||||
<listitem><para><literal>.dwordbe</literal> <emphasis>arg</emphasis> [ , <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.dword</literal>, but stores the value high byte first.</para></listitem>
|
||||
<listitem><para><literal>.scope</literal>: Starts a new scope block. Labels
|
||||
that begin with an underscore are only reachable from within
|
||||
their innermost enclosing <literal>.scope</literal> statement.</para></listitem>
|
||||
<listitem><para><literal>.scend</literal>: Ends a scope block. Makes the
|
||||
temporary labels defined since the last <literal>.scope</literal>
|
||||
statement unreachable, and permits them to be redefined in a
|
||||
new scope.</para></listitem>
|
||||
<listitem><para><literal>.macro</literal> <emphasis>name</emphasis>: Begins a macro
|
||||
definition block. This is a scope block that can be inlined
|
||||
at arbitrary points with <literal>.invoke</literal>. Arguments to the
|
||||
macro will be bound to temporary labels with names like
|
||||
<literal>_1</literal>, <literal>_2</literal>, etc.</para></listitem>
|
||||
<listitem><para><literal>.macend</literal>: Ends a macro definition
|
||||
block.</para></listitem>
|
||||
<listitem><para><literal>.invoke</literal> <emphasis>label</emphasis> [<emphasis>argument</emphasis> [,
|
||||
<emphasis>argument</emphasis> ...]]: invokes (inlines) the specified
|
||||
macro, binding the values of the arguments to the ones the
|
||||
macro definition intends to read. A shorthand for <literal>.invoke</literal>
|
||||
is the name of the macro to invoke, backquoted.</para></listitem>
|
||||
</itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.outfile</literal> <emphasis>filename</emphasis>:
|
||||
Sets the filename for the output binary if one has not
|
||||
already been set. If no name is ever set, the output will
|
||||
be written to <literal>ophis.bin</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.advance</literal> <emphasis>address</emphasis>:
|
||||
Forces the program counter to
|
||||
be <emphasis>address</emphasis>. Unlike
|
||||
the <literal>.org</literal>
|
||||
directive, <literal>.advance</literal> outputs zeroes
|
||||
until the program counter reaches a specified
|
||||
address. Attempting to <literal>.advance</literal> to a
|
||||
point behind the current program counter is an
|
||||
assemble-time error.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.alias</literal> <emphasis>label</emphasis> <emphasis>value</emphasis>:
|
||||
The .alias directive assigns an arbitrary value to a
|
||||
label. This value may be an arbitrary argument, but
|
||||
cannot reference any label that has not already been
|
||||
defined (this prevents recursive label
|
||||
dependencies).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.byte</literal> <emphasis>arg</emphasis> [
|
||||
, <emphasis>arg</emphasis>, ... ]: Specifies a series of
|
||||
arguments, which are evaluated, and strings, which are
|
||||
included as raw ASCII data. The final results of these
|
||||
arguments must be one byte in size. Seperate constants
|
||||
are seperated by comments.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.checkpc</literal> <emphasis>address</emphasis>:
|
||||
Ensures that the program counter is less than or equal to
|
||||
the address specified, and emits an assemble-time error
|
||||
if it is not. <emphasis>This produces no code in the
|
||||
final binary - it is there to ensure that linking a large
|
||||
amount of data together does not overstep memory
|
||||
boundaries.</emphasis>
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.data</literal> <emphasis>[label]</emphasis>:
|
||||
Sets the segment to the segment name specified and
|
||||
disallows output. If no label is given, switches to the
|
||||
default data segment.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.incbin</literal> <emphasis>filename</emphasis>:
|
||||
Inserts the contents of the file specified as binary
|
||||
data. Use it to include graphics information, precompiled
|
||||
code, or other non-assembler data.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.include</literal> <emphasis>filename</emphasis>:
|
||||
Includes the entirety of the file specified at that point
|
||||
in the program. Use this to order your final sources, if
|
||||
you aren't doing it via the command line.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.org</literal> <emphasis>address</emphasis>:
|
||||
Sets the program counter to the address
|
||||
specified. <emphasis>This does not emit any code in and
|
||||
of itself, nor does it overwrite anything that previously
|
||||
existed.</emphasis> If you wish to jump ahead in memory,
|
||||
use <literal>.advance</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.require</literal> <emphasis>filename</emphasis>:
|
||||
Includes the entirety of the file specified at that point
|
||||
in the program. Unlike <literal>.include</literal>,
|
||||
however, code included with <literal>.require</literal>
|
||||
will only be inserted once.
|
||||
The <literal>.require</literal> directive is useful for
|
||||
ensuring that certain code libraries are somewhere in the
|
||||
final binary. They are also very useful for guaranteeing
|
||||
that macro libraries are available.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.space</literal> <emphasis>label</emphasis> <emphasis>size</emphasis>:
|
||||
This directive is used to organize global variables. It
|
||||
defines the label specified to be at the current location
|
||||
of the program counter, and then advances the program
|
||||
counter <emphasis>size</emphasis> steps ahead. No actual
|
||||
code is produced. This is equivalent to <literal>label:
|
||||
.org ^+size</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.text</literal> <emphasis>[label]</emphasis>:
|
||||
Sets the segment to the segment name specified and allows
|
||||
output. If no label is given, switches to the default
|
||||
text segment.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.word</literal> <emphasis>arg</emphasis> [
|
||||
, <emphasis>arg</emphasis>, ... ]:
|
||||
Like <literal>.byte</literal>, but values are all treated
|
||||
as two-byte values and stored low-end first (as is the
|
||||
6502's wont). Use this to create jump tables (an
|
||||
unadorned label will evaluate to that label's location)
|
||||
or otherwise store 16-bit data.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.dword</literal> <emphasis>arg</emphasis> [
|
||||
, <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.word</literal>, but for 32-bit
|
||||
values.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.wordbe</literal> <emphasis>arg</emphasis> [
|
||||
, <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.word</literal>, but stores the value in a
|
||||
big-endian format (high byte first).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.dwordbe</literal> <emphasis>arg</emphasis> [
|
||||
, <emphasis>arg</emphasis>, ...]:
|
||||
Like <literal>.dword</literal>, but stores the value high
|
||||
byte first.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.scope</literal>: Starts a new scope
|
||||
block. Labels that begin with an underscore are only
|
||||
reachable from within their innermost
|
||||
enclosing <literal>.scope</literal>
|
||||
statement.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.scend</literal>: Ends a scope block. Makes the
|
||||
temporary labels defined since the
|
||||
last <literal>.scope</literal> statement unreachable, and
|
||||
permits them to be redefined in a new
|
||||
scope.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.macro</literal> <emphasis>name</emphasis>:
|
||||
Begins a macro definition block. This is a scope block
|
||||
that can be inlined at arbitrary points
|
||||
with <literal>.invoke</literal>. Arguments to the macro
|
||||
will be bound to temporary labels with names like
|
||||
<literal>_1</literal>, <literal>_2</literal>, etc.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.macend</literal>: Ends a macro definition block.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>.invoke</literal> <emphasis>label</emphasis> [<emphasis>argument</emphasis> [,
|
||||
<emphasis>argument</emphasis> ...]]: invokes (inlines) the
|
||||
specified macro, binding the values of the arguments to the
|
||||
ones the macro definition intends to read. A shorthand
|
||||
for <literal>.invoke</literal> is the name of the macro to
|
||||
invoke, backquoted.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
</appendix>
|
||||
|
185
doc/hll1.sgm
Normal file
185
doc/hll1.sgm
Normal file
@ -0,0 +1,185 @@
|
||||
<chapter id="hll-1">
|
||||
<title>The Second Step</title>
|
||||
|
||||
<para>
|
||||
This essay discusses how to do 16-or-more bit addition and
|
||||
subtraction on the 6502, and how to do unsigned comparisons
|
||||
properly, thus making 16-bit arithmetic less necessary.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>The problem</title>
|
||||
<para>
|
||||
The <literal>ADC</literal>, <literal>SBC</literal>, <literal>INX</literal>,
|
||||
and <literal>INY</literal> instructions are the only real
|
||||
arithmetic instructions the 6502 chip has. In and of themselves,
|
||||
they aren't too useful for general applications: the accumulator
|
||||
can only hold 8 bits, and thus can't store any value over 255.
|
||||
Matters get even worse when we're branching based on
|
||||
values; <literal>BMI</literal> and <literal>BPL</literal> hinge on
|
||||
the seventh (sign) bit of the result, so we can't represent any
|
||||
value above 127.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>The solution</title>
|
||||
|
||||
<para>
|
||||
We have two solutions available to us. First, we can use
|
||||
the <quote>unsigned</quote> discipline, which involves checking
|
||||
different flags, but lets us deal with values between 0 and 255
|
||||
instead of -128 to 127. Second, we can trade speed and register
|
||||
persistence for multiple precision arithmetic, using 16-bit
|
||||
integers (-32768 to 32767, or 0-65535), 24-bit, or more.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Multiplication, division, and floating point arithmetic are beyond
|
||||
the scope of this essay. The best way to deal with those is to
|
||||
find a math library on the web (I
|
||||
recommend <ulink url="http://www.6502.org/"></ulink>) and use the
|
||||
routines there.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Unsigned arithmetic</title>
|
||||
<para>
|
||||
When writing control code that hinges on numbers, we should always
|
||||
strive to have our comparison be with zero; that way, no explicit
|
||||
compare is necessary, and we can branch simply
|
||||
with <literal>BEQ/BNE</literal>, which test the zero flag.
|
||||
Otherwise, we use <literal>CMP</literal>.
|
||||
The <literal>CMP</literal> command subtracts its argument from the
|
||||
accumulator (without borrow), updates the flags, but throws away
|
||||
the result. If the value is equal, the result is zero.
|
||||
(<literal>CMP</literal> followed by <literal>BEQ</literal>
|
||||
branches if the argument is equal to the accumulator; this is
|
||||
probably why it's called <literal>BEQ</literal> and not something
|
||||
like <literal>BZS</literal>.)
|
||||
</para>
|
||||
<para>
|
||||
Intuitively, then, to check if the accumulator is <emphasis>less
|
||||
than</emphasis> some value, we <literal>CMP</literal> against that
|
||||
value and <literal>BMI</literal>. The <literal>BMI</literal>
|
||||
command branches based on the Negative Flag, which is equal to the
|
||||
seventh bit of <literal>CMP</literal>'s subtract. That's exactly
|
||||
what we need, for signed arithmetic. However, this produces
|
||||
problems if you're writing a boundary detector on your screen or
|
||||
something and find that 192 < 4. 192 is outside of a signed
|
||||
byte's range, and is interpreted as if it were -64. This will not
|
||||
do for most graphics applications, where your values will be
|
||||
ranging from 0-319 or 0-199 or 0-255.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Instead, we take advantage of the implied subtraction
|
||||
that <literal>CMP</literal> does. When subtracting, the result's
|
||||
carry bit starts at 1, and gets borrowed from if necessary. Let
|
||||
us consider some four-bit subtractions.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
C|3210 C|3210
|
||||
------ ------
|
||||
1|1001 9 1|1001 9
|
||||
|0100 - 4 |1100 -12
|
||||
------ --- ------ ---
|
||||
1|0101 5 0|1101 -3
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The <literal>CMP</literal> command properly modifies the carry bit
|
||||
to reflect this. When computing A-B, the carry bit is set if A
|
||||
>= B, and it's clear if A < B. Consider the following two
|
||||
code sequences.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
(1) (2)
|
||||
CMP #$C0 CMP #$C0
|
||||
BMI label BCC label
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The code in the first column treats the value in the accumulator
|
||||
as a signed value, and branches if the value is less than -64.
|
||||
(Because of overflow issues, it will actually branch for
|
||||
accumulator values between $40 and $BF, even though it *should*
|
||||
only be doing it for values between $80 and $BF. To see why,
|
||||
compare $40 to $C0 and look at the result.) The second column
|
||||
code treats the accumulator as holding an unsigned value, and
|
||||
branches if the value is less than 192. It will branch for
|
||||
accumulator values $00-$BF.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>16-bit addition and subtraction</title>
|
||||
|
||||
<para>
|
||||
Time to use the carry bit for what it was meant to do. Adding two
|
||||
8 bit numbers can produce a 9-bit result. That 9th bit is stored
|
||||
in the carry flag. The <literal>ADC</literal> command adds the
|
||||
carry value to its result, as well. Thus, carries work just as
|
||||
we'd expect them to. Suppose we're storing two 16-bit values, low
|
||||
byte first, in $C100-1 and $C102-3. To add them together and
|
||||
store them in $C104-5, this is very easy:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
CLC
|
||||
LDA $C100
|
||||
ADC $C102
|
||||
STA $C104
|
||||
LDA $C101
|
||||
ADC $C103
|
||||
STA $C105
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Subtraction is identical, but you set the carry bit first
|
||||
with <literal>SEC</literal> (because borrow is the complement of
|
||||
carry—think about how the unsigned compare works if this
|
||||
puzzles you) and, of course, using the <literal>SBC</literal>
|
||||
instruction instead of <literal>ADC</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The carry/borrow bit is set appropriately to let you continue,
|
||||
too. As long as you just keep working your way up to bytes of
|
||||
ever-higher significance, this generalizes to 24 (do it three
|
||||
times instead of two) or 32 (four, etc.) bit integers.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>16-bit comparisons</title>
|
||||
|
||||
<para>
|
||||
Doing comparisons on extended precision values is about the same
|
||||
as doing them on 8-bit values, but you have to have the value you
|
||||
test in memory, since it won't fit in the accumulator all at once.
|
||||
You don't have to store the values back anywhere, either, since
|
||||
all you care about is the final state of the flags. For example,
|
||||
here's a signed comparison, branching to <literal>label</literal>
|
||||
if the value in $C100-1 is less than 1000 ($03E8):
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
SEC
|
||||
LDA $C100
|
||||
SBC #$E8
|
||||
LDA $C101 ; We only need the carry bit from that subtract
|
||||
SBC #$03
|
||||
BMI label
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
All the commentary on signed and unsigned compares holds for
|
||||
16-bit (or higher) integers just as it does for the 8-bit
|
||||
ones.
|
||||
</para>
|
||||
</section>
|
||||
</chapter>
|
880
doc/hll2.sgm
Normal file
880
doc/hll2.sgm
Normal file
@ -0,0 +1,880 @@
|
||||
<chapter id="hll2">
|
||||
<title>Structured Programming</title>
|
||||
|
||||
<para>
|
||||
This essay discusses the machine language equivalents of the
|
||||
basic <quote>structured programming</quote> concepts that are part
|
||||
of the <quote>imperative</quote> family of programming languages:
|
||||
if/then/else, for/next, while loops, and procedures. It also
|
||||
discusses basic use of variables, as well as arrays, multi-byte data
|
||||
types (records), and sub-byte data types (bitfields). It closes by
|
||||
hand-compiling pseudo-code for an insertion sort on linked lists
|
||||
into assembler. A complete Commodore 64 application is included as
|
||||
a sample with this essay.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>Control constructs</title>
|
||||
<section>
|
||||
<title>Branches: <literal>if x then y else z</literal></title>
|
||||
|
||||
<para>
|
||||
This is almost the most basic control construct.
|
||||
The <emphasis>most</emphasis> basic is <literal>if x then
|
||||
y</literal>, which is a simple branch instruction
|
||||
(bcc/bcs/beq/bmi/bne/bpl/bvc/bvs) past the <quote>then</quote>
|
||||
clause if the conditional is false:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
iny
|
||||
bne no'overflow
|
||||
inx
|
||||
no'overflow:
|
||||
;; rest of code
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This increments the value of the y register, and if it just
|
||||
wrapped back around to zero, it increments the x register too.
|
||||
It is basically equivalent to the C statement <literal>if
|
||||
((++y)==0) ++x;</literal>. We need a few more labels to handle
|
||||
else clauses as well.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
;; Computation of the conditional expression.
|
||||
;; We assume for the sake of the example that
|
||||
;; we want to execute the THEN clause if the
|
||||
;; zero bit is set, otherwise the ELSE
|
||||
;; clause. This will happen after a CMP,
|
||||
;; which is the most common kind of 'if'
|
||||
;; statement anyway.
|
||||
|
||||
BNE else'clause
|
||||
|
||||
;; THEN clause code goes here.
|
||||
|
||||
JMP end'of'if'stmt
|
||||
else'clause:
|
||||
|
||||
;; ELSE clause code goes here.
|
||||
|
||||
end'of'if'stmt:
|
||||
;; ... rest of code.
|
||||
</programlisting>
|
||||
</section>
|
||||
<section>
|
||||
<title>Free loops: <literal>while x do y</literal></title>
|
||||
|
||||
<para>
|
||||
A <emphasis>free loop</emphasis> is one that might execute any
|
||||
number of times. These are basically just a combination
|
||||
of <literal>if</literal> and <literal>goto</literal>. For
|
||||
a <quote>while x do y</quote> loop, that executes zero or more
|
||||
times, you'd have code like this...
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
loop'begin:
|
||||
;; ... computation of condition, setting zero
|
||||
;; bit if loop is finished...
|
||||
beq loop'done
|
||||
;; ... loop body goes here
|
||||
jmp loop'begin
|
||||
loop'done:
|
||||
;; ... rest of program.
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
If you want to ensure that the loop body executes at least once
|
||||
(do y while x), just move the test to the end.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
loop'begin:
|
||||
;; ... loop body goes here
|
||||
;; ... computation of condition, setting zero
|
||||
;; bit if loop is finished...
|
||||
bne loop'begin
|
||||
;; ... rest of program.
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The choice of zero bit is kind of arbitrary here. If the
|
||||
condition involves the carry bit, or overflow, or negative, then
|
||||
replace the beq with bcs/bvs/bmi appropriately.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Bounded loops: <literal>for i = x to y do z</literal></title>
|
||||
|
||||
<para>
|
||||
A special case of loops is one where you know exactly how many
|
||||
times you're going through it—this is called
|
||||
a <emphasis>bounded</emphasis> loop. Suppose you're copying 16
|
||||
bytes from $C000 to $D000. The C code for that would look
|
||||
something like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
int *a = 0xC000;
|
||||
int *b = 0xD000;
|
||||
int i;
|
||||
for (i = 0; i < 16; i++) { a[i] = b[i]; }
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
C doesn't directly support bounded loops;
|
||||
its <literal>for</literal> statement is just <quote>syntactic
|
||||
sugar</quote> for a while statement. However, we can take
|
||||
advantage of special purpose machine instructions to get very
|
||||
straightforward code:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldx #$00
|
||||
loop:
|
||||
lda $c000, x
|
||||
sta $d000, x
|
||||
inx
|
||||
cpx #$10
|
||||
bmi loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
However, remember that every arithmetic operation,
|
||||
including <literal>inx</literal> and <literal>dex</literal>,
|
||||
sets the various flags, including the Zero bit. That means that
|
||||
if we can make our computation <emphasis>end</emphasis> when the
|
||||
counter hits zero, we can shave off some bytes:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldx #$10
|
||||
loop:
|
||||
lda #$bfff, x
|
||||
sta #$cfff, x
|
||||
dex
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Notice that we had to change the addresses we're indexing from,
|
||||
because x takes a slightly different range of values. The space
|
||||
savings is small here, and it's become slightly more unclear.
|
||||
(It also hasn't actually saved any time, because the lda and sta
|
||||
instructions are crossing a page boundary where they weren't
|
||||
before—but if the start or end arrays began at $b020 or
|
||||
something this wouldn't be an issue.) This tends to work better
|
||||
when the precise value of the counter isn't used in the
|
||||
computation—so let us consider the NES, which uses memory
|
||||
location $2007 as a port to its video memory. Suppose we wish
|
||||
to jam 4,096 copies of the hex value $20 into the video memory.
|
||||
We can write this <emphasis>very</emphasis> cleanly, using the X
|
||||
and Y registers as indices in a nested loop.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldx #$10
|
||||
ldy #$00
|
||||
lda #$20
|
||||
loop:
|
||||
sta $2007
|
||||
iny
|
||||
bne loop
|
||||
dex
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Work through this code. Convince yourself that
|
||||
the <literal>sta</literal> is executed exactly 16*256 = 4096
|
||||
times.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This is an example of a <emphasis>nested</emphasis> loop: a loop
|
||||
inside a loop. Since our internal loop didn't need the X or Y
|
||||
registers, we got to use both of them, which is nice, because
|
||||
they have special incrementing and decrementing instructions.
|
||||
The accumulator lacks these instructions, so it is a poor choice
|
||||
to use for index variables. If you have a bounded loop and
|
||||
don't have access to registers, use memory locations
|
||||
instead:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
lda #$10
|
||||
sta counter ; loop 16 times
|
||||
loop:
|
||||
;; Do stuff that trashes all the registers
|
||||
dec counter
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
That's it! These are the basic control constructs for using
|
||||
inside of procedures. Before talking about how to organize
|
||||
procedures, I'll briefly cover the way the 6502 handles its
|
||||
stack, because stacks and procedures are very tightly
|
||||
intertwined.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>The stack</title>
|
||||
|
||||
<para>
|
||||
The 6502 has an onboard stack in page 1. You can modify the stack
|
||||
pointer by storing values in X register and
|
||||
using <literal>txs</literal>; an <quote>empty</quote> stack is
|
||||
value $FF. Going into a procedure pushes the address of the next
|
||||
instruction onto the stack, and RTS pops that value off and jumps
|
||||
there. (Well, not precisely. JSR actually pushes a value that's
|
||||
one instruction short, and RTS loads the value, increases it by
|
||||
one, and THEN jumps there. But that's only an issue if you're
|
||||
using RTS to implement jump tables.) On an interrupt, the next
|
||||
instruction's address is pushed on the stack, then the process
|
||||
flags, and it jumps to the handler. The return from interrupt
|
||||
restores the flags and the PC, just as if nothing had
|
||||
happened.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The stack only has 256 possible entries; since addresses take two
|
||||
bytes to store, that means that if you call something that calls
|
||||
something that calls something that (etc., etc., 129 times), your
|
||||
computation will fail. This can happen faster if you save
|
||||
registers or memory values on the stack (see below).
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Procedures and register saving</title>
|
||||
|
||||
<para>
|
||||
All programming languages are designed around the concept of
|
||||
procedures.<footnote><para>Yes, all of them. Functional languages
|
||||
just let you do more things with them, logic programming has
|
||||
implicit calls to query procedures, and
|
||||
object-oriented <quote>methods</quote> are just normal procedures
|
||||
that take one extra argument in secret.</para></footnote>
|
||||
Procedures let you break a computation up into different parts,
|
||||
then use them independently. However, compilers do a lot of work
|
||||
for you behind the scenes to let you think this. Consider the
|
||||
following assembler code. How many times does the loop
|
||||
execute?
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
loop: ldx #$10 jsr do'stuff dex bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The correct answer is <quote>I don't know, but
|
||||
it <emphasis>should</emphasis> be 16.</quote> The reason we don't
|
||||
know is because we're assuming here that
|
||||
the <literal>do'stuff</literal> routine doesn't change the value
|
||||
of the X register. If it does, than all sorts of chaos could
|
||||
result. For major routines that aren't called often but are
|
||||
called in places where the register state is important, you should
|
||||
store the old registers on the stack with code like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
do'stuff:
|
||||
pha
|
||||
txa
|
||||
pha
|
||||
tya
|
||||
pha
|
||||
|
||||
;; Rest of do'stuff goes here
|
||||
|
||||
pla
|
||||
tay
|
||||
pla
|
||||
tax
|
||||
pla
|
||||
rts
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
(Remember, the last item pushed onto the stack is the first one
|
||||
pulled off, so you have to restore them in reverse order.) That's
|
||||
three more bytes on the stack, so you don't want to do this if you
|
||||
don't absolutely have to. If <literal>do'stuff</literal>
|
||||
actually <emphasis>doesn't</emphasis> touch X, there's no need to
|
||||
save and restore the value. This technique is
|
||||
called <emphasis>callee-save</emphasis>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The reverse technique is called <emphasis>caller-save</emphasis>
|
||||
and pushes important registers onto the stack before the routine
|
||||
is called, then restores them afterwards. Each technique has its
|
||||
advantages and disadvantages. The best way to handle it in your
|
||||
own code is to mark at the top of each routine which registers
|
||||
need to be saved by the caller. (It's also useful to note things
|
||||
like how it takes arguments and how it returns values.)
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Variables</title>
|
||||
|
||||
<para>
|
||||
Variables come in several flavors.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>Global variables</title>
|
||||
|
||||
<para>
|
||||
Global variables are variables that can be reached from any
|
||||
point in the program. Since the 6502 has no memory protection,
|
||||
these are easy to declare. Take some random chunk of unused
|
||||
memory and declare it to be the global variables area. All
|
||||
reasonable assemblers have commands that let you give a symbolic
|
||||
name to a memory location—you can use this to give your
|
||||
globals names.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Local variables</title>
|
||||
|
||||
<para>
|
||||
All modern languages have some concept of <quote>local
|
||||
variables</quote>, which are data values unique to that
|
||||
invocation of that procedure. In modern architecures, this data
|
||||
is stored into and read directly off of the stack. The 6502
|
||||
doesn't really let you do this cleanly; I'll discuss ways of
|
||||
handling it in a later essay. If you're implementing a system
|
||||
from scratch, you can design your memory model to not require
|
||||
such extreme measures. There are three basic techniques.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>Treat local variables like registers</title>
|
||||
<para>
|
||||
This means that any memory location you use, you save on the
|
||||
stack and restore afterwards. This
|
||||
can <emphasis>really</emphasis> eat up stack space, and it's
|
||||
really slow, it's often pointless, and it has a tendency to
|
||||
overflow the stack. I can't recommend it. But it does let
|
||||
you do recursion right, if you don't need to save much memory
|
||||
and you aren't recursing very deep.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Procedure-based memory allocation</title>
|
||||
<para>
|
||||
With this technique, you give each procedure its own little
|
||||
chunk of memory for use with its data. All the variables are
|
||||
still, technically, globals; a
|
||||
routine <emphasis>could</emphasis> interfere with another's,
|
||||
but the discipline of <quote>only mess with real globals, and
|
||||
your own locals</quote> is very, very easy to maintain.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This has many advantages. It's <emphasis>very</emphasis>
|
||||
fast, both to write and to run, because loading a variable is
|
||||
an Absolute or Zero Page instruction. Also, any procedure may
|
||||
call any other procedure, as long as it doesn't wind up
|
||||
calling itself at some point.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
It has two major disadvantages. First, if many routines need
|
||||
a lot of space, it can consume more memory than it should.
|
||||
Also, this technique can require significant assembler
|
||||
support—you must ensure that no procedure's local
|
||||
variables are defined in the same place as any other
|
||||
procedure, and it essentially requires a full symbolic linker
|
||||
to do right. Ophis includes commands for <emphasis>memory
|
||||
segmentation simulation</emphasis> that automate most of this
|
||||
task, and make writing general libraries feasible.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Partition-based memory allocation</title>
|
||||
|
||||
<para>
|
||||
It's not <emphasis>really</emphasis> necessary that no
|
||||
procedure overwrite memory used by any other procedure. It's
|
||||
only required that procedures don't write on the memory that
|
||||
their <emphasis>callers</emphasis> use. Suppose that your
|
||||
program is organized into a bunch of procedures, and each fall
|
||||
into one of three sets:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>Procedures in set A don't call anyone.</para></listitem>
|
||||
<listitem><para>Procedures in set B only call procedures in set A.</para></listitem>
|
||||
<listitem><para>Procedures in set C only call procedures in sets A or B.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
Now, each <emphasis>set</emphasis> can be given its own chunk
|
||||
of memory, and we can be absolutely sure that no procedures
|
||||
overwrite each other. Even if every procedure in set C uses
|
||||
the <emphasis>same</emphasis> memory location, they'll never
|
||||
step on each other, because there's no way to get to any other
|
||||
routine in set C <emphasis>from</emphasis> any routine in set
|
||||
C.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This has the same time efficiencies as procedure-based memory
|
||||
allocation, and, given a thoughtful design aimed at using this
|
||||
technique, also can use significantly less memory at run time.
|
||||
It's also requires much less assembler support, as addresses
|
||||
for variables may be assigned by hand without having to worry
|
||||
about those addresses already being used. However, it does
|
||||
impose a very tight discipline on the design of the overall
|
||||
system, so you'll have to do a lot more work before you start
|
||||
actually writing code.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>Constants</title>
|
||||
|
||||
<para>
|
||||
Constants are <quote>variables</quote> that don't change. If
|
||||
you know that the value you're using is not going to change, you
|
||||
should fold it into the code, either as an Immediate operand
|
||||
wherever it's used, or (if it's more complicated than that)
|
||||
as <literal>.byte</literal> commands in between the procedures.
|
||||
This is especially important for ROM-based systems such as the
|
||||
NES; the NES has very little RAM available, so constants should
|
||||
be kept in the more plentiful ROM wherever possible.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Data structures</title>
|
||||
|
||||
<para>
|
||||
So far, we've been treating data as a bunch of one-byte values.
|
||||
There really isn't a lot you can do just with bytes. This section
|
||||
talks about how to deal with larger and smaller elements.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>Arrays</title>
|
||||
|
||||
<para>
|
||||
An <emphasis>array</emphasis> is a bunch of data elements in a
|
||||
row. An array of bytes is very easy to handle with the 6502
|
||||
chip, because the various indexed addressing modes handle it for
|
||||
you. Just load the index into the X or Y register and do an
|
||||
absolute indexed load. In general, these are going to be
|
||||
zero-indexed (that is, a 32-byte array is indexed from 0 to 31.)
|
||||
This code would initialize a byte array with 32 entries to
|
||||
0:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
lda #$00
|
||||
tax
|
||||
loop:
|
||||
sta array,x
|
||||
inx
|
||||
cpx #$20
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
(If you count down to save instructions, remember to adjust the
|
||||
base address so that it's still writing the same memory
|
||||
location.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This approach to arrays has some limits. Primary among them is
|
||||
that we can't have arrays of size larger than 256; we can't fit
|
||||
our index into the index register. In order to address larger
|
||||
arrays, we need to use the indirect indexed addressing mode. We
|
||||
use 16-bit addition to add the offset to the base pointer, then
|
||||
set the Y register to 0 and then load the value
|
||||
with <literal>lda (ptr),y</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Well, actually, we can do better than that. Suppose we want to
|
||||
clear out 8K of ram, from $2000 to $4000. We can use the Y
|
||||
register to hold the low byte of our offset, and only update the
|
||||
high bit when necessary. That produces the following
|
||||
loop:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
lda #$00 ; Set pointer value to base ($2000)
|
||||
sta ptr
|
||||
lda #$20
|
||||
sta ptr+1
|
||||
lda #$00 ; Storing a zero
|
||||
ldx #$20 ; 8,192 ($2000) iterations: high byte
|
||||
ldy #$00 ; low byte.
|
||||
loop:
|
||||
sta (ptr),y
|
||||
iny
|
||||
bne loop ; If we haven't wrapped around, go back
|
||||
inc ptr+1 ; Otherwise update high byte
|
||||
dex ; bump counter
|
||||
bne loop ; and continue if we aren't done
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This code could be optimized further; the loop prelude in
|
||||
particular loads a lot of redundant values that could be
|
||||
compressed down further:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
lda #$00
|
||||
tay
|
||||
ldx #$20
|
||||
sta ptr
|
||||
stx ptr+1
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
That's not directly relevant to arrays, but these sorts of
|
||||
things are good things to keep in mind when writing your code.
|
||||
Done well, they can make it much smaller and faster; done
|
||||
carelessly, they can force a lot of bizarre dependencies on your
|
||||
code and make it impossible to modify later.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Records</title>
|
||||
<para>
|
||||
A <emphasis>record</emphasis> is a collection of values all
|
||||
referred to as one variable. This has no immediate
|
||||
representation in assembler. If you have a global variable
|
||||
that's two bytes and a code pointer, this is exactly equivalent
|
||||
to three seperate variables. You can just put one label in
|
||||
front of it, and refer to the first byte
|
||||
as <literal>label</literal>, the second
|
||||
as <literal>label+1</literal>, and the code pointer
|
||||
a <literal>label+2</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This really applies to all data structures that take up more
|
||||
than one byte. When dealing with the pointer, a 16-bit value,
|
||||
we refer to the low byte as <literal>ptr</literal>
|
||||
(or <literal>label+2</literal>, in the example above), and the
|
||||
high byte as <literal>ptr+1</literal>
|
||||
(or <literal>label+3</literal>).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Arrays of records are more interesting. There are two
|
||||
possibilities for these. The way most high level languages
|
||||
treat it is by keeping the records contiguous. If you have an
|
||||
array of two sixteen bit integers, then the records are stored
|
||||
in order, one at a time. The first is in location $1000, the
|
||||
next in $1004, the next in $1008, and so on. You can do this
|
||||
with the 6502, but you'll probably have to use the indirect
|
||||
indexed mode if you want to be able to iterate
|
||||
conveniently.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Another, more unusual, but more efficient approach is to keep
|
||||
each byte as a seperate array, just like in the arrays example
|
||||
above. To illustrate, here's a little bit of code to go through
|
||||
a contiguous array of 16 bit integers, adding their values to
|
||||
some <literal>total</literal> variable:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldx #$10 ; Number of elements in the array
|
||||
ldy #$00 ; Byte index from array start
|
||||
loop:
|
||||
clc
|
||||
lda array, y ; Low byte
|
||||
adc total
|
||||
sta total
|
||||
lda array+1, y ; High byte
|
||||
adc total+1
|
||||
sta total+1
|
||||
iny ; Jump ahead to next entry
|
||||
iny
|
||||
dex ; Check for loop termination
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
And here's the same loop, keeping the high and low bytes in
|
||||
seperate arrays:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldx #$00
|
||||
loop:
|
||||
clc
|
||||
lda lowbyte,x
|
||||
adc total
|
||||
sta total
|
||||
lda highbyte,x
|
||||
adc total+1
|
||||
sta total+1
|
||||
inx
|
||||
cpx #$10
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Which approach is the right one depends on what you're doing.
|
||||
For large arrays, the first approach is better, as you only need
|
||||
to maintain one base pointer. For smaller arrays, the easier
|
||||
indexing makes the second approach more convenient.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Bitfields</title>
|
||||
|
||||
<para>
|
||||
To store values that are smaller than a byte, you can save space
|
||||
by putting multiple values in a byte. To extract a sub-byte
|
||||
value, use the bitmasking commands:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>To set bits, use the <literal>ORA</literal> command. <literal>ORA #$0F</literal> sets the lower four bits to 1 and leaves the rest unchanged.</para></listitem>
|
||||
<listitem><para>To clear bits, use the <literal>AND</literal> command. <literal>AND #$F0</literal> sets the lower four bits to 0 and leaves the rest unchanged.</para></listitem>
|
||||
<listitem><para>To reverse bits, use the <literal>EOR</literal> command. <literal>EOR #$0F</literal> reverses the lower four bits and leaves the rest unchanged.</para></listitem>
|
||||
<listitem><para>To test if a bit is 0, AND away everything but that bit, then see if the Zero bit was set. If the bit is in the top two bits of a memory location, you can use the BIT command instead (which stores bit 7 in the Negative bit, and bit 6 in the Overflow bit).</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>A modest example: Insertion sort on linked lists</title>
|
||||
|
||||
<para>
|
||||
To demonstrate these techniques, we will now produce code to
|
||||
perform insertion sort on a linked list. We'll start by defining
|
||||
our data structure, then defining the routines we want to write,
|
||||
then producing actual code for those routines. A downloadable
|
||||
version that will run unmodified on a Commodore 64 closes the
|
||||
chapter.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>The data structure</title>
|
||||
|
||||
<para>
|
||||
We don't really want to have to deal with pointers if we can
|
||||
possibly avoid it, but it's hard to do a linked list without
|
||||
them. Instead of pointers, we will
|
||||
use <emphasis>cursors</emphasis>: small integers that represent
|
||||
the index into the array of values. This lets us use the
|
||||
many-small-byte-arrays technique for our data. Furthermore, our
|
||||
random data that we're sorting never has to move, so we may
|
||||
declare it as a constant and only bother with changing the
|
||||
values of <literal>head</literal> and
|
||||
the <literal>next</literal> arrays. The data record definition
|
||||
looks like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
head : byte;
|
||||
data : const int[16] = [838, 618, 205, 984, 724, 301, 249, 946,
|
||||
925, 43, 114, 697, 985, 633, 312, 86];
|
||||
next : byte[16];
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Exactly how this gets represented will vary from assembler to
|
||||
assembler. Ophis does it like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
.data
|
||||
.space head 1
|
||||
.space next 16
|
||||
|
||||
.text
|
||||
lb: .byte <$838,<$618,<$205,<$984,<$724,<$301,<$249,<$946
|
||||
.byte <$925,<$043,<$114,<$697,<$985,<$633,<$312,<$086
|
||||
hb: .byte >$838,>$618,>$205,>$984,>$724,>$301,>$249,>$946
|
||||
.byte >$925,>$043,>$114,>$697,>$985,>$633,>$312,>$086
|
||||
</programlisting>
|
||||
</section>
|
||||
<section>
|
||||
<title>Doing an insertion sort</title>
|
||||
|
||||
<para>
|
||||
To do an insertion sort, we clear the list by setting the 'head'
|
||||
value to -1, and then insert each element into the list one at a
|
||||
time, placing each element in its proper order in the list. We
|
||||
can consider the lb/hb structure alone as an array of 16
|
||||
integers, and just insert each one into the list one at a
|
||||
time.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
procedure insertion_sort
|
||||
head := -1;
|
||||
for i := 0 to 15 do
|
||||
insert_elt i
|
||||
end
|
||||
end
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This translates pretty directly. We'll have insert_elt take its
|
||||
argument in the X register, and loop with that. However, given
|
||||
that insert_elt is going to be a complex procedure, we'll save
|
||||
the value first. The assembler code becomes:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
; insertion'sort: Sorts the list defined by head, next, hb, lb.
|
||||
; Arguments: None.
|
||||
; Modifies: All registers destroyed, head and next array sorted.
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
insertion'sort:
|
||||
lda #$FF ; Clear list by storing the terminator in 'head'
|
||||
sta head
|
||||
ldx #$0 ; Loop through the lb/hb array, adding each
|
||||
insertion'sort'loop: ; element one at a time
|
||||
txa
|
||||
pha
|
||||
jsr insert_elt
|
||||
pla
|
||||
tax
|
||||
inx
|
||||
cpx #$10
|
||||
bne insertion'sort'loop
|
||||
rts
|
||||
</programlisting>
|
||||
</section>
|
||||
<section>
|
||||
<title>Inserting an element</title>
|
||||
|
||||
<para>
|
||||
The pseudocode for inserting an element is a bit more
|
||||
complicated. If the list is empty, or the value we're inserting
|
||||
goes at the front, then we have to update the value
|
||||
of <literal>head</literal>. Otherwise, we can iterate through
|
||||
the list until we find the element that our value fits in after
|
||||
(so, the first element whose successor is larger than our
|
||||
value). Then we update the next pointers directly and exit.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
procedure insert_elt i
|
||||
begin
|
||||
if head = -1 then begin
|
||||
head := i;
|
||||
next[i] := -1;
|
||||
return;
|
||||
end;
|
||||
val := data[i];
|
||||
if val < data[i] then begin
|
||||
next[i] := head;
|
||||
head := i;
|
||||
return;
|
||||
end;
|
||||
current := head;
|
||||
while (next[current] <> -1 and val < data[next[current]]) do
|
||||
current := next[current];
|
||||
end;
|
||||
next[i] := next[current];
|
||||
next[current] := i;
|
||||
end;
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This produces the following rather hefty chunk of code:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
; insert_elt: Insert an element into the linked list. Maintains the
|
||||
; list in sorted, ascending order. Used by
|
||||
; insertion'sort.
|
||||
; Arguments: X register holds the index of the element to add.
|
||||
; Modifies: All registers destroyed; head and next arrays updated
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
.data
|
||||
.space lbtoinsert 1
|
||||
.space hbtoinsert 1
|
||||
.space indextoinsert 1
|
||||
|
||||
.text
|
||||
|
||||
insert_elt:
|
||||
ldy head ; If the list is empty, make
|
||||
cpy #$FF ; head point at it, and return.
|
||||
bne insert_elt'list'not'empty
|
||||
stx head
|
||||
tya
|
||||
sta next,x
|
||||
rts
|
||||
insert_elt'list'not'empty:
|
||||
lda lb,x ; Cache the data we're inserting
|
||||
sta lbtoinsert
|
||||
lda hb,x
|
||||
sta hbtoinsert
|
||||
stx indextoinsert
|
||||
ldy head ; Compare the first value with
|
||||
sec ; the data. If the data must
|
||||
lda lb,y ; be inserted at the front...
|
||||
sbc lbtoinsert
|
||||
lda hb,y
|
||||
sbc hbtoinsert
|
||||
bmi insert_elt'not'smallest
|
||||
tya ; Set its next pointer to the
|
||||
sta next,x ; old head, update the head
|
||||
stx head ; pointer, and return.
|
||||
rts
|
||||
insert_elt'not'smallest:
|
||||
ldx head
|
||||
insert_elt'loop: ; At this point, we know that
|
||||
lda next,x ; argument > data[X].
|
||||
tay
|
||||
cpy #$FF ; if next[X] = #$FF, insert arg at end.
|
||||
beq insert_elt'insert'after'current
|
||||
lda lb,y ; Otherwise, compare arg to
|
||||
sec ; data[next[X]]. If we insert
|
||||
sbc lbtoinsert ; before that...
|
||||
lda hb,y
|
||||
sbc hbtoinsert
|
||||
bmi insert_elt'goto'next
|
||||
insert_elt'insert'after'current: ; Fix up all the next links
|
||||
tya
|
||||
ldy indextoinsert
|
||||
sta next,y
|
||||
tya
|
||||
sta next,x
|
||||
rts ; and return.
|
||||
insert_elt'goto'next: ; Otherwise, let X = next[X]
|
||||
tya ; and go looping again.
|
||||
tax
|
||||
jmp insert_elt'loop
|
||||
</programlisting>
|
||||
</section>
|
||||
<section>
|
||||
<title>The complete application</title>
|
||||
|
||||
<para>
|
||||
The full application, which deals with interfacing with CBM
|
||||
BASIC and handles console I/O and such, is
|
||||
in <xref linkend="structure-src" endterm="structure-fname">.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
</chapter>
|
297
doc/hll3.sgm
Normal file
297
doc/hll3.sgm
Normal file
@ -0,0 +1,297 @@
|
||||
<chapter id="hll3">
|
||||
<title>Pointers and Indirection</title>
|
||||
|
||||
<para>
|
||||
The basics of pointers versus cursors (or, at the 6502 assembler
|
||||
level, the indirect indexed addressing mode versus the absolute
|
||||
indexed ones) were covered in <xref linkend="hll2"> This essay seeks
|
||||
to explain the uses of the indirect modes, and how to implement
|
||||
pointer operations with them. It does <emphasis>not</emphasis> seek to explain
|
||||
why you'd want to use pointers for something to begin with; for a
|
||||
tutorial on proper pointer usage, consult any decent C textbook.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>The absolute basics</title>
|
||||
|
||||
<para>
|
||||
A pointer is a variable holding the address of a memory location.
|
||||
Memory locations take 16 bits to represent on the 6502: thus, we
|
||||
need two bytes to hold it. Any decent assembler will have ways of
|
||||
taking the high and low bytes of an address; use these to acquire
|
||||
the raw values you need. The 6502 chip does not have any
|
||||
simple <quote>pure</quote> indirect modes (except
|
||||
for <literal>JMP</literal>, which is a matter for a later essay);
|
||||
all are indexed, and they're indexed different ways depending on
|
||||
which index register you use.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>The simplest example</title>
|
||||
|
||||
<para>
|
||||
When doing a simple, direct dereference (that is, something
|
||||
equivalent to the C code <literal>c=*b;</literal>) the code
|
||||
looks like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
ldy #0
|
||||
lda (b), y
|
||||
sta c
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Even with this simple example, there are several important
|
||||
things to notice.
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
The variable <literal>b</literal> <emphasis>must be on the
|
||||
zero page</emphasis>, and furthermore, it <emphasis>cannot
|
||||
be $FF.</emphasis> All your pointer values need to be
|
||||
either stored on the zero page to begin with or copied
|
||||
there before use.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
The <literal>y</literal> in the <literal>lda</literal>
|
||||
statement must be y. It cannot be x (that's a different
|
||||
form of indirection), and it cannot be a constant. If
|
||||
you're doing a lot of indirection, be sure to keep your Y
|
||||
register free to handle the indexing on the
|
||||
pointers.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
The <literal>b</literal> variable is used alone. Statements
|
||||
like <literal>lda (b+2), y</literal> are syntactically valid
|
||||
and sometimes even correct: it dereferences the value next
|
||||
to <literal>b</literal> after adding y to the value therein.
|
||||
However, it is almost guaranteed that what you *really*
|
||||
wanted to do was compute <literal>*(b+2)</literal> (that is,
|
||||
take the address of b, add 2 to <emphasis>that</emphasis>,
|
||||
and dereference that value); see the next section for how to
|
||||
do this properly.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
In nearly all cases, it is the Y-register's version (Indirect
|
||||
Indexed) that you want to use when you're dealing with pointers.
|
||||
Even though either version could be used for this example, we
|
||||
use the Y register to establish this habit.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>Pointer arithmetic</title>
|
||||
|
||||
<para>
|
||||
Pointer arithmetic is an obscenely powerful and dangerous
|
||||
technique. However, it's the most straightforward way to deal
|
||||
with enormous arrays, structs, indexable stacks, and nearly
|
||||
everything you do in C. (C has no native array or string types
|
||||
primarily because it allows arbitrary pointer arithmetic, which is
|
||||
strong enough to handle all of those without complaint and at
|
||||
blazing speed. It also allows for all kinds of buffer overrun
|
||||
security holes, but let's face it, who's going to be cracking root
|
||||
on your Apple II?) There are a number of ways to implement this
|
||||
on the 6502. We'll deal with them in increasing order of design
|
||||
complexity.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>The straightforward, slow way</title>
|
||||
|
||||
<para>
|
||||
When computing a pointer value, you simply treat the pointer as
|
||||
if it were a 16-bit integer. Do all the math you need, then
|
||||
when the time comes to dereference it, simply do a direct
|
||||
dereference as above. This is definitely doable, and it's not
|
||||
difficult. However, it is costly in both space and time.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When dealing with arbitrary indices large enough that they won't
|
||||
fit in the Y register, or when creating values that you don't
|
||||
intend to dereference (such as subtracting two pointers to find
|
||||
the length of a string), this is also the only truly usable
|
||||
technique.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>The clever fast way</title>
|
||||
|
||||
<para>
|
||||
But wait, you say. Often when we compute a value, at least one
|
||||
of the operations is going to be an addition, and we're almost
|
||||
certain to have that value be less than 256! Surely we may save
|
||||
ourselves an operation by loading that value into the Y register
|
||||
and having the load operation itself perform the final
|
||||
addition!
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Very good. This is the fastest technique, and sometimes it's
|
||||
even the most readable. These cases usually involve repeated
|
||||
reading of various fields from a structure or record. The base
|
||||
pointer always points to the base of the structure (or the top
|
||||
of the local variable list, or what have you) and the Y register
|
||||
takes values that index into that structure. This lets you keep
|
||||
the pointer variable in memory largely static and requires no
|
||||
explicit arithmetic instructions at all.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, this technique is highly opaque and should always be
|
||||
well documented, indicating exactly what you think you're
|
||||
pointing at. Then, when you get garbage results, you can
|
||||
compare your comments and the resulting Y values with the actual
|
||||
definition of the structure to see who's screwing up.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For a case where we still need to do arithmetic, consider the
|
||||
classic case of needing to clear out a large chunk of memory.
|
||||
The following code fills the 4KB of memory between $C000 and
|
||||
$D000 with zeroes:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
lda #$C0 ; Store #$C000 in mem (low byte first)
|
||||
sta mem+1
|
||||
lda #$00
|
||||
sta mem
|
||||
ldx #$04 ; x holds number of times to execute outer loop
|
||||
tay ; accumulator and y are both 0
|
||||
loop: sta (mem), y
|
||||
iny
|
||||
bne loop ; Inner loop ends when y wraps around to 0
|
||||
inc mem+1 ; "Carry" from the iny to the core pointer
|
||||
dex ; Decrement outer loop count, quit if done
|
||||
bne loop
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Used carefully, proper use of the Y register can make your code
|
||||
smaller, faster, <emphasis>and</emphasis> more readable. Used
|
||||
carelessly it can make your code an unreadable, unmaintainable
|
||||
mess. Use it wisely, and with care, and it will be your
|
||||
greatest ally in writing flexible code.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>What about Indexed Indirect?</title>
|
||||
|
||||
<para>
|
||||
This essay has concerned itself almost exclusively with the
|
||||
Indirect Indexed—or (Indirect), Y—mode. What about Indexed
|
||||
Indirect—(Indirect, X)? This is a <emphasis>much</emphasis>
|
||||
less useful mode than the Y register's version. While the Y
|
||||
register indirection lets you implement pointers and arrays in
|
||||
full generality, the X register is useful for pretty much only one
|
||||
application: lookup tables for single byte values.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Even coming up with a motivating example for this is difficult,
|
||||
but here goes. Suppose you have multiple, widely disparate
|
||||
sections of memory that you're watching for signals. The
|
||||
following routine takes a resource index in the accumulator and
|
||||
returns the status byte for the corresponding resource.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
; This data is sitting on the zero page somewhere
|
||||
resource_status_table: .word resource0_status, resource1_status,
|
||||
.word resource2_status, resource3_status,
|
||||
; etc. etc. etc.
|
||||
|
||||
; This is the actual program code
|
||||
.text
|
||||
getstatus:
|
||||
clc ; Multiply argument by 2 before putting it in X, so that it
|
||||
asl ; produces a value that's properly word-indexed
|
||||
tax
|
||||
lda (resource_status_table, x)
|
||||
rts
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Why having a routine such as this is better than just having the
|
||||
calling routine access resourceN_status itself as an absolute
|
||||
memory load is left as an exercise for the reader. That aside,
|
||||
this code fragment does serve as a reminder that when indexing an
|
||||
array of anything other than bytes, you must multiply your index
|
||||
by the size of the objects you want to index. C does this
|
||||
automatically—assembler does not. Stay sharp.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Comparison with the other indexed forms</title>
|
||||
|
||||
<para>
|
||||
Pointers are slow. It sounds odd saying this, when C is the
|
||||
fastest language around on modern machines precisely because of
|
||||
its powerful and extensive use of pointers. However, modern
|
||||
architectures are designed to be optimized for C-style code (as an
|
||||
example, the x86 architecture allows statements like <literal>mov
|
||||
eax, [bs+bx+4*di]</literal> as a single instruction), while the
|
||||
6502 is not. An (Indirect, Y) operation can take up to 6 cycles
|
||||
to complete just on its own, while the preparation of that command
|
||||
costs additional time <emphasis>and</emphasis> scribbles over a
|
||||
bunch of registers, meaning memory operations to save the values
|
||||
and yet more time spent. The simple code given at the beginning
|
||||
of this essay—loading <literal>*b</literal> into the
|
||||
accumulator—takes 7 cycles, not counting the 6 it takes to
|
||||
load b with the appropriate value to begin with. If b is known to
|
||||
contain a specific value, we can write a single Absolute mode
|
||||
instruction to load its value, which takes only 4 cycles and also
|
||||
preserves the value in the Y register. Clearly, Absolute mode
|
||||
should be used whenever possible.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
One might be tempted to use self-modifying code to solve this
|
||||
problem. This actually doesn't pay off near enough for the hassle
|
||||
it generates; for self-modifying code, the address must be
|
||||
generated, then stored in the instruction, and then the data must
|
||||
be loaded. Cost: 16 cycles for 2 immediate loads, 2 absolute
|
||||
stores, and 1 absolute load. For the straight pointer
|
||||
dereference, we generate the address, store it in the pointer,
|
||||
clear the index, then dereference that. Cost: 17 cycles for 3
|
||||
immediate loads, 2 zero page stores, and 1 indexed indirect load.
|
||||
Furthermore, unlike in the self-modifying case, loops where simple
|
||||
arithmetic is being continuously performed only require repeating
|
||||
the final load instruction, which allows for much greater time
|
||||
savings over an equivalent self-modifying loop.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
(This point is also completely moot for NES programmers or anyone
|
||||
else whose programs are sitting in ROM, because programs stored on
|
||||
a ROM cannot modify themselves.)
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Conclusion</title>
|
||||
|
||||
<para>
|
||||
That's pretty much it for pointers. Though they tend to make
|
||||
programs hairy, and learning how to properly deal with pointers is
|
||||
what separates real C programmers from the novices, the basic
|
||||
mechanics of them are not complex. With pointers you can do
|
||||
efficient passing of large structures, pass-by-reference,
|
||||
complicated return values, and dynamic memory management—and
|
||||
now these wondrous toys may be added to your assembler programs,
|
||||
too (assuming you have that kind of space to play with).
|
||||
</para>
|
||||
</section>
|
||||
</chapter>
|
270
doc/hll4.sgm
Normal file
270
doc/hll4.sgm
Normal file
@ -0,0 +1,270 @@
|
||||
<chapter>
|
||||
<title>Functionals</title>
|
||||
|
||||
<para>
|
||||
This essay deals with indirect calls. These are the core of an
|
||||
enormous number of high level languages: LISP's closures, C's
|
||||
function pointers, C++ and Java's virtual method calls, and some
|
||||
implementations of the <literal>switch</literal> statement.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
These techniques vary in complexity, and most will not be
|
||||
appropriate for large-scale assembler projects. Of them, however,
|
||||
the Data-Directed approach is the most likely to lead to organized
|
||||
and maintainable code.
|
||||
</para>
|
||||
|
||||
<section>
|
||||
<title>Function Pointers</title>
|
||||
|
||||
<para>
|
||||
Because assembly language is totally untyped, function pointers
|
||||
are the same as any other sixteen-bit integer. This makes
|
||||
representing them really quite easy; most assemblers should permit
|
||||
routines to be declared simply by naming the routine as
|
||||
a <literal>.word</literal> directly.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To actually invoke these methods, copy them to some sixteen-bit
|
||||
location (say, <literal>target</literal>) and then invoking the
|
||||
method is a simple matter of the using an indirect jump:
|
||||
the <literal>JMP (target)</literal> instruction.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There's really only one subtlety here, and it's that the indirect
|
||||
jump is an indirect <emphasis>jump</emphasis>, not an
|
||||
indirect <emphasis>function call</emphasis>. Thus, if some
|
||||
function <literal>A</literal> makes in indirect jump to some
|
||||
routine, when that routine returns, it returns to whoever
|
||||
called <literal>A</literal>, not <literal>A</literal>
|
||||
itself.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are several ways of dealing with this, but only one correct
|
||||
way, which is to structure your procedures so that any call
|
||||
to <literal>JMP (xxxx)</literal> occurs at the very
|
||||
end.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>A quick digression on how subroutines work</title>
|
||||
<para>
|
||||
Ordinarily, subroutines are called with <literal>JSR</literal> and
|
||||
finished with <literal>RTS</literal>. The <literal>JSR</literal>
|
||||
instruction takes its own address, adds 2 to it, and pushes this
|
||||
16-bit value on the stack, high byte first, then low byte (so that
|
||||
the low byte will be popped off first).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
But wait, you may object. All <literal>JSR</literal> instructions
|
||||
are three bytes long. This <quote>return address</quote> is in
|
||||
the middle of the instruction. And you would be quite right;
|
||||
the <literal>RTS</literal> instruction pops off the 16-bit
|
||||
address, adds one to it, and <emphasis>then</emphasis> sets the
|
||||
program counter to that value.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
So it <emphasis>is</emphasis> possible to set up
|
||||
a <quote><literal>JSR</literal> indirect</quote> kind of operation
|
||||
by adding two to the indirect jump's address and then pushing that
|
||||
value onto the stack before making the jump; however, you wouldn't
|
||||
want to do this. It takes six bytes and trashes your accumulator,
|
||||
and you can get the same functionality with half the space and
|
||||
with no register corruption by simply defining the indirect jump
|
||||
to be a one-instruction routine and <literal>JSR</literal>-ing to
|
||||
it directly. As an added bonus, that way if you have multiple
|
||||
indirect jumps through the same pointer, you don't need to
|
||||
duplicate the jump instruction.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Does this mean that abusing <literal>JSR</literal>
|
||||
and <literal>RTS</literal> is a dead-end, though? Not at all...
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Dispatch-on-type and Data-Directed Assembler</title>
|
||||
|
||||
<para>
|
||||
Most of the time, you care about function pointers because you've
|
||||
arranged them in some kind of table. You hand it an index
|
||||
representing the type of your argument, or which method it is
|
||||
you're calling, or some other determinator, and then you index
|
||||
into an array of routines and execute the right one.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Writing a generic routine to do this is kind of a pain. First you
|
||||
have to pass a 16-bit pointer in, then you have to dereference it
|
||||
to figure out where your table is, then you have to do an indexed
|
||||
dereference on <emphasis>that</emphasis> to get the routine you
|
||||
want to run, then you need to copy it out to somewhere fixed so
|
||||
that you can write your jump instruction. And making this
|
||||
non-generic doesn't help a whole lot, since that only saves you
|
||||
the first two steps, but now you have to write them out in every
|
||||
single indexed jump instruction. If only there were some way to
|
||||
easily and quickly pass in a local pointer directly...
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Something, say, like the <literal>JSR</literal> instruction, only not for
|
||||
program code.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Or we could just use the <literal>JSR</literal> statement itself,
|
||||
but only call this routine at the ends of other routines, much
|
||||
like we were organizing for indirect jumps to begin with. This
|
||||
lets us set up routines that look like this:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
jump'table'alpha:
|
||||
jsr do'jump'table
|
||||
.word alpha'0, alpha'1, alpha'2
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Where the <literal>alpha'x</literal> routines are the ones to be
|
||||
called when the index has that value. This leaves the
|
||||
implementation of do'jump'table, which in this case uses the Y
|
||||
register to hold the index:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
do'jump'table:
|
||||