mirror of
https://github.com/fadden/6502bench.git
synced 2024-11-30 01:50:10 +00:00
22c47e1d0b
Switched from XHTML to HTML5. Added formatting for menu items and keyboard shortcuts. Made various minor edits to the text.
980 lines
39 KiB
HTML
980 lines
39 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
|
|
<link rel="stylesheet" href="main.css"/>
|
|
<title>More Details - 6502bench SourceGen</title>
|
|
</head>
|
|
|
|
<body>
|
|
<div id="content">
|
|
<h1>SourceGen: More Details</h1>
|
|
<p><a href="index.html">Back to index</a></p>
|
|
|
|
<h2 id="more-details">Intro, Continued</h2>
|
|
|
|
<p>This section of the manual digs a little deeper into how SourceGen works.</p>
|
|
|
|
|
|
|
|
<h2 id="about-symbols">All About Symbols</h2>
|
|
|
|
<p>A symbol has two essential parts, a label and a value. The label is a short
|
|
ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
|
|
constant. Symbols can be defined in different ways, and applied in
|
|
different ways.</p>
|
|
|
|
<p>The label syntax is restricted to a format that should be compatible
|
|
with most assemblers:</p>
|
|
<ul>
|
|
<li>2-32 characters long.</li>
|
|
<li>Starts with a letter or underscore.</li>
|
|
<li>Comprised of ASCII letters, numbers, and the underscore.</li>
|
|
</ul>
|
|
<p>Label comparisons are case-sensitive, as is customary for programming
|
|
languages.</p>
|
|
<p>Sometimes the purpose of a subroutine or variable isn't immediately
|
|
clear, but you can take a reasonable guess. You can document your
|
|
uncertainty by adding a question mark ('?') to the end of the label.
|
|
This isn't really part of the label, so it won't appear in the assembled
|
|
output, and you don't have to include it when searching for a symbol.</p>
|
|
<p>Some assemblers restrict the set of valid labels further. For example,
|
|
64tass uses a leading underscore to indicate a local label, and reserves
|
|
a double leading underscore (e.g. <code>__label</code>) for its own
|
|
purposes. In such cases, the label will be modified to comply with the
|
|
target assembler syntax.</p>
|
|
|
|
<p>Operands may use parts of symbols. For example, if you have a label
|
|
<code>MYSTRING</code>, you can write:</p>
|
|
<pre>
|
|
MYSTRING .STR "hello"
|
|
LDA #<MYSTRING
|
|
STA $00
|
|
LDA #>MYSTRING
|
|
STA $01
|
|
</pre>
|
|
<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
|
|
|
|
<p>Symbols that represent a memory address within a project are treated
|
|
differently from those outside a project. We refer to these as internal
|
|
and external addresses, respectively.</p>
|
|
|
|
|
|
<h3 id="connecting-operands">Connecting Operands with Labels</h3>
|
|
|
|
<p>Suppose you have the following code:</p>
|
|
<pre>
|
|
LDA $1234
|
|
JSR $2345
|
|
</pre>
|
|
<p>If we put that in a source file, it will assemble correctly.
|
|
However, if those addresses are part of the file, the code may break if
|
|
changes are made and things assemble to different addresses. It would
|
|
be better to generate code that references labels, e.g.:</p>
|
|
<pre>
|
|
LDA my_data
|
|
JSR nifty_func
|
|
</pre>
|
|
<p>SourceGen tries to establish labels for address operands automatically.
|
|
How this works depends on whether the operand's address is inside the file or
|
|
external, and whether there are existing labels at or near the target
|
|
address. The details are explored in the next few sections.</p>
|
|
<p>On the 65816 this process is trickier, because addresses are 24 bits
|
|
instead of 16. For a control-transfer instruction like <code>JSR</code>,
|
|
the high 8 bits come from the Program Bank Register (K). For a data-access
|
|
instruction like <code>LDA</code>, the high 8 bits come from the Data
|
|
Bank Register (B). The PBR value is determined by the address in which
|
|
the code is executing, so it's easy to determine. The DBR value can be
|
|
set arbitrarily. Sometimes it's easy to figure out, sometimes it has
|
|
to be specified manually.</p>
|
|
|
|
|
|
<h3 id="internal-address-symbols">Internal Address Symbols</h3>
|
|
|
|
<p>Symbols that represent an address inside the file being disassembled
|
|
are referred to as <i>internal</i>. They come in two varieties.</p>
|
|
|
|
<p><b>User labels</b> are labels added to instructions or data by the user.
|
|
The editor will try to prevent you from creating a label that has the same
|
|
name as another symbol, but if you manage to do so, the user label takes
|
|
precedence over symbols from other sources. User labels may be tagged
|
|
as non-unique local, unique local, global, or global and exported. Local
|
|
vs. global is important for the label localizer, while exported symbols
|
|
can be pulled directly into other projects.</p>
|
|
|
|
<p><b>Auto labels</b> are automatically generated labels placed on
|
|
instructions or data offsets that are the target of operands. They're
|
|
formed by appending the hexadecimal address to the letter "L", with
|
|
additional characters added if some other symbol has already defined
|
|
that label. Options can be set that change the "L" to a character or
|
|
characters based on how the label is referenced, e.g. "B" for branch targets.
|
|
Auto labels are only added where they are needed, and are removed when
|
|
no longer necessary. Because auto labels may be renamed or vanish, the
|
|
editor will try to prevent you from referring to them explicitly when
|
|
editing operands.</p>
|
|
|
|
|
|
<h3 id="external-address-symbols">External Address Symbols</h3>
|
|
|
|
<p>Symbols that represent an address outside the file being disassembled
|
|
are referred to as <i>external</i>. These may be ROM entry points,
|
|
data buffers, zero-page variables, or a number of other things. Because
|
|
the memory address they appear at aren't within the bounds of the file,
|
|
we can't simply put an address label on them. Three different mechanisms
|
|
exist for defining them. If an instruction or data operand refers to
|
|
an address outside the file bounds, SourceGen looks for a symbol with
|
|
a matching address value.</p>
|
|
|
|
<p><b>Platform symbols</b> are defined in platform symbol files. These
|
|
are named with a ".sym65" extension, and have a fairly straightforward
|
|
name/value syntax. Several files for popular platforms come with SourceGen
|
|
and live in the <code>RuntimeData</code> directory. You can also create your
|
|
own, but they have to live in the same directory as the project file.</p>
|
|
|
|
<p>Platform symbols can be addresses or constants. Addresses are
|
|
limited to 24-bit values, and are matched automatically. Constants may
|
|
be 32-bit values, but must be specified manually.</p>
|
|
|
|
<p>If two platform symbols have the same label, only the most recently read
|
|
one is kept. If two platform symbols have different labels but the
|
|
same value, both symbols will be kept, but the one in the file loaded
|
|
last will take priority when doing a lookup by address. If symbols with
|
|
the same value are defined in the same file, the one whose symbol appears
|
|
first alphabetically takes priority.</p>
|
|
|
|
<p>Platform address symbols have an optional width. This can be used
|
|
to define multi-byte items, such as two-byte pointers or 256-byte stacks.
|
|
If no width is specified, a default value of 1 is used. Widths are ignored
|
|
for constants.
|
|
Overlapping symbols are resolved as described earlier, with symbols loaded
|
|
later taking priority over previously-loaded symbols. In addition,
|
|
symbols defined closer to the target address take priority, so if you put
|
|
a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
|
|
be visible because the start point is closer to the addresses it covers
|
|
than the start of the 256-byte range.</p>
|
|
|
|
<p>Platform symbols can be designated for reading, writing, or both.
|
|
Normally you'd want both, but if an address is a memory-mapped I/O
|
|
location that has different behavior for reads and writes, you'd want
|
|
to define two different symbols, and have the correct one applied
|
|
based on the access type.</p>
|
|
|
|
<p><b>Project symbols</b> behave like platform symbols, but they are
|
|
defined in the project file itself, through the Project Properties editor.
|
|
The editor will try to prevent you from creating two symbols with the same
|
|
name. If two symbols have the same value, the one whose label comes
|
|
first alphabetically is used.</p>
|
|
|
|
<p>Project symbols always have precedence over platform symbols, allowing
|
|
you to redefine symbols within a project. (You can "hide" a platform
|
|
symbol by creating a project symbol constant with the same name. Use a
|
|
value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
|
|
|
|
<p><b>Address region pre-labels</b> are an oddity: they're external
|
|
address symbols that also act like user labels. These are explained
|
|
in more detail <a href="#pre-labels">later</a>.</p>
|
|
|
|
<p><b>Local variables</b> are redefinable symbols that are organized
|
|
into tables. They're used to specify labels for zero-page addresses
|
|
and 65816 stack-relative instructions. These are explained in more
|
|
detail in the next section.</p>
|
|
|
|
|
|
<h4 id="local-vars">How Local Variables Work</h4>
|
|
|
|
<p>Local variables are applied to instructions that have zero
|
|
page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
|
|
65816 stack relative operands
|
|
(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>). While they must be
|
|
unique relative to other kinds of labels, they don't have to be unique
|
|
with respect to earlier variable definitions. So you can define
|
|
<code>TMP .EQ $10</code>, and a few lines later define
|
|
<code>TMP .EQ $20</code>. This is handy because zero-page addresses are
|
|
often used in different ways by different parts of the program. For
|
|
example:</p>
|
|
<pre>
|
|
LDA ($00),Y
|
|
INC $02
|
|
... elsewhere ...
|
|
DEC $00
|
|
STA ($01),Y
|
|
</pre>
|
|
<p>If we had given <code>$00</code> the label <code>PTR</code> and
|
|
<code>$02</code> the label <code>COUNT</code> globally,
|
|
the second pair of instructions would look all wrong. With local
|
|
variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
|
|
and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
|
|
|
|
<p>Local variables have a value and a width. If we create a pair of
|
|
variable definitions like this:</p>
|
|
<pre>
|
|
PTR .eq $00 ;2 bytes
|
|
COUNT .eq $02 ;1 byte
|
|
</pre>
|
|
<p>Then this:</p>
|
|
<pre>
|
|
STA $00
|
|
STX $01
|
|
LDY $02
|
|
</pre>
|
|
<p>Would become:</p>
|
|
<pre>
|
|
STA PTR
|
|
STX PTR+1
|
|
LDY COUNT
|
|
</pre>
|
|
|
|
<p>The scope of a variable definition starts at the point where it is
|
|
defined, and stops when its definition is erased. There are three
|
|
ways for a table to erase an earlier definition:</p>
|
|
<ol>
|
|
<li>Create a new definition with the same name.</li>
|
|
<li>Create a new definition that has an overlapping value. For
|
|
example, if you have a two-byte variable <code>PTR = $00</code>,
|
|
and define a one-byte variable <code>COUNT = $01</code>, the
|
|
definition for <code>PTR</code> will be cleared because its second
|
|
byte overlaps.</li>
|
|
<li>Tables have a "clear previous" flag that erases all previous
|
|
definitions. This doesn't usually cause anything to be generated in the
|
|
assembly sources; instead, it just causes SourceGen to stop using
|
|
those labels.</li>
|
|
</ol>
|
|
<p>As you might expect, you're not allowed to have duplicate labels or
|
|
overlapping values in an individual table.</p>
|
|
<p>If a platform/project symbol has the same value as a local variable,
|
|
the local variable is used. If the local variable definition is cleared,
|
|
use of the platform/project symbol will resume.</p>
|
|
<p>Not all assemblers support redefinable variables. In those cases,
|
|
the symbol names will be modified to be unique (e.g. the second definition
|
|
of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
|
|
global scope.</p>
|
|
|
|
|
|
<h3 id="unique-local-global">Unique vs. Non-Unique and Local vs. Global</h3>
|
|
|
|
<p>Most assemblers have a notion of "local" labels, which have a scope
|
|
that is book-ended by global labels. These are handy for generic branch
|
|
target names like "loop" or "notzero" that you might want to use in
|
|
multiple places. The exact definition of local variable scope varies
|
|
between assemblers, so labels that you want to be local might have to
|
|
be promoted to global (and probably renamed).</p>
|
|
<p>SourceGen has a similar concept with a slight twist: they're called
|
|
non-unique labels, because the goal is to be able to use the same
|
|
label in more than one place. Whether or not they actually turn out
|
|
to be local is a decision deferred to assembly source generation time.
|
|
(You can also declare a label to be a unique local if you like; the
|
|
auto-generated labels like "L1234" do this.)</p>
|
|
<p>When you're writing code for an assembler, it has to be unambiguous,
|
|
because the assembler can't guess at what the output should be. For a
|
|
disassembler, the output is known, so a greater degree of ambiguity is
|
|
tolerable. Instead of throwing errors and refusing to continue, the
|
|
source generator can modify the output until it works. For example:<p>
|
|
<pre>
|
|
@LOOP LDX #$02
|
|
@LOOP DEX
|
|
BNE @LOOP
|
|
DEY
|
|
BNE @LOOP
|
|
</pre>
|
|
<p>This would confuse an assembler. SourceGen already knows which
|
|
<code>@LOOP</code> is being branched to, so it can just rename one of
|
|
them to <code>@LOOP1</code>.</p>
|
|
<p>One situation where non-unique labels cause difficulty is with
|
|
weak symbolic references (see next section). For example, suppose
|
|
the above code then did this:</p>
|
|
<pre>
|
|
LDA #<@LOOP
|
|
</pre>
|
|
<p>While it's possible to make an educated guess at which <code>@LOOP</code>
|
|
was meant, it's easy to get wrong. In situations like this, it's best to
|
|
give the labels different names.</p>
|
|
|
|
|
|
<h3 id="weak-refs">Weak Symbolic References</h3>
|
|
|
|
<p>Symbolic references in operands are "weak references". If the named
|
|
symbol exists, the reference is used. If the symbol can't be found, the
|
|
operand is formatted in hex instead. They're called "weak" because
|
|
failing to resolve the reference isn't considered an error.</p>
|
|
|
|
<p>It's important to know this when editing a project. Consider the
|
|
following trivial chunk of code:</p>
|
|
|
|
<pre>
|
|
1000: 4c0310 JMP $1003
|
|
1003: ea NOP
|
|
</pre>
|
|
|
|
<p>When you load it into SourceGen, it will be formatted like this:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP L1003
|
|
L1003 NOP
|
|
</pre>
|
|
|
|
<p>The analyzer found the <code>JMP</code> operand, and created an auto
|
|
label for address $1003. It then created a weak reference to
|
|
"<code>L1003</code>" in the <code>JMP</code> operand.</p>
|
|
|
|
<p>If you edit the <code>JMP</code> instruction's operand to use the
|
|
symbol "<code>FOO</code>", the results are probably not what you want:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP $1003
|
|
NOP
|
|
</pre>
|
|
|
|
<p>This happened because you added a weak reference to "<code>FOO</code>"
|
|
in the operand, but the label isn't defined anywhere. With no matching
|
|
label found, the operand was formatted as hex. Further, because there's
|
|
no longer a numeric reference to the code at $1003, SourceGen removed
|
|
the <code>L1003</code> auto-label.</p>
|
|
|
|
<p>If you set the label "<code>FOO</code>" on the <code>NOP</code>
|
|
instruction, you'll see what you probably wanted:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP FOO
|
|
FOO NOP
|
|
</pre>
|
|
|
|
<p>Of course, you don't actually need the explicit reference in the
|
|
<code>JMP</code> instruction. If you edit the <code>JMP</code> operand
|
|
and set the format back to <samp>Default</samp>, removing the weak
|
|
symbolic reference, the code will still look the same.
|
|
This is because SourceGen identified the numeric reference, and
|
|
used that to find the label on the <code>NOP</code> instruction.</p>
|
|
|
|
<p>However, suppose you didn't actually want <code>FOO</code> as the
|
|
operand label. You can create a project symbol called "<code>BAR</code>"
|
|
with the value $1003, and then edit the operand to reference <code>BAR</code>
|
|
instead. Your code would then look like:</p>
|
|
<pre>
|
|
BAR .EQ $1003
|
|
.ADDRS $1000
|
|
JMP BAR
|
|
FOO NOP
|
|
</pre>
|
|
|
|
<p>If you change the value of <code>BAR</code> in the project symbol file,
|
|
the operand will continue to refer to it, but with an adjustment. For
|
|
example, if you changed <code>BAR</code> from $1003 to $1007,
|
|
the code would become:</p>
|
|
<pre>
|
|
BAR .EQ $1007
|
|
.ADDRS $1000
|
|
JMP BAR-4
|
|
FOO NOP
|
|
</pre>
|
|
|
|
<p>If you rename a label, all references to that label are updated. For
|
|
numeric references that happens implicitly. For explicit operand
|
|
references, the weak references are updated individually. (Modern IDEs
|
|
call this "refactoring".)</p>
|
|
<p>If you remove a label, all of the numeric references to it will
|
|
reference something else, probably a new auto label. Weak references
|
|
to the symbol will break and be formatted as hex, but will not be
|
|
removed. Similarly, removing symbols from a platform or project file
|
|
will break the reference but won't modify the operands.</p>
|
|
|
|
<h3 id="symbol-parts">Parts and Adjustments</h3>
|
|
|
|
<p>Sometimes you want to use part of a label, or adjust the value slightly.
|
|
(I use "adjustment" rather than "offset" to avoid confusing it with file
|
|
offsets.) Consider the following example:</p>
|
|
<pre>
|
|
1000: a910 LDA #$10
|
|
1002: 48 PHA
|
|
1003: a906 LDA #$06
|
|
1005: 48 PHA
|
|
1006: 60 RTS
|
|
1007: 4c3aff JMP $ff3a
|
|
</pre>
|
|
|
|
<p>This pushes the address of the <code>JMP</code> instruction ($1007)
|
|
onto the stack, and jumps to it with the <code>RTS</code> instruction.
|
|
However, <code>RTS</code> requires the address of the byte <i>before</i>
|
|
the target instruction, so we actually need to push $1006.</p>
|
|
|
|
<p>The disassembler won't know that offset $1007 is code because nothing
|
|
appears to reference it. After tagging $1007 as a code start point, the
|
|
project looks like this:</p>
|
|
<pre>
|
|
LDA #$10
|
|
PHA
|
|
LDA #$06
|
|
PHA
|
|
RTS
|
|
|
|
JMP $ff3a
|
|
</pre>
|
|
|
|
<p>We set a label called "<code>NEXT</code>" on the <code>JMP</code>
|
|
instruction, and then edit the two <code>LDA</code> instructions to
|
|
reference the high and low parts, yielding:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
LDA #>NEXT
|
|
PHA
|
|
LDA #<NEXT-1
|
|
PHA
|
|
RTS
|
|
|
|
NEXT JMP $ff3a
|
|
</pre>
|
|
|
|
<p>SourceGen will adjust label values by whatever amount is required to
|
|
generate the original value. If the adjustment seems wrong, make sure
|
|
you're selecting the right part of the symbol.</p>
|
|
|
|
<p>Different assemblers use different syntaxes to form expressions. This
|
|
is particularly noticeable in 65816 code. You can choose which syntax
|
|
to use on-screen from the application settings.</p>
|
|
|
|
|
|
<h3 id="nearby-targets">Automatic Use of Nearby Targets</h3>
|
|
|
|
<p>Sometimes you want to use a symbol that doesn't match up with the
|
|
operand. SourceGen tries to anticipate situations where that might be
|
|
the case, and apply adjustments for you.</p>
|
|
|
|
<p>Suppose you have the following:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
LDA #$00
|
|
STA L1010
|
|
LDA #$20
|
|
STA L1011
|
|
LDA #$e1
|
|
STA L1012
|
|
RTS
|
|
|
|
L1010 .DD1 $00
|
|
L1011 .DD1 $00
|
|
L1012 .DD1 $00
|
|
</pre>
|
|
|
|
<p>Showing stores to three different labeled addresses is fine, but
|
|
the code is actually setting up a single 24-bit address. For clarity,
|
|
you'd like the output to reflect the fact that it's a single, multi-byte
|
|
variable. So, if you set a label at $1010, SourceGen removes the
|
|
nearby auto labels, and sets the numeric references to use your label:</p>
|
|
|
|
<pre>
|
|
.ADDRS $1000
|
|
LDA #$00
|
|
STA DATA
|
|
LDA #$20
|
|
STA DATA+1
|
|
LDA #$e1
|
|
STA DATA+2
|
|
RTS
|
|
|
|
DATA .DD1 $00
|
|
.DD1 $00
|
|
.DD1 $00
|
|
</pre>
|
|
|
|
<p>If you decide that you really wanted each store to have its own
|
|
label, you can set labels on the other two addresses. SourceGen won't
|
|
search for alternate labels if the numeric reference target has a
|
|
user-defined label.</p>
|
|
|
|
<p>This is also used for self-modifying code. For example:</p>
|
|
<pre>
|
|
1000: a9ff LDA #$ff
|
|
1002: 8d0610 STA $1006
|
|
1005: 4900 EOR #$00
|
|
</pre>
|
|
|
|
<p>The above changes the <code>EOR #$00</code> instruction to
|
|
<code>EOR #$ff</code>. The operand target is $1006, but we can't
|
|
put a label there because it's in the middle of the instruction. So
|
|
SourceGen puts a label at $1005 and adjusts it:</p>
|
|
<pre>
|
|
LDA #$ff
|
|
STA L1005+1
|
|
L1005 EOR #$00
|
|
</pre>
|
|
|
|
<p>If you really don't like the way this works, you can disable the
|
|
search for nearby targets entirely from the
|
|
<a href="settings.html#project-properties">project properties</a>.
|
|
Self-modifying code will always be adjusted because of the limitation
|
|
on mid-instruction labels.</p>
|
|
|
|
|
|
<h2 id="width-disambiguation">Width Disambiguation</h2>
|
|
|
|
<p>It's possible to interpret certain instructions in multiple ways.
|
|
For example, "<code>LDA $0000</code>" might be an absolute load from a 16-bit
|
|
address, or it might be a zero-page load from an 8-bit address.
|
|
Humans can infer from the fact that it was written with a 4-digit address
|
|
that it's meant to be absolute, but assemblers often treat operands
|
|
purely as numbers, and would just see "LDA 0". Common practice is to
|
|
use the shortest instruction possible.</p>
|
|
<p>Every assembler seems to address the problem in a slightly different
|
|
way. Some use opcode suffixes, others use operand prefixes, some
|
|
allow both. You can configure how they appear in the
|
|
<a href="settings.html#app-settings">application settings</a>.</p>
|
|
<p>SourceGen will only add width disambiguators to opcodes or operands when
|
|
they are needed, with one exception: the opcode suffix for long
|
|
(24-bit address) operations is always applied. This is done because some
|
|
assemblers require it, insisting on "<code>LDAL</code>" rather than
|
|
"<code>LDA</code>" for an absolute long load, and because it can
|
|
make 65816 code easier to read.</p>
|
|
|
|
|
|
|
|
<h2 id="address-regions">Address Regions</h2>
|
|
|
|
<p>Simple programs are loaded at a particular address and executed there.
|
|
The source code starts with a directive that tells the assembler what the
|
|
initial address is, and the code and data statements that follow are
|
|
placed appropriately. More complicated programs might relocate parts
|
|
of themselves to other parts of memory, or be comprised of multiple
|
|
"overlay" segments that, through disk I/O or bank-switching, all execute
|
|
at the same address.</p>
|
|
|
|
<p>Consider the code in the first tutorial. It loads at $1000, copies
|
|
part of itself to $2000, and transfers execution there:</p>
|
|
|
|
<pre>
|
|
.ADDRS $1000
|
|
1000: a0 71 LDY #$71
|
|
1002: b9 17 10 L1002 LDA SRC,y
|
|
1005: 99 00 20 STA MAIN,y
|
|
1008: 88 DEY
|
|
1009: 30 09 BMI L1014
|
|
100b: 10 f5 BPL L1002
|
|
|
|
100d: 00 .DD1 $00
|
|
100e: 68 65 6c 6c+ .STR "hello!"
|
|
|
|
1014: 4c 00 20 L1014 JMP MAIN
|
|
|
|
1017: SRC
|
|
.ADDRS $2000
|
|
2000: ad 00 30 MAIN LDA $3000
|
|
[...]
|
|
</pre>
|
|
|
|
<p>The arrangement of this code can be viewed in a couple of ways. One
|
|
way is to see it linearly: the code starts at $1000, continues to $1017,
|
|
then restarts at $2000:</p>
|
|
<pre>
|
|
+000000 +- start
|
|
| $1000 - $1016 length=23 ($0017)
|
|
+000016 +- end (floating)
|
|
|
|
+000017 +- start 'MAIN'
|
|
| $2000 - $2070 length=113 ($0071)
|
|
+000087 +- end (floating)
|
|
</pre>
|
|
|
|
<p>The other way to picture it is hierarchical: the file loads
|
|
fully at $1000, and has a "child" region at offset +000017 in which the
|
|
address changes to $2000:</p>
|
|
<pre>
|
|
+000000 +- start
|
|
| $1000 - $1016 length=23 ($0017)
|
|
+000017 | +- start 'MAIN' pre='SRC'
|
|
| | $2000 - $2070 length=113 ($0071)
|
|
+000087 | +- end
|
|
+000087 +- end
|
|
</pre>
|
|
|
|
<p>The latter is closer to what many assemblers expect, with a "physical"
|
|
PC that starts where the file is loaded, and a "logical" or "pseudo" PC
|
|
that determines how the code is generated. SourceGen supports both
|
|
approaches. The only thing that would change in this example is that
|
|
the nested approach allows the "SRC" label to exist. (More on this
|
|
later, on the section on <a href="#pre-labels">pre-labels</a>.)</p>
|
|
|
|
<p>The real value of a hierarchical arrangement becomes apparent when
|
|
the area copied out of the file is only a small part of it. For
|
|
example, suppose something like:</p>
|
|
|
|
<pre>
|
|
.ADDRS $1000
|
|
LDA SUB_SRC,Y
|
|
STA SUB_DST,Y
|
|
JMP CONT
|
|
|
|
SUB_SRC
|
|
.ADDRS $2000
|
|
SUB_DST LDY #$00
|
|
[...]
|
|
RTS
|
|
.ADREND
|
|
|
|
CONT LDA #$12
|
|
JSR SUB_DST
|
|
</pre>
|
|
<p>In this case, a small routine is copied out of the middle of the
|
|
code that lives at $1000. We want the code at <code>CONT</code>
|
|
to pick up where things left off. If <code>SUB_SRC</code> is at $1009,
|
|
and is 23 bytes long, then <code>CONT</code> should be $1020. We
|
|
could output <code>.ADDRS $1020</code> directly before <code>CONT</code>,
|
|
but it's inconvenient to work with the generated
|
|
code if we want to modify the subroutine (changing its length)
|
|
and re-assemble it.</p>
|
|
|
|
|
|
<h3 id="fixed-float">Fixed vs. Floating</h3>
|
|
|
|
<p>Sometimes when disassembling code you know exactly where an address
|
|
region starts and ends. Other times you know where it starts, but won't
|
|
know where it stops until you've had a chance to look at the updated
|
|
disassembly. In the former case you create a region with a "fixed" end
|
|
point, in the latter you create one with a "floating" end point.</p>
|
|
<p>Address regions with fixed end points always stop in the same place.
|
|
Regions with floating end points stop at the next address region boundary,
|
|
which means they can change size as regions are added or removed.
|
|
The end will be placed for either the start of a new region (a "sibling"),
|
|
or the end of an encapsulating region (the "parent").</p>
|
|
|
|
<p>Regions that overlap must have a parent/child relationship. Whichever
|
|
one starts last or ends first is the child. A strict ordering is necessary
|
|
because a given file offset can only have one address, and if we don't
|
|
know which region is the child we can't know which address to assign.
|
|
Regions cannot straddle the start or end of another region, and cannot
|
|
exactly overlap (have the same start and length) as another region.
|
|
One consequence of these rules is that "floating" regions cannot share
|
|
a start offset with another region, because their end point would be
|
|
adjusted to match the end of the other region.</p>
|
|
|
|
<p>The arrangement of regions is particularly important when attempting
|
|
to resolve an address operand (such as a <code>JSR</code>) to a location
|
|
within the file. The process is straightforward if the address only
|
|
appears once, but when overlays cause multiple parts of the file to have
|
|
the same address, the operand target may be in different places depending
|
|
on where the call is being made from.
|
|
The algorithm for resolving addresses is described
|
|
in the <a href="advanced.html#overlap">advanced topics</a> section.</p>
|
|
|
|
|
|
<h3 id="non-addr">Non-Addressable Areas</h3>
|
|
|
|
<p>Some files have contents that aren't actually loaded into memory
|
|
addressable by the 6502. One example is a file header, such as a load
|
|
address extracted by the system when reading the program into memory, or
|
|
something intended to be read by an emulator. Another example is the
|
|
CHR graphic data on the NES, which is loaded into an area inaccessible
|
|
to the CPU.</p>
|
|
|
|
<p>The generated source file must recreate the original binary exactly,
|
|
but we don't really want to assign an address to non-addressable data,
|
|
because it should never be resolved as the target of a <code>JSR</code>
|
|
or <code>LDA</code>. To handle this case, you can set a region's address
|
|
to "<kbd>NA</kbd>". The assembler needs to have <i>some</i> notion of
|
|
address, so the start address will be treated as zero.</p>
|
|
|
|
<p>Non-addressable regions cannot include executable code. You may put
|
|
labels on data items, but attempting to reference them will cause a
|
|
warning and will likely generate code that doesn't assemble.</p>
|
|
|
|
<p>It's possible to delete all address regions from a project, or edit
|
|
them so that there are "holes" not covered by a region.
|
|
To handle this, all projects are effectively covered by a non-addressable
|
|
region that spans the entire file. Any part of the file that isn't
|
|
explicitly covered by a user-specified region will be provided an
|
|
auto-generated non-addressable region. Such regions don't actually exist,
|
|
so attempting to edit one will actually cause a new region to be created.</p>
|
|
|
|
|
|
<h3 id="pre-labels">Pre-Labels</h3>
|
|
|
|
<p>The need for pre-labels was illustrated in the earlier example, where
|
|
code in Tutorial1 was copied from $1017 to $2000. The fundamental issue
|
|
is that offset +000017 has <i>two</i> addresses: $1017 and $2000. The
|
|
assembler can only generate code for one. Pre-labels allow you to do
|
|
the same thing you'd do in the source code, which is to add a label
|
|
immediately before the address is changed.</p>
|
|
|
|
<p>Pre-labels are "external" symbols, similar to project symbols,
|
|
because they refer to an address that is outside the file bounds.
|
|
They're always treated as having global scope.
|
|
However, they also behave like user labels, because they're generated
|
|
as part of the instruction stream and interfere with local label
|
|
references that cross them.</p>
|
|
|
|
<p>The address of a pre-label is determined by the parent region.
|
|
Suppose you have a file with an arrangement like:</p>
|
|
<pre>
|
|
region1 start
|
|
...
|
|
region2 start
|
|
...
|
|
region2 end
|
|
region1 end
|
|
</pre>
|
|
|
|
<p>You can put a pre-label on <code>region2</code>, which will be the
|
|
address of the byte in <code>region1</code> right before the address
|
|
changed. You can't put a pre-label on <code>region1</code>, because
|
|
before <code>region1</code> there was no address. Similarly:</p>
|
|
<pre>
|
|
region1 start
|
|
...
|
|
region1 end
|
|
region2 start
|
|
...
|
|
region2 end
|
|
</pre>
|
|
|
|
<p>You can't put a pre-label on <code>region2</code> because its parent
|
|
is non-addressable. <code>region1</code>'s address doesn't apply,
|
|
because <code>region1</code> ended before the label would be issued.</p>
|
|
|
|
|
|
<h3 id="relative-addr">Relative Addressing</h3>
|
|
|
|
<p>It is occasionally useful to output an address region start directive
|
|
that uses relative addressing instead of absolute addressing. For
|
|
example, given:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
[...]
|
|
.ADDRS $2000
|
|
</pre>
|
|
<p>We could instead generate:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
[...]
|
|
.ADDRS *+$0fe9
|
|
</pre>
|
|
|
|
<p>This has no effect on the definition of the region. It only affects
|
|
how the start directive is generated in the assembly source file.</p>
|
|
|
|
<p>The value is an offset from the current assembler program counter.
|
|
If the new region is the child of a non-addressable region, a relative
|
|
offset cannot be used.</p>
|
|
|
|
|
|
|
|
<h2 id="atags">Directing the Code Analyzer</h2>
|
|
|
|
<p>Sometimes SourceGen can't automatically find the start or end of an
|
|
instruction stream, or gets confused by inline data. These situations
|
|
can be resolved by adding analyzer tags.</p>
|
|
|
|
<p><b>Code start point</b> tags tell the analyzer to add the offset
|
|
to the list of instruction start points. Suppose you've got a code
|
|
library that begins with jump vectors, like this:</p>
|
|
<pre>
|
|
1000: 4c0910 JMP $1009
|
|
1003: 4cef10 JMP $10ef
|
|
1006: 4c3012 JMP $1230
|
|
1009: 18 CLC
|
|
</pre>
|
|
|
|
<p>When opened with SourceGen, it will look like this:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP L1009
|
|
|
|
.DD1 $4c
|
|
.DD1 $ef
|
|
.DD1 $10
|
|
.DD1 $4c
|
|
.DD1 $30
|
|
.DD1 $12
|
|
L1009 CLC
|
|
</pre>
|
|
|
|
<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
|
|
assumes those are data. Further, the functions at those addresses may
|
|
also be considered data unless some bit of code reachable from
|
|
<code>L1009</code> calls into them. If you tag $1003 and $1006 as code
|
|
start points, you'll get better results:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP L1009
|
|
JMP L10ef
|
|
JMP L1230
|
|
L1009 CLC
|
|
</pre>
|
|
|
|
<p>Be careful that you only tag the instruction opcode byte. If
|
|
you tagged each and every byte from $1003 to $1008, you would
|
|
end up with a mess:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
JMP L1009
|
|
JMP ▼ L10ef
|
|
BPL ▼ L1053
|
|
JMP ▼ L1230
|
|
BMI L101b
|
|
L1009 CLC
|
|
</pre>
|
|
|
|
<p>The exact set of instructions shown depends on your CPU configuration.
|
|
The problem is that the bytes in the middle of the instruction have
|
|
been tagged as start points, so SourceGen is treating them as
|
|
embedded instructions. $EF and $12 aren't valid 6502 opcodes, so
|
|
they're being ignored, but $10 is <code>BPL</code> and $30 is
|
|
<code>BMI</code>.
|
|
Because tagging multiple consecutive bytes is rarely useful, SourceGen
|
|
only applies code start tags to the first byte in a selected line.</p>
|
|
|
|
<p><b>Code stop point</b> tags tell the analyzer when it should stop. For
|
|
example, suppose address $ff00 is known to always be nonzero, and the code
|
|
uses that fact to get a branch-always on the 6502:</p>
|
|
<pre>
|
|
.ADDRS $1000
|
|
LDA $ff00
|
|
BNE L1010
|
|
BRK $11
|
|
</pre>
|
|
|
|
<p>By tagging the <code>BRK</code> as a code stop point, you're telling the
|
|
analyzer that it should stop trying to execute code when it reaches
|
|
that point.
|
|
(Note that this example would actually be better solved by setting a
|
|
status flag override on the <code>BNE</code> that sets Z=0, so the code
|
|
tracer will know it's a branch-always and just do the right thing.)
|
|
As with code start points, code stop points should only be placed on the
|
|
opcode byte. Placing a code stop point in the middle of what SourceGen
|
|
believes to be instruction will have no effect.</p>
|
|
<p>As with code start points, only the first byte in each selected line will
|
|
be tagged.</p>
|
|
|
|
<p><b>Inline data</b> tags identify bytes as being part of the
|
|
instruction stream, but not instructions. A simple example of this
|
|
is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
|
<pre>
|
|
JSR $bf00
|
|
.DD1 $function
|
|
.DD2 $address
|
|
BCS BAD
|
|
</pre>
|
|
|
|
<p>The three bytes following the <code>JSR $bf00</code> should be tagged
|
|
as inline data, so that the code analyzer skips over them and continues the
|
|
analysis at the <code>BCS</code> instruction. You can think of these as
|
|
"code skip" tags, but they're different from stop/start points, because
|
|
every byte of inline data must be tagged. When
|
|
applying the tag, all bytes in a selected line will be modified.</p>
|
|
<p>If code branches into a region that is tagged as inline data, the
|
|
branch will be ignored.</p>
|
|
|
|
|
|
<h3 id="scripts">Extension Scripts</h3>
|
|
|
|
<p>Extension scripts are C# source files that are compiled and
|
|
executed by SourceGen. They can be added to a project from SourceGen's
|
|
runtime data directory, or can live in the directory next to the project
|
|
file. They're used to generate visualizations of graphical data, and
|
|
to format inline data automatically.</p>
|
|
<p>The inline data formatting feature can significantly reduce the tedium
|
|
in certain projects. For example, suppose the code uses a string print
|
|
routine that embeds a null-terminated string right after a JSR. Ordinarily
|
|
you'd have to walk through the code, marking every instance by hand so
|
|
the disassembler would know where the string ends and execution resumes.
|
|
With an extension script, you can just pass in the print routine's label,
|
|
and let the script do the formatting automatically.</p>
|
|
|
|
<p>To reduce the chances of a script causing problems, all scripts are
|
|
executed in a sandbox with severely restricted access. Notably, nothing
|
|
in the sandbox can access files, except to read files from the PluginDllCache
|
|
directory.</p>
|
|
<p>The PluginDllCache directory lives next to the SourceGen executable, and
|
|
contains all of the compiled script DLLs, as well as two pre-built
|
|
application DLLs that plugins are allowed access to. The contents
|
|
are persistent, to avoid recompiling the scripts every time SourceGen
|
|
is launched, but may be manually deleted without harm.</p>
|
|
<p>More details can be found in the
|
|
<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
|
|
|
|
|
|
<h2 id="pseudo-ops">Data and Directive Pseudo-Opcodes</h2>
|
|
|
|
<p>The on-screen code list shows assembler directives that are similar
|
|
to what the various cross-assemblers provide. The actual directives
|
|
generated for a given assembler may match exactly or be totally different.
|
|
The idea is to represent the concept behind the directive, then let the
|
|
code generator figure out the implementation details.</p>
|
|
|
|
<p>There are eight assembler directives that appear in the code list:</p>
|
|
<ul>
|
|
<li><code>.EQ</code> - defines a symbol's value. These are generated
|
|
automatically when an operand that matches a platform or project
|
|
symbol is found.</li>
|
|
<li><code>.VAR</code> - defines a local variable. These are generated for
|
|
local variable tables.</li>
|
|
<li><code>.ADDRS</code>/<code>.ADREND</code> - specifies the start or
|
|
end of an address region, respectively.</li>
|
|
<li><code>.RWID</code> - specifies the width of the accumulator and
|
|
index registers (65816 only). Note this doesn't change the actual
|
|
width, just tells the assembler that the width has changed.</li>
|
|
<li><code>.DBANK</code> - specifies what value the Data Bank Register holds
|
|
(65816 only). Used when matching operands to labels.</li>
|
|
<li><code>.DS</code> - identifies space set aside for variable storage.
|
|
The storage is initialized by the program before first use, so the values
|
|
in the binary don't actually matter.</li>
|
|
<li><code>.JUNK</code> - indicates that the data in a range of bytes is
|
|
irrelevant. (When generating sources, this will become
|
|
<code>.FILL</code> or <code>.BULK</code>
|
|
depending on the contents of the memory region and the assembler's
|
|
capabilities.)</li>
|
|
<li><code>.ALIGN</code> - a special case of <code>.JUNK</code> that
|
|
indicates the irrelevant bytes exist to force alignment to a
|
|
memory boundary (usually a 256-byte page). Depending on the
|
|
memory contents, it may be possible to output this as an
|
|
assembler-specific alignment directive.</li>
|
|
</ul>
|
|
|
|
<p>Every data item is represented by a pseudo-op. Some of them may
|
|
represent hundreds of bytes and span multiple lines.</p>
|
|
<ul>
|
|
<li><code>.DD1</code>, <code>.DD2</code>, <code>.DD3</code>,
|
|
<code>.DD4</code> - basic "define data" op. A 1-4 byte
|
|
little-endian value.</li>
|
|
<li><code>.DBD2</code>, <code>.DBD3</code>, <code>.DBD4</code> - "define
|
|
big-endian data". 2-4 bytes of big-endian data.
|
|
(The 3- and 4-byte versions are not currently available in the UI,
|
|
since they're very unusual and few assemblers support them.)</li>
|
|
<li><code>.BULK</code> - data packed in as compact a form as the
|
|
assembler allows. Useful for things like chunks of graphics data.</li>
|
|
<li><code>.FILL</code> - a series of identical bytes. The operand
|
|
has two parts, the byte count followed by the byte value.</li>
|
|
</ul>
|
|
|
|
<p>In addition, several pseudo-ops are defined for string constants:</p>
|
|
<ul>
|
|
<li><code>.STR</code> - basic character string.</li>
|
|
<li><code>.RSTR</code> - string in reverse order.</li>
|
|
<li><code>.ZSTR</code> - null-terminated string.</li>
|
|
<li><code>.DSTR</code> - Dextral Character Inverted string. The
|
|
high bit of the last byte is flipped.</li>
|
|
<li><code>.L1STR</code> - string prefixed with a length byte.</li>
|
|
<li><code>.L2STR</code> - string prefixed with a length word.</li>
|
|
</ul>
|
|
|
|
<p>You can configure the pseudo-operands to look more like what your
|
|
favorite assembler uses in the
|
|
<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
|
|
application settings.</p>
|
|
|
|
<p>String constants start and end with delimiter characters, typically
|
|
single or double quotes. You can configure the delimiters differently
|
|
for each character encoding, so that it's obvious whether the text is
|
|
in ASCII or PETSCII. See the
|
|
<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
|
|
the application settings.</p>
|
|
|
|
|
|
</div>
|
|
|
|
<div id="footer">
|
|
<p><a href="index.html">Back to index</a></p>
|
|
</div>
|
|
</body>
|
|
<!-- Copyright 2018 faddenSoft -->
|
|
</html>
|