mirror of
https://github.com/fadden/6502bench.git
synced 2025-02-08 05:30:35 +00:00
ORG rework, part 10 (of 10)
Update documentation. Made lots of address region changes, and split "intro" into two parts. Removed all content from "tutorials.html". This does not update the tutorial, because that goes live as soon as it's checked in.
This commit is contained in:
parent
0ac0686c7a
commit
0ca9911d0d
@ -275,24 +275,29 @@ file, and operate on that.</p>
|
||||
<h2><a name="overlap">Overlapping Address Spaces</a></h2>
|
||||
<p>Some programs use memory overlays, where multiple parts of the
|
||||
code run in the same address in RAM. Others use bank switching to access
|
||||
parts of the program that reside in separate physical RAM, but appear at
|
||||
the same address.</p>
|
||||
<p>SourceGen allows you to set the same address on multiple parts of
|
||||
a file. Branches to a given address are resolved against the current
|
||||
segment first. For example, consider this:</p>
|
||||
parts of the program that reside in separate physical RAM or ROM,
|
||||
but appear at the same address. Nested address regions allow for a
|
||||
variety of configurations, which can make address resolution complicated.</p>
|
||||
|
||||
<p>The general goal is to have references to an address resolve to
|
||||
the "nearest" match. For example, consider a simple overlay:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
.ADDRS $1000
|
||||
JMP L1100
|
||||
|
||||
.ORG $1100
|
||||
.ADDRS $1100
|
||||
L1100 BIT L1100
|
||||
L1103 LDA #$11
|
||||
BRA L1103
|
||||
.ADREND
|
||||
|
||||
.ORG $1100
|
||||
.ADDRS $1100
|
||||
L1100_0 BIT L1100_0
|
||||
L1103_0 LDA #$22
|
||||
JMP L1103_0
|
||||
.ADREND
|
||||
|
||||
.ADREND
|
||||
</pre>
|
||||
|
||||
<p>Both sections start at $1100, and have branches to $1103. The branch
|
||||
@ -302,6 +307,32 @@ the label in the second chunk. When branches originate outside the current
|
||||
address chunk, the first chunk that includes that address is used, as
|
||||
it is with the <code>JMP $1000</code> at the start of the file.</p>
|
||||
|
||||
<p>The full address-to-offset algorithm is as follows.
|
||||
There are two inputs: the file offset of the instruction or data item
|
||||
that has the reference (e.g. the JMP or LDA), and the address
|
||||
it is referring to.</p>
|
||||
<ul>
|
||||
<li>Create a tree with all address regions. Each "node" in the tree
|
||||
has an offset, length, and start address.</li>
|
||||
<li>Search the tree for a node that includes the offset of the
|
||||
reference source.
|
||||
When there are multiple overlapping regions, descend until the
|
||||
deepest child that spans the offset is found. This node will be
|
||||
the starting point of the search.</li>
|
||||
<li>Loop until we hit the top of the tree:
|
||||
<ul>
|
||||
<li>Perform a recursive depth-first search of all children of the
|
||||
current node. They're searched in order of ascending file offset.</li>
|
||||
<li>If the address wasn't found in the children, check the current
|
||||
node. If we find it here, return this node as the result.</li>
|
||||
<li>Move up to the parent node.
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<p>This searches all children and all siblings before checking the parent.
|
||||
If we hit the top of the tree without finding a match, we conclude
|
||||
that the reference is to an external address.</p>
|
||||
|
||||
|
||||
<h2><a name="reloc-data">OMF Relocation Dictionaries</a></h2>
|
||||
|
||||
|
@ -114,7 +114,7 @@ file. After a change is made, a full or partial re-analysis is done to
|
||||
fill out the Anattribs.</p>
|
||||
<p>Consider a simple example:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
.ADDRS $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
@ -323,9 +323,9 @@ if they're associated with the first (opcode) byte of an instruction.</p>
|
||||
<p>The uncategorized data analyzer tries to find character strings and
|
||||
opportunities to use the ".FILL" operation. It breaks the file into
|
||||
pieces, where contiguous regions hold nothing but data, are not split
|
||||
across a ".ORG" directive, are not interrupted by data, and do not
|
||||
contain anything that the user has chosen to format. Each region is
|
||||
scanned for matching patterns. If a match is found, a format entry
|
||||
across address region start/end directives, are not interrupted by data,
|
||||
and do not contain anything that the user has chosen to format. Each
|
||||
region is scanned for matching patterns. If a match is found, a format entry
|
||||
is added to the Anattrib array. Otherwise, data is added as single-byte
|
||||
values.</p>
|
||||
|
||||
|
@ -87,7 +87,7 @@ an additional command-line option) to the assembler.</p>
|
||||
<li>the format at offset +000000 is a 16-bit numeric data item
|
||||
(not executable code, not two 8-byte values, not the first part
|
||||
of a 24-bit value, etc.)</li>
|
||||
<li>there is an ORG directive at +000002
|
||||
<li>there is an address region start directive at +000002
|
||||
<li>the 16-bit value at +000000 is equal to the address of the
|
||||
byte at +000002</li>
|
||||
<li>there is no label at offset +000000 (explicit or auto-generated)</li>
|
||||
|
@ -14,42 +14,9 @@
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
|
||||
<h2><a name="address">Edit Address</a></h2>
|
||||
<p>This adds a target address directive (".ORG") to the current offset.
|
||||
If you leave the text field blank, the directive will be removed.</p>
|
||||
<p>The text entry field is initialized to the address of the
|
||||
first selected line. The "load address", i.e. the place where the
|
||||
code or data will live when the file is first loaded into memory,
|
||||
is shown for reference.</p>
|
||||
<p>If multiple lines were selected, some additional information will be
|
||||
shown, and an address directive will be added after the last selected
|
||||
line. This directive will set the address to the "load address".
|
||||
This is useful for "relocating" a block of code or data in the middle of
|
||||
the file. You're not allowed to do this when the selected range of
|
||||
lines spans another address directive.</p>
|
||||
<p>Addresses are always interpreted as hexadecimal. You can prefix
|
||||
it with a '$', but that's not required.
|
||||
24-bit addresses may be written with a bank separator, e.g. "12/3456"
|
||||
would resolve to address $123456.</p>
|
||||
|
||||
<p>There will always be an address directive at the start of the file.
|
||||
Attempts to remove it will be ignored.</p>
|
||||
|
||||
|
||||
<h2><a name="flags">Edit Status Flag Override</a></h2>
|
||||
<p>The state of the processor status flags are tracked for every
|
||||
instruction. Each individual flag is recorded as zero, one, or
|
||||
"indeterminate", meaning it could hold either value at the start of
|
||||
that instruction. You can override the value of individual flags.</p>
|
||||
<p>The 65816 emulation bit, which is not part of the processor status
|
||||
register, may also be set in the editor.</p>
|
||||
<p>The M, X, and E flags will not be editable unless your CPU configuration
|
||||
is set to 65816.</p>
|
||||
|
||||
|
||||
<h2><a name="label">Edit Label</a></h2>
|
||||
<p>Sets or clears a label at the selected offset. The label must have the
|
||||
<a href="intro.html#about-symbols">proper form</a>, and not have the same
|
||||
<a href="intro-details.html#about-symbols">proper form</a>, and not have the same
|
||||
name as another symbol, unless it's specified to be non-unique. If you
|
||||
edit an auto-generated label you will be required to change the name.</p>
|
||||
<p>The label may be marked as non-unique local, unique local, global,
|
||||
@ -120,7 +87,7 @@ set, or editing a local variable table.</p>
|
||||
|
||||
<p>For operands that are 8-bit, 16-bit, or 24-bit addresses, you can
|
||||
define a symbol for the address as a label or
|
||||
<a href="intro.html#symbol-types">project symbol</a>.</p>
|
||||
<a href="intro-details.html#symbol-types">project symbol</a>.</p>
|
||||
<p>If the operand is an address inside the project, you can set a
|
||||
label at that address. If the address falls in the middle of an
|
||||
instruction or multi-byte data item, its position will be adjusted to
|
||||
@ -309,7 +276,92 @@ not associated with a file offset. If you delete it, you can get it
|
||||
back by using Edit > Edit Header Comment.</p>
|
||||
|
||||
|
||||
<h2><a name="address">Define Address Region</a></h2>
|
||||
|
||||
<p>Address regions may be created, edited, resized, or removed. Which
|
||||
operation is performed depends on the current selection. You can
|
||||
specify the start and end points of a region by selecting the entire
|
||||
region, or by selecting just the first and last lines.</p>
|
||||
<p>In all cases, you can specify the range's initial address
|
||||
as a hexadecimal value. You can prefix it with '$', but that's not
|
||||
required.
|
||||
24-bit addresses may be written with a bank separator, e.g. "12/3456"
|
||||
would resolve to address $123456.
|
||||
If you want to set the region to be non-addressable, enter
|
||||
"<code>NA</code>".</p>
|
||||
|
||||
<p>You can also enter a <a href="intro-details.html#pre-labels">pre-label</a>
|
||||
or specify that the operand should be formatted as a
|
||||
<a href="intro-details.html#relative-addr">relative address</a>.
|
||||
|
||||
<p>To delete a region, click the "Delete Region" button.</p>
|
||||
|
||||
<h4>Create</h4>
|
||||
|
||||
<p>If your selection starts with a code or data line, the editor
|
||||
will allow to create a new address region. If a single line was
|
||||
selected, the default behavior will be to create a region with a
|
||||
floating end point. If multiple lines were selected, the default
|
||||
behavior will be to create a region with a fixed end point.</p>
|
||||
|
||||
<p>The address field will be initialized to the address of the
|
||||
first selected line.</p>
|
||||
|
||||
<p>You can create a child region that shares the same start offset
|
||||
as an existing region by selecting the first code or data line
|
||||
within that region. Note that regions with floating end points cannot
|
||||
have the same start offset as another region.</p>
|
||||
|
||||
<h4>Edit</h4>
|
||||
|
||||
<p>If you select only the address region start line, perhaps by
|
||||
double-clicking the operand there, you will be able to edit the
|
||||
current region's properties.</p>
|
||||
|
||||
<p>If the region has a floating end point, you can choose to convert
|
||||
it to a fixed end. The end doesn't move; it just gets fixed in place.
|
||||
This is a quick way to "lock down" regions once you've established
|
||||
their end points.</p>
|
||||
|
||||
<h4>Resize</h4>
|
||||
|
||||
<p>If you select multiple lines, and the first line is an address
|
||||
region start directive, you will be able to resize that region to
|
||||
the selection. By definition, the updated region will have a fixed
|
||||
end point.</p>
|
||||
|
||||
<h4>Other notes</h4>
|
||||
|
||||
<p>There is no affordance for moving the start offset of a region. You
|
||||
must create a new region and then delete the old one.</p>
|
||||
|
||||
<p>Regions may not "straddle" the start or end points of other regions.</p>
|
||||
|
||||
<p>Double-clicking on the pseudo-opcode of a region start or end
|
||||
declaration will move the selection to the other end, rather than
|
||||
opening the editor.</p>
|
||||
|
||||
<p>To see detailed information about an address region in the "Info"
|
||||
window, select the region start or end directive. You can see the
|
||||
current arrangement of address regions across your entire
|
||||
project with Navigate > View Address Map.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="flags">Override Status Flags</a></h2>
|
||||
|
||||
<p>The state of the processor status flags are tracked for every
|
||||
instruction. Each individual flag is recorded as zero, one, or
|
||||
"indeterminate", meaning it could hold either value at the start of
|
||||
that instruction. You can override the value of individual flags.</p>
|
||||
<p>The 65816 emulation bit, which is not part of the processor status
|
||||
register, may also be set in the editor.</p>
|
||||
<p>The M, X, and E flags will not be editable unless your CPU configuration
|
||||
is set to 65816.</p>
|
||||
|
||||
|
||||
<h2><a name="data-bank">Edit Data Bank (65816 only)</a></h2>
|
||||
|
||||
<p>Sets the Data Bank Register (DBR) value for 65816 code. This is used
|
||||
when matching 16-bit address operands with labels. The new value is
|
||||
in effect from the line where it's declared to the end of the file, even
|
||||
@ -360,7 +412,7 @@ will not be applied to addresses inside the data file. Symbols
|
||||
marked as "constant" are not applied automatically, and must be
|
||||
explicitly specified as an operand.</p>
|
||||
<p>The label must meet the criteria for symbols (see
|
||||
<a href="intro.html#about-symbols">All About Symbols</a>), and must
|
||||
<a href="intro-details.html#about-symbols">All About Symbols</a>), and must
|
||||
not have the same name as another project symbol. It can overlap
|
||||
with platform symbols and user labels.</p>
|
||||
<p>The value may be entered in decimal, hexadecimal, or binary. The numeric
|
||||
@ -380,7 +432,7 @@ the Read/Write checkboxes to specify the desired behavior.</p>
|
||||
|
||||
|
||||
<h2><a name="lvtable">Create/Edit Local Variable Table</a></h2>
|
||||
<p><a href="intro.html#local-vars">Local variables</a> are arranged in
|
||||
<p><a href="intro-details.html#local-vars">Local variables</a> are arranged in
|
||||
tables, which are created at a specific file offset. They must be
|
||||
associated with a line of code, and are usually placed at the start of
|
||||
a subroutine.
|
||||
|
@ -10,7 +10,7 @@
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen</h1>
|
||||
<h1>6502bench SourceGen Reference Manual</h1>
|
||||
<p>SourceGen is an interactive disassembler for 6502, 65C02,
|
||||
and 65816 code. The official web site is
|
||||
<a href="https://6502bench.com/">https://6502bench.com/</a>.</p>
|
||||
@ -22,29 +22,40 @@ and 65816 code. The official web site is
|
||||
<ul>
|
||||
<li><a href="intro.html">Overview</a>
|
||||
<ul>
|
||||
<li><a href="intro.html#fundamental-concepts">Fundamental Concepts</a></li>
|
||||
<li><a href="intro.html#begin">About 6502 Code</a>
|
||||
<li><a href="intro.html#fundamental-concepts">Fundamentals</a></li>
|
||||
<ul>
|
||||
<li><a href="intro.html#begin">About 6502 Code</a>
|
||||
<li><a href="intro.html#charenc">Character Encoding</a></li>
|
||||
<li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro.html#sgintro">How SourceGen Works</a>
|
||||
<li><a href="intro.html#sgintro">How SourceGen Works</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html">Digging Deeper</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#about-symbols">All About Symbols</a>
|
||||
<ul>
|
||||
<li><a href="intro.html#scripts">Extension Scripts</a></li>
|
||||
<li><a href="intro.html#atags">Code Analyzer Start, Stop, and Skip</a></li>
|
||||
<li><a href="intro-details.html#connecting-operands">Connecting Operands With Labels</a></li>
|
||||
<li><a href="intro-details.html#internal-address-symbols">Internal Address Symbols</a></li>
|
||||
<li><a href="intro-details.html#external-address-symbols">External Address Symbols</a></li>
|
||||
<li><a href="intro-details.html#unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></li>
|
||||
<li><a href="intro-details.html#weak-refs">Weak Symbolic References</a></li>
|
||||
<li><a href="intro-details.html#symbol-parts">Parts and Adjustments</a></li>
|
||||
<li><a href="intro-details.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
|
||||
<li><a href="intro.html#about-symbols">All About Symbols</a>
|
||||
<li><a href="intro-details.html#width-disambiguation">Width Disambiguation</a></li>
|
||||
<li><a href="intro-details.html#address-regions">Address Regions</a>
|
||||
<ul>
|
||||
<li><a href="intro.html#connecting-operands">Connecting Operands With Labels</a></li>
|
||||
<li><a href="intro.html#internal-address-symbols">Internal Address Symbols</a></li>
|
||||
<li><a href="intro.html#external-address-symbols">External Address Symbols</a></li>
|
||||
<li><a href="intro.html#unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></li>
|
||||
<li><a href="intro.html#weak-refs">Weak References</a></li>
|
||||
<li><a href="intro.html#symbol-parts">Parts and Adjustments</a></li>
|
||||
<li><a href="intro.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
|
||||
<li><a href="intro-details.html#fixed-float">Fixed vs. Floating</a></li>
|
||||
<li><a href="intro-details.html#non-addr">Non-Addressable Areas</a></li>
|
||||
<li><a href="intro-details.html#pre-labels">Pre-Labels</a></li>
|
||||
<li><a href="intro-details.html#relative-addr">Relative Addressing</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro.html#width-disambiguation">Width Disambiguation</a></li>
|
||||
<li><a href="intro.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
|
||||
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
|
||||
<li><a href="intro-details.html#atags">Directing the Code Analyzer</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#scripts">Extension Scripts</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="mainwin.html">Using SourceGen</a>
|
||||
@ -72,8 +83,6 @@ and 65816 code. The official web site is
|
||||
|
||||
<li><a href="editors.html">Editors</a>
|
||||
<ul>
|
||||
<li><a href="editors.html#address">Edit Address</a></li>
|
||||
<li><a href="editors.html#flags">Edit Status Flags</a></li>
|
||||
<li><a href="editors.html#label">Edit Label</a></li>
|
||||
<li><a href="editors.html#instruction-operand">Edit Instruction Operand</a>
|
||||
<ul>
|
||||
@ -84,6 +93,8 @@ and 65816 code. The official web site is
|
||||
<li><a href="editors.html#data-operand">Edit Data Operand</a></li>
|
||||
<li><a href="editors.html#comment">Edit Comment</a></li>
|
||||
<li><a href="editors.html#long-comment">Edit Long Comment</a></li>
|
||||
<li><a href="editors.html#address">Define Address Region<a></li>
|
||||
<li><a href="editors.html#flags">Override Status Flags</a></li>
|
||||
<li><a href="editors.html#data-bank">Edit Data Bank (65816 only)</a></li>
|
||||
<li><a href="editors.html#note">Edit Note</a></li>
|
||||
<li><a href="editors.html#project-symbol">Edit Project Symbol</a></li>
|
||||
|
958
SourceGen/RuntimeData/Help/intro-details.html
Normal file
958
SourceGen/RuntimeData/Help/intro-details.html
Normal file
@ -0,0 +1,958 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>More Details - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Intro Details</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="more-details">More Details</a></h2>
|
||||
|
||||
<p>This section digs a little deeper into how SourceGen works.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="about-symbols">All About Symbols</a></h2>
|
||||
|
||||
<p>A symbol has two essential parts, a label and a value. The label is a short
|
||||
ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
|
||||
constant. Symbols can be defined in different ways, and applied in
|
||||
different ways.</p>
|
||||
|
||||
<p>The label syntax is restricted to a format that should be compatible
|
||||
with most assemblers:</p>
|
||||
<ul>
|
||||
<li>2-32 characters long.</li>
|
||||
<li>Starts with a letter or underscore.</li>
|
||||
<li>Comprised of ASCII letters, numbers, and the underscore.</li>
|
||||
</ul>
|
||||
<p>Label comparisons are case-sensitive, as is customary for programming
|
||||
languages.</p>
|
||||
<p>Sometimes the purpose of a subroutine or variable isn't immediately
|
||||
clear, but you can take a reasonable guess. You can document your
|
||||
uncertainty by adding a question mark ('?') to the end of the label.
|
||||
This isn't really part of the label, so it won't appear in the assembled
|
||||
output, and you don't have to include it when searching for a symbol.</p>
|
||||
<p>Some assemblers restrict the set of valid labels further. For example,
|
||||
64tass uses a leading underscore to indicate a local label, and reserves
|
||||
a double leading underscore (e.g. <code>__label</code>) for its own
|
||||
purposes. In such cases, the label will be modified to comply with the
|
||||
target assembler syntax.</p>
|
||||
|
||||
<p>Operands may use parts of symbols. For example, if you have a label
|
||||
<code>MYSTRING</code>, you can write:</p>
|
||||
<pre>
|
||||
MYSTRING .STR "hello"
|
||||
LDA #<MYSTRING
|
||||
STA $00
|
||||
LDA #>MYSTRING
|
||||
STA $01
|
||||
</pre>
|
||||
<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
|
||||
|
||||
<p>Symbols that represent a memory address within a project are treated
|
||||
differently from those outside a project. We refer to these as internal
|
||||
and external addresses, respectively.</p>
|
||||
|
||||
|
||||
<h3><a name="connecting-operands">Connecting Operands with Labels</a></h3>
|
||||
|
||||
<p>Suppose you have the following code:</p>
|
||||
<pre>
|
||||
LDA $1234
|
||||
JSR $2345
|
||||
</pre>
|
||||
<p>If we put that in a source file, it will assemble correctly.
|
||||
However, if those addresses are part of the file, the code may break if
|
||||
changes are made and things assemble to different addresses. It would
|
||||
be better to generate code that references labels, e.g.:</p>
|
||||
<pre>
|
||||
LDA my_data
|
||||
JSR nifty_func
|
||||
</pre>
|
||||
<p>SourceGen tries to establish labels for address operands automatically.
|
||||
How this works depends on whether the operand's address is inside the file or
|
||||
external, and whether there are existing labels at or near the target
|
||||
address. The details are explored in the next few sections.</p>
|
||||
<p>On the 65816 this process is trickier, because addresses are 24 bits
|
||||
instead of 16. For a control-transfer instruction like <code>JSR</code>,
|
||||
the high 8 bits come from the Program Bank Register (K). For a data-access
|
||||
instruction like <code>LDA</code>, the high 8 bits come from the Data
|
||||
Bank Register (B). The PBR value is determined by the address in which
|
||||
the code is executing, so it's easy to determine. The DBR value can be
|
||||
set arbitrarily. Sometimes it's easy to figure out, sometimes it has
|
||||
to be specified manually.</p>
|
||||
|
||||
|
||||
<h3><a name="internal-address-symbols">Internal Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address inside the file being disassembled
|
||||
are referred to as <i>internal</i>. They come in two varieties.</p>
|
||||
|
||||
<p><b>User labels</b> are labels added to instructions or data by the user.
|
||||
The editor will try to prevent you from creating a label that has the same
|
||||
name as another symbol, but if you manage to do so, the user label takes
|
||||
precedence over symbols from other sources. User labels may be tagged
|
||||
as non-unique local, unique local, global, or global and exported. Local
|
||||
vs. global is important for the label localizer, while exported symbols
|
||||
can be pulled directly into other projects.</p>
|
||||
|
||||
<p><b>Auto labels</b> are automatically generated labels placed on
|
||||
instructions or data offsets that are the target of operands. They're
|
||||
formed by appending the hexadecimal address to the letter "L", with
|
||||
additional characters added if some other symbol has already defined
|
||||
that label. Options can be set that change the "L" to a character or
|
||||
characters based on how the label is referenced, e.g. "B" for branch targets.
|
||||
Auto labels are only added where they are needed, and are removed when
|
||||
no longer necessary. Because auto labels may be renamed or vanish, the
|
||||
editor will try to prevent you from referring to them explicitly when
|
||||
editing operands.</p>
|
||||
|
||||
|
||||
<h3><a name="external-address-symbols">External Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address outside the file being disassembled
|
||||
are referred to as <i>external</i>. These may be ROM entry points,
|
||||
data buffers, zero-page variables, or a number of other things. Because
|
||||
the memory address they appear at aren't within the bounds of the file,
|
||||
we can't simply put an address label on them. Three different mechanisms
|
||||
exist for defining them. If an instruction or data operand refers to
|
||||
an address outside the file bounds, SourceGen looks for a symbol with
|
||||
a matching address value.</p>
|
||||
|
||||
<p><b>Platform symbols</b> are defined in platform symbol files. These
|
||||
are named with a ".sym65" extension, and have a fairly straightforward
|
||||
name/value syntax. Several files for popular platforms come with SourceGen
|
||||
and live in the <code>RuntimeData</code> directory. You can also create your
|
||||
own, but they have to live in the same directory as the project file.</p>
|
||||
|
||||
<p>Platform symbols can be addresses or constants. Addresses are
|
||||
limited to 24-bit values, and are matched automatically. Constants may
|
||||
be 32-bit values, but must be specified manually.</p>
|
||||
|
||||
<p>If two platform symbols have the same label, only the most recently read
|
||||
one is kept. If two platform symbols have different labels but the
|
||||
same value, both symbols will be kept, but the one in the file loaded
|
||||
last will take priority when doing a lookup by address. If symbols with
|
||||
the same value are defined in the same file, the one whose symbol appears
|
||||
first alphabetically takes priority.</p>
|
||||
|
||||
<p>Platform address symbols have an optional width. This can be used
|
||||
to define multi-byte items, such as two-byte pointers or 256-byte stacks.
|
||||
If no width is specified, a default value of 1 is used. Widths are ignored
|
||||
for constants.
|
||||
Overlapping symbols are resolved as described earlier, with symbols loaded
|
||||
later taking priority over previously-loaded symbols. In addition,
|
||||
symbols defined closer to the target address take priority, so if you put
|
||||
a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
|
||||
be visible because the start point is closer to the addresses it covers
|
||||
than the start of the 256-byte range.</p>
|
||||
|
||||
<p>Platform symbols can be designated for reading, writing, or both.
|
||||
Normally you'd want both, but if an address is a memory-mapped I/O
|
||||
location that has different behavior for reads and writes, you'd want
|
||||
to define two different symbols, and have the correct one applied
|
||||
based on the access type.</p>
|
||||
|
||||
<p><b>Project symbols</b> behave like platform symbols, but they are
|
||||
defined in the project file itself, through the Project Properties editor.
|
||||
The editor will try to prevent you from creating two symbols with the same
|
||||
name. If two symbols have the same value, the one whose label comes
|
||||
first alphabetically is used.</p>
|
||||
|
||||
<p>Project symbols always have precedence over platform symbols, allowing
|
||||
you to redefine symbols within a project. (You can "hide" a platform
|
||||
symbol by creating a project symbol constant with the same name. Use a
|
||||
value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
|
||||
|
||||
<p><b>Address region pre-labels</b> are an oddity: they're external
|
||||
address symbols that also act like user labels. These are explained
|
||||
in more detail <a href="#pre-labels">later</a>.</p>
|
||||
|
||||
<p><b>Local variables</b> are redefinable symbols that are organized
|
||||
into tables. They're used to specify labels for zero-page addresses
|
||||
and 65816 stack-relative instructions. These are explained in more
|
||||
detail in the next section.</p>
|
||||
|
||||
|
||||
<h4><a name="local-vars">How Local Variables Work</a></h4>
|
||||
|
||||
<p>Local variables are applied to instructions that have zero
|
||||
page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
|
||||
65816 stack relative operands
|
||||
(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>). While they must be
|
||||
unique relative to other kinds of labels, they don't have to be unique
|
||||
with respect to earlier variable definitions. So you can define
|
||||
<code>TMP .EQ $10</code>, and a few lines later define
|
||||
<code>TMP .EQ $20</code>. This is handy because zero-page addresses are
|
||||
often used in different ways by different parts of the program. For
|
||||
example:</p>
|
||||
<pre>
|
||||
LDA ($00),Y
|
||||
INC $02
|
||||
... elsewhere ...
|
||||
DEC $00
|
||||
STA ($01),Y
|
||||
</pre>
|
||||
<p>If we had given <code>$00</code> the label <code>PTR</code> and
|
||||
<code>$02</code> the label <code>COUNT</code> globally,
|
||||
the second pair of instructions would look all wrong. With local
|
||||
variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
|
||||
and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
|
||||
|
||||
<p>Local variables have a value and a width. If we create a pair of
|
||||
variable definitions like this:</p>
|
||||
<pre>
|
||||
PTR .eq $00 ;2 bytes
|
||||
COUNT .eq $02 ;1 byte
|
||||
</pre>
|
||||
<p>Then this:</p>
|
||||
<pre>
|
||||
STA $00
|
||||
STX $01
|
||||
LDY $02
|
||||
</pre>
|
||||
<p>Would become:</p>
|
||||
<pre>
|
||||
STA PTR
|
||||
STX PTR+1
|
||||
LDY COUNT
|
||||
</pre>
|
||||
|
||||
<p>The scope of a variable definition starts at the point where it is
|
||||
defined, and stops when its definition is erased. There are three
|
||||
ways for a table to erase an earlier definition:</p>
|
||||
<ol>
|
||||
<li>Create a new definition with the same name.</li>
|
||||
<li>Create a new definition that has an overlapping value. For
|
||||
example, if you have a two-byte variable <code>PTR = $00</code>,
|
||||
and define a one-byte variable <code>COUNT = $01</code>, the
|
||||
definition for <code>PTR</code> will be cleared because its second
|
||||
byte overlaps.</li>
|
||||
<li>Tables have a "clear previous" flag that erases all previous
|
||||
definitions. This doesn't usually cause anything to be generated in the
|
||||
assembly sources; instead, it just causes SourceGen to stop using
|
||||
that label.</li>
|
||||
</ol>
|
||||
<p>As you might expect, you're not allowed to have duplicate labels or
|
||||
overlapping values in an individual table.</p>
|
||||
<p>If a platform/project symbol has the same value as a local variable,
|
||||
the local variable is used. If the local variable definition is cleared,
|
||||
use of the platform/project symbol will resume.</p>
|
||||
<p>Not all assemblers support redefinable variables. In those cases,
|
||||
the symbol names will be modified to be unique (e.g. the second definition
|
||||
of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
|
||||
global scope.</p>
|
||||
|
||||
|
||||
<h3><a name="unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></h3>
|
||||
|
||||
<p>Most assemblers have a notion of "local" labels, which have a scope
|
||||
that is book-ended by global labels. These are handy for generic branch
|
||||
target names like "loop" or "notzero" that you might want to use in
|
||||
multiple places. The exact definition of local variable scope varies
|
||||
between assemblers, so labels that you want to be local might have to
|
||||
be promoted to global (and probably renamed).</p>
|
||||
<p>SourceGen has a similar concept with a slight twist: they're called
|
||||
non-unique labels, because the goal is to be able to use the same
|
||||
label in more than one place. Whether or not they actually turn out
|
||||
to be local is a decision deferred to assembly source generation time.
|
||||
(You can also declare a label to be a unique local if you like; the
|
||||
auto-generated labels like "L1234" do this.)</p>
|
||||
<p>When you're writing code for an assembler, it has to be unambiguous,
|
||||
because the assembler can't guess at what the output should be. For a
|
||||
disassembler, the output is known, so a greater degree of ambiguity is
|
||||
tolerable. Instead of throwing errors and refusing to continue, the
|
||||
source generator can modify the output until it works. For example:<p>
|
||||
<pre>
|
||||
@LOOP LDX #$02
|
||||
@LOOP DEX
|
||||
BNE @LOOP
|
||||
DEY
|
||||
BNE @LOOP
|
||||
</pre>
|
||||
<p>This would confuse an assembler. SourceGen already knows which @LOOP
|
||||
is being branched to, so it can just rename one of them to "@LOOP1".</p>
|
||||
<p>One situation where non-unique labels cause difficulty is with
|
||||
weak symbolic references (see next section). For example, suppose
|
||||
the above code then did this:</p>
|
||||
<pre>
|
||||
LDA #<@LOOP
|
||||
</pre>
|
||||
<p>While it's possible to make an educated guess at which @LOOP was
|
||||
meant, it's easy to get wrong. In situations like this, it's best to
|
||||
give the labels different names.</p>
|
||||
|
||||
|
||||
<h3><a name="weak-refs">Weak Symbolic References</a></h3>
|
||||
|
||||
<p>Symbolic references in operands are "weak references". If the named
|
||||
symbol exists, the reference is used. If the symbol can't be found, the
|
||||
operand is formatted in hex instead. They're called "weak" because
|
||||
failing to resolve the reference isn't considered an error.</p>
|
||||
|
||||
<p>It's important to know this when editing a project. Consider the
|
||||
following trivial chunk of code:</p>
|
||||
|
||||
<pre>
|
||||
1000: 4c0310 JMP $1003
|
||||
1003: ea NOP
|
||||
</pre>
|
||||
|
||||
<p>When you load it into SourceGen, it will be formatted like this:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>The analyzer found the JMP operand, and created an auto label for
|
||||
address $1003. It then created a weak reference to "L1003" in the JMP
|
||||
operand.</p>
|
||||
|
||||
<p>If you edit the JMP instruction's operand to use the symbol "FOO", the
|
||||
results are probably not what you want:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP $1003
|
||||
NOP
|
||||
</pre>
|
||||
|
||||
<p>This happened because you added a weak reference to "FOO" in the operand,
|
||||
but the label doesn't exist. The operand is formatted as hex. Because
|
||||
there's no longer a reference to L1003, SourceGen removed the auto-label
|
||||
as well.</p>
|
||||
|
||||
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
|
||||
probably wanted:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP FOO
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>You don't actually need the explicit reference in the JMP instruction.
|
||||
If you edit the JMP operand and set it back to "Default", the code will
|
||||
still look the same. This is because SourceGen identified the numeric
|
||||
reference, and automatically added a symbolic reference to the label on
|
||||
the NOP instruction.</p>
|
||||
|
||||
<p>However, suppose you didn't actually want FOO as the operand label.
|
||||
You can create a project symbol, BAR with the value $1003, and then edit
|
||||
the operand to reference BAR instead. Your code would then look like:</p>
|
||||
<pre>
|
||||
BAR .EQ $1003
|
||||
.ADDRS $1000
|
||||
JMP BAR
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you change the value of BAR in the project symbol file, the operand
|
||||
will continue to refer to it, but with an adjustment. For example, if
|
||||
you changed BAR from $1003 to $1007, the code would become:</p>
|
||||
<pre>
|
||||
BAR .EQ $1007
|
||||
.ADDRS $1000
|
||||
JMP BAR-4
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you rename a label, all references to that label are updated. For
|
||||
numeric references that happens implicitly. For explicit operand
|
||||
references, the weak references are updated individually. (Modern IDEs
|
||||
call this "refactoring".)</p>
|
||||
<p>If you remove a label, all of the numeric references to it will
|
||||
reference something else, probably a new auto label. Weak references
|
||||
to the symbol will break and be formatted as hex, but will not be
|
||||
removed. Similarly, removing symbols from a platform or project file
|
||||
will break the reference but won't modify the operands.</p>
|
||||
|
||||
<h3><a name="symbol-parts">Parts and Adjustments</a></h3>
|
||||
|
||||
<p>Sometimes you want to use part of a label, or adjust the value slightly.
|
||||
(I use "adjustment" rather than "offset" to avoid confusing it with file
|
||||
offsets.) Consider the following example:</p>
|
||||
<pre>
|
||||
1000: a910 LDA #$10
|
||||
1002: 48 PHA
|
||||
1003: a906 LDA #$06
|
||||
1005: 48 PHA
|
||||
1006: 60 RTS
|
||||
1007: 4c3aff JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>This pushes the address of the JMP instruction ($1007) onto the stack,
|
||||
and jumps to it with the RTS instruction. However, RTS requires the
|
||||
address of the byte before the target instruction, so we actually push
|
||||
$1006.</p>
|
||||
|
||||
<p>The disassembler won't know that offset $1007 is code because nothing
|
||||
appears to reference it. After tagging $1007 as a code start point, the
|
||||
project looks like this:</p>
|
||||
<pre>
|
||||
LDA #$10
|
||||
PHA
|
||||
LDA #$06
|
||||
PHA
|
||||
RTS
|
||||
|
||||
JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>We set a label called "NEXT" on the JMP instruction, and then edit
|
||||
the two LDA instructions to reference the high and low parts, yielding:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #>NEXT
|
||||
PHA
|
||||
LDA #<NEXT-1
|
||||
PHA
|
||||
RTS
|
||||
|
||||
NEXT JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>SourceGen will adjust label values by whatever amount is required to
|
||||
generate the original value. If the adjustment seems wrong, make sure
|
||||
you're selecting the right part of the symbol.</p>
|
||||
|
||||
<p>Different assemblers use different syntaxes to form expressions. This
|
||||
is particularly noticeable in 65816 code. You can adjust how it appears
|
||||
on-screen from the app settings.</p>
|
||||
|
||||
<h3><a name="nearby-targets">Automatic Use of Nearby Targets</a></h3>
|
||||
|
||||
<p>Sometimes you want to use a symbol that doesn't match up with the
|
||||
operand. SourceGen tries to anticipate situations where that might be
|
||||
the case, and apply adjustments for you.</p>
|
||||
|
||||
<p>Suppose you have the following:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #$00
|
||||
STA L1010
|
||||
LDA #$20
|
||||
STA L1011
|
||||
LDA #$e1
|
||||
STA L1012
|
||||
RTS
|
||||
|
||||
L1010 .DD1 $00
|
||||
L1011 .DD1 $00
|
||||
L1012 .DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>Showing stores to three different labeled addresses is fine, but
|
||||
the code is actually setting up a single 24-bit address. For clarity,
|
||||
you'd like the output to reflect the fact that it's a single, multi-byte
|
||||
variable. So, if you set a label at $1010, SourceGen removes the
|
||||
nearby auto labels, and sets the numeric references to use your label:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #$00
|
||||
STA DATA
|
||||
LDA #$20
|
||||
STA DATA+1
|
||||
LDA #$e1
|
||||
STA DATA+2
|
||||
RTS
|
||||
|
||||
DATA .DD1 $00
|
||||
.DD1 $00
|
||||
.DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>If you decide that you really wanted each store to have its own
|
||||
label, you can set labels on the other two addresses. SourceGen won't
|
||||
search for alternate labels if the numeric reference target has a
|
||||
user-defined label.</p>
|
||||
|
||||
<p>This is also used for self-modifying code. For example:</p>
|
||||
<pre>
|
||||
1000: a9ff LDA #$ff
|
||||
1002: 8d0610 STA $1006
|
||||
1005: 4900 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>The above changes the <code>EOR #$00</code> instruction to
|
||||
<code>EOR #$ff</code>. The operand target is $1006, but we can't
|
||||
put a label there because it's in the middle of the instruction. So
|
||||
SourceGen puts a label at $1005 and adjusts it:</p>
|
||||
<pre>
|
||||
LDA #$ff
|
||||
STA L1005+1
|
||||
L1005 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>If you really don't like the way this works, you can disable the
|
||||
search for nearby targets entirely from the
|
||||
<a href="settings.html#project-properties">project properties</a>.
|
||||
Self-modifying code will always be adjusted because of the limitation
|
||||
on mid-instruction labels.</p>
|
||||
|
||||
|
||||
<h2><a name="width-disambiguation">Width Disambiguation</a></h2>
|
||||
|
||||
<p>It's possible to interpret certain instructions in multiple ways.
|
||||
For example, "LDA $0000" might be an absolute load from a 16-bit
|
||||
address, or it might be a direct page load from an 8-bit address.
|
||||
Humans can infer from the fact that it was written with a 4-digit address
|
||||
that it's meant to be absolute, but assemblers often treat operands
|
||||
purely as numbers, and would just see "LDA 0". Common practice is to
|
||||
use the shortest instruction possible.</p>
|
||||
<p>Every assembler seems to address the problem in a slightly different
|
||||
way. Some use opcode suffixes, others use operand prefixes, some
|
||||
allow both. You can configure how they appear in the
|
||||
<a href="settings.html#app-settings">application settings</a>.</p>
|
||||
<p>SourceGen will only add width disambiguators to opcodes or operands when
|
||||
they are needed, with one exception: the opcode suffix for long
|
||||
(24-bit address) operations is always applied. This is done because some
|
||||
assemblers require it, insisting on "LDAL" rather than "LDA" for an
|
||||
absolute long load, and because it can make 65816 code easier to read.</p>
|
||||
|
||||
|
||||
|
||||
<h2 id="address-regions">Address Regions</h2>
|
||||
|
||||
<p>Simple programs are loaded at a particular address and executed there.
|
||||
The source code starts with a directive that tells the assembler what the
|
||||
initial address is, and the code and data statements that follow are
|
||||
placed appropriately. More complicated programs might relocate parts
|
||||
of themselves to other parts of memory, or be comprised of multiple
|
||||
"overlay" segments that, through disk loading or bank-switching, all execute
|
||||
at the same address.</p>
|
||||
|
||||
<p>Consider the code in the first tutorial. It loads at $1000, copies
|
||||
part of itself to $2000, and transfers execution there:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
1000: a0 71 LDY #$71
|
||||
1002: b9 17 10 L1002 LDA SRC,y
|
||||
1005: 99 00 20 STA MAIN,y
|
||||
1008: 88 DEY
|
||||
1009: 30 09 BMI L1014
|
||||
100b: 10 f5 BPL L1002
|
||||
|
||||
100d: 00 .DD1 $00
|
||||
100e: 68 65 6c 6c+ .STR "hello!"
|
||||
|
||||
1014: 4c 00 20 L1014 JMP MAIN
|
||||
|
||||
1017: SRC
|
||||
.ADDRS $2000
|
||||
2000: ad 00 30 MAIN LDA $3000
|
||||
[...]
|
||||
</pre>
|
||||
|
||||
<p>The arrangement of this code can be viewed in a couple of ways. One
|
||||
way is to see it linearly: the code starts at $1000, continues to $1017,
|
||||
then restarts at $2000:</p>
|
||||
<pre>
|
||||
+000000 +- start
|
||||
| $1000 - $1016 length=23 ($0017)
|
||||
+000016 +- end (floating)
|
||||
|
||||
+000017 +- start 'MAIN'
|
||||
| $2000 - $2070 length=113 ($0071)
|
||||
+000087 +- end (floating)
|
||||
</pre>
|
||||
|
||||
<p>The other way to picture it is hierarchical: the file loads
|
||||
fully at $1000, and has a "child" region at offset +000017 in which the
|
||||
address changes to $2000:</p>
|
||||
<pre>
|
||||
+000000 +- start
|
||||
| $1000 - $1016 length=23 ($0017)
|
||||
+000017 | +- start 'MAIN' pre='SRC'
|
||||
| | $2000 - $2070 length=113 ($0071)
|
||||
+000087 | +- end
|
||||
+000087 +- end
|
||||
</pre>
|
||||
|
||||
<p>The latter is closer to what many assemblers expect, with a "physical"
|
||||
PC that starts where the file is loaded, and a "logical" or "pseudo" PC
|
||||
that determines how the code is generated. SourceGen supports both
|
||||
approaches. The only thing that would change in this example is that
|
||||
the nested approach allows the "SRC" label to exist. (More on this
|
||||
later, on the section on <a href="#pre-labels">pre-labels</a>.)</p>
|
||||
|
||||
<p>The real value of a hierarchical arrangement becomes apparent when
|
||||
the area copied out of the file is only a small part of it. For
|
||||
example, suppose something like:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA SUB_SRC,Y
|
||||
STA SUB_DST,Y
|
||||
JMP CONT
|
||||
|
||||
SUB_SRC
|
||||
.ADDRS $2000
|
||||
SUB_DST [small routine]
|
||||
.ADREND
|
||||
|
||||
CONT LDA #$12
|
||||
JSR SUB_DST
|
||||
</pre>
|
||||
<p>In this case, a small routine is copied out of the middle of the
|
||||
code that lives at $1000. We want the code at CONT to pick up where
|
||||
things left off. If SUB_SRC is at $1009, and is 23 bytes long, then
|
||||
CONT should be $1020. We could output <code>.ADDRS $1020</code>
|
||||
directly before CONT, but it's inconvenient to work with the generated
|
||||
code if we want to modify the subroutine (changing its length)
|
||||
and re-assemble it.</p>
|
||||
|
||||
|
||||
<h3 id="fixed-float">Fixed vs. Floating</h3>
|
||||
|
||||
<p>Sometimes when disassembling code you know exactly where an address
|
||||
region starts and ends. Other times you know where it starts, but won't
|
||||
know where it stops until you've had a chance to look at the updated
|
||||
disassembly. In the former case you create a region with a "fixed" end
|
||||
point, in the latter you create one with a "floating" end point.</p>
|
||||
<p>Address regions with fixed end points always stop in the same place.
|
||||
Regions with floating end points stop at the next address region boundary,
|
||||
which means they can change size as regions are added or removed.
|
||||
The end will be placed for either the start of a new region (a "sibling"),
|
||||
or the end of an encapsulating region (the "parent").</p>
|
||||
|
||||
<p>Regions that overlap must have a parent/child relationship. Whichever
|
||||
one starts last or ends first is the child. A strict ordering is necessary
|
||||
because a given file offset can only have one address, and if we don't
|
||||
know which region is the child we can't know which address to assign.
|
||||
Regions cannot straddle the start or end of another region, and cannot
|
||||
exactly overlap (have the same start and length) as another region.
|
||||
One consequence of these rules is that "floating" regions cannot share
|
||||
a start offset with another region, because their end point would be
|
||||
adjusted to match the end of the other region.</p>
|
||||
|
||||
<p>The arrangement of regions is particularly important when attempting
|
||||
to resolve an address operand (such as a JSR) to a location within the
|
||||
file. The process is straightforward if the address only appears once,
|
||||
but when overlays cause multiple parts of the file to have the same
|
||||
address, the operand target may be in different places depending on where
|
||||
the call is being made from.
|
||||
The algorithm for resolving addresses is described
|
||||
in the <a href="advanced.html#overlap">advanced topics</a> section.</p>
|
||||
|
||||
|
||||
<h3 id="non-addr">Non-Addressable Areas</h3>
|
||||
|
||||
<p>Some files have contents that aren't actually loaded into memory
|
||||
addressable by the 6502. One example is a file header, such as a load
|
||||
address extracted by the system when reading the program into memory, or
|
||||
something intended to be read by an emulator. Another example is the
|
||||
CHR graphic data on the NES, which is loaded into an area inaccessible
|
||||
to the CPU.</p>
|
||||
|
||||
<p>The generated source file must recreate the original binary exactly,
|
||||
but we don't really want to assign an address to non-addressable data,
|
||||
because it should never be resolved as the target of a JSR or LDA. To
|
||||
handle this case, you can set a region's address to "NA". The assembler
|
||||
needs to have <i>some</i> notion of address, so the start address will
|
||||
be treated as zero.</p>
|
||||
|
||||
<p>Non-addressable regions cannot include executable code. You may put
|
||||
labels on data items, but attempting to reference them will cause a
|
||||
warning and will likely generate code that doesn't assemble.</p>
|
||||
|
||||
<p>It's possible to delete all address regions from a project, or edit
|
||||
them so that there are "holes" not covered by a region.
|
||||
To handle this, all projects are effectively covered by a non-addressable
|
||||
region that spans the entire file. Any part of the file that isn't
|
||||
explicitly covered by a user-specified region will be provided an
|
||||
auto-generated non-addressable region. Such regions don't actually exist,
|
||||
so attempting to edit one will actually cause a new region to be created.</p>
|
||||
|
||||
|
||||
<h3 id="pre-labels">Pre-Labels</h3>
|
||||
|
||||
<p>The need for pre-labels was illustrated in the earlier example, where
|
||||
code in Tutorial1 was copied from $1017 to $2000. The fundamental issue
|
||||
is that offset +000017 has <i>two</i> addresses: $1017 and $2000. The
|
||||
assembler can only generate code for one. Pre-labels allow you to do
|
||||
the same thing you'd do in the source code, which is to add a label
|
||||
immediately before the address is changed.</p>
|
||||
|
||||
<p>Pre-labels are "external" symbols, similar to project symbols,
|
||||
because they refer to an address that is outside the file bounds.
|
||||
They're always treated as having global scope.
|
||||
However, they also behave like user labels, because they're generated
|
||||
as part of the instruction stream and interfere with local label
|
||||
references that cross them.</p>
|
||||
|
||||
<p>The address of a pre-label is determined by the parent region.
|
||||
Suppose you have a file with an arrangement like:</p>
|
||||
<pre>
|
||||
region1 start
|
||||
...
|
||||
region2 start
|
||||
...
|
||||
region2 end
|
||||
region1 end
|
||||
</pre>
|
||||
|
||||
<p>You can put a pre-label on <code>region2</code>, which will be the
|
||||
address of the byte in <code>region1</code> right before the address
|
||||
changed. You can't put a pre-label on <code>region1</code>, because
|
||||
before <code>region1</code> there was no address. Similarly:</p>
|
||||
<pre>
|
||||
region1 start
|
||||
...
|
||||
region1 end
|
||||
region2 start
|
||||
...
|
||||
region2 end
|
||||
</pre>
|
||||
|
||||
<p>You can't put a pre-label on <code>region2</code> because its parent
|
||||
is non-addressable. <code>region1</code>'s address doesn't apply,
|
||||
because <code>region1</code> ended before the label would be issued.</p>
|
||||
|
||||
|
||||
<h3 id="relative-addr">Relative Addressing</h3>
|
||||
|
||||
<p>It is occasionally useful to output an address region start directive
|
||||
that uses relative addressing instead of absolute addressing. For
|
||||
example, given:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
[...]
|
||||
.ADDRS $2000
|
||||
</pre>
|
||||
<p>We could instead generate:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
[...]
|
||||
.ADDRS *+$0fe9
|
||||
</pre>
|
||||
|
||||
<p>This has no effect on the definition of the region. It only affects
|
||||
how the start directive is generated in the assembly source file.</p>
|
||||
|
||||
<p>The value is an offset from the current assembler program counter.
|
||||
If the new region is the child of a non-addressable region, a relative
|
||||
offset cannot be used.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="atags">Directing the Code Analyzer</a></h2>
|
||||
|
||||
<p>Sometimes SourceGen can't automatically find the start or end of an
|
||||
instruction stream, or gets confused by inline data. These situations
|
||||
can be resolved by adding analyzer tags.</p>
|
||||
|
||||
<p><b>Code start point</b> tags tell the analyzer to add the offset
|
||||
to the list of instruction start points. Suppose you've got a code
|
||||
library that begins with jump vectors, like this:</p>
|
||||
<pre>
|
||||
1000: 4c0910 JMP $1009
|
||||
1003: 4cef10 JMP $10ef
|
||||
1006: 4c3012 JMP $1230
|
||||
1009: 18 CLC
|
||||
</pre>
|
||||
|
||||
<p>When opened with SourceGen, it will look like this:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
|
||||
.DD1 $4c
|
||||
.DD1 $ef
|
||||
.DD1 $10
|
||||
.DD1 $4c
|
||||
.DD1 $30
|
||||
.DD1 $12
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
|
||||
assumes those are data. Further, the functions at those addresses may
|
||||
also be considered data unless some bit of code reachable from L1009
|
||||
calls into them. If you tag $1003 and $1006 as code start points,
|
||||
you'll get better results:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
JMP L10ef
|
||||
JMP L1230
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>Be careful that you only tag the instruction opcode byte. If
|
||||
you tagged each and every byte from $1003 to $1008, you would
|
||||
end up with a mess:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
JMP ▼ L10ef
|
||||
BPL ▼ L1053
|
||||
JMP ▼ L1230
|
||||
BMI L101b
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>The exact set of instructions shown depends on your CPU configuration.
|
||||
The problem is that the bytes in the middle of the instruction have
|
||||
been tagged as start points, so SourceGen is treating them as
|
||||
embedded instructions. $EF and $12 aren't valid 6502 opcodes, so
|
||||
they're being ignored, but $10 is BPL and $30 is BMI. Because tagging
|
||||
multiple consecutive bytes is rarely useful, SourceGen only applies code
|
||||
start tags to the first byte in a selected line.</p>
|
||||
|
||||
<p><b>Code stop point</b> tags tell the analyzer when it should stop. For
|
||||
example, suppose address $ff00 is known to always be nonzero, and the code
|
||||
uses that fact to get a branch-always on the 6502:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA $ff00
|
||||
BNE L1010
|
||||
BRK $11
|
||||
</pre>
|
||||
|
||||
<p>By tagging the BRK as a code stop point, you're telling the analyzer that
|
||||
it should stop trying to execute code when it reaches that point. (Note
|
||||
that this example would actually be better solved by setting a status flag
|
||||
override on the BNE that sets Z=0, so the code tracer will know it's a
|
||||
branch-always and just do the right thing.) As with code start points,
|
||||
code stop points should only be placed on the opcode byte. Placing a
|
||||
code stop point in the middle of what SourceGen believes to be instruction
|
||||
will have no effect.</p>
|
||||
<p>As with code start points, only the first byte in each selected line will
|
||||
be tagged.</p>
|
||||
|
||||
<p><b>Inline data</b> tags identify bytes as being part of the
|
||||
instruction stream, but not instructions. A simple example of this
|
||||
is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
||||
<pre>
|
||||
JSR $bf00
|
||||
.DD1 $function
|
||||
.DD2 $address
|
||||
BCS BAD
|
||||
</pre>
|
||||
|
||||
<p>The three bytes following the <code>JSR $bf00</code> should be tagged
|
||||
as inline data, so that the code analyzer skips over them and continues the
|
||||
analysis at the <code>BCS</code> instruction. You can think of these as
|
||||
"code skip" tags, but they're different from stop/start points, because
|
||||
every byte of inline data must be tagged. When
|
||||
applying the tag, all bytes in a selected line will be modified.</p>
|
||||
<p>If code branches into a region that is tagged as inline data, the
|
||||
branch will be ignored.</p>
|
||||
|
||||
|
||||
<h3><a name="scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts are C# source files that are compiled and
|
||||
executed by SourceGen. They can be added to a project from SourceGen's
|
||||
runtime data directory, or can live in the directory next to the project
|
||||
file. They're used to generate visualizations of graphical data, and
|
||||
to format inline data automatically.</p>
|
||||
<p>The inline data formatting feature can significantly reduce the tedium
|
||||
in certain projects. For example, suppose the code uses a string print
|
||||
routine that embeds a null-terminated string right after a JSR. Ordinarily
|
||||
you'd have to walk through the code, marking every instance by hand so
|
||||
the disassembler would know where the string ends and execution resumes.
|
||||
With an extension script, you can just pass in the print routine's label,
|
||||
and let the script do the formatting automatically.</p>
|
||||
|
||||
<p>To reduce the chances of a script causing problems, all scripts are
|
||||
executed in a sandbox with severely restricted access. Notably, nothing
|
||||
in the sandbox can access files, except to read files from the PluginDll
|
||||
directory.</p>
|
||||
<p>The PluginDll directory lives next to the SourceGen executable, and
|
||||
contains all of the compiled script DLLs, as well as two pre-built
|
||||
application DLLs that plugins are allowed access to. The contents
|
||||
are persistent, to avoid recompiling the scripts every time SourceGen
|
||||
is launched, but may be manually deleted without harm.</p>
|
||||
<p>More details can be found in the
|
||||
<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
|
||||
|
||||
|
||||
<h2><a name="pseudo-ops">Data and Directive Pseudo-Opcodes</a></h2>
|
||||
|
||||
<p>The on-screen code list shows assembler directives that are similar
|
||||
to what the various cross-assemblers provide. The actual directives
|
||||
generated for a given assembler may match exactly or be totally different.
|
||||
The idea is to represent the concept behind the directive, then let the
|
||||
code generator figure out the implementation details.</p>
|
||||
|
||||
<p>There are eight assembler directives that appear in the code list:</p>
|
||||
<ul>
|
||||
<li>.EQ - defines a symbol's value. These are generated automatically
|
||||
when an operand that matches a platform or project symbol is found.</li>
|
||||
<li>.VAR - defines a local variable. These are generated for
|
||||
local variable tables.</li>
|
||||
<li>.ADDRS/.ADREND - specifies the start or end of an
|
||||
address region.</li>
|
||||
<li>.RWID - specifies the width of the accumulator and index registers
|
||||
(65816 only). Note this doesn't change the actual width, just tells
|
||||
the assembler that the width has changed.</li>
|
||||
<li>.DBANK - specifies what value the Data Bank Register holds
|
||||
(65816 only). Used when matching operands to labels.</li>
|
||||
<li>.JUNK - indicates that the data in a range of bytes is irrelevant.
|
||||
(When generating sources, this will become .FILL or .BULK
|
||||
depending on the contents of the memory region and the assembler's
|
||||
capabilities.)</li>
|
||||
<li>.ALIGN - a special case of .JUNK that indicates the irrelevant
|
||||
bytes exist to force alignment to a memory boundary (usually a
|
||||
256-byte page). Depending on the memory contents, it may be possible
|
||||
to output this as an assembler-specific alignment directive.</li>
|
||||
</ul>
|
||||
|
||||
<p>Every data item is represented by a pseudo-op. Some of them may
|
||||
represent hundreds of bytes and span multiple lines.</p>
|
||||
<ul>
|
||||
<li>.DD1, .DD2, .DD3, .DD4 - basic "define data" op. A 1-4 byte
|
||||
little-endian value.</li>
|
||||
<li>.DBD2, .DBD3, .DBD4 - "define big-endian data". 2-4 bytes of
|
||||
big-endian data. (The 3- and 4-byte versions are not currently
|
||||
available in the UI, since they're very unusual and few assemblers
|
||||
support them.)</li>
|
||||
<li>.BULK - data packed in as compact a form as the assembler allows.
|
||||
Useful for chunks of graphics data.</li>
|
||||
<li>.FILL - a series of identical bytes. The operand
|
||||
has two parts, the byte count followed by the byte value.</li>
|
||||
</ul>
|
||||
|
||||
<p>In addition, several pseudo-ops are defined for string constants:</p>
|
||||
<ul>
|
||||
<li>.STR - basic character string.</li>
|
||||
<li>.RSTR - string in reverse order.</li>
|
||||
<li>.ZSTR - null-terminated string.</li>
|
||||
<li>.DSTR - Dextral Character Inverted string. The high bit of the
|
||||
last byte is flipped.</li>
|
||||
<li>.L1STR - string prefixed with a length byte.</li>
|
||||
<li>.L2STR - string prefixed with a length word.</li>
|
||||
</ul>
|
||||
|
||||
<p>You can configure the pseudo-operands to look more like what your
|
||||
favorite assembler uses in the
|
||||
<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
|
||||
application settings.</p>
|
||||
|
||||
<p>String constants start and end with delimiter characters, typically
|
||||
single or double quotes. You can configure the delimiters differently
|
||||
for each character encoding, so that it's obvious whether the text is
|
||||
in ASCII or PETSCII. See the
|
||||
<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
|
||||
the application settings.</p>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
@ -53,12 +53,12 @@ disassembler does, so it remains to be seen whether SourceGen succeeds at
|
||||
what it's trying to do, and also whether what it's trying to do is
|
||||
something that people actually want.</p>
|
||||
|
||||
<p>You can get started by watching the
|
||||
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and playing with the
|
||||
<a href="tutorials.html">tutorials</a>.</p>
|
||||
<p>You can get started by watching a
|
||||
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and working through
|
||||
the <a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
|
||||
|
||||
|
||||
<h2><a name="fundamental-concepts">Fundamental Concepts</a></h2>
|
||||
<h2><a name="fundamental-concepts">Fundamentals</a></h2>
|
||||
|
||||
<p>The next few sections present some general concepts and terminology. The
|
||||
rest of the documentation assumes you've read and understood this.</p>
|
||||
@ -68,7 +68,7 @@ other programs is actually a pretty good way to learn how to code in
|
||||
assembly. You will need to be familiar with hexadecimal numbers and
|
||||
general programming concepts to make sense of this, however.</p>
|
||||
|
||||
<h2><a name="begin">About 6502 Code</a></h2>
|
||||
<h3><a name="begin">About 6502 Code</a></h3>
|
||||
|
||||
<p>For brevity's sake, "6502 code" should be taken to mean "code for
|
||||
the 6502 CPU or any of its derivatives, including but not limited to
|
||||
@ -120,8 +120,9 @@ executed there. If you're disassembling an executing program you don't
|
||||
have to worry about this, but if you're disassembling the binary from the
|
||||
loadable file on disk then you need to track the address changes. The
|
||||
address is communicated to the assembler with a "pseudo-opcode", usually
|
||||
something like "ORG". Other pseudo-op directives are used to define external
|
||||
symbols and (for 65816 code) register widths.</p>
|
||||
something like "ORG" (short for "origin"). Other pseudo-op directives
|
||||
are used to define things like constants and (for 65816 code)
|
||||
register widths.</p>
|
||||
|
||||
<p>The 8-bit CPUs have a 16-bit (64KiB) address space, so addresses can
|
||||
range from $0000 to $ffff. (I'm going to write hex values with a
|
||||
@ -183,169 +184,7 @@ bell ($07), linefeed ($0a), and carriage return ($0d) are recognized as
|
||||
string data, and in C64 PETSCII a number of text color and formatting
|
||||
control codes are also allowed.</p>
|
||||
|
||||
|
||||
<h2><a name="sgintro">How SourceGen Works</a></h2>
|
||||
|
||||
<p>SourceGen employs a partial emulation technique that traces the flow
|
||||
of execution. Most of what a given instruction does isn't important;
|
||||
only its effect on the flow of execution matters.</p>
|
||||
|
||||
<p>The code tracing has to start somewhere, so SourceGen uses "code start
|
||||
points" to identify places where execution may begin. By default,
|
||||
the first byte of the file is tagged as a start point. From there, the
|
||||
tracing process walks through the code, pursuing all branches. In many
|
||||
cases, if you tag all external entry points, SourceGen will automatically
|
||||
find all executable code and separate it from variable storage and
|
||||
data areas.</p>
|
||||
|
||||
<p>As noted earlier, tracking the processor status flags can make the
|
||||
analysis more accurate. Identifying situations where a branch instruction
|
||||
is always or never taken avoids mis-categorizing a data region as code.
|
||||
On the 65816, it's absolutely crucial to track the M/X flags, since those
|
||||
affect the width of instructions. SourceGen tracks the value of the
|
||||
processor flags at every instruction, blending sets of flags together when
|
||||
multiple paths of execution converge.</p>
|
||||
|
||||
<p>Once instructions and data have been separated, the instruction operands
|
||||
can be examined. Branches, loads, and stores that reference an address
|
||||
that falls inside the address space covered by the file can be replaced
|
||||
with a symbol. Operands that refer to addresses outside the file, such
|
||||
as ROM or operating system routines, can be replaced with a symbol defined
|
||||
by an equate directive.</p>
|
||||
|
||||
(For more details on how this works, see the
|
||||
<a href="analysis.html">analysis appendix</a>.)
|
||||
|
||||
|
||||
<h3><a name="scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts are C# source files that are compiled and
|
||||
executed by SourceGen. They can be added to a project from SourceGen's
|
||||
runtime data directory, or can live in the directory next to the project
|
||||
file.</p>
|
||||
<p>In the current implementation, scripts are only called to examine
|
||||
JSR, JSL, and BRK instructions. They can format nearby bytes as inline
|
||||
data, or apply symbols to operands.</p>
|
||||
|
||||
<p>To reduce the chances of a script causing problems, all scripts are
|
||||
executed in a sandbox with severely restricted access. Notably, nothing
|
||||
in the sandbox can access files, except to read files from the PluginDll
|
||||
directory.</p>
|
||||
<p>The PluginDll directory lives next to the SourceGen executable, and
|
||||
contains all of the compiled script DLLs, as well as two pre-built
|
||||
application DLLs that plugins are allowed access to. The contents
|
||||
are persistent, to avoid recompiling the scripts every time SourceGen
|
||||
is launched, but may be manually deleted without harm.</p>
|
||||
<p>More details can be found in the
|
||||
<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
|
||||
|
||||
|
||||
<h3><a name="atags">Code Analyzer Stop, Start, and Skip</a></h3>
|
||||
|
||||
<p>Sometimes SourceGen can't automatically find the start or end of an
|
||||
instruction stream, or gets confused by inline data. These situations
|
||||
can be resolved by adding analyzer tags.</p>
|
||||
|
||||
<p><b>Code start point</b> tags tell the analyzer to add the offset
|
||||
to the list of instruction start points. Suppose you've got a code
|
||||
library that begins with jump vectors, like this:</p>
|
||||
<pre>
|
||||
1000: 4c0910 JMP $1009
|
||||
1003: 4cef10 JMP $10ef
|
||||
1006: 4c3012 JMP $1230
|
||||
1009: 18 CLC
|
||||
</pre>
|
||||
|
||||
<p>When opened with SourceGen, it will look like this:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1009
|
||||
|
||||
.DD1 $4c
|
||||
.DD1 $ef
|
||||
.DD1 $10
|
||||
.DD1 $4c
|
||||
.DD1 $30
|
||||
.DD1 $12
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
|
||||
assumes those are data. Further, the functions at those addresses may
|
||||
also be considered data unless some bit of code reachable from L1009
|
||||
calls into them. If you tag $1003 and $1006 as code start points,
|
||||
you'll get better results:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1009
|
||||
JMP L10ef
|
||||
JMP L1230
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>Be careful that you only tag the instruction opcode byte. If
|
||||
you tagged each and every byte from $1003 to $1008, you would
|
||||
end up with a mess:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1009
|
||||
JMP ▼ L10ef
|
||||
BPL ▼ L1053
|
||||
JMP ▼ L1230
|
||||
BMI L101b
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>The exact set of instructions shown depends on your CPU configuration.
|
||||
The problem is that the bytes in the middle of the instruction have
|
||||
been tagged as start points, so SourceGen is treating them as
|
||||
embedded instructions. $EF and $12 aren't valid 6502 opcodes, so
|
||||
they're being ignored, but $10 is BPL and $30 is BMI. Because tagging
|
||||
multiple consecutive bytes is rarely useful, SourceGen only applies code
|
||||
start tags to the first byte in a selected line.</p>
|
||||
|
||||
<p><b>Code stop point</b> tags tell the analyzer when it should stop. For
|
||||
example, suppose address $ff00 is known to always be nonzero, and the code
|
||||
uses that fact to get a branch-always on the 6502:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
LDA $ff00
|
||||
BNE L1010
|
||||
BRK $11
|
||||
</pre>
|
||||
|
||||
<p>By tagging the BRK as a code stop point, you're telling the analyzer that
|
||||
it should stop trying to execute code when it reaches that point. (Note
|
||||
that this example would actually be better solved by setting a status flag
|
||||
override on the BNE that sets Z=0, so the code tracer will know it's a
|
||||
branch-always and just do the right thing.) As with code start points,
|
||||
code stop points should only be placed on the opcode byte. Placing a
|
||||
code stop point in the middle of what SourceGen believes to be instruction
|
||||
will have no effect.</p>
|
||||
<p>As with code start points, only the first byte in each selected line will
|
||||
be tagged.</p>
|
||||
|
||||
<p><b>Inline data</b> tags identify bytes as being part of the
|
||||
instruction stream, but not instructions. A simple example of this
|
||||
is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
||||
<pre>
|
||||
JSR $bf00
|
||||
.DD1 $function
|
||||
.DD2 $address
|
||||
BCS BAD
|
||||
</pre>
|
||||
|
||||
<p>The three bytes following the <code>JSR $bf00</code> should be tagged
|
||||
as inline data, so that the code analyzer skips over them and continues the
|
||||
analysis at the <code>BCS</code> instruction. You can think of these as
|
||||
"code skip" tags, but they're different from stop/start points, because
|
||||
every byte of inline data must be tagged. When
|
||||
applying the tag, all bytes in a selected line will be modified.</p>
|
||||
<p>If code branches into a region that is tagged as inline data, the
|
||||
branch will be ignored.</p>
|
||||
|
||||
|
||||
<h2><a name="sgconcepts">SourceGen Concepts</a></h2>
|
||||
<h3><a name="sgconcepts">SourceGen Concepts</a></h3>
|
||||
|
||||
<p>As you work on a disassembled file, formatting operands and adding
|
||||
comments, everything you do is saved in the project file as "meta data".
|
||||
@ -408,570 +247,40 @@ can happen to labels. SourceGen will try to prevent this from happening
|
||||
by splitting formatted data into sub-regions at label boundaries.</p>
|
||||
|
||||
|
||||
<h2><a name="about-symbols">All About Symbols</a></h2>
|
||||
|
||||
<p>A symbol has two basic parts, a label and a value. The label is a short
|
||||
ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
|
||||
constant. Symbols can be defined in different ways, and applied in
|
||||
different ways.</p>
|
||||
|
||||
<p>The label syntax is restricted to a format that should be compatible
|
||||
with most assemblers:</p>
|
||||
<ul>
|
||||
<li>2-32 characters long.</li>
|
||||
<li>Starts with a letter or underscore.</li>
|
||||
<li>Comprised of ASCII letters, numbers, and the underscore.</li>
|
||||
</ul>
|
||||
<p>Label comparisons are case-sensitive, as is customary for programming
|
||||
languages.</p>
|
||||
<p>Sometimes the purpose of a subroutine or variable isn't immediately
|
||||
clear, but you can take a reasonable guess. You can document your
|
||||
uncertainty by adding a question mark ('?') to the end of the label.
|
||||
This isn't really part of the label, so it won't appear in the assembled
|
||||
output, and you don't have to include it when searching for a symbol.</p>
|
||||
<p>Some assemblers restrict the set of valid labels further. For example,
|
||||
64tass uses a leading underscore to indicate a local label, and reserves
|
||||
a double leading underscore (e.g. <code>__label</code>) for its own
|
||||
purposes. In such cases, the label will be modified to comply with the
|
||||
target assembler syntax.</p>
|
||||
|
||||
<p>Operands may use parts of symbols. For example, if you have a label
|
||||
<code>MYSTRING</code>, you can write:</p>
|
||||
<pre>
|
||||
MYSTRING .STR "hello"
|
||||
LDA #<MYSTRING
|
||||
STA $00
|
||||
LDA #>MYSTRING
|
||||
STA $01
|
||||
</pre>
|
||||
<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
|
||||
|
||||
<p>Symbols that represent a memory address within a project are treated
|
||||
differently from those outside a project. We refer to these as internal
|
||||
and external addresses, respectively.</p>
|
||||
|
||||
|
||||
<h3><a name="connecting-operands">Connecting Operands with Labels</a></h3>
|
||||
|
||||
<p>Suppose you have the following code:</p>
|
||||
<pre>
|
||||
LDA $1234
|
||||
JSR $2345
|
||||
</pre>
|
||||
<p>If we put that in a source file, it will assemble correctly.
|
||||
However, if those addresses are part of the file, the code may break if
|
||||
changes are made and things assemble to different addresses. It would
|
||||
be better to generate code that references labels, e.g.:</p>
|
||||
<pre>
|
||||
LDA my_data
|
||||
JSR nifty_func
|
||||
</pre>
|
||||
<p>SourceGen tries to establish labels for address operands automatically.
|
||||
How this works depends on whether the operand's address is inside the file or
|
||||
external, and whether there are existing labels at or near the target
|
||||
address. The details are explored in the next few sections.</p>
|
||||
<p>On the 65816 this process is trickier, because addresses are 24 bits
|
||||
instead of 16. For a control-transfer instruction like <code>JSR</code>,
|
||||
the high 8 bits come from the Program Bank Register (K). For a data-access
|
||||
instruction like <code>LDA</code>, the high 8 bits come from the Data
|
||||
Bank Register (B). The PBR value is determined by the address in which
|
||||
the code is executing, so it's easy to determine. The DBR value can be
|
||||
set arbitrarily. Sometimes it's easy to figure out, sometimes it has
|
||||
to be specified manually.</p>
|
||||
|
||||
|
||||
<h3><a name="internal-address-symbols">Internal Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address inside the file being disassembled
|
||||
are referred to as <i>internal</i>. They come in two varieties.</p>
|
||||
|
||||
<p><b>User labels</b> are labels added to instructions or data by the user.
|
||||
The editor will try to prevent you from creating a label that has the same
|
||||
name as another symbol, but if you manage to do so, the user label takes
|
||||
precedence over symbols from other sources. User labels may be tagged
|
||||
as non-unique local, unique local, global, or global and exported. Local
|
||||
vs. global is important for the label localizer, while exported symbols
|
||||
can be pulled directly into other projects.</p>
|
||||
|
||||
<p><b>Auto labels</b> are automatically generated labels placed on
|
||||
instructions or data offsets that are the target of operands. They're
|
||||
formed by appending the hexadecimal address to the letter "L", with
|
||||
additional characters added if some other symbol has already defined
|
||||
that label. Options can be set that change the "L" to a character or
|
||||
characters based on how the label is referenced, e.g. "B" for branch targets.
|
||||
Auto labels are only added where they are needed, and are removed when
|
||||
no longer necessary. Because auto labels may be renamed or vanish, the
|
||||
editor will try to prevent you from referring to them explicitly when
|
||||
editing operands.</p>
|
||||
|
||||
|
||||
<h3><a name="external-address-symbols">External Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address outside the file being disassembled
|
||||
are referred to as <i>external</i>. These may be ROM entry points,
|
||||
data buffers, zero-page variables, or a number of other things. Because
|
||||
the memory address they appear at aren't within the bounds of the file,
|
||||
we can't simply put an address label on them. Three different mechanisms
|
||||
exist for defining them. If an instruction or data operand refers to
|
||||
an address outside the file bounds, SourceGen looks for a symbol with
|
||||
a matching address value.</p>
|
||||
|
||||
<p><b>Platform symbols</b> are defined in platform symbol files. These
|
||||
are named with a ".sym65" extension, and have a fairly straightforward
|
||||
name/value syntax. Several files for popular platforms come with SourceGen
|
||||
and live in the <code>RuntimeData</code> directory. You can also create your
|
||||
own, but they have to live in the same directory as the project file.</p>
|
||||
|
||||
<p>Platform symbols can be addresses or constants. Addresses are
|
||||
limited to 24-bit values, and are matched automatically. Constants may
|
||||
be 32-bit values, but must be specified manually.</p>
|
||||
|
||||
<p>If two platform symbols have the same label, only the most recently read
|
||||
one is kept. If two platform symbols have different labels but the
|
||||
same value, both symbols will be kept, but the one in the file loaded
|
||||
last will take priority when doing a lookup by address. If symbols with
|
||||
the same value are defined in the same file, the one whose symbol appears
|
||||
first alphabetically takes priority.</p>
|
||||
|
||||
<p>Platform address symbols have an optional width. This can be used
|
||||
to define multi-byte items, such as two-byte pointers or 256-byte stacks.
|
||||
If no width is specified, a default value of 1 is used. Widths are ignored
|
||||
for constants.
|
||||
Overlapping symbols are resolved as described earlier, with symbols loaded
|
||||
later taking priority over previously-loaded symbols. In addition,
|
||||
symbols defined closer to the target address take priority, so if you put
|
||||
a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
|
||||
be visible because the start point is closer to the addresses it covers
|
||||
than the start of the 256-byte range.</p>
|
||||
|
||||
<p>Platform symbols can be designated for reading, writing, or both.
|
||||
Normally you'd want both, but if an address is a memory-mapped I/O
|
||||
location that has different behavior for reads and writes, you'd want
|
||||
to define two different symbols, and have the correct one applied
|
||||
based on the access type.</p>
|
||||
|
||||
<p><b>Project symbols</b> behave like platform symbols, but they are
|
||||
defined in the project file itself, through the Project Properties editor.
|
||||
The editor will try to prevent you from creating two symbols with the same
|
||||
name. If two symbols have the same value, the one whose label comes
|
||||
first alphabetically is used.</p>
|
||||
|
||||
<p>Project symbols always have precedence over platform symbols, allowing
|
||||
you to redefine symbols within a project. (You can "hide" a platform
|
||||
symbol by creating a project symbol constant with the same name. Use a
|
||||
value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
|
||||
|
||||
<p><b>Local variables</b> are redefinable symbols that are organized
|
||||
into tables. They're used to specify labels for zero-page addresses
|
||||
and 65816 stack-relative instructions. These are explained in more
|
||||
detail in the next section.</p>
|
||||
|
||||
|
||||
<h4><a name="local-vars">How Local Variables Work</a></h4>
|
||||
|
||||
<p>Local variables are applied to instructions that have zero
|
||||
page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
|
||||
65816 stack relative operands
|
||||
(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>). While they must be
|
||||
unique relative to other kinds of labels, they don't have to be unique
|
||||
with respect to earlier variable definitions. So you can define
|
||||
<code>TMP .EQ $10</code>, and a few lines later define
|
||||
<code>TMP .EQ $20</code>. This is handy because zero-page addresses are
|
||||
often used in different ways by different parts of the program. For
|
||||
example:</p>
|
||||
<pre>
|
||||
LDA ($00),Y
|
||||
INC $02
|
||||
... elsewhere ...
|
||||
DEC $00
|
||||
STA ($01),Y
|
||||
</pre>
|
||||
<p>If we had given <code>$00</code> the label <code>PTR</code> and
|
||||
<code>$02</code> the label <code>COUNT</code> globally,
|
||||
the second pair of instructions would look all wrong. With local
|
||||
variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
|
||||
and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
|
||||
|
||||
<p>Local variables have a value and a width. If we create a pair of
|
||||
variable definitions like this:</p>
|
||||
<pre>
|
||||
PTR .eq $00 ;2 bytes
|
||||
COUNT .eq $02 ;1 byte
|
||||
</pre>
|
||||
<p>Then this:</p>
|
||||
<pre>
|
||||
STA $00
|
||||
STX $01
|
||||
LDY $02
|
||||
</pre>
|
||||
<p>Would become:</p>
|
||||
<pre>
|
||||
STA PTR
|
||||
STX PTR+1
|
||||
LDY COUNT
|
||||
</pre>
|
||||
|
||||
<p>The scope of a variable definition starts at the point where it is
|
||||
defined, and stops when its definition is erased. There are three
|
||||
ways for a table to erase an earlier definition:</p>
|
||||
<ol>
|
||||
<li>Create a new definition with the same name.</li>
|
||||
<li>Create a new definition that has an overlapping value. For
|
||||
example, if you have a two-byte variable <code>PTR = $00</code>,
|
||||
and define a one-byte variable <code>COUNT = $01</code>, the
|
||||
definition for <code>PTR</code> will be cleared because its second
|
||||
byte overlaps.</li>
|
||||
<li>Tables have a "clear previous" flag that erases all previous
|
||||
definitions. This doesn't usually cause anything to be generated in the
|
||||
assembly sources; instead, it just causes SourceGen to stop using
|
||||
that label.</li>
|
||||
</ol>
|
||||
<p>As you might expect, you're not allowed to have duplicate labels or
|
||||
overlapping values in an individual table.</p>
|
||||
<p>If a platform/project symbol has the same value as a local variable,
|
||||
the local variable is used. If the local variable definition is cleared,
|
||||
use of the platform/project symbol will resume.</p>
|
||||
<p>Not all assemblers support redefinable variables. In those cases,
|
||||
the symbol names will be modified to be unique (e.g. the second definition
|
||||
of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
|
||||
global scope.</p>
|
||||
|
||||
|
||||
<h3><a name="unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></h3>
|
||||
|
||||
<p>Most assemblers have a notion of "local" labels, which have a scope
|
||||
that is book-ended by global labels. These are handy for generic branch
|
||||
target names like "loop" or "notzero" that you might want to use in
|
||||
multiple places. The exact definition of local variable scope varies
|
||||
between assemblers, so labels that you want to be local might have to
|
||||
be promoted to global (and probably renamed).</p>
|
||||
<p>SourceGen has a similar concept with a slight twist: they're called
|
||||
non-unique labels, because the goal is to be able to use the same
|
||||
label in more than one place. Whether or not they actually turn out
|
||||
to be local is a decision deferred to assembly source generation time.
|
||||
(You can also declare a label to be a unique local if you like; the
|
||||
auto-generated labels like "L1234" do this.)</p>
|
||||
<p>When you're writing code for an assembler, it has to be unambiguous,
|
||||
because the assembler can't guess at what the output should be. For a
|
||||
disassembler, the output is known, so a greater degree of ambiguity is
|
||||
tolerable. Instead of throwing errors and refusing to continue, the
|
||||
source generator can modify the output until it works. For example:<p>
|
||||
<pre>
|
||||
@LOOP LDX #$02
|
||||
@LOOP DEX
|
||||
BNE @LOOP
|
||||
DEY
|
||||
BNE @LOOP
|
||||
</pre>
|
||||
<p>This would confuse an assembler. SourceGen already knows which @LOOP
|
||||
is being branched to, so it can just rename one of them to "@LOOP1".</p>
|
||||
<p>One situation where non-unique labels cause difficulty is with
|
||||
weak symbolic references (see next section). For example, suppose
|
||||
the above code then did this:</p>
|
||||
<pre>
|
||||
LDA #<@LOOP
|
||||
</pre>
|
||||
<p>While it's possible to make an educated guess at which @LOOP was
|
||||
meant, it's easy to get wrong. In situations like this, it's best to
|
||||
give the labels different names.</p>
|
||||
|
||||
|
||||
<h3><a name="weak-refs">Weak Symbolic References</a></h3>
|
||||
|
||||
<p>Symbolic references in operands are "weak references". If the named
|
||||
symbol exists, the reference is used. If the symbol can't be found, the
|
||||
operand is formatted in hex instead. They're called "weak" because
|
||||
failing to resolve the reference isn't considered an error.</p>
|
||||
|
||||
<p>It's important to know this when editing a project. Consider the
|
||||
following trivial chunk of code:</p>
|
||||
|
||||
<pre>
|
||||
1000: 4c0310 JMP $1003
|
||||
1003: ea NOP
|
||||
</pre>
|
||||
|
||||
<p>When you load it into SourceGen, it will be formatted like this:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>The analyzer found the JMP operand, and created an auto label for
|
||||
address $1003. It then created a weak reference to "L1003" in the JMP
|
||||
operand.</p>
|
||||
|
||||
<p>If you edit the JMP instruction's operand to use the symbol "FOO", the
|
||||
results are probably not what you want:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP $1003
|
||||
NOP
|
||||
</pre>
|
||||
|
||||
<p>This happened because you added a weak reference to "FOO" in the operand,
|
||||
but the label doesn't exist. The operand is formatted as hex. Because
|
||||
there's no longer a reference to L1003, SourceGen removed the auto-label
|
||||
as well.</p>
|
||||
|
||||
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
|
||||
probably wanted:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP FOO
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>You don't actually need the explicit reference in the JMP instruction.
|
||||
If you edit the JMP operand and set it back to "Default", the code will
|
||||
still look the same. This is because SourceGen identified the numeric
|
||||
reference, and automatically added a symbolic reference to the label on
|
||||
the NOP instruction.</p>
|
||||
|
||||
<p>However, suppose you didn't actually want FOO as the operand label.
|
||||
You can create a project symbol, BAR with the value $1003, and then edit
|
||||
the operand to reference BAR instead. Your code would then look like:</p>
|
||||
<pre>
|
||||
BAR .EQ $1003
|
||||
.ORG $1000
|
||||
JMP BAR
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you change the value of BAR in the project symbol file, the operand
|
||||
will continue to refer to it, but with an adjustment. For example, if
|
||||
you changed BAR from $1003 to $1007, the code would become:</p>
|
||||
<pre>
|
||||
BAR .EQ $1007
|
||||
.ORG $1000
|
||||
JMP BAR-4
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you rename a label, all references to that label are updated. For
|
||||
numeric references that happens implicitly. For explicit operand
|
||||
references, the weak references are updated individually. (Modern IDEs
|
||||
call this "refactoring".)</p>
|
||||
<p>If you remove a label, all of the numeric references to it will
|
||||
reference something else, probably a new auto label. Weak references
|
||||
to the symbol will break and be formatted as hex, but will not be
|
||||
removed. Similarly, removing symbols from a platform or project file
|
||||
will break the reference but won't modify the operands.</p>
|
||||
|
||||
<h3><a name="symbol-parts">Parts and Adjustments</a></h3>
|
||||
|
||||
<p>Sometimes you want to use part of a label, or adjust the value slightly.
|
||||
(I use "adjustment" rather than "offset" to avoid confusing it with file
|
||||
offsets.) Consider the following example:</p>
|
||||
<pre>
|
||||
1000: a910 LDA #$10
|
||||
1002: 48 PHA
|
||||
1003: a906 LDA #$06
|
||||
1005: 48 PHA
|
||||
1006: 60 RTS
|
||||
1007: 4c3aff JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>This pushes the address of the JMP instruction ($1007) onto the stack,
|
||||
and jumps to it with the RTS instruction. However, RTS requires the
|
||||
address of the byte before the target instruction, so we actually push
|
||||
$1006.</p>
|
||||
|
||||
<p>The disassembler won't know that offset $1007 is code because nothing
|
||||
appears to reference it. After tagging $1007 as a code start point, the
|
||||
project looks like this:</p>
|
||||
<pre>
|
||||
LDA #$10
|
||||
PHA
|
||||
LDA #$06
|
||||
PHA
|
||||
RTS
|
||||
|
||||
JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>We set a label called "NEXT" on the JMP instruction, and then edit
|
||||
the two LDA instructions to reference the high and low parts, yielding:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
LDA #>NEXT
|
||||
PHA
|
||||
LDA #<NEXT-1
|
||||
PHA
|
||||
RTS
|
||||
|
||||
NEXT JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>SourceGen will adjust label values by whatever amount is required to
|
||||
generate the original value. If the adjustment seems wrong, make sure
|
||||
you're selecting the right part of the symbol.</p>
|
||||
|
||||
<p>Different assemblers use different syntaxes to form expressions. This
|
||||
is particularly noticeable in 65816 code. You can adjust how it appears
|
||||
on-screen from the app settings.</p>
|
||||
|
||||
<h3><a name="nearby-targets">Automatic Use of Nearby Targets</a></h3>
|
||||
|
||||
<p>Sometimes you want to use a symbol that doesn't match up with the
|
||||
operand. SourceGen tries to anticipate situations where that might be
|
||||
the case, and apply adjustments for you.</p>
|
||||
|
||||
<p>Suppose you have the following:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
LDA #$00
|
||||
STA L1010
|
||||
LDA #$20
|
||||
STA L1011
|
||||
LDA #$e1
|
||||
STA L1012
|
||||
RTS
|
||||
|
||||
L1010 .DD1 $00
|
||||
L1011 .DD1 $00
|
||||
L1012 .DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>Showing stores to three different labeled addresses is fine, but
|
||||
the code is actually setting up a single 24-bit address. For clarity,
|
||||
you'd like the output to reflect the fact that it's a single, multi-byte
|
||||
variable. So, if you set a label at $1010, SourceGen removes the
|
||||
nearby auto labels, and sets the numeric references to use your label:</p>
|
||||
|
||||
<pre>
|
||||
.ORG $1000
|
||||
LDA #$00
|
||||
STA DATA
|
||||
LDA #$20
|
||||
STA DATA+1
|
||||
LDA #$e1
|
||||
STA DATA+2
|
||||
RTS
|
||||
|
||||
DATA .DD1 $00
|
||||
.DD1 $00
|
||||
.DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>If you decide that you really wanted each store to have its own
|
||||
label, you can set labels on the other two addresses. SourceGen won't
|
||||
search for alternate labels if the numeric reference target has a
|
||||
user-defined label.</p>
|
||||
|
||||
<p>This is also used for self-modifying code. For example:</p>
|
||||
<pre>
|
||||
1000: a9ff LDA #$ff
|
||||
1002: 8d0610 STA $1006
|
||||
1005: 4900 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>The above changes the <code>EOR #$00</code> instruction to
|
||||
<code>EOR #$ff</code>. The operand target is $1006, but we can't
|
||||
put a label there because it's in the middle of the instruction. So
|
||||
SourceGen puts a label at $1005 and adjusts it:</p>
|
||||
<pre>
|
||||
LDA #$ff
|
||||
STA L1005+1
|
||||
L1005 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>If you really don't like the way this works, you can disable the
|
||||
search for nearby targets entirely from the
|
||||
<a href="settings.html#project-properties">project properties</a>.
|
||||
Self-modifying code will always be adjusted because of the limitation
|
||||
on mid-instruction labels.</p>
|
||||
|
||||
|
||||
<h2><a name="width-disambiguation">Width Disambiguation</a></h2>
|
||||
|
||||
<p>It's possible to interpret certain instructions in multiple ways.
|
||||
For example, "LDA $0000" might be an absolute load from a 16-bit
|
||||
address, or it might be a direct page load from an 8-bit address.
|
||||
Humans can infer from the fact that it was written with a 4-digit address
|
||||
that it's meant to be absolute, but assemblers often treat operands
|
||||
purely as numbers, and would just see "LDA 0". Common practice is to
|
||||
use the shortest instruction possible.</p>
|
||||
<p>Every assembler seems to address the problem in a slightly different
|
||||
way. Some use opcode suffixes, others use operand prefixes, some
|
||||
allow both. You can configure how they appear in the
|
||||
<a href="settings.html#app-settings">application settings</a>.</p>
|
||||
<p>SourceGen will only add width disambiguators to opcodes or operands when
|
||||
they are needed, with one exception: the opcode suffix for long
|
||||
(24-bit address) operations is always applied. This is done because some
|
||||
assemblers require it, insisting on "LDAL" rather than "LDA" for an
|
||||
absolute long load, and because it can make 65816 code easier to read.</p>
|
||||
|
||||
|
||||
<h2><a name="pseudo-ops">Data and Directive Pseudo-Opcodes</a></h2>
|
||||
|
||||
<p>The on-screen code list shows assembler directives that are similar
|
||||
to what the various cross-assemblers provide. The actual directives
|
||||
generated for a given assembler may match exactly or be totally different.
|
||||
The idea is to represent the concept behind the directive, then let the
|
||||
code generator figure out the implementation details.</p>
|
||||
|
||||
<p>There are six assembler directives that appear in the code list:</p>
|
||||
<ul>
|
||||
<li>.EQ - defines a symbol's value. These are generated automatically
|
||||
when an operand that matches a platform or project symbol is found.</li>
|
||||
<li>.VAR - defines a local variable. These are generated for
|
||||
local variable tables.</li>
|
||||
<li>.ORG - changes the target address.</li>
|
||||
<li>.RWID - specifies the width of the accumulator and index registers
|
||||
(65816 only). Note this doesn't change the actual width, just tells
|
||||
the assembler that the width has changed.</li>
|
||||
<li>.DBANK - specifies what value the Data Bank Register holds
|
||||
(65816 only). Used when matching operands to labels.</li>
|
||||
<li>.JUNK - indicates that the data in a range of bytes is irrelevant.
|
||||
(When generating sources, this will become .FILL or .BULK
|
||||
depending on the contents of the memory region and the assembler's
|
||||
capabilities.)</li>
|
||||
<li>.ALIGN - a special case of .JUNK that indicates the irrelevant
|
||||
bytes exist to force alignment to a memory boundary (usually a
|
||||
256-byte page). Depending on the memory contents, it may be possible
|
||||
to output this as an assembler-specific alignment directive.</li>
|
||||
</ul>
|
||||
|
||||
<p>Every data item is represented by a pseudo-op. Some of them may
|
||||
represent hundreds of bytes and span multiple lines.</p>
|
||||
<ul>
|
||||
<li>.DD1, .DD2, .DD3, .DD4 - basic "define data" op. A 1-4 byte
|
||||
little-endian value.</li>
|
||||
<li>.DBD2, .DBD3, .DBD4 - "define big-endian data". 2-4 bytes of
|
||||
big-endian data. (The 3- and 4-byte versions are not currently
|
||||
available in the UI, since they're very unusual and few assemblers
|
||||
support them.)</li>
|
||||
<li>.BULK - data packed in as compact a form as the assembler allows.
|
||||
Useful for chunks of graphics data.</li>
|
||||
<li>.FILL - a series of identical bytes. The operand
|
||||
has two parts, the byte count followed by the byte value.</li>
|
||||
</ul>
|
||||
|
||||
<p>In addition, several pseudo-ops are defined for string constants:</p>
|
||||
<ul>
|
||||
<li>.STR - basic character string.</li>
|
||||
<li>.RSTR - string in reverse order.</li>
|
||||
<li>.ZSTR - null-terminated string.</li>
|
||||
<li>.DSTR - Dextral Character Inverted string. The high bit of the
|
||||
last byte is flipped.</li>
|
||||
<li>.L1STR - string prefixed with a length byte.</li>
|
||||
<li>.L2STR - string prefixed with a length word.</li>
|
||||
</ul>
|
||||
|
||||
<p>You can configure the pseudo-operands to look more like what your
|
||||
favorite assembler uses in the
|
||||
<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
|
||||
application settings.</p>
|
||||
|
||||
<p>String constants start and end with delimiter characters, typically
|
||||
single or double quotes. You can configure the delimiters differently
|
||||
for each character encoding, so that it's obvious whether the text is
|
||||
in ASCII or PETSCII. See the
|
||||
<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
|
||||
the application settings.</p>
|
||||
|
||||
<h2><a name="sgintro">How SourceGen Works</a></h2>
|
||||
|
||||
<p>SourceGen employs a partial emulation technique that traces the flow
|
||||
of execution through the program. Most of what a given instruction does
|
||||
isn't important; only its effect on the flow of execution matters. This
|
||||
makes SourceGen different from most other disassemblers, because instead
|
||||
of assuming everything is code and expecting the user to separate out the
|
||||
data, it assumes everything is data and asks the user to identify where the
|
||||
code starts executing.</p>
|
||||
|
||||
<p>SourceGen uses "code start points" to tag places where execution may
|
||||
begin. By default, the first byte of the file is marked as a start point.
|
||||
From there, the tracing process walks through the code, pursuing all
|
||||
branches. In many cases, if you tag all external entry points, SourceGen
|
||||
will automatically find all executable code and separate it from variable
|
||||
storage and data areas.</p>
|
||||
|
||||
<p>As noted earlier, tracking the processor status flags can make the
|
||||
analysis more accurate. Identifying situations where a branch instruction
|
||||
is always or never taken avoids mis-categorizing a data region as code.
|
||||
On the 65816, it's absolutely crucial to track the M/X flags, since those
|
||||
affect the width of instructions. SourceGen tracks the value of the
|
||||
processor flags at every instruction, blending sets of flags together when
|
||||
multiple paths of execution converge.</p>
|
||||
|
||||
<p>Once instructions and data have been separated, the instruction operands
|
||||
can be examined. Branches, loads, and stores that reference an address
|
||||
that falls inside the address space covered by the file can be replaced
|
||||
with a symbol. Operands that refer to addresses outside the file, such
|
||||
as ROM or operating system routines, can be replaced with a symbol defined
|
||||
by an equate directive.</p>
|
||||
|
||||
(For more details on how this works, see the
|
||||
<a href="analysis.html">analysis appendix</a>.)
|
||||
|
||||
</div>
|
||||
|
||||
|
@ -314,7 +314,7 @@ selection to the navigation stack. This makes notes useful as bookmarks.</p>
|
||||
|
||||
<h3><a name="symbols">Symbols Window</a></h3>
|
||||
|
||||
<p>All known <a href="intro.html#about-symbols">symbols</a> are shown
|
||||
<p>All known <a href="intro-details.html#about-symbols">symbols</a> are shown
|
||||
here. The filter buttons allow you to screen out symbols you're not
|
||||
interested in, such as platform symbols or constants.</p>
|
||||
|
||||
@ -327,9 +327,9 @@ way to move around the file, jumping from label to label.</p>
|
||||
|
||||
<p>The "type" column uses a two-letter code to identify the symbol's
|
||||
type and scope. The first letter is one of A (auto), U (user),
|
||||
P (platform), J (project), or V (variable). The second letter is one
|
||||
of N (non-unique local), L (local), G (global), X (exported),
|
||||
E (external), or C (constant).</p>
|
||||
P (platform), J (project), R (pre-label), or V (variable).
|
||||
The second letter is one of N (non-unique local), L (local), G (global),
|
||||
X (exported), E (external), or C (constant).</p>
|
||||
|
||||
|
||||
<h3><a name="info">Info Window</a></h3>
|
||||
@ -397,7 +397,8 @@ will jump to the nearest instance.</p>
|
||||
|
||||
<p>If an instruction or data line has an operand that references an address
|
||||
in the file, you can navigate to the operand's location with
|
||||
Navigate > Jump to Operand.</p>
|
||||
Navigate > Jump to Operand. You can also do this by double-clicking
|
||||
in the opcode column.</p>
|
||||
|
||||
<p>When you edit something, lines throughout the listing can change. This
|
||||
is different from a source code editor, where editing a line just changes
|
||||
|
@ -113,7 +113,7 @@ specified by the chosen assembler.</p>
|
||||
do not affect generated code.</p>
|
||||
|
||||
<p>The
|
||||
<a href="intro.html#width-disambiguation">operand width disambiguator</a>
|
||||
<a href="intro-details.html#width-disambiguation">operand width disambiguator</a>
|
||||
strings are used when the width of an instruction operand is unclear.
|
||||
You may specify values for all of them or none of them.</p>
|
||||
|
||||
@ -221,13 +221,13 @@ are required to process the file correctly.</p>
|
||||
They specify which CPU to use, which extension scripts to load, and a
|
||||
variety of other things that directly impact how SourceGen processes
|
||||
the project. Because of the potential impact, all changes to
|
||||
the project properties are made through the undo/redo buffer.</p>
|
||||
the project properties are made through the undo/redo buffer,
|
||||
which means you hit "undo" to revert a property change.</p>
|
||||
|
||||
<p>The properties editor is divided into four tabs. Changes aren't pushed
|
||||
out to the main application until you close the dialog. Clicking Apply
|
||||
will capture the current changes, ensuring that they're applied even if
|
||||
you later hit Cancel, but the changes are not applied immediately.</p>
|
||||
<p>All changes are subject to undo/redo.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-general">General</a></h3>
|
||||
@ -315,7 +315,7 @@ can be labeled with the letter 'B'.</p>
|
||||
|
||||
<h3><a name="projprop-projsym">Project Symbols</a></h3>
|
||||
<p>You can add, edit, and delete individual symbols and constants.
|
||||
See the <a href="intro.html#about-symbols">symbols</a> section for an
|
||||
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
|
||||
explanation of how project symbols work.</p>
|
||||
|
||||
<p>The Edit Symbol button opens the
|
||||
@ -337,7 +337,7 @@ Project Symbols tab selected by hitting F6 from the main code list.</p>
|
||||
<h3><a name="projprop-symfiles">Symbol Files</a></h3>
|
||||
<p>From here, you can add and remove platform symbol files, or change
|
||||
the order in which they are loaded.
|
||||
See the <a href="intro.html#about-symbols">symbols</a> section for an
|
||||
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
|
||||
explanation of how platform symbols work, and the
|
||||
<a href="advanced.html#platform-symbols">advanced topics</a> section
|
||||
for a description of the file syntax.</p>
|
||||
|
@ -11,874 +11,10 @@
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Tutorials</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<p><strong>NOTE:</strong> this tutorial has been superseded by
|
||||
<p><strong>NOTE:</strong> this tutorial has been replaced by
|
||||
content on the 6502bench web site. Visit
|
||||
<a href="tutorials.html">https://6502bench.com/sgtutorial/</a>.</p>
|
||||
<p> </p>
|
||||
|
||||
<p>The tutorials introduce SourceGen and cover some of the basic
|
||||
features. They skim lightly over some important concepts, like the
|
||||
difference between numeric and symbolic references, so reading the
|
||||
manual is recommended.</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="#basic-features">#1: Basic Features</a></li>
|
||||
<li><a href="#advanced-features">#2: Advanced Features</a></li>
|
||||
<li><a href="#address-tables">#3: Address Table Formatting</a></li>
|
||||
<li><a href="#extension-scripts">#4: Extension Scripts</a></li>
|
||||
<li><a href="#visualizations">#5: Visualizations</a></li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h2><a name="basic-features">Tutorial #1: Basic Features</a></h2>
|
||||
|
||||
<p>Start by launching SourceGen. The initial screen has a large
|
||||
center area with some buttons, and some mostly-empty windows on the sides.
|
||||
The buttons are shortcuts for items in the File menu.</p>
|
||||
|
||||
|
||||
<h3>Create the project</h3>
|
||||
|
||||
<p>Click the "Start new project" button.</p>
|
||||
|
||||
<p>The New Project window has three parts. The top-left window has a
|
||||
tree of known platforms, arranged by manufacturer. The top-right window
|
||||
provides some details on whichever platform is selected. The bottom
|
||||
window will have some information about the data file, once we choose one.</p>
|
||||
<p>Scroll down in the list, and select "Generic 6502". Then click
|
||||
"Select File...", navigate to the SourceGen installation directory,
|
||||
open the "Examples" folder, then open the "Tutorial" folder. Select the
|
||||
file named "Tutorial1", and click "Open".</p>
|
||||
<p>The filename now appears in the bottom window, along with an indication
|
||||
of the file's size.</p>
|
||||
<p>Click "OK" to create the project.</p>
|
||||
|
||||
|
||||
<h3>Getting Around</h3>
|
||||
|
||||
<p>The first thing we'll do is save the project. Some features create or
|
||||
load files from the directory where the project lives, so we want to
|
||||
establish that.</p>
|
||||
<p>Select File > Save, which will bring up a standard save-file dialog.
|
||||
Make sure you're in still in the Examples/Tutorial folder. The default
|
||||
project file name is "Tutorial1.dis65", which is what we want, so just
|
||||
click "Save".</p>
|
||||
|
||||
<p>The display is divided into rows, one per line of disassembled code
|
||||
or data. This is a standard Windows "list view", so you can select a row
|
||||
by left-clicking anywhere in it. Use Ctrl+Click to toggle the selection
|
||||
on individual lines, and Shift+Click to select a range of lines. You can
|
||||
move the selection around with the up/down arrow keys and PgUp/PgDn. Scroll
|
||||
the window with the mouse wheel or by dragging the scroll bar.</p>
|
||||
|
||||
<p>Each row is divided into nine columns. You can adjust the column
|
||||
widths by clicking and dragging the column dividers in the header. The
|
||||
columns on the right side of the screen are similar to what you'd find
|
||||
in assembly source code: label, opcode, operand, comment. The columns
|
||||
on the left are what you'd find in a disassembly (file offset, address,
|
||||
raw bytes), plus some information about processor status flags and line
|
||||
attributes that may or may not be useful to you. If you find any of
|
||||
these distracting, collapse the column.</p>
|
||||
|
||||
<p>Click on the fourth line down, which has address 1002. The line has
|
||||
a label, "L1002", and is performing an indexed load from L1017. Both
|
||||
of these labels were automatically generated, and are named for the
|
||||
address at which they appear. When you clicked on the line, a few
|
||||
things happened:</p>
|
||||
<ul>
|
||||
<li>The line was highlighted in the system selection color (usually
|
||||
blue).</li>
|
||||
<li>Address 1017 and label L1017 were highlighted. When you select
|
||||
a line with an operand that targets an in-file address, the target
|
||||
address is highlighted.</li>
|
||||
<li>An entry appeared in the References window. This tells you that the
|
||||
only reference to L1002 is a branch from address $100B.</li>
|
||||
<li>The Info window filled with a bunch of text that describes the
|
||||
line format and some details about the LDA instruction.</li>
|
||||
</ul>
|
||||
|
||||
<p>Click some other lines, such as address $100B and $1014. Note how the
|
||||
highlights and contents of other windows change.</p>
|
||||
<p>Click on L1002 again, then double-click on the opcode ("LDA"). The
|
||||
selection jumps to L1017. When an operand references an in-file address,
|
||||
double-clicking on the opcode will take you to it. (Double-clicking on
|
||||
the operand itself opens a format editor; more on that later.)</p>
|
||||
<p>With line L1017 selected, double-click on the line that appears in the
|
||||
References window. Note the selection jumps to L1002. You can immediately
|
||||
jump to any reference.</p>
|
||||
<p>At the top of the Symbols window on the right side of the screen is a
|
||||
row of buttons. Make sure "Auto" and "Addr" are selected. You should see
|
||||
three labels in the window (L1002, L1014, L1017). Double-click on "L1014"
|
||||
in the Symbols list. The selection jumps to the appropriate line.</p>
|
||||
|
||||
<p>Select Navigate > Find. Type "hello", and hit Enter. The selection will
|
||||
move to address $100E, which is a string that says "hello!". You can use
|
||||
Navigate > Find Next to try to find the next occurrence (there isn't one). You
|
||||
can search for any text that appears in the rightmost columns (label, opcode,
|
||||
operand, comment).</p>
|
||||
<p>Select Navigate > Go To. You can enter a label, address, or file offset.
|
||||
Enter "100b" to set the selection to the line at address $100B.</p>
|
||||
|
||||
<p>Near the top-left of the SourceGen window is a set of toolbar icons.
|
||||
Click the curly left-pointing arrow, and watch the selection move. Click
|
||||
it again. Then click the curly right-arrow a couple of times. Whenever
|
||||
you jump around in the file by using the Go To feature, or by double-clicking
|
||||
on opcodes or lines in the side windows, the locations are added to a
|
||||
navigation history. The arrows let you move forward and backward
|
||||
through it.</p>
|
||||
|
||||
|
||||
<h3>Editing</h3>
|
||||
|
||||
<p>Click the very first line of the file, which is a comment that says
|
||||
something like "6502bench SourceGen vX.Y.Z". There are three ways to
|
||||
open the comment editor:</p>
|
||||
<ol>
|
||||
<li>Select Actions > Edit Long Comment from the menu bar.</li>
|
||||
<li>Right click, and select Edit Long Comment from the
|
||||
pop-up menu. (This menu is exactly the same as the Actions menu.)</li>
|
||||
<li>Double-click the comment</li>
|
||||
</ol>
|
||||
<p>Most things in the code list will respond to a double-click.
|
||||
Double-clicking on addresses, flags, labels, operands, and comments will
|
||||
open editors for those things. Double-clicking on a value in the "bytes"
|
||||
column will open a floating hex dump viewer. This is usually the most
|
||||
convenient way to edit something: point and click.</p>
|
||||
<p>Double-click the comment to open the editor. Type some words into the
|
||||
upper window, and note that a formatted version appears in the bottom
|
||||
window. Experiment with the maximum line width and "render in box"
|
||||
settings to see what they do. You can hit Enter to create line breaks,
|
||||
or let SourceGen wrap lines for you. When you're done, click "OK". (Or
|
||||
hit Ctrl+Enter.)</p>
|
||||
<p>When the dialog closes, you'll see your new comment in place at the
|
||||
top of the file. If you typed enough words, your comment will span
|
||||
multiple lines. You can select the comment by selecting any line in it.</p>
|
||||
|
||||
<p>Click on the comment, then shift-click on L1014. Right-click, and look
|
||||
at the menu. Nearly all of the menu items are disabled. Most edit features
|
||||
are only enabled when a single instance of a relevant item is selected, so
|
||||
for example Edit Long Comment won't be enabled if you have an instruction
|
||||
selected.</p>
|
||||
|
||||
<p>Let's add a note. Click on $100E (the line with "hello!"), then
|
||||
select Actions > Edit Note. Type a few words, pick a color, and click "OK"
|
||||
(or hit Ctrl+Enter). Your note appears in the code, and also in the
|
||||
window on the bottom left. Notes are like long comments, with three key
|
||||
differences:</p>
|
||||
<ol>
|
||||
<li>You can't pick their line width, but you can pick their color.</li>
|
||||
<li>They don't appear in generated assembly sources, making them
|
||||
useful for leaving notes to yourself as you work.</li>
|
||||
<li>They're listed in the Notes window. Double-clicking them jumps
|
||||
the selection to the note, making them useful as bookmarks.</li>
|
||||
</ol>
|
||||
|
||||
<p>It's time to do something with the code. If you look at what the code
|
||||
does you'll see that it's copying several dozen bytes from $1017
|
||||
to $2000, then jumping to $2000. It appears to be relocating the next
|
||||
part of the code before
|
||||
executing it. We want to let the disassembler know what's going on, so
|
||||
select the line at address $1017 and then
|
||||
Actions > Set Address. (Or double-click the "1017" in the Addr column.)
|
||||
In the Set Address dialog, type "2000", and hit Enter.)</p>
|
||||
|
||||
<p>Note the way the code list has changed. When you changed the address,
|
||||
the "JMP $2000" at L1014 found a home inside the bounds of the file, so
|
||||
the code tracer was able to find the instructions there.</p>
|
||||
<p>From the menu, select Edit > Undo. Notice how everything reverts to
|
||||
the way it was. Now, select Edit > Redo. You can undo any change you
|
||||
make to the project. (The undo history is <strong>not</strong> saved in
|
||||
the project file, though, so when you exit the program the history is
|
||||
lost.)</p>
|
||||
<p>Notice that, while the address column has changed, the offset column
|
||||
has not. File offsets never change, which is why they're shown here and
|
||||
in the References and Notes windows. (They can, however, be distracting,
|
||||
so you'll be forgiven if you reduce the offset column width to zero.)</p>
|
||||
<p>On the line at address $2000, select Actions > Edit Label, or
|
||||
double-click on the label "L2000". Change the label to "MAIN", and hit
|
||||
Enter. The label changes on that line, and on the two lines that refer
|
||||
to address $2000. (If you're not sure which lines refer to address $2000,
|
||||
select line $2000 and look at the References window.)</p>
|
||||
<p>On that same line, select Actions > Edit Comment. Type a short
|
||||
comment, and hit Enter. Your comment appears in the "comment" column.</p>
|
||||
|
||||
|
||||
<h3>Editing Instruction Operands</h3>
|
||||
|
||||
<p>Select the line with address $2003 ("CMP #$04"), then
|
||||
Actions > Edit Operand. This allows you to pick how you want the
|
||||
operand to look. It's currently set to "Default", which for an 8-bit
|
||||
immediate argument means it's shown as a hexadecimal value. Click
|
||||
"Binary", then "OK". It now appears as a binary value.</p>
|
||||
|
||||
<p>The operand in the LDA instruction at line $2000 refers to an address
|
||||
($3000) that isn't part of the file. We want to create an equate directive to
|
||||
give it a name. With the line at $2000 selected, use Actions > Edit Operand,
|
||||
or double-click on "$3000". Select the "Symbol" radio button, then type
|
||||
"INPUT" in the text box. Click "OK".</p>
|
||||
<p>Disappointed? The instruction is unchanged. The problem is that we
|
||||
updated the operand to reference a symbol that doesn't exist. This fact
|
||||
is noted in a message that appeared at the bottom of the screen. Open the
|
||||
operand editor again, but this time click on "Create Project Symbol" at
|
||||
the bottom left. Enter "INPUT" in the Label field, and click "OK", then
|
||||
click "OK" in the operand editor.</p>
|
||||
<p>That's better. The instruction looks the way we wanted it to, and the
|
||||
message at the bottom of the window disappeared. If you scroll up to the
|
||||
top of the project, you'll see that there's now a ".EQ" line for
|
||||
the symbol.</p>
|
||||
<p>Operands that refer to in-file locations behave similarly. Select the
|
||||
line two down, at address $2005, and Actions > Edit Operand. Enter the
|
||||
symbol "IS_OK". (Note you don't actually have to click Symbol first -- if
|
||||
you just start typing as soon as the dialog opens, it'll select Symbol
|
||||
for you automatically.) Click "OK".</p>
|
||||
<p>As before, nothing appears to have happened, but if you were watching
|
||||
carefully you would have noticed that the label at $2009 ("L2009") has
|
||||
disappeared. This happened because the code at $2005 used to have a
|
||||
<i>numeric</i> reference to $2009, and SourceGen automatically created a
|
||||
label. However, you changed the code at $2005 to have a <i>symbolic</i>
|
||||
reference to a symbol called "IS_OK", and there were no other numeric
|
||||
references to $2009, so the auto-label was no longer
|
||||
needed. Because IS_OK doesn't exist, the operand at $2005 is just formatted
|
||||
as a hexadecimal value. (There's also now a message at the bottom of the
|
||||
window telling us this.)</p>
|
||||
<p>Let's fix this. Select the line at address $2009, then
|
||||
Actions > Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are
|
||||
case-sensitive, so it needs to match the operand at $2005 exactly.) You'll
|
||||
see the new label appear, and the operand at line $2005 will use it.</p>
|
||||
|
||||
<!--<p>There's an easier way. Select Edit > Undo twice, to get back to the
|
||||
state where line $2005 says "BCC L2009", and line $2009 has the label
|
||||
L2009. Now double-click on the "BCC" opcode (not operand) at address
|
||||
$2005. This moves the selection to $2009. Double-click on the label field,
|
||||
and enter "IS_OK". Hit "OK".</p>
|
||||
<p>You should now see that both the operand at $2005 and the label at
|
||||
$2009 have changed to IS_OK, accomplishing what we wanted to do in a
|
||||
single step. The key difference is that we haven't explicitly set a
|
||||
format for the BCC operand -- we just defined a label, and SourceGen
|
||||
used it automatically.</p>-->
|
||||
|
||||
<p>There's another way to set a label that is simpler and more convenient.
|
||||
Select Edit > Undo twice, to get back to the state where line $2005
|
||||
says "BCC L2009", and line $2009 has the label L2009.
|
||||
Double-click on the operand on line $2005 ("L2009") to open the operand
|
||||
editor, then in the bottom left panel click "Create Label". Type "IS_OK",
|
||||
then click "OK". Make sure the operand format is still set to Default,
|
||||
then click "OK".</p>
|
||||
<p>This puts the label IS_OK at line $2009, and we can see the BCC
|
||||
instruction has it as well. We were able to leave the BCC instruction
|
||||
set to Default format because the numeric reference to $2009 was
|
||||
automatically resolved to the IS_OK label. You could do the same thing
|
||||
by editing the label on line $2009 directly as we did earlier, but
|
||||
in many cases -- particularly when the operand's target address is far
|
||||
off screen -- it's more convenient to work through the operand editor.</p>
|
||||
|
||||
<h3>Unique vs. Non-Unique Labels</h3>
|
||||
|
||||
<p>Most assemblers have a notion of "local" labels, which go out of
|
||||
scope when a non-local (global) label is encountered. The actual
|
||||
definition of "local" is assembler-specific, but SourceGen allows you
|
||||
to create labels that serve the same purpose.</p>
|
||||
<p>By default, newly-created labels have global scope and must be
|
||||
unique. You can change these attributes when you edit the label. Up near the
|
||||
top of the file, at address $1002, double-click on the label ("L1002").
|
||||
Change the label to "LOOP" and click the "non-unique local" button.
|
||||
Click "OK".</p>
|
||||
<p>The label at line $1002 (and the operand on line $100B) should now
|
||||
be "@LOOP". By default, '@' is used to indicate non-unique labels,
|
||||
though you can change it to a different character in the application
|
||||
settings.</p>
|
||||
<p>At address $2019, double-click to edit the label ("L2019"). If
|
||||
you type "MAIN" or "IS_OK" with Global selected you'll get an error,
|
||||
but if you type "@LOOP" it will be accepted. Note the "non-unique local"
|
||||
button is selected automatically if you start a label with '@' (or
|
||||
whatever character you have configured). Click "OK".</p>
|
||||
<p>You now have two lines with the same label. The assembly source
|
||||
generator may "promote" them to globals or rename them if your chosen
|
||||
assembler requires it.</p>
|
||||
|
||||
<h3>Editing Data Operands</h3>
|
||||
|
||||
<p>There's some string and numeric data down at the bottom of the file. The
|
||||
final string appears to be multiple strings stuck together. (You may need
|
||||
to increase the width of the Operand column to see the whole thing.) Notice
|
||||
that the opcode for the very last line is '+', which means it's a
|
||||
continuation of the previous line. Long data items can span multiple
|
||||
lines, split every 64 characters (including delimiters), but they are
|
||||
still single items: selecting any part selects the whole.</p>
|
||||
<p>Select the last line in the file, then Actions > Edit Operand. You'll
|
||||
notice that this dialog is much different from the one you got when editing
|
||||
the operand of an instruction. At the top it will say "65 bytes
|
||||
selected". You can format this as a single 65-byte string, as 65 individual
|
||||
items, or various things in between. For now, select "Single bytes", and
|
||||
then on the right, select "ASCII (low or high) character". Click "OK".</p>
|
||||
<p>Each character is now on its own line. The selection still spans the
|
||||
same set of addresses.</p>
|
||||
<p>Select address $203D on its own, then Actions > Edit Label. Set the
|
||||
label to "STR1". Move up a bit and select address $2030, then scroll to
|
||||
the bottom and shift-click address $2070. Select Actions > Edit Operand.
|
||||
At the top it should now say, "65 bytes selected in 2 groups".
|
||||
There are two groups because the presence of a label split the data into
|
||||
two separate regions. From the "Character encoding" pop-up down in the
|
||||
"String" section, make sure "Low or High ASCII" encoding is selected,
|
||||
then select the "mixed character and non-character" string type and
|
||||
click "OK".</p>
|
||||
<p>We now have two ".STR" lines, one for "string zero ", and one with the
|
||||
STR1 label and the rest of the string data. This is okay, but it's not
|
||||
really what we want. The code at $200B appears to be loading a 16-bit
|
||||
address from data at $2025, so we want to use that if we can.</p>
|
||||
<p>Select Edit > Undo three times. You should be back to the state where
|
||||
there's a single ".STR" line at the bottom of the file, split across two
|
||||
lines with a '+'.</p>
|
||||
<p>Select the line at $2026. This is currently formatted as a string,
|
||||
but that appears to be incorrect, so let's format it as individual bytes
|
||||
instead. There's an easy way to do that: use Actions > Toggle Single-Byte
|
||||
Format (or hit Ctrl+B).</p>
|
||||
<p>The data starting at $2025 appears to be 16-bit addresses that point
|
||||
into the table of strings, so let's format them appropriately.</p>
|
||||
<p>Double-click the operand column on line $2025 ("$30") to open
|
||||
the operand data format editor. Because you only have one byte selected,
|
||||
most of the options are disabled. This won't do what we want, so
|
||||
click "Cancel".</p>
|
||||
<p>Select the line at $2025, then shift-click the line at $202E. Right-click
|
||||
and select Edit Operand. If you selected the correct set of bytes,
|
||||
the top line in the dialog should now say, "10 bytes selected". Because
|
||||
10 is a multiple of two, the 16-bit formats are enabled. It's not a multiple
|
||||
of 3 or 4, so the 24-bit and 32-bit options are not enabled. Click the
|
||||
"16-bit words, little-endian" radio button, then over to the right, click
|
||||
the "Address" radio button. Click "OK".</p>
|
||||
<p>We just told SourceGen that those 10 bytes are actually five 16-bit numeric
|
||||
references. SourceGen determined that the addresses are contained in the
|
||||
file, and created labels for each of them. Labels only work if they're
|
||||
on their own line, so the long string was automatically split into five
|
||||
separate ".STR" statements.</p>
|
||||
|
||||
<p>Use File > Save (or hit Ctrl+S) to save your work.</p>
|
||||
|
||||
|
||||
<h3>Generating Assembly Code</h3>
|
||||
|
||||
<p>You can generate assembly source code from the disassembled data.
|
||||
Select File > Assemble (or hit Ctrl+Shift+A) to open the source generation
|
||||
and assembly dialog.</p>
|
||||
<p>Pick your favorite assembler from the drop list at the top right,
|
||||
then click "Generate". An assembly source file will be generated in the
|
||||
directory where your project files lives, named after a combination of the
|
||||
project name and the assembler name. A preview of the assembled code
|
||||
appears in the top window. (It's a "preview" because it has line numbers
|
||||
added and is cut off after a certain limit.)</p>
|
||||
|
||||
<p>If you have a cross-assembler installed and configured, you can run
|
||||
it by clicking "Run Assembler". The output from the assembler will appear
|
||||
in the lower window, along with an indication of whether the assembled
|
||||
file matches the original. (Barring bugs in SourceGen or the assembler,
|
||||
it should always match exactly.)</p>
|
||||
|
||||
<p>Click "Close" to close the window.</p>
|
||||
|
||||
|
||||
<h3>End of Part One</h3>
|
||||
|
||||
<p>At this point you know enough to work with a SourceGen project. Continue
|
||||
on to the next tutorial to learn more.</p>
|
||||
|
||||
<hr/>
|
||||
|
||||
|
||||
<h2><a name="advanced-features">Tutorial #2: Advanced Features</a></h2>
|
||||
|
||||
<p>This tutorial will walk you through some of the fancier things SourceGen
|
||||
can do. We assume you've already finished the Basic Features tutorial.</p>
|
||||
|
||||
|
||||
<p>Start a new project. Select "Generic 6502". For the data file, navigate
|
||||
to the Examples directory, then from the Tutorial directory
|
||||
select "Tutorial2".</p>
|
||||
<p>The first thing you'll notice is that we immediately ran into a BRK,
|
||||
which is a pretty reliable sign that we're not in a code section. The
|
||||
generic profile puts a code start point tag on the first byte, but that's
|
||||
wrong here. This particular file begins with <code>00 20</code>, which
|
||||
could be a load address (some C64 binaries look like this). So let's start
|
||||
with that assumption.</p>
|
||||
<p>Click on the first line of code at address $1000, and select
|
||||
Actions > Remove Analyzer Tags. This removes the tag that tells the
|
||||
code analyzer to start scanning for instructions at that point. (By
|
||||
default, a code start point is placed on the first byte of a new project.)
|
||||
Note the $20 is now part of a string directive. The
|
||||
string is making it hard to manipulate the next few bytes, so let's fix
|
||||
that by selecting Edit > Toggle Data Scan (Ctrl+D). This turns off
|
||||
the feature that automatically generates strings and .FILL directives,
|
||||
so now each uncategorized byte is on its own line.</p>
|
||||
<p>You could select the first two lines and use Actions > Edit Operand
|
||||
to format them as a 16-bit little-endian hex value, but there's a shortcut:
|
||||
select the first line, then Actions > Format As Word (Ctrl+W).
|
||||
It automatically grabbed the following byte and combined them. Since we
|
||||
believe $2000 is the load address for everything that follows, click on
|
||||
the line with address $1002, select Actions > Set Address, and
|
||||
enter "2000". With that line still selected, use
|
||||
Actions > Tag Address As Code Start Point (Ctrl+H then Ctrl+C) to
|
||||
tell the analyzer to start looking for code there.</p>
|
||||
<p>That looks better, but the branch destination is off the bottom of the
|
||||
screen (unless you have a really tall screen or small fonts) because of
|
||||
all the intervening data. Use Edit > Toggle Data Scan to turn the
|
||||
string-finder back on. Now it's easier to read.</p>
|
||||
|
||||
<p>There are four strings starting at address $2004, each of which is
|
||||
followed by $00. These look like null-terminated strings, so let's make
|
||||
it official. But first, let's do it wrong. Click on the line with
|
||||
address $2004 to select it. Hold the shift key down, then double-click
|
||||
on the operand field of the line with address $2031 (i.e. double-click on
|
||||
the words "last string").</p>
|
||||
<p>The Edit Data Operand dialog opens, but the null-terminated strings
|
||||
option is not available. This is because we didn't include the null byte
|
||||
on the last string. To be recognized as one of the "special" string types,
|
||||
every selected string must match the expected pattern.</p>
|
||||
<p>Cancel out of the dialog. Hold the shift key down, and double-click
|
||||
on the operand on line $203c (<code>$00</code>).
|
||||
You should see "Null-terminated strings (4)" as an available
|
||||
option now (make sure the Character Encoding pop-up is set to
|
||||
"Low or High ASCII"). Click on that, then click "OK". The strings are now
|
||||
shown as .ZSTR operands.</p>
|
||||
|
||||
<p>It's wise to save your work periodically. Use File > Save to create
|
||||
a project file for Tutorial2.</p>
|
||||
|
||||
<h4>Pointers and Parts</h4>
|
||||
|
||||
<p>Let's move on to the code at $203d. It starts by storing a couple of
|
||||
values into direct page address $02/03. This appears to be setting up a
|
||||
pointer to $2063, which is a data area inside the file. So let's make it
|
||||
official.</p>
|
||||
<p>Select the line at address $2063, and use Actions > Edit Label to
|
||||
give it the label "XDATA?". The question mark on the end is there to
|
||||
remind us that we're not entirely sure what this is. Now edit the
|
||||
operand on line $203d, and set it to the symbol "XDATA", with the part
|
||||
"low". The question mark isn't really part of the label, so you don't
|
||||
need to type it here. Edit the operand on line $2041,
|
||||
and set it to "XDATA" with the part "high". (The symbol text box
|
||||
gets focus immediately, so you can start typing the symbol name as soon
|
||||
as the dialog opens; you don't need to click around first.) If all
|
||||
went well, the operands should now read <code>LDA #<XDATA?</code>
|
||||
and <code>LDA #>XDATA?</code>.</p>
|
||||
<p>Let's give the pointer a name. Select line $203d, and use
|
||||
Actions > Create Local Variable Table to create an empty table.
|
||||
Click "New Symbol" on the right side. Leave the Address button selected.
|
||||
Set the Label field to "PTR1", the Value field to $02, and the width
|
||||
to 2 (it's a 2-byte pointer). Click "OK" to create the entry, and then
|
||||
"OK" to update the table.</p>
|
||||
<p>There's now a ".var" statement (similar to a .equ) above line $203d,
|
||||
and the stores to $02/$03 have changed to "PTR1" and "PTR1+1".</p>
|
||||
<p>Double-click on the JSR on line $2045 to jump to L20A7. This just
|
||||
loads a value from $3000 into the accumulator and returns, so not much
|
||||
to see here. Hit the back-arrow in the toolbar to jump back to the JSR.</p>
|
||||
<p>The next bit of code masks the accumulator so it holds a value between
|
||||
0 and 3, then doubles it and uses it as an index into PTR1. We know PTR1
|
||||
points to XDATA, which looks like it has some 16-bit addresses. The
|
||||
values loaded are stored in two more zero-page locations, $04-05.</p>
|
||||
<p>Let's make these a pointer as well. Double-click the operand on
|
||||
line $204e ("$04"), and click "Create Local Variable". Set the Label
|
||||
to "PTR2" and the width to 2. Click "OK" to create the symbol, then
|
||||
"OK" to close the operand editor, which should still be set to Default --
|
||||
we didn't actually edit the operand, we just used the operand edit
|
||||
dialog as a convenient way to create a local variable table entry. All
|
||||
accesses to $04/$05 now use PTR2, and there's a new entry in the local
|
||||
variable table we created earlier.</p>
|
||||
|
||||
<p>The next bit of code copies bytes from PTR2 to $0400, stopping when it
|
||||
hits a zero byte. Looks like this is copying null-terminated strings.
|
||||
This confirms our idea that XDATA holds 16-bit addresses, so let's
|
||||
format it. Select lines $2063 to $2066, and Actions > Edit Operand.
|
||||
It should say "8 bytes selected" at the top. Select "16-bit words,
|
||||
little-endian", and then from the Display As box, select "Address".
|
||||
Click "OK". XDATA should now be four <code>.dd2</code> 16-bit addresses.
|
||||
If you scroll up, you'll see that the .ZSTR strings near the top now have
|
||||
labels that match the operands in XDATA.</p>
|
||||
<p>Now that we know what XDATA holds, let's rename it. Change the label
|
||||
to STRADDR. The symbol parts in the operands at $203d and $2041 update
|
||||
automatically.</p>
|
||||
|
||||
<p>Let's pause briefly to look at the cycle-count feature. Use
|
||||
Edit > Settings to open the app settings panel. In the "miscellaneous"
|
||||
group on the right side, click the "Show cycle counts in comments"
|
||||
checkbox, then click "OK". (There's also a toolbar button for this.)</p>
|
||||
<p>Every line with an instruction now has a cycle count on it. The cycle
|
||||
counts are adjusted for everything SourceGen can figure out. For example,
|
||||
the BEQ on line $205a shows "2+" cycles, meaning that it takes at least two
|
||||
cycles but might take more. That's because conditional branches take an
|
||||
extra cycle if the branch is taken. The BNE on line $2061" shows 3 cycles,
|
||||
because we know that the branch is always taken and doesn't cross a page
|
||||
boundary. (If you want to see why it's always taken,
|
||||
look at the value of the 'Z' flag in the "flags" column, which indicates
|
||||
the state of the flags before the instruction on that line is executed.
|
||||
Lower-case 'z' means the zero-flag is clear, upper-case 'Z' means it's
|
||||
set. The analyzer determined that the flag was clear for instructions
|
||||
following the <code>BEQ</code> because the branch wasn't taken.)</p>
|
||||
|
||||
<p>The cycle-count comments can be added to generated code as well. If
|
||||
you add an end-of-line comment, it appears after the cycle count.
|
||||
(Try it.)</p>
|
||||
<p>Hit Ctrl+S to save your project. Make that a habit.</p>
|
||||
|
||||
<h4>Inline Data</h4>
|
||||
|
||||
<p>Consider the code at address $206B. It's a JSR followed by some
|
||||
ASCII text, then a $00 byte, and then what might be code. Double-click
|
||||
on the JSR opcode to jump to $20AB to see the function. It pulls the
|
||||
call address off the stack, and uses it as a pointer. When it encounters
|
||||
a zero byte, it breaks out of the loop, pushes the adjusted pointer
|
||||
value back onto the stack, and returns.</p>
|
||||
<p>This is an example of "inline data", where a function uses the return
|
||||
address to get a pointer to data. The return address is adjusted to
|
||||
point past the inline data before returning (technically, it points at
|
||||
the very last byte of the inline data, because RTS jumps to address + 1).</p>
|
||||
<p>To format the data, we first need to tell SourceGen that there's data
|
||||
in line with the code. Select the line at address $206E, then
|
||||
shift-click the line at address $2077. Use
|
||||
Actions > Tag Bytes As Inline Data.</p>
|
||||
<p>The data turns to single-byte values, and we now see the code
|
||||
continuing at address $2078. We can format the data as a string by
|
||||
using Actions > Edit Operand, setting the Character Encoding to "Low or
|
||||
High ASCII", and choosing "null-terminated strings".</p>
|
||||
|
||||
<p>That's pretty straightforward, but this could quickly become tedious if
|
||||
there were a lot of these. SourceGen allows you to define scripts to
|
||||
automate common formatting tasks. This is covered in a later tutorial.</p>
|
||||
|
||||
<h4>Odds & Ends</h4>
|
||||
|
||||
<p>The rest of the code isn't really intended to do anything useful. It
|
||||
just exists to illustrate some odd situations.</p>
|
||||
<p>Look at the code starting at $2078. It ends with a BRK at $2081, which
|
||||
as noted earlier is a bad sign. If you look two lines above the BRK,
|
||||
you'll see that it's loading the accumulator with zero, then doing a BNE,
|
||||
which should never be taken (note the cycle count for the BNE is 2). The
|
||||
trick is in the two lines before that, which use self-modifying code to
|
||||
change the LDA immediate operand from $00 to $ff. The BNE is actually
|
||||
a branch-always.</p>
|
||||
<p>We can fix this by correcting the status flags. Select line $207F,
|
||||
and then Actions > Override Status Flags. This lets us specify what
|
||||
the flags should be before the instruction is executed. For each flag,
|
||||
we can override the default behavior and specify that the flag is
|
||||
clear (0), set (1), or indeterminate (could be 0 or 1). In this case,
|
||||
we know that the self-modified code will be loading a non-zero value, so
|
||||
in the "Z" column click on the button in the "Zero" row. Click "OK". The
|
||||
BNE is now an always-taken branch, and the code list rearranges itself
|
||||
appropriately (and the cycle count is now 3).</p>
|
||||
|
||||
<p>Continuing on, the code at $2086 touches a few consecutive locations. Edit
|
||||
the label on line $2081, setting it to "STUFF". Notice how the references
|
||||
to $2081 through $2084 have changed from auto-generated labels to
|
||||
references to STUFF. For some projects this may be undesirable. Use
|
||||
Edit > Project Properties, then in the Analysis Parameters box
|
||||
un-check "Seek nearby targets", and click "OK". You'll notice that the
|
||||
references to $2081 and later have switched back to auto labels. If
|
||||
you scroll up, you'll see that the references to PTR1+1 and PTR2+1 were
|
||||
not affected, because local variables use explicit widths rather
|
||||
than the "nearby" logic.</p>
|
||||
<p>The nearby-target behavior is generally desirable, because it lets you
|
||||
avoid explicitly labeling every part of a multi-byte data item. For now,
|
||||
use Edit > Undo to switch it back on.</p>
|
||||
|
||||
<p>The code at $2092 looks a bit strange. <code>LDX</code>, then a
|
||||
<code>BIT</code> with a weird symbol, then another <code>LDX</code>. If
|
||||
you look at the "bytes" column, you'll notice that the three-byte
|
||||
<code>BIT</code> instruction has only one byte on its line. The
|
||||
trick here is that the <code>LDX #$01</code> is embedded inside the
|
||||
<code>BIT</code> instruction. When the code runs through here, X is set
|
||||
to $00, then the <code>BIT</code> instruction sets some flags, then the
|
||||
<code>STA</code> runs. Several lines down there's a <code>BNE</code>
|
||||
to $2095, which is in the middle of the <code>BIT</code> instruction.
|
||||
It loads X with $01, then also continues to the <code>STA</code>.</p>
|
||||
<p>Embedded instructions are unusual but not unheard-of. (This trick is
|
||||
used extensively in Microsoft BASICs, such as Applesoft.) When you see the
|
||||
extra symbol in the opcode field, you need to look closely at what's going
|
||||
on.</p>
|
||||
|
||||
<hr/>
|
||||
|
||||
|
||||
<h2><a name="address-tables">Tutorial #3: Address Table Formatting</a></h2>
|
||||
|
||||
<p><i>This tutorial covers one specific feature.</i></p>
|
||||
|
||||
<p>Start a new project. Select the Apple //e platform, click Select File
|
||||
and navigate to the Examples directory. In A2-Amper-fdraw, select
|
||||
<code>AMPERFDRAW#061d60</code> (ignore the existing .dis65 file). Click
|
||||
"OK" to create the project.</p>
|
||||
<p>Not a lot to see here -- just half a dozen lines of loads and stores.
|
||||
This particular program interfaces with Applesoft BASIC, so we can make it
|
||||
a bit more meaningful by loading an additional platform
|
||||
symbol file. Select Edit > Project Properties, then the Symbol Files
|
||||
tab. Click Add Symbol Files from Runtime. The file browser starts in
|
||||
the RuntimeData directory. Open the "Apple" folder, then select
|
||||
<code>Applesoft.sym65</code>, and click "Open". Click "OK" to close
|
||||
the project properties window.</p>
|
||||
<p>The <code>STA</code> instructions now reference <code>BAS_AMPERV</code>,
|
||||
which is noted as a code vector. We can see the code setting up a jump
|
||||
(opcode $4c) to $1d70. As it happens, the start address of the code
|
||||
is $1d60 -- the last four digits of the filename -- so let's make that
|
||||
change. Double-click the initial .ORG statement, and change it from
|
||||
$2000 to $1d60. We can now see that $1d70 starts right after this
|
||||
initial chunk of code.</p>
|
||||
|
||||
<p>Select the line with address $1d70, then
|
||||
Actions > Tag Address As Code Start Point.
|
||||
More code appears, but not much -- if you scroll down you'll see that most
|
||||
of the file is still data. The code at $1d70 searches through a table at
|
||||
$1d88 for a match with the contents of the accumulator. If it finds a match,
|
||||
it loads bytes from tables at $1da6 and $1d97, pushes them on the stack,
|
||||
and then JMPs away. This code is pushing a return address onto the stack.
|
||||
When the code at <code>BAS_CHRGET</code> returns, it'll return to that
|
||||
address. Because of a quirk of the 6502 architecture, the address pushed
|
||||
must be the desired address minus one.</p>
|
||||
<p>The first byte in the first address table at $1d97 (which has the auto-label
|
||||
L1D97) is $b4. The first byte in the second table is $1d. So the first
|
||||
address we want is $1db4 + 1 = $1db5.</p>
|
||||
<p>Select the line at $1db5, and use
|
||||
Actions > Tag Address As Code Start Point.
|
||||
More code appears, but again it's only a few lines. Let's dress this one
|
||||
up a bit. Set a label on the code at $1db5 called "FUNC". At $1d97, edit
|
||||
the data item (double-click on "$b4"), click "Single bytes", then type "FUNC"
|
||||
(note the text field gets focus immediately, and the radio button
|
||||
automatically switches to "symbolic reference" when you start typing).
|
||||
Click "OK". The operand at $1d97 should now say <code><FUNC-1</code>.
|
||||
Repeat the process at $1da6, this time clicking the "High" part radio button
|
||||
below the symbol entry text box,
|
||||
to make the operand there say <code>>FUNC</code>. (If it says
|
||||
<code><FUNC-152</code>, you forgot to select the High part.)</p>
|
||||
|
||||
<p>We've now changed the first entry in the table to symbolic references.
|
||||
You could repeat these steps for the remaining items, but there's a faster
|
||||
way. Click on the line at address $1d97, then shift-click the line at
|
||||
address $1da9 (which should be <code>.FILL 12,$1e</code>). Select
|
||||
Actions > Format Address Table.</p>
|
||||
<p>Contrary to first impressions, this imposing dialog does not allow you
|
||||
to launch objects into orbit. There are a variety of common ways to
|
||||
structure an address table, all of which are handled here. You can
|
||||
configure the various parameters and see the effects as you make
|
||||
each change.</p>
|
||||
<p>The message at the top should indicate that there are 30 bytes
|
||||
selected. In Address Characteristics, click the "Parts are split across
|
||||
sub-tables" checkbox and the "adjusted for RTS/RTL"
|
||||
checkbox. As soon as you do, the first line of the Generated Addresses
|
||||
list should show the symbol "FUNC". The rest of the addresses will look like
|
||||
<code>(+) T1DD0</code>. The "(+)" means that a label was not found at
|
||||
that location, so a label will be generated automatically.</p>
|
||||
<p>Down near the bottom, check the "tag targets as code start points" checkbox.
|
||||
Because we saw the table contents being pushed onto the stack for RTS,
|
||||
we know that they're all code entry points.</p>
|
||||
<p>Click "OK". The table of address bytes at $1d97 should now all be
|
||||
references to symbols -- 15 low parts followed by 15 high parts. If you
|
||||
scroll down, you should see nothing but instructions until you get to the
|
||||
last dozen bytes at the end of the file. (If this isn't the case, use
|
||||
Edit > Undo, then work through the steps again.)</p>
|
||||
<p>The formatter did the same steps you went through earlier -- set a
|
||||
label, apply the label to the low and high bytes in the table, add a
|
||||
code start point tag -- but did several of them at once.</p>
|
||||
|
||||
<p>We don't want to save this project, so select File > Close. When
|
||||
SourceGen asks for confirmation, click Discard & Continue.</p>
|
||||
|
||||
<hr/>
|
||||
|
||||
|
||||
<h2><a name="extension-scripts">Tutorial #4: Extension Scripts</a></h2>
|
||||
|
||||
<p><i>This tutorial covers one specific feature.</i></p>
|
||||
|
||||
<p>Some repetitive formatting tasks can be handled with automatic scripts.
|
||||
This is especially useful for inline data, which can confuse the code
|
||||
analyzer.</p>
|
||||
<p>An earlier tutorial demonstrated how to manually mark bytes as
|
||||
inline data. We're going to do it a faster way. For this tutorial,
|
||||
start a new project with "Generic 6502", and in the SourceGen
|
||||
Examples/Tutorial directory select "Tutorial4".</p>
|
||||
<p>We'll need to load scripts from the project directory, so we have to
|
||||
save the project. File > Save, use the default name ("Tutorial4.dis65").</p>
|
||||
|
||||
<p>Take a look at the disassembly listing. The file starts with a JSR
|
||||
followed by a string that begins with a small number. This appears to be
|
||||
a string with a leading length byte. We want to load a script that
|
||||
can handle that, so use Edit > Project Properties, select the
|
||||
Extension Scripts tab, and click "Add Scripts from Project". The file
|
||||
browser opens in the project directory. Select the file
|
||||
"InlineL1String.cs", click "Open", then "OK".</p>
|
||||
<p>Nothing happened. If you look at the script with an editor (and you
|
||||
know some C#), you'll see that it's looking for a JSR to a function called
|
||||
"PrintInlineL1String". So let's give it one.</p>
|
||||
<p>Double-click the JSR operand ("L1026"), click "Create Label", and
|
||||
enter "PrintInlineL1String". Remember that labels are case-sensitive;
|
||||
you must enter it exactly as shown. Hit "OK" to accept the label, and "OK"
|
||||
to close the operand editor. If all went well, address $1003 should now be
|
||||
an L1 string "How long?", and adress $100D should be another JSR.</p>
|
||||
|
||||
<p>The next JSR appears to be followed by a null-terminated string, so
|
||||
we'll need something that handles that. Go back into Project Properties
|
||||
and add the script "InlineNullTermString.cs".</p>
|
||||
<p>This script is slightly different, in that it handles any JSR to a label
|
||||
that starts with "PrintInlineNullString". So let's give it a couple of
|
||||
those.</p>
|
||||
<p>Double-click the operand on line $100D ("L1027"), click Create Label,
|
||||
and set the label to "PrintInlineNullStringOne". Hit "OK" twice. That
|
||||
formatted the first one and got us to the next JSR. Repeat the process
|
||||
on line $1019 ("L1028"), setting the label to "PrintInlineNullStringTwo".</p>
|
||||
|
||||
<p>The entire project is now nicely formatted. In a real project the
|
||||
"Print Inline" locations would be actual print functions, not just RTS
|
||||
instructions. There would likely be multiple JSRs to the print function,
|
||||
so labeling a single function entry point could format dozens of inline
|
||||
strings and clean up the disassembly automatically. The reason for
|
||||
allowing wildcard names is that some functions may have multiple
|
||||
entry points or chain through different locations.</p>
|
||||
|
||||
<p>Extension scripts can make your life much easier, but they do require
|
||||
some programming experience. See the
|
||||
<a href="advanced.html#extension-scripts">manual</a> for more details.</p>
|
||||
|
||||
<hr/>
|
||||
|
||||
|
||||
<h2><a name="visualizations">Tutorial #5: Visualizations</a></h2>
|
||||
|
||||
<p><i>This tutorial covers one specific feature.</i></p>
|
||||
|
||||
<p>Many programs contain a significant amount of graphical data. This is
|
||||
especially true for games, where the space used for bitmaps is often
|
||||
larger than the space required for the code. When disassembling a program
|
||||
it can be very helpful to be able to see the contents of the data
|
||||
regions in graphical form.</p>
|
||||
|
||||
<p>Start a new project with "Generic 6502", and in the SourceGen Tutorial
|
||||
directory select "Tutorial5". We'll need to load an extension script from
|
||||
the project directory, so immediately save the project, using the
|
||||
default name ("Tutorial5.dis65").</p>
|
||||
<p>Normally a project will give you some sort of hint as to the data
|
||||
format, e.g. the graphics might be a platform-specific sprite. For
|
||||
non-standard formats you can glean dimensions from the drawing code. For
|
||||
the purposes of this tutorial we're just using a simple monochrome bitmap
|
||||
format, with 8 pixels per byte, and we'll know that our images are for
|
||||
a Tic-Tac-Toe game. The 'X' and the 'O' are 8x8, the game board is 40x40.
|
||||
The bitmaps are sprites with transparency, so pixels are either solid
|
||||
or transparent.</p>
|
||||
<p>The first thing we need to do is load an extension script that can
|
||||
decode this format. The RuntimeData directory has a few, but for this
|
||||
tutorial we're using a custom one. Select Edit > Project Properties,
|
||||
select the Extension Scripts tab, and click "Add Scripts from Project".
|
||||
Double-click on "VisTutorial5.cs", then click "OK".</p>
|
||||
|
||||
<p>The address of the three bitmaps are helpfully identified by the
|
||||
load instructions at the top of the file. Select the line at
|
||||
address $100A, then Actions > Create/Edit Visualization Set. In
|
||||
the window that opens, click "New Visualization".</p>
|
||||
<p>We're going to ignore most of what's going on and just focus on the
|
||||
list of parameters at the bottom. The file offset indicates where in
|
||||
the file the bitmap starts; note this is an offset, not an address
|
||||
(that way, if you change the address, your visualizations don't break).
|
||||
This is followed by the bitmap's width in bytes, and the bitmap's height.
|
||||
Because we have 8 pixels per byte, we're currently showing an 8x1 image.
|
||||
We'll come back to row stride.</p>
|
||||
<p>We happen to know (by playing the game and/or reading the fictitious
|
||||
drawing code) that the image is 8x8, so change the value in the height
|
||||
field to 8. As soon as you do, the preview window shows a big blue 'X'.
|
||||
(The 'X' is 7x7; the last row/column of pixels are transparent so adjacent
|
||||
images don't bump into each other.)</p>
|
||||
<p>Let's try doing it wrong. Add a '0' in the Height field to make the
|
||||
height 80. You can see
|
||||
some additional bitmap data. Add another 0 to make it 800. Now you get
|
||||
a big red X, and the "Height" parameter is shown in red. That's because
|
||||
the maximum value for the height is 512, as shown by "[1,512]" on the
|
||||
right.</p>
|
||||
<p>Change it back to 8, and hit "OK". Hit "OK" in the Edit Visualization
|
||||
Set window as well. You should now see the blue 'X' in the code listing
|
||||
above line $100A.</p>
|
||||
|
||||
<p>Repeat the process at line $1012: select the line, create a visualization
|
||||
set, create a new visualization. The height will default to 8 because
|
||||
that's what you used last time. Click "OK" in both dialogs to close them.</p>
|
||||
|
||||
<p>Repeat the process at line $101A, but this time the image is 40x40
|
||||
rather than 8x8. Set the width to 5, and the height to 40. This makes
|
||||
a mess.</p>
|
||||
<p>In this case, the bitmap data is 5 bytes wide, but the data is stored
|
||||
as 8 bytes per row. This is known as the "stride" or "pitch" of the row.
|
||||
To tell the visualizer to skip the last 3 bytes on each row, set the
|
||||
"Row stride (bytes)" field to 8. Now we have a proper Tic-Tac-Toe grid.
|
||||
Note that it fills the preview window just as the 'X' and 'O' did, even
|
||||
though it's 5x as large. The preview window scales everything up. Hit
|
||||
"OK" twice to create the visualization.</p>
|
||||
<p>Let's format the bitmap data. Select line $101A, then shift-click the
|
||||
last line in the file ($1159). Actions > Edit Operand. Select
|
||||
"densely-packed bytes", and click "OK". This is perhaps a little too
|
||||
dense. Open the operand editor again, but this time select the
|
||||
densely-packed bytes sub-option "...with a limit", and set the limit
|
||||
to 8 bytes per line. Instead of one very dense statement spread across
|
||||
a few lines, you get one line of code per row of bitmap. If you prefer
|
||||
to see individual bytes, you can use Edit > Settings, select the
|
||||
Display Format tab, and check "use comma-separated format for bulk data".
|
||||
This can make it a bit easier to read.</p>
|
||||
|
||||
<h4>Bitmap Animations</h4>
|
||||
|
||||
<p>Some bitmaps represent individual frames in an animated sequence.
|
||||
You can convert those as well. Double-click on the blue 'X' to open
|
||||
the visualization set editor, then click "New Bitmap Animation". This
|
||||
opens the Bitmap Animation Editor.</p>
|
||||
<p>Let's try it with our Tic-Tac-Toe board pieces. From the list on the
|
||||
left, select the blue 'X' and click "Add", then click the 'O' and click
|
||||
"Add". Below the list, set the frame delay to 500 msec. Near the bottom,
|
||||
click "Start / Stop". This causes the animation to play in a loop. You
|
||||
can use the controls to add and remove items, change their order, and change
|
||||
the animation speed. You can add the grid to the animation set, but the
|
||||
preview scales the bitmaps up to full size, so it may not look the way
|
||||
you expect.</p>
|
||||
<p>Hit "OK" to save the animation, then "OK" to update the visualization set.
|
||||
The code list now shows two entries in the line: the first is the 'X'
|
||||
bitmap, the second is the animation, which is shown as the initial frame
|
||||
with a blue triangle superimposed. (If you go back into the editor and
|
||||
reverse the order of the frames, the list will show the 'O' instead.)
|
||||
You can have as many bitmaps and animations on a line as you want.</p>
|
||||
<p>If you have a lot of bitmaps it can be helpful to give them meaningful
|
||||
names, so that they're easy to identify and sort together in the list.
|
||||
The "tag" field at the top of the editor windows lets you give things
|
||||
names. Tags must be unique.</p>
|
||||
|
||||
<h4>Other Notes</h4>
|
||||
|
||||
<p>The visualization editor is intended to be very dynamic, showing the
|
||||
results of parameter changes immediately. This can be helpful if you're
|
||||
not exactly sure what the size or format of a bitmap is. Just keep
|
||||
tweaking values until it looks right.</p>
|
||||
|
||||
<p>Visualization generators are defined by extension scripts. If you're
|
||||
disassembling a program with a totally custom way of storing graphics,
|
||||
you can write a totally custom visualizer and distribute it with the
|
||||
project. Because the file offset is a parameter, you're not limited to
|
||||
placing visualizations at the start of the graphic data -- you can put
|
||||
them on any code or data line.</p>
|
||||
|
||||
<p>Visualizations have no effect on assembly source code generation,
|
||||
but they do appear in code exported to HTML. Bitmaps are converted to GIF
|
||||
images, and animations become animated GIFs.</p>
|
||||
|
||||
<p>You can also create animated visualizations of wireframe objects
|
||||
(vector graphics, 3D shapes), but that's not covered in this tutorial.</p>
|
||||
|
||||
<hr/>
|
||||
|
||||
|
||||
<h2>End of Tutorials</h2>
|
||||
|
||||
<p>That's it for the tutorials. Significantly more detail on
|
||||
all aspects of SourceGen can be found in the manual.</p>
|
||||
<p>While you can do some fancy things, nothing you do will alter the
|
||||
data file. The assembled output will always match the original. So
|
||||
don't be afraid to play around.</p>
|
||||
<p>If you want to work on something large over a long period, save your
|
||||
progress by putting the .dis65 project into a source code control system
|
||||
like git. Project files are stored in a text format that, while not meant
|
||||
to be human-readable, should yield reasonable diffs.</p>
|
||||
|
||||
<a href="https://6502bench.com/sgtutorial/">https://6502bench.com/sgtutorial/</a>.</p>
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
|
Loading…
x
Reference in New Issue
Block a user