1
0
mirror of https://github.com/fadden/6502bench.git synced 2024-06-13 14:29:30 +00:00

Revise documentation

This commit is contained in:
Andy McFadden 2018-10-03 18:03:04 -07:00
parent a8f26a048b
commit 8aba1c4fba
11 changed files with 392 additions and 203 deletions

View File

@ -28,8 +28,8 @@ as project symbols into the other projects.</p>
symbol-import step in every interested project. This step must be
repeated whenever the labels are updated.</p>
<p>A different but related problem is typified by arcade ROM sets,
where files are split apart because each file must be flashed to a
separate chip. All files are expected to be present in memory at
where files are split apart because each file must be burned into a
separate PROM. All files are expected to be present in memory at
once, so there's no reason to treat them as separate projects. Currently,
the best way to deal with this is to concatenate the files into a single
file, and operate on that.</p>
@ -60,7 +60,7 @@ L1103_0 LDA #$22
<p>Both sections start at $1100, and have branches to $1103. The branch
in the first section resolves to the label in the first version of
that address chunk, while the branch in the second section resolves to
the label in the second chunk. When branches are outside the current
the label in the second chunk. When branches originate outside the current
address chunk, the first chunk that includes that address is used, as
it is with the <code>JMP $1000</code> at the start of the file.</p>
@ -96,7 +96,7 @@ not help you debug 6502 projects.</p>
multi-line comment (long comment, note). Useful for confirming that
the width limitation is being obeyed. These are added exactly
as shown, without comment delimiters, into generated assembly output,
which doesn't work out well.</li>
which doesn't work out well if you run the assembler.</li>
<li>Use Keep-Alive Hack. If set, a "ping" is sent to the extension
script sandbox every 60 seconds. This seems to be required to avoid
an infrequently-encountered Windows bug. (See code for notes and

View File

@ -43,7 +43,7 @@ method in <code>DisasmProject.cs</code>):</p>
The Anattrib array tracks most of the state from here on. If we're
doing a partial re-analysis, this step will just clone a copy of the
Anattrib array that was made at this point in a previous run. (This
step is described in more detail <a href="code-analysis">below</a>.)</li>
step is described in more detail below.)</li>
<li>Apply user-specified labels to Anattribs.</li>
<li>Apply user-specified format descriptors. These are the instruction
and data operand formats.</li>
@ -51,14 +51,14 @@ method in <code>DisasmProject.cs</code>):</p>
data, and connects instruction and data operands to target offsets.
The "nearby label" stuff is handled here. All of the results are
stored in the Anattribs array. (This step is described in more
detail <a href="data-analysis">below</a>.)</li>
detail below.)</li>
<li>Remove hidden labels from the symbol table. These are user-specified
labels that have been placed on offsets that are in the middle of an
instruction or multi-byte data item. They can't be referenced, so we
want to pull them out of the symbol table. (Remember, symbolic
operands use "weak references", so a missing symbol just means the
operand is shown as a hex value.)</li>
<li>Resolve references to platform and project external symbols>
<li>Resolve references to platform and project external symbols.
This sets the operand symbol in Anattrib, and adds the symbol to
the list that is displayed in .EQ directives.</li>
<li>Generate cross-reference lists. This is done for file data and
@ -71,6 +71,103 @@ by walking through the annotated file data. Most of the actual strings
aren't rendered until they're needed.</p>
<h3><a name="auto-format">Automatic Formatting</a></h3>
<p>Every offset in the file is marked as an instruction byte, data byte, or
inline data byte. Some offsets are also marked as the start of an instruction
or data area. The start offsets may have a format descriptor associated
with them.</p>
<p>Format descriptors have a format (like "numeric" or "string") a
sub-format (like "hexadecimal" or "null-terminated"), and a length. For
an instruction operand the length is redundant, but for a data operand it
determines the width of the numeric value or length of the string. For
this reason, instructions do not need a format descriptor, but all
data items do.</p>
<p>Symbolic references are format descriptors with a symbol attached.
The symbol reference also specifies low/high/bank.</p>
<p>Every offset marked as a start point gets its own line in the on-screen
display list. Embedded instructions are identified internally by
looking for instruction-start offsets inside instructions.</p>
<p>The Anattrib array holds the post-analysis state for every offset,
including comments and formatting, but any changes you make in the
editors are applied to the data structures that are saved in the project
file. After a change is made, a full or partial re-analysis is done to
fill out the Anattribs.</p>
<p>Consider a simple example:</p>
<pre>
.ORG $1000
JMP L1003
L1003 NOP
</pre>
<p>We haven't formatted anything yet. The data analyzer sees that the
JMP operand is inside the file, and has no label, so it creates an
auto-label at offset +000003 and a format descriptor with a symbolic
operand reference to "L1003" at +000000.</p>
<p>Now we edit the label, changing L1003 to "FOO". This goes into the
project's "user label" list. The analyzer is
run, and applies the new "user label" to the Anattrib array. The
data analyzer finds the numeric reference in the JMP operand, and finds
a label at the target address, so it creates a symbolic operand reference
to "FOO". When the display list is generated, the symbol "FOO" appears
in both places.</p>
<p>Even though the JMP operand changed from "L1003" to "FOO", the only
change actually written to the project file is the label edit. The
contents of the Anattrib array are disposable, so it can be used to
add labels and "fix up" numeric references. Generated labels and
format descriptors are never added to the project file.</p>
<p>If the JMP operand were edited, a format descriptor would be added
to the user-specified descriptor list. During the analysis pass it would
be added to the Anattrib array at offset +000000.</p>
<h3><a name="undo-redo">Interaction With Undo/Redo</a></h3>
<p>The analysis pass always considers the current state of the user
data structures. Whether you're adding a label or removing one, the
code runs through the same set of steps. The advantage of this approach
is that the act of doing a thing, undoing a thing, and redoing a thing
are all handled the same way.</p>
<p>None of the editors modify the project data structures directly. All
changes are added to a change set, which is processed by a single function.
The change sets are kept in the undo/redo buffer indefinitely. After
the changes are made, the Anattrib array and other data structures are
regenerated.</p>
<p>Data format editing can create some tricky situations. For example,
suppose you have 8 bytes that have been formatted as two 32-bit words:
<pre>
1000: 68690074 .dd4 $74006968
1004: 65737400 .dd4 $00747365
</pre>
You realize these are null-terminated strings, select both words, and
reformat them:
<pre>
1000: 686900 .zstr "hi"
1003: 74657374+ .zstr "test"
</pre>
Seems simple enough. Under the hood, SourceGen created three changes:
<ol>
<li>At offset +000000, replace the current format descriptor (4-byte
numeric) with a 3-byte null-terminated string descriptor.</li>
<li>At offset +000003, add a new 5-byte null-terminated string
descriptor.</li>
<li>At offset +000004, remove the 4-byte numeric descriptor.</li>
</ol>
<p>Each entry in the change set has "before" and "after" states for the
format descriptor at a specific offset. Only the state for the affected
offsets is included -- the program doesn't take a complete state snapshot
(even with the RAM on a modern system that would add up quickly). When
undoing a change, before and after are simply reversed.</p>
<h2><a name="code-analysis">Code Analysis</a></h2>
<p>The code tracer walks through the instructions, examining them to
@ -81,8 +178,9 @@ for every instruction:</p>
Examples: <code>LDA</code>, <code>STA</code>, <code>AND</code>,
<code>NOP</code>.
<li>Don't continue. The next instruction to be executed can't be
determined from the file data, unless you're disassembling the
system ROM. Examples: <code>RTS</code>, <code>BRK</code>.
determined from the file data (unless you're disassembling the
system ROM around the BRK vector).
Examples: <code>RTS</code>, <code>BRK</code>.
<li>Branch always. The operand specifies the next instruction address.
Examples: <code>JMP</code>, <code>BRA</code>, <code>BRL</code>.
<li>Branch sometimes. Execution may continue at the operand address,
@ -96,8 +194,8 @@ for every instruction:</p>
</ol>
<p>Branch targets are added to a list. When the current run of instructions
is exhausted (i.e. a "don't continue" instruction is reached), the next
target is pulled off of the list.</p>
is exhausted (i.e. a "don't continue" or "branch always" instruction is
reached), the next target is pulled off of the list.</p>
<p>The state of the processor status flags is recorded for every
instruction. When execution proceeds to the next instruction or branches
@ -116,18 +214,19 @@ of status flags, the analyzer stops pursuing that path.</p>
when examining 65816 code, but it's possible for the status flag values
to be indeterminate. In such a situation, short registers are assumed.
Similarly, if the carry flag is unknown when an <code>XCE</code> is
performed, we assume a transition to emulation mode.</p>
performed, we assume a transition to emulation mode (E=1).</p>
<p>There are three ways to set a definite value in a status flags:</p>
<p>There are three ways in which code can set a flag to a definite value:</p>
<ol>
<li>By specific instructions, like <code>SEC</code> or
<li>By explicit instructions, like <code>SEC</code> or
<code>CLD</code>.</li>
<li>By immediate instructions. <code>LDA #$00</code> sets Z=1 and N=0.
<code>ORA #$80</code> sets Z=0 and N=1.</li>
<li>By immediate-operand instructions. <code>LDA #$00</code> sets Z=1
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
<li>By inference. For example, if we see a <code>BCC</code> instruction,
we know that the carry will be clear at the branch target address, and
set at the following instruction. The instruction doesn't affect the
value of the flag, but we know what the value is at either address.</li>
value of the flag, but we know what the value will be at both
addresses.</li>
</ol>
<p>Self-modifying code can render spoil any of these, possibly requiring a
status flag override to get correct disassembly.</p>
@ -145,7 +244,7 @@ code does <code>CLC</code>/<code>PHP</code>, followed a bit later by the
flag around. Flagging the carry bit as indeterminate with a status flag
override on the instruction following the PLP fixes things.)</p>
<p>Some other things that the code analyzer can't handle:</p>
<p>Some other things that the code analyzer can't recognize automatically:</p>
<ul>
<li>Jumping indirectly through an address outside the file, e.g.
storing an address in zero-page memory and jumping through it.
@ -163,6 +262,26 @@ that it's equal to the program bank register ("K"). Handling this
correctly will require improvements to the user interface.</p>
<h3><a name="extension-scripts">Extension Scripts</a></h3>
<p>Extension scripts can mark data that follows a JSR or JSL as inline
data, or change the format of nearby data or instructions. The first
time a JSR/JSL instruction is encountered, all loaded extension scripts
are offered a chance to act.</p>
<p>The first script that applies a format wins. Attempts to re-format
instructions or data will fail. This rule ensure that anything explicitly
formatted by the user will not be overridden by a script.</p>
<p>If code jumps into a region that is marked as inline data, the
branch will be ignored. If an extension script tries to flag bytes
as inline data that have already been executed, the script will be
ignored. This can lead to a race condition in the analyzer if
an extension script is doing the wrong thing. (The race doesn't exist
with inline data hints specified by the user, because those are applied
before code analysis starts.)</p>
<h2><a name="data-analysis">Data Analysis</a></h2>
<p>The data analyzer performs two tasks. It matches operands with
offsets, and it analyzes uncategorized data. Either or both of
@ -171,17 +290,17 @@ these can be disabled from the
<p>The data target analyzer examines every instruction and data operand
to see if it's referring to an offset within the data file. If the
target is within the file, and has a label, a weak symbolic reference
to that label is added to the Anattrib array. If the target doesn't
have a label, the analyzer will either use a nearby label, or generate
a unique label and use that.</p>
target is within the file, and has a label, a format descriptor with a
weak symbolic reference to that label is added to the Anattrib array. If
the target doesn't have a label, the analyzer will either use a nearby
label, or generate a unique label and use that.</p>
<p>While most of the "nearby label" logic can be disabled, targets that
land in the middle of an instruction are always adjusted backward to
the instruction start. This is necessary because labels are only visible
if they're associated with the first (opcode) byte of an instruction.</p>
<p>The uncategorized data analyzer tries to find ASCII strings and
opportunities to use the ".FILL" instruction. It breaks the file into
opportunities to use the ".FILL" operation. It breaks the file into
pieces, where contiguous regions hold nothing but data, are not split
across a ".ORG" directive, are not interrupted by data, and do not
contain anything that the user has chosen to format. Each region is

View File

@ -15,8 +15,8 @@
<p>SourceGen can generate an assembly source file that, when fed into
the target assembler, will recreate the original data file exactly.
Every assembler is different, so code must be written especially for
each.<p>
Every assembler is different, so support must be added to SourceGen
for each.</p>
<p>The generation / assembly dialog can be opened with File &gt; Assemble.</p>
@ -37,7 +37,7 @@ assembler. This is most easily understood with an example.</p>
<code>54 02 01</code>, with the arguments reversed. cc65 v2.17 doesn't
do that; this is a bug that was fixed in a later version. So if you're
generating code for v2.17, you want to create source code with the
arguments the other way around.</p>
arguments the wrong way around.</p>
<p>Having version-dependent source code is a bad idea, so SourceGen
just outputs raw hex bytes for MVN/MVP instructions. This yields the
correct code for all versions of the assembler, but is ugly and
@ -56,7 +56,7 @@ intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
generators may produce multiple source files, perhaps a link script or
symbol definition header to go with the assembly source. To avoid
spreading files across the filesystem, SourceGen does all of its work
in the same directory where the project lives. So before you can generate
in the same directory where the project lives. Before you can generate
code, you have to have given your project a name by saving it.</p>
<p>The Generate and Assemble dialog has a drop-down list near the top
@ -98,12 +98,12 @@ command-line output will be displayed, with stdout and stderr separated.
provides.)</p>
<p>The output will show the assembler's exit code, which will be zero
on success (note: sometimes they lie.) If it did, SourceGen will then
compare the assembler's output to the original file, and report any
differences.</p>
on success (note: sometimes they lie.) If it appeared to succeed,
SourceGen will then compare the assembler's output to the original file,
and report any differences.</p>
<p>Failures here may be due to bugs in the cross-assembler or in
SourceGen. However, SourceGen can generally work around assembler bugs,
so any failure here is an opportunity for improvement.</p>
so any failure is an opportunity for improvement.</p>
</div>

View File

@ -16,13 +16,13 @@
<h2><a name="address">Edit Address</a></h2>
<p>This adds a target address directive (".ORG") to the current offset.
If you leave the field blank, the directive will be removed.</p>
If you leave the text field blank, the directive will be removed.</p>
<p>Addresses are always interpreted as hexadecimal. You can prefix
it with a '$', but that's not necessary.</p>
<p>24-bit addresses may be written with a bank separator, e.g. "12/3456"
it with a '$', but that's not required.
24-bit addresses may be written with a bank separator, e.g. "12/3456"
would resolve to address $123456.</p>
<p>There will always be an address directive at the start of the list.
<p>There will always be an address directive at the start of the file.
Attempts to remove it will be ignored.</p>
@ -34,14 +34,15 @@ that instruction. You can override the value of individual flags.</p>
<p>The 65816 emulation bit, which is not part of the processor status
register, may also be set in the editor.</p>
<p>The M, X, and E flags will not be editable unless your CPU configuration
is set to a 16-bit CPU.</p>
is set to 65816.</p>
<h2><a name="label">Edit Label</a></h2>
<p>Sets or clears a label at the selected offset. The label must have
the proper form, and not have the same name as another symbol.</p>
the proper form, and not have the same name as another symbol. If
you edit an auto-generated label you will be required to change the name.</p>
<p>The label may be marked as local, global, or global and exported.
Local labels may be generated in the assembler output in a
Local labels may be modified by the assembly code generator to have a more
convenient form, such as a local loop identifier. Global labels are
always output as-is. Exported labels are added to a table that may
be imported by other projects.</p>
@ -51,16 +52,17 @@ be imported by other projects.</p>
<p>Operands can be displayed in a variety of numeric formats, or as a
symbol. The ASCII character format is only available for operands
whose value falls into the range of low- or high-ASCII characters.</p>
<p>Symbols may be used in their entirety, or offset by a byte or two.
<p>Symbols may be used in their entirety, or shifted and masked.
The low / high / bank selector determines which byte is used as the
low byte. For 16-bit operands, this acts as a shift rather than a byte
select.</p>
select. If the symbol is wider than the operand field, a mask will be
applied automatically.</p>
<p>A few shortcuts are provided when specifying a symbol. As noted in
the introductory sections, operand symbols are weak references. If the
symbol hasn't been defined as a label yet, the operand will be formatted
as hex, which is probably not what you want.</p>
<p>The default behavior is to just set the operand's symbol.</p>
<p>The default behavior is just to set the operand's symbol.</p>
<p>For operands that target an offset inside the file, if the target
address does not yet have a label, and the symbol doesn't exist, you may
set the symbol as the label on the target address as well. You can do
@ -84,24 +86,35 @@ future release.)</p>
<h2><a name="data">Edit Data Format</a></h2>
<p>This dialog offers a variety of choices, and can be used to apply a
format to a range of offsets. If the range crosses a visual boundary,
format to a range of offsets. You must select all of the bytes you want
to format. For example, to format two bytes as a 16-bit word, you must
select both bytes in the editor. (If you click on the first item, then
Shift+double-click on the operand field of the last item, you can do
this very quickly.) The selection does not need to be contiguous: you
can use Control+click to select scattered items.)
<p>If the range is discontiguous, or crosses a visual boundary
such as a change in address, a user-specified label, or a long comment
or note, the region will be split. The top of the dialog indicates how
many bytes have been selected, and how many regions they have been
divided into.</p>
or note, the selection will be split into smaller regions. A message at the
top of the dialog indicates how many bytes have been selected, and how
many regions they have been divided into.</p>
<p>(End-of-line comments do <i>not</i> split a region, and will
disappear if they end up inside a multi-byte data item.)</p>
<p>The "Simple Data" items behave the same as their equivalents in the
Edit Operand dialog. However, because the width is not determined by
an instruction opcode, you will need to specify how wide each item is,
and the byte order.</p>
<p>Suppose you find a table of 16-bit addresses in the code. Click on
an instruction opcode, and multiple items can be selected, you will need
to specify how wide each item is and what its byte order is. For data
you also have the option of setting the format to "Address", which marks
the selected bytes as a numeric reference.</p>
<p>Consider a simple example: suppose you find a table of 16-bit
addresses in the code. Click on
the first byte, shift-click the last byte, then select the Edit Data menu
item. The number of bytes selected should be even. Select
"16-bit words, little-endian", then to the right "Address". When you
click OK, the selected data will be formatted as a series of 16-bit
address values.</p>
"16-bit words, little-endian", then over to the right click on
"Address". When you click OK, the selected data will be formatted as a
series of 16-bit address values. If the addresses can be resolved inside
the data file, each address will be assigned a label.</p>
<p>The "Bulk Data" items can represent large chunks of data compactly.
The "fill" option is only available if all selected bytes have the
@ -161,8 +174,8 @@ want to limit the overall length if you're hoping to create 80-column
output. Some retro assemblers may have hard line length limitations,
which could result in the comment being truncated in generated sources.</p>
<p>A semicolon (';') is placed at the start of the line. If an assembler
has different conventions, a different character may be used. You don't
need to include a delimiter in the comment field.</p>
has different conventions, a different delimiter character may be used. You
don't need to include a semicolon in the comment field.</p>
<p>Comments on platform symbols are read from the platform symbol file, and
cannot be edited from within SourceGen. Comments on project symbols are
@ -176,11 +189,11 @@ will be word-wrapped at a line width of your choosing. They're always
drawn with a fixed-width font, so you can create ASCII-art diagrams.
Comment delimiters are added automatically at the start of each line.</p>
<p>For a true retro look you can "box" the comment with asterisks. You
can create a fill-width row of asterisks by putting a '*' on a line by
can create a full-width row of asterisks by putting a '*' on a line by
itself. (Assembly source generators are allowed to use a character
other than '*' for the output, e.g. they might use a full set of
box outline characters, though that's somewhat against the spirit of
the thing.)</p>
the thing. Regardless, a solo '*' results in a line.)</p>
<p>The bottom window will update automatically as you type, showing what
the output is expected to look like. The actual assembler source output
will depend on features of the target assembler, such as comment
@ -226,7 +239,7 @@ the same way when used in a .EQ directive.</p>
the .EQ directive.</p>
<p>Symbols marked as "address" will be applied automatically when an
operand references an address outside the scope of the data file. Symbols
marked as "constant" will not, though you can still specify it manually.</p>
marked as "constant" will not, though you can still specify them manually.</p>
</div>

View File

@ -19,10 +19,10 @@ school in the late 1980s, I read Don Lancaster's
<i>Enhancing Your Apple II, Vol. 1</i> (available for download
<a href="https://www.tinaja.com/ebksamp1.shtml">here</a>). This
included a very detailed methodology for disassembling 6502 software.
I decided to give it a try, so I dumped a monitor listing of the
operating system from an SSI game ("RDOS") to paper with my Epson
RX-80 -- tractor feed paper was helpful for this sort of thing -- and
set to work.</p>
I wanted to give it a try, so I generated a monitor listing of an
operating system (called "RDOS") that SSI used on their games, and
printed it out on my Epson RX-80 -- tractor feed paper was helpful for
this sort of thing -- then set to work.</p>
<p>Lancaster's methodology involved highlighting different types of
instructions with different colors, making notes, and adding labels.
@ -44,14 +44,17 @@ like a modern IDE, because I didn't just want it to translate machine code
into readable form. I wanted it to help me with the process of
understanding the code, by providing cross-reference tables and symbol
lists and giving me a place to scribble notes to myself while I worked.
Especially the note-scribbling.</p>
I especially wanted the note-scribbling, because learning how something
works is usually an iterative process, where the function of a chunk of
code gradually reveals itself over time.</p>
<p>In 2002, while writing the 6502/65816 disassembler for CiderPress, I
ran into the same problems I had with the original Apple II monitor: it
blundered through data sections and got lost briefly when a new code
section started. This made it annoying to use for even small binaries. I
section started. You had to pick long or short registers for the entire
diassembly, which made 65816 code something of a disaster. I
jotted down some notes on what I thought the core features of a good
6502 diassembler should be, then went back to work on other features. It
6502 disassembler should be, then moved on to work on other features. It
was another 15 years before I picked up the idea again.</p>
<p>More recently, I disassembled some code by dumping it to a text

View File

@ -54,6 +54,7 @@ and 65816 code. The official web site is
<li><a href="mainwin.html#info">Info Window</a></li>
<li><a href="mainwin.html#navigation">Navigation</a></li>
<li><a href="mainwin.html#hints">Adding and Removing Hints</a></li>
<li><a href="mainwin.html#toggle-format">Quick Format Toggle</a></li>
<li><a href="mainwin.html#clipboard">Copying to Clipboard</a></li>
</ul>
</ul>
@ -108,11 +109,6 @@ and 65816 code. The official web site is
<li><a href="tools.html#ascii-chart">ASCII Chart</a></li>
</ul>
<li><a href="tutorials.html">Tutorials</a></li>
<ul>
<li><a href="tutorials.html#basic-features">Basic Features</a></li>
</ul>
<li><a href="advanced.html">Advanced Topics</a></li>
<ul>
<li><a href="advanced.html#multi-bin">Working With Multiple Binaries</a></li>
@ -123,12 +119,26 @@ and 65816 code. The official web site is
<li><a href="analysis.html">Appendix: Instruction and Data Analysis</a></li>
<ul>
<li><a href="analysis.html#analysis-process">Analysis Process</a></li>
<ul>
<li><a href="analysis.html#auto-format">Automatic Formatting</a></li>
<li><a href="analysis.html#undo-redo">Interaction With Undo/Redo</a></li>
</ul>
<li><a href="analysis.html#code-analysis">Code Analysis</a></li>
<ul>
<li><a href="analysis.html#extension-scripts">Extension Scripts</a></li>
</ul>
<li><a href="analysis.html#data-analysis">Data Analysis</a></li>
</ul>
<li><a href="end-notes.html">End Notes</a></li>
<br/>
<li><a href="tutorials.html">Tutorials</a></li>
<ul>
<li><a href="tutorials.html#basic-features">Basic Features</a></li>
</ul>
</ul>

View File

@ -28,7 +28,7 @@ navigate the code while trying to figure out what it does. A
disassembler should help you understand the code, not just dump the
instructions to a text file.</p>
<p>The computer I built in 2014 has a 4GHz CPU and 8GB of RAM.
We should put that to good use.</p>
I figured we should put that kind of power to good use.</p>
<p>The second purpose is to facilitate sharing and collaboration. Most
disassemblers generate output for a specific assembler, or in a way that's
@ -49,12 +49,13 @@ capabilities within SourceGen are sufficiently flexible. If you need to
generate assembly source and tweak it a bunch to express the intent of
the original code, then passing a SourceGen project around won't work.
This sort of thing is a bit outside the bounds of what a typical
disassembler does, so it remains to be seen whether this succeeds at
what it's trying to do, and also whether what it's trying to do is actually
something that people want.</p>
disassembler does, so it remains to be seen whether SourceGen succeeds at
what it's trying to do, and also whether what it's trying to do is
something that people actually want.</p>
<p>You can get started by watching the demo video and playing with the
tutorials.</p>
<p>You can get started by watching the
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and playing with the
<a href="tutorials.html">tutorials</a>.</p>
<h2><a name="fundamental-concepts">Fundamental Concepts</a></h2>
@ -63,7 +64,7 @@ tutorials.</p>
rest of the documentation assumes you've read and understood this. It will
be helpful if you already understand something about the 6502 instruction
set and assembly-language programming, but disassembling other programs is
actually a pretty good way to learn assembly.</p>
actually a pretty good way to learn how to code in assembly.</p>
<h2><a name="begin">About 6502 Code</a></h2>
@ -71,21 +72,24 @@ actually a pretty good way to learn assembly.</p>
the 6502 CPU or any of its derivatives, including but not limited to
the 65C02 and 65816". So let's talk about 6502 code.</p>
<p>Code usually arrives in a big binary blob. Some of it will be
instructions, some of it will be data, some will be empty space used
for variable storage. Part of the challenge of disassembly is
identifying which parts of the file contain which.</p>
<p>Much of the code you'll find for the 6502 was written by humans,
rather than generated by a compiler, which means it won't conform to a
specific set of conventions. However, most programmers will use
subroutines, and will often intersperse code with bits of data storage
for variables. The variable data storage is referred to as a "stash".
standard set of conventions. However, most programmers will use
subroutines, which can be identified and analyzed in isolation. Subroutines
are often interspersed with variable storage, referred to as a "stash".
Variables may be single-byte or multi-byte, the latter typically
in little-endian byte order.</p>
<p>Data that is principally read-only can take many forms. Among the
more common forms are graphics and ASCII string data. The former is
generally difficult to recognize automatically, but strings can often be
identified. Address tables, which are a collection of addresses to
other things, are also fairly common. When used as jump tables, they
might actually refer to the address before the actual instruction, because
of the way the RTS (Return to Subroutine) instruction works.</p>
<p>Much of the data in a typical program is read-only, often in the
form of graphics or ASCII string data. Graphics can be difficult
to recognize automatically, but strings can be identified with a
reasonable degree of confidence. Address tables, which are a collection
of addresses to other things, are also fairly common.</p>
<p>A simple disassembler would start at the top of the file and just
start converting bytes to instructions. Unfortunately there's no reliable
@ -127,14 +131,17 @@ by the program bank register and the data bank register, respectively.
The disassembler can't generally know the contents of the data bank
register, which makes life a bit more interesting.</p>
<p>The 6502 has an 8-bit processor status register with a bunch of flags
in it. One use of certain flags is to determine whether a
conditional branch is taken or not.
Two flags that are only present on the 65816 (M and X) are especially
interesting, because they determine whether the accumulator and index
registers are 8 or 16 bits wide. This determines the width of immediate-mode
instructions, so if you don't know what's in the processor status
register it's hard to correctly disassemble the instruction stream.</p>
<p>The 6502 has an 8-bit processor status register ("P") with a bunch of flags
in it. Some of the flags determine whether a conditional branch is taken
or not, which is important because some branches appear to be conditional
but actually are always or never taken in practice. The disassembler needs
to be able to figure this out so that it doesn't try to disassemble the
bytes that follow an always-taken branch.
A more significant concern is the M and X flags found on the 65802/65816,
which determine the width of the registers and of immediate load
instructions. If you don't know what state the flags are in, you can't
know whether <code>LDA #value</code> is two bytes or three, and the
disassembly of the instruction stream will come out wrong.</p>
<h2><a name="sgintro">How SourceGen Works</a></h2>
@ -145,9 +152,9 @@ only its effect on the flow of execution matters.
<p>The code tracing has to start somewhere, so SourceGen uses "code entry
point hints" to identify places where execution may begin. By default,
one is placed at the start of the file. From there, the tracing process
a hint is placed at the start of the file. From there, the tracing process
walks through the code, pursuing all branches. In many cases, if you
mark all code entry points, SourceGen will automatically find all
mark all external entry points, SourceGen will automatically find all
executable code and separate it from variable storage and data areas.</p>
<p>As noted earlier, tracking the processor status flags can make the
@ -155,7 +162,7 @@ analysis more accurate. Identifying situations where a branch instruction
is always or never taken avoids mis-categorizing a data region as code.
On the 65816, it's absolutely crucial to track the M/X flags, since those
affect the width of instructions. SourceGen tracks the value of the
processor flags at every instruction, blending sets together when
processor flags at every instruction, blending sets of flags together when
multiple paths of execution converge.</p>
<p>Once instructions and data have been separated, the instruction operands
@ -172,23 +179,16 @@ by an equate directive.</p>
<h3><a name="scripts">Extension Scripts</a></h3>
<p>Extension scripts are C# source files that are compiled and
executed by SourceGen. They can be added to a project from the RuntimeData
directory or the directory the project file lives in.</p>
<p>In v1.0, scripts are only called to examine JSR/JSL instructions.
They can format nearby bytes as inline data, or apply symbols to
operands.</p>
<p>If code jumps into a region that is marked as inline data, the
branch will be ignored. If an extension script tries to flag bytes
as inline data that have already been executed, the script will be
ignored. This can lead to a race condition in the analyzer if
an extension script is doing the wrong thing. (The race doesn't exist
with inline data hints specified by the user, because those are applied
before code analysis starts.)</p>
executed by SourceGen. They can be added to a project from SourceGen's
runtime data directory, or can live in the directory next to the project
file.</p>
<p>In the current implementation, scripts are only called to examine
JSR/JSL instructions. They can format nearby bytes as inline data, or
apply symbols to operands.</p>
<p>To reduce the chances of a script causing problems, all scripts are
executed in a sandbox with severely restricted access. Notably, nothing
in the script can access files, except to read those in the PluginDll
in the sandbox can access files, except to read files from the PluginDll
directory.</p>
<p>The PluginDll directory lives next to the SourceGen executable, and
contains all of the compiled script DLLs, as well as two pre-built
@ -199,10 +199,9 @@ is launched, but may be manually deleted without harm.</p>
<h3><a name="hints">Analyzer Hints</a></h3>
<p>Sometimes SourceGen can't automatically find the start or end of a
code area. Maybe there's inline data after a JSR that didn't get
recognized by an extension scripts. These situations can be resolved
by adding an appropriate hint.</p>
<p>Sometimes SourceGen can't automatically find the start or end of an
instruction stream, or gets confused by inline data. These situations
can be resolved by adding an appropriate hint.</p>
<p><b>Code entry point hints</b> tell the analyzer to add the offset
to the list of instruction start points. Suppose you've got a code
@ -247,9 +246,9 @@ end up with this:</p>
<pre>
.ORG $1000
JMP L1009
JMP &#9193; L10ef
BPL &#9193; L1053
JMP &#9193; L1230
JMP &#9193; L10ef
BPL &#9193; L1053
JMP &#9193; L1230
BMI L101b
L1009 CLC
</pre>
@ -276,7 +275,7 @@ would actually be better solved by setting a status flag override on
the BNE that sets Z=0, so the code tracer will know it's a branch-always
and do the right thing.) It's only necessary to place a hint on the
very first (opcode) byte. Placing a data hint in the middle of what
SourceGen believes is an instruction will have no effect.</p>
SourceGen believes to be instruction will have no effect.</p>
<p><b>Inline data hints</b> identify bytes as being part of the
instruction stream, but not instructions. A simple example of this
@ -285,11 +284,13 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
JSR $bf00
.DD1 $function
.DD2 $address
BCS BAD
</pre>
<p>The three bytes following a JSR to $bf00 should be skipped over by
the code analyzer. In this case, all three bytes must be hinted.</p>
<p>If code jumps into a region that is marked as inline data, the
<p>The three bytes following the <code>JSR $bf00</code> should be hinted
as inline data, so that the code analyzer skips them and continues the
analysis at the <code>BCS</code>.</p>
<p>If code branches into a region that is marked as inline data, the
branch will be ignored.</p>
@ -303,9 +304,9 @@ of the work being disassembled. (This will vary by region. Also, note
that the mere act of disassembling a piece of software may be illegal in
some cases.)</p>
<p>To avoid mix-ups, the data file's length and CRC are stored in the
project file. SourceGen will refuse to open a project if the data file's
length and CRC don't match.</p>
<p>To avoid mix-ups where the wrong data file is used, the file's length
and CRC are stored in the project file. SourceGen will refuse to open a
project if the data file's length and CRC don't match.</p>
<p>Most of the data in the project file is associated with a file offset.
When you create a comment, you aren't associating it with line 53, you're
@ -317,14 +318,20 @@ convention, file offsets are always shown as a six-digit hexadecimal value
with a leading '+', e.g. "+0012ab". This makes it easy to distinguish
between an address and a offset.</p>
<p>Instruction and data operands can be formatted in various ways. The
formatting choice is associated with the first offset of the item. For
instructions the number of bytes in the operand is determined by the opcode
(and, on the 65816, the M/X status flags). For data items the length
can be a single byte or an entire file. Operand formats are not allowed
to overlap.</p>
<p>When an instruction or data operand references an address, we call
it a <b>numeric reference</b>. When the target address has a label, and
the operand uses that symbol, we call that a <b>symbolic reference</b>.
SourceGen tries to establish symbolic references whenever possible,
so that the generated assembly source doesn't refer to hard-coded
locations within the program.</p>
<p>Data operands can also be numeric references. From the Edit Data
dialog, select the "Address" format.</p>
locations within the program. Labels are generated automatically for
the targets of numeric references.</p>
<p>As your understanding of the disassembled code develops, you will want
to add comments explaining it. SourceGen projects have three kinds of
@ -339,32 +346,38 @@ comments:</p>
are a way for you to leave notes to yourself, perhaps "don't forget
to figure this out" or "this is the cool part".
</ol>
<p>Each offset can have one of each.</p>
<p>Every file offset can have one of each.</p>
<p>Labels and comments may disappear if you associate them with a file
offset that is in the middle of a multi-byte instruction or data item.
For example, suppose you put a long comment at offset +000010, and then
mark a 50-byte region starting at offset +000008 as an ASCII string. The
comment won't be deleted, but won't be displayed either. The same thing
happens to labels.</p>
can happen to labels. SourceGen will try to prevent this from happening
by splitting formatted data into sub-regions at label boundaries.</p>
<h2><a name="about-symbols">All About Symbols</a></h2>
<p>A symbol has two parts, a label and a value. The value may be an
address or a numeric constant. Symbols can be defined in different ways,
and applied in different ways.</p>
<p>A symbol has two parts, a label and a value. The label is a short
ASCII string; the value may be an 8-to-24-bit address or a numeric
constant. Symbols can be defined in different ways, and applied in
different ways.</p>
<p>The label format is restricted:</p>
<p>The label syntax is restricted to a format that should be compatible
with most assemblers:</p>
<ul>
<li>2-32 characters long.
<li>Starts with a letter or underscore.
<li>Comprised of ASCII letters, numbers, and the underscore.
</ul>
<p>Label comparisons are case-sensitive, as is customary for programming
languages.</p>
<p><b>Platform symbols</b> are defined in platform symbol files, which
have a ".sym65" filename extension. Several come with SourceGen and
live in the <code>RuntimeData</code> directory. You can also create your
<p><b>Platform symbols</b> are defined in platform symbol files. These
are named with a ".sym65" extension, and have a fairly straightforward
name/value syntax. Several files for popular platforms come with SourceGen
and live in the <code>RuntimeData</code> directory. You can also create your
own, but they have to live in the same directory as the project file.</p>
<p>Platform symbols can be addresses or constants. If an instruction
@ -384,7 +397,7 @@ creating two symbols with the same name. If two symbols have the same
value, the one whose label comes first alphabetically is used.</p>
<p>Project symbols always have precedence over platform symbols, allowing
you to redefine symbols within a project. (You can "block" a platform
you to redefine symbols within a project. (You can "hide" a platform
symbol by creating a project symbol with the same name and an unused
value, such as $ffffffff.)</p>
@ -400,8 +413,8 @@ instructions or data offsets that are the target of operands. They're
formed by appending the hexadecimal address to the letter "L", with
additional characters added if some other symbol has already defined
that label. Auto labels are only added where they are needed. Because
auto labels may be redefined at any time, the editor will try to prevent
you from using them in operands.</p>
auto labels may be redefined or disappear, the editor will try to prevent
you from referring to them when editing operands.</p>
<p>Operands may use parts of symbols. For example, if you have a label
<code>MYSTRING</code>, you can write:</p>
@ -414,7 +427,7 @@ MYSTRING .STR "hello"
</pre>
<p>The format editor allows you to choose which part of the symbol's
value to use. If the value doesn't match exactly, and adjustment will
value to use. If the value doesn't match exactly, an adjustment will
be applied.</p>
<h3><a name="weak-refs">Weak References</a></h3>
@ -451,9 +464,9 @@ results are probably not what you want:</p>
</pre>
<p>This happened because you added a weak reference to "FOO" in the operand,
but the label doesn't exist. The operand is formatted as hex. This also
means that there's no longer a need for an auto label on the NOP instruction,
so SourceGen removed that as well.</p>
but the label doesn't exist. The operand is formatted as hex. Because
there's no longer a reference to L1003, SourceGen removed the auto-label
as well.</p>
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
probably wanted:</p>
@ -518,7 +531,9 @@ and jumps to it with the RTS instruction. However, RTS requires the
address of the byte before the target instruction, so we actually push
$1006.</p>
<p>After adding a code hint at $1007, the project looks like this:</p>
<p>The disassembler won't know that offset $1007 is code because nothing
appears to reference it. After adding a code hint at $1007, the project
looks like this:</p>
<pre>
LDA #$10
PHA

View File

@ -31,7 +31,7 @@ incomplete. The maximum size for a data file is currently 1 MiB.</p>
<p>The first time you save the project (with File &gt; Save), you will be
prompted for the project name. It's best to use the data file's name
with ".dis65" added. This will be configured automatically. The data
with ".dis65" added, so this will be set as the default. The data
file's name is not stored in the project file, so if you pick a different
name, or save the project in a different directory, you will have to
select the data file manually whenever you open the project.</p>
@ -58,7 +58,7 @@ to cancel the loading of the project.</p>
<p>The locations of the last few projects you've worked with are saved
in the application settings. You can access them from
File &gt; Recent Projects. If no project is open, links to the two
most-recently opened projects will be available.</p>
most-recently-opened projects will be available.</p>
<h2><a name="working">Working With a Project</a></h2>
@ -70,7 +70,7 @@ most-recently opened projects will be available.</p>
<li>Top left: cross-reference list.
<li>Bottom left: notes list.
<li>Top right: symbols list.
<li>Bottom right: line info.
<li>Bottom right: info on selected line.
</ol>
<p>Most of the action takes place in the center code list.</p>
@ -94,10 +94,12 @@ assembler directive.</p>
correspond to the instruction or data. To see the full dump of
a longer item, such as an ASCII string, double-click on the field
to open the
<a href="tools.html#hexdump">Hex Dump Viewer</a>. (Note this is
a floating window, so you can keep it open while you work.)</li>
<a href="tools.html#hexdump">Hex Dump Viewer</a>. This is
a floating window, so you can keep it open while you work.
Double-clicking in the bytes column in other rows will update
the window position and selection.</li>
<li><b>Flags</b>. This shows the state of the status flags as they
were before the instruction was executed. Double-click on this
are before the instruction is executed. Double-click on this
field to open the
<a href="editors.html#flags">Edit Status Flag Override</a> dialog.</li>
<li><b>Attributes</b>. Some instructions and data items have
@ -115,8 +117,8 @@ assembler directive.</p>
If an instruction is embedded inside this one, a &#9193; symbol
will appear.
If you double-click this field for an instruction or data item
whose operand refers to an address in the file, the view will jump to
that location.</li>
whose operand refers to an address in the file, the selection will
jump to that location.</li>
<li><b>Operand</b>. The instruction or data operand. Data operands
may span a large number of bytes. Double-click on this field to
open the
@ -177,7 +179,7 @@ enabled will depend on what you have selected in the main window.</p>
when a single equate directive, generated from a project symbol, is
selected.</li>
<li><a href="#hinting">Hinting</a> (Hint As Code Entry Point, Hint As
<li><a href="#hints">Hinting</a> (Hint As Code Entry Point, Hint As
Data Start, Hint As Inline Data, Remove Hints). Enabled when one or more
code and data lines are selected. Remove Hints is only enabled when
at least one line has hints.</li>
@ -187,7 +189,8 @@ enabled will depend on what you have selected in the main window.</p>
<li>Delete Note / Long Comment. Deletes the selected note or long
comment. Enabled when a single note or long comment is selected.</li>
<li><a href="tools.html#hexdump">Show Hex Dump</a>. Opens the hex dump
viewer with the current selection highlighted. Always enabled.</li>
viewer, with the current selection highlighted. Always enabled. If
nothing is selected, the viewer will open at the top of the file.</li>
</ul>
@ -199,8 +202,8 @@ change with Edit &gt; Redo, Ctrl+Y, or Ctrl+Shift+Z.</p>
are added to the undo/redo buffer. This has no fixed size limit, so no
matter how much you change, you can always undo back to the point where
the project was opened.</p>
<p>The undo buffer is not saved as part of the project, so closing and
reopening the project resets the buffer.</p>
<p>The undo history is not saved as part of the project. Closing a project
clears the buffer.</p>
<h3><a name="references">References Window</a></h3>
@ -264,7 +267,9 @@ Use Edit &gt; Find Next to find the next match.</p>
<p>Use Edit &gt; Go To to jump to an offset, address, or label. Remember
that offsets and addresses are always hexadecimal, and offsets start
with a '+'.</p>
with a '+'. If you have a label that is also a valid hexadecimal
address, like "FEED", the label takes precedence. To jump to the address
write "$FEED" instead.</p>
<p>When you jump around, by double-clicking on an opcode or an entry
in one of the side windows, the currently-selected line is added to
@ -291,6 +296,17 @@ entirely from the
<a href="settings.html#project-properties">project properties</a> editor.
<h3><a name="toggle-format">Quick Format Toggle</a></h3>
<p>The "Toggle Single-Byte Format" feature provides a quick way to
change a range of bytes to single bytes
or back to their default format. It's equivalent to opening the Edit
Data Format dialog and selecting "Single bytes" displayed as hex, or
selecting "Default".</p>
<p>This can be handy if the default format for a range of bytes is a
string, but you want to see it as bytes or set a label in the middle.</p>
<h3><a name="clipboard">Copying to Clipboard</a></h3>
<p>When you use Edit &gt; Copy, all lines selected in the code list are
@ -298,14 +314,16 @@ copied to the system clipboard. This can be a convenient way to post
code snippets into forum postings or documentation. The text is
copied from the data shown on screen, so your chosen capitalization
and pseudo-ops will appear in the copy.</p>
<p>A copy of all of the fields is also written to the clipboard, in
CSV format. If you open a program like Excel, you can use Paste Special
to put the data into individual cells.</p>
<p>Long comments are included, but notes are not.</p>
<p>By default, the label, opcode, operand, and comment fields are included.
From the
<a href="settings.html#app-settings">app settings</a> dialog you can select
a different format, "Disassembly", which also includes the address and byte
columns.</p>
<p>By default, the label, opcode, operand, and comment fields are included
in the text form. From the
<a href="settings.html#app-settings">app settings</a> you can select
a different format that also includes the address and byte columns.</p>
<p>A copy of all of the fields is also written to the clipboard in CSV
format. If you have a spreadsheet like Excel, you can use Paste Special
to put the data into individual cells.</p>
</div>

View File

@ -21,15 +21,15 @@ project properties.</p>
<p>Application settings are stored in a file called "SourceGen-settings"
in the SourceGen installation directory. If the file is missing or
corrupted, some default settings will be used. These settings are local
to your system, and include everything from window sizes to whether you
prefer hexadecimal values to be shown in upper case. None of them
to your system, and include everything from window sizes to whether or not
you prefer hexadecimal values to be shown in upper case. None of them
affect the way the project analyzes code and data, though they may affect
the way generated assembly sources look.</p>
<p>Project properties are stored in each individual .dis65 project file.
They specify which CPU to use, which extension scripts to load, and a
variety of other things that directly impact how SourceGen processes
the project. Because of the way it impacts the project, all changes to
the project. Because of the potential impact, all changes to
the project properties are made through the undo/redo buffer.</p>
@ -50,7 +50,7 @@ hide columns from the code list. The buttons may be more convenient
though.</p>
<p>You can select a different font for the code list. Make it as large
or small as you want. Monospace fonts like Courier or Consolas are
or small as you want. Mono-space fonts like Courier or Consolas are
recommended.</p>
<p>You can choose to display different parts of the display in upper or
@ -147,8 +147,8 @@ you later hit Cancel, but the changes are not applied immediately.</p>
<p>The choice of CPU determines the set of available instructions, as
well as cycle costs and register widths. There are many variations
on the 6502, but from the perspective of a disassembler only three
matter:
on the 6502, but from the perspective of a disassembler most can be
treated as one of these three:
<ol>
<li>MOS 6502. The original 8-bit instruction set.</li>
<li>WDC W65C02S. Expanded the instruction set and smoothed
@ -156,9 +156,9 @@ matter:
<li>WDC W65C816S. Expanded instruction set, 24-bit address space,
and 16-bit registers.</li>
</ol>
<p>The Rockwell R65C02 features an expanded instruction set that is
compatible with the WDC 65C02 but incompatible with the 65816. It's
not currently supported by SourceGen.</p>
<p>The Rockwell R65C02, Hudson Soft HuC6280, and Commodore CSG 4510 / 65CE02
have instruction sets that expand on the 6502/65C02, but aren't compatible
with the 65816. These are not yet supported by SourceGen.</p>
<p>If "enable undocumented instructions" is checked, some additional
opcodes are recognized on the 6502 and 65C02. These instructions are
@ -198,14 +198,18 @@ create two symbols with the same label.</p>
<p>The Import button allows you to import symbols from another project.
Only labels that have been tagged as global and exported will be imported.
Existing symbols with identical labels will be replaced, so it's okay to
run the importer multiple times.</p>
run the importer multiple times. Labels that aren't found will not be
removed, so you can safely import from multiple projects, but will need
to manually delete any symbols that are no longer being exported.</p>
<h3><a name="projprop-symfiles">Symbol Files</a></h3>
<p>From here, you can add and remove platform symbol files, or change
the order in which they are loaded.
See the <a href="intro.html#about-symbols">symbols</a> section for an
explanation of how platform symbols work.</p>
explanation of how platform symbols work.
See "README.md" in the RuntimeData directory for a description of the
file syntax.</p>
<p>Platform symbol files must live in the RuntimeData directory that comes
with SourceGen, or in the directory where the project file lives. This
@ -222,7 +226,9 @@ you will receive a warning.</p>
<h3><a name="projprop-extscripts">Extension Scripts</a></h3>
<p>From here, you can add and remove extension script files.
See the <a href="intro.html#scripts">extension scripts</a> section for
an explanation of how extension scripts work.</p>
an overview of how extension scripts work.
There's a more detailed document in the RuntimeData directory
("ExtensionScripts.md").</p>
<p>Extension script files must live in the RuntimeData directory that comes

View File

@ -46,7 +46,7 @@ pasting in some situations.</p>
<p>If "always on top" is checked, the window will stay above all other
windows that don't also declare that they should always be on top. By
default this box is checked for the project dump, and not checked for
default this box is checked when displaying project data, and not checked for
external files.</p>

View File

@ -70,15 +70,18 @@ these distracting, collapse the column.</p>
<p>Click on the fourth line down, which has address 1002. The line has
a label, "L1002", and is performing an indexed load from L1017. Both
of these labels were automatically generated, and are named for the
address they appear. When you clicked on the line, a few things happened:</p>
address at which they appear. When you clicked on the line, a few
things happened:</p>
<ul>
<li>The line was highlighted in the system selection color.</li>
<li>The line was highlighted in the system selection color (usually
blue).</li>
<li>Address 1017 and label L1017 were highlighted. When a line
with an in-file operand is selected, the target address is higlighted.</li>
<li>An entry appeared in the References window. This notes that the only
reference to L1002 is a branch from address $100B.</li>
with an in-file operand is selected, the target address is
highlighted.</li>
<li>An entry appeared in the References window. This tells you that the
only reference to L1002 is a branch from address $100B.</li>
<li>The Info window filled with a bunch of text that describes the
line and the LDA instruction.</li>
line format and some details about the LDA instruction.</li>
</ul>
<p>Click some other lines, such as address $100B and $1014. Note how the
@ -91,17 +94,17 @@ the operand itself opens a format editor; more on that later.)</p>
References window. Note the selection jumps to L1002. You can immediately
jump to any reference.</p>
<p>At the top of the Symbols window on the right side of the screen is a
row of buttons. Make sure "Auto" is highlighted. You should see three
row of buttons. Make sure "Auto" is selected. You should see three
labels in the window (L1002, L1014, L1017). Double-click on L1014. The
selection jumps to the appropriate line.</p>
<p>Select Edit &gt; Find. Type "hello", and hit Enter. The selection will
move to address $100E, which is a string that says "hello!". You can use
Edit &gt; Find Next to try to find the next occurrence (there isn't one). You
can search for text that appears in the rightmost columns (label, opcode,
can search for any text that appears in the rightmost columns (label, opcode,
operand, comment).</p>
<p>Select Edit &gt; Go To. You can enter a label, address, or file offset.
Enter "100b" to jump the selection to $100B.</p>
Enter "100b" to set the selection to $100B.</p>
<p>Near the top-left of the SourceGen window is a set of toolbar icons.
Click the left-arrow, and watch the selection moves. Click it again. Then
@ -118,21 +121,21 @@ something like "6502bench SourceGen vX.Y.Z". There are three ways to
open the comment editor:</p>
<ol>
<li>Select Actions &gt; Edit Long Comment from the menu bar.</li>
<li>Right click, and select Actions &gt; Edit Long Comment from the
pop-up menu. (The menus area exactly the same.)</li>
<li>Right click, and select Edit Long Comment from the
pop-up menu. (This menu is exactly the same as the Actions menu.)</li>
<li>Double-click the comment</li>
</ol>
<p>Most things in the code list will respond to a double-click.
Double-clicking on addresses, flags, labels, operands, and comments will
open editors for those things. Double-clicking on a value in the "bytes"
column will open a floating hex dump viewer. This is usually the most
convenient way to edit something.</p>
convenient way to edit something: point and click.</p>
<p>Double-click the comment to open the editor. Type some words into the
upper window, and note that a formatted version appears in the bottom
window. Experiment with the maximum line width and "render in box"
settings to see what they do. You can hit Enter to create line breaks,
or let SourceGen wrap lines for you. When you're done, click OK. (Or
hit Ctrl-Enter.</p>
hit Ctrl+Enter.)</p>
<p>When the dialog closes, you'll see your new comment in place at the
top of the file. If you typed enough words, your comment will span
multiple lines. You can select the comment by selecting any line in it.</p>
@ -151,15 +154,17 @@ differences:</p>
<ol>
<li>You can't pick their line width, but you can pick their color.</li>
<li>They don't appear in generated assembly sources, making them
useful for leaving notes to yourself.</li>
useful for leaving notes to yourself as you work.</li>
<li>They're listed in the Notes window. Double-clicking them jumps
the selection to the note, making them useful as bookmarks.</li>
</ol>
<p>It's time to do something with the code. It's copying the instructions
from $1017 to $2000, then jumping to $2000, so it looks like it's
relocating the code before executing it. We want to do the same thing
to our disassembled code, so select the line at address $1017 and then
<p>It's time to do something with the code. If you look at what the code
does you'll see that it's copying several dozen bytes from $1017
to $2000, then jumping to $2000. It appears to be relocating the next
part of the code before
executing it. We want to let the disassembler know what's going on, so
select the line at address $1017 and then
Edit &gt; Edit Address. (Or double-click the "1017" in the addr column.)
In the Edit Address dialog, type "2000", and hit Enter.)</p>
@ -178,8 +183,8 @@ so you'll be forgiven if you reduce the offset column width to zero.)</p>
<p>On the line at address $2000, select Actions &gt; Edit Label, or
double-click on the label "L2000". Change the label to "MAIN", and hit
Enter. The label changes on that line, and on the two lines that refer
to address $2000. (If you're not sure what refers to line $2000, check
the References window.)</p>
to address $2000. (If you're not sure what refers to line $2000, select
it and check the References window.)</p>
<p>On that same line, select Actions &gt; Edit Comment. Type a short
comment, and hit Enter. Your comment appears in the "comment" column.</p>
@ -215,12 +220,12 @@ Actions &gt; Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are
case-sensitive, so it needs to match the operand at $2005 exactly.) You'll
see the new label appear, and the operand at line $2005 will use it.</p>
<p>There's an easier way. Use Edit &gt; Undo twice, to get back to the place
where line $2005 is using "L2009" as it's operand. Select that line and
where line $2005 is using "L2009" as its operand. Select that line and
Actions &gt; Edit Operand. Enter "IS_OK", then select "Create label at target
address instead". Hit "OK".</p>
<p>You should now see that both the operand at $2005 and the label at
$2009 have changed to IS_OK, accomplishing what we wanted to do in a
single step. (There's actually a sutble difference compared to the two-step
single step. (There's actually a subtle difference compared to the two-step
process: the operand at $2005 is still a numeric reference. It was
automatically changed to match IS_OK in the same way that the references
to MAIN were when we renamed "L2000" earlier. If you actually do want the
@ -248,7 +253,7 @@ label to "STR1". Move up a bit and select address $2030, then scroll to
the bottom and shift-click address $2070. Select Actions &gt; Edit Data
Format. At the top it should now say, "65 bytes selected in 2 groups".
There are two groups because the presence of a label split the data into
two separate regions. Selected "mixed ASCII and non-ASCII", then click
two separate regions. Select "mixed ASCII and non-ASCII", then click
"OK".</p>
<p>We now have two ".STR" lines, one for "string zero ", one with the
STR1 label and the rest of the string data. This is okay, but it's not
@ -260,8 +265,8 @@ a single ".STR" line at the bottom, split across two lines with a '+'.</p>
but that appears to be incorrect, so let's format it as individual bytes
instead. There's an easy way to do that: use Actions &gt; Toggle Single-Byte
Format (or hit Ctrl+B).</p>
<p>The data starting at $2025 appears to be 16-bit addresses into the
table of strings, so let's format them appropriately.</p>
<p>The data starting at $2025 appears to be 16-bit addresses that point
into the table of strings, so let's format them appropriately.</p>
<p>Select the line at $2025, then shift-click the line at $202E. Select
Actions &gt; Edit Data Format. If you selected the correct set of bytes,
the top should say, "10 bytes selected". Click the
@ -277,7 +282,7 @@ on their own line, so each string is now in a separate ".STR" statement.
<h3>Generating Assembly Code</h3>
<p>You can generate asssembly source code from the disassembled data.
<p>You can generate assembly source code from the disassembled data.
Select File &gt; Assembler (or hit Ctrl+Shift+A) to open the generation
and assembly dialog.</p>
<p>Pick your favorite assembler from the drop list at the top right,