mirror of
https://github.com/fadden/6502bench.git
synced 2024-12-31 21:30:59 +00:00
Revise documentation
This commit is contained in:
parent
a8f26a048b
commit
8aba1c4fba
@ -28,8 +28,8 @@ as project symbols into the other projects.</p>
|
||||
symbol-import step in every interested project. This step must be
|
||||
repeated whenever the labels are updated.</p>
|
||||
<p>A different but related problem is typified by arcade ROM sets,
|
||||
where files are split apart because each file must be flashed to a
|
||||
separate chip. All files are expected to be present in memory at
|
||||
where files are split apart because each file must be burned into a
|
||||
separate PROM. All files are expected to be present in memory at
|
||||
once, so there's no reason to treat them as separate projects. Currently,
|
||||
the best way to deal with this is to concatenate the files into a single
|
||||
file, and operate on that.</p>
|
||||
@ -60,7 +60,7 @@ L1103_0 LDA #$22
|
||||
<p>Both sections start at $1100, and have branches to $1103. The branch
|
||||
in the first section resolves to the label in the first version of
|
||||
that address chunk, while the branch in the second section resolves to
|
||||
the label in the second chunk. When branches are outside the current
|
||||
the label in the second chunk. When branches originate outside the current
|
||||
address chunk, the first chunk that includes that address is used, as
|
||||
it is with the <code>JMP $1000</code> at the start of the file.</p>
|
||||
|
||||
@ -96,7 +96,7 @@ not help you debug 6502 projects.</p>
|
||||
multi-line comment (long comment, note). Useful for confirming that
|
||||
the width limitation is being obeyed. These are added exactly
|
||||
as shown, without comment delimiters, into generated assembly output,
|
||||
which doesn't work out well.</li>
|
||||
which doesn't work out well if you run the assembler.</li>
|
||||
<li>Use Keep-Alive Hack. If set, a "ping" is sent to the extension
|
||||
script sandbox every 60 seconds. This seems to be required to avoid
|
||||
an infrequently-encountered Windows bug. (See code for notes and
|
||||
|
@ -43,7 +43,7 @@ method in <code>DisasmProject.cs</code>):</p>
|
||||
The Anattrib array tracks most of the state from here on. If we're
|
||||
doing a partial re-analysis, this step will just clone a copy of the
|
||||
Anattrib array that was made at this point in a previous run. (This
|
||||
step is described in more detail <a href="code-analysis">below</a>.)</li>
|
||||
step is described in more detail below.)</li>
|
||||
<li>Apply user-specified labels to Anattribs.</li>
|
||||
<li>Apply user-specified format descriptors. These are the instruction
|
||||
and data operand formats.</li>
|
||||
@ -51,14 +51,14 @@ method in <code>DisasmProject.cs</code>):</p>
|
||||
data, and connects instruction and data operands to target offsets.
|
||||
The "nearby label" stuff is handled here. All of the results are
|
||||
stored in the Anattribs array. (This step is described in more
|
||||
detail <a href="data-analysis">below</a>.)</li>
|
||||
detail below.)</li>
|
||||
<li>Remove hidden labels from the symbol table. These are user-specified
|
||||
labels that have been placed on offsets that are in the middle of an
|
||||
instruction or multi-byte data item. They can't be referenced, so we
|
||||
want to pull them out of the symbol table. (Remember, symbolic
|
||||
operands use "weak references", so a missing symbol just means the
|
||||
operand is shown as a hex value.)</li>
|
||||
<li>Resolve references to platform and project external symbols>
|
||||
<li>Resolve references to platform and project external symbols.
|
||||
This sets the operand symbol in Anattrib, and adds the symbol to
|
||||
the list that is displayed in .EQ directives.</li>
|
||||
<li>Generate cross-reference lists. This is done for file data and
|
||||
@ -71,6 +71,103 @@ by walking through the annotated file data. Most of the actual strings
|
||||
aren't rendered until they're needed.</p>
|
||||
|
||||
|
||||
<h3><a name="auto-format">Automatic Formatting</a></h3>
|
||||
|
||||
<p>Every offset in the file is marked as an instruction byte, data byte, or
|
||||
inline data byte. Some offsets are also marked as the start of an instruction
|
||||
or data area. The start offsets may have a format descriptor associated
|
||||
with them.</p>
|
||||
<p>Format descriptors have a format (like "numeric" or "string") a
|
||||
sub-format (like "hexadecimal" or "null-terminated"), and a length. For
|
||||
an instruction operand the length is redundant, but for a data operand it
|
||||
determines the width of the numeric value or length of the string. For
|
||||
this reason, instructions do not need a format descriptor, but all
|
||||
data items do.</p>
|
||||
<p>Symbolic references are format descriptors with a symbol attached.
|
||||
The symbol reference also specifies low/high/bank.</p>
|
||||
<p>Every offset marked as a start point gets its own line in the on-screen
|
||||
display list. Embedded instructions are identified internally by
|
||||
looking for instruction-start offsets inside instructions.</p>
|
||||
|
||||
<p>The Anattrib array holds the post-analysis state for every offset,
|
||||
including comments and formatting, but any changes you make in the
|
||||
editors are applied to the data structures that are saved in the project
|
||||
file. After a change is made, a full or partial re-analysis is done to
|
||||
fill out the Anattribs.</p>
|
||||
<p>Consider a simple example:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>We haven't formatted anything yet. The data analyzer sees that the
|
||||
JMP operand is inside the file, and has no label, so it creates an
|
||||
auto-label at offset +000003 and a format descriptor with a symbolic
|
||||
operand reference to "L1003" at +000000.</p>
|
||||
<p>Now we edit the label, changing L1003 to "FOO". This goes into the
|
||||
project's "user label" list. The analyzer is
|
||||
run, and applies the new "user label" to the Anattrib array. The
|
||||
data analyzer finds the numeric reference in the JMP operand, and finds
|
||||
a label at the target address, so it creates a symbolic operand reference
|
||||
to "FOO". When the display list is generated, the symbol "FOO" appears
|
||||
in both places.</p>
|
||||
<p>Even though the JMP operand changed from "L1003" to "FOO", the only
|
||||
change actually written to the project file is the label edit. The
|
||||
contents of the Anattrib array are disposable, so it can be used to
|
||||
add labels and "fix up" numeric references. Generated labels and
|
||||
format descriptors are never added to the project file.</p>
|
||||
|
||||
<p>If the JMP operand were edited, a format descriptor would be added
|
||||
to the user-specified descriptor list. During the analysis pass it would
|
||||
be added to the Anattrib array at offset +000000.</p>
|
||||
|
||||
|
||||
<h3><a name="undo-redo">Interaction With Undo/Redo</a></h3>
|
||||
|
||||
<p>The analysis pass always considers the current state of the user
|
||||
data structures. Whether you're adding a label or removing one, the
|
||||
code runs through the same set of steps. The advantage of this approach
|
||||
is that the act of doing a thing, undoing a thing, and redoing a thing
|
||||
are all handled the same way.</p>
|
||||
<p>None of the editors modify the project data structures directly. All
|
||||
changes are added to a change set, which is processed by a single function.
|
||||
The change sets are kept in the undo/redo buffer indefinitely. After
|
||||
the changes are made, the Anattrib array and other data structures are
|
||||
regenerated.</p>
|
||||
|
||||
<p>Data format editing can create some tricky situations. For example,
|
||||
suppose you have 8 bytes that have been formatted as two 32-bit words:
|
||||
|
||||
<pre>
|
||||
1000: 68690074 .dd4 $74006968
|
||||
1004: 65737400 .dd4 $00747365
|
||||
</pre>
|
||||
|
||||
You realize these are null-terminated strings, select both words, and
|
||||
reformat them:
|
||||
|
||||
<pre>
|
||||
1000: 686900 .zstr "hi"
|
||||
1003: 74657374+ .zstr "test"
|
||||
</pre>
|
||||
|
||||
Seems simple enough. Under the hood, SourceGen created three changes:
|
||||
<ol>
|
||||
<li>At offset +000000, replace the current format descriptor (4-byte
|
||||
numeric) with a 3-byte null-terminated string descriptor.</li>
|
||||
<li>At offset +000003, add a new 5-byte null-terminated string
|
||||
descriptor.</li>
|
||||
<li>At offset +000004, remove the 4-byte numeric descriptor.</li>
|
||||
</ol>
|
||||
|
||||
<p>Each entry in the change set has "before" and "after" states for the
|
||||
format descriptor at a specific offset. Only the state for the affected
|
||||
offsets is included -- the program doesn't take a complete state snapshot
|
||||
(even with the RAM on a modern system that would add up quickly). When
|
||||
undoing a change, before and after are simply reversed.</p>
|
||||
|
||||
|
||||
<h2><a name="code-analysis">Code Analysis</a></h2>
|
||||
|
||||
<p>The code tracer walks through the instructions, examining them to
|
||||
@ -81,8 +178,9 @@ for every instruction:</p>
|
||||
Examples: <code>LDA</code>, <code>STA</code>, <code>AND</code>,
|
||||
<code>NOP</code>.
|
||||
<li>Don't continue. The next instruction to be executed can't be
|
||||
determined from the file data, unless you're disassembling the
|
||||
system ROM. Examples: <code>RTS</code>, <code>BRK</code>.
|
||||
determined from the file data (unless you're disassembling the
|
||||
system ROM around the BRK vector).
|
||||
Examples: <code>RTS</code>, <code>BRK</code>.
|
||||
<li>Branch always. The operand specifies the next instruction address.
|
||||
Examples: <code>JMP</code>, <code>BRA</code>, <code>BRL</code>.
|
||||
<li>Branch sometimes. Execution may continue at the operand address,
|
||||
@ -96,8 +194,8 @@ for every instruction:</p>
|
||||
</ol>
|
||||
|
||||
<p>Branch targets are added to a list. When the current run of instructions
|
||||
is exhausted (i.e. a "don't continue" instruction is reached), the next
|
||||
target is pulled off of the list.</p>
|
||||
is exhausted (i.e. a "don't continue" or "branch always" instruction is
|
||||
reached), the next target is pulled off of the list.</p>
|
||||
|
||||
<p>The state of the processor status flags is recorded for every
|
||||
instruction. When execution proceeds to the next instruction or branches
|
||||
@ -116,18 +214,19 @@ of status flags, the analyzer stops pursuing that path.</p>
|
||||
when examining 65816 code, but it's possible for the status flag values
|
||||
to be indeterminate. In such a situation, short registers are assumed.
|
||||
Similarly, if the carry flag is unknown when an <code>XCE</code> is
|
||||
performed, we assume a transition to emulation mode.</p>
|
||||
performed, we assume a transition to emulation mode (E=1).</p>
|
||||
|
||||
<p>There are three ways to set a definite value in a status flags:</p>
|
||||
<p>There are three ways in which code can set a flag to a definite value:</p>
|
||||
<ol>
|
||||
<li>By specific instructions, like <code>SEC</code> or
|
||||
<li>By explicit instructions, like <code>SEC</code> or
|
||||
<code>CLD</code>.</li>
|
||||
<li>By immediate instructions. <code>LDA #$00</code> sets Z=1 and N=0.
|
||||
<code>ORA #$80</code> sets Z=0 and N=1.</li>
|
||||
<li>By immediate-operand instructions. <code>LDA #$00</code> sets Z=1
|
||||
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
|
||||
<li>By inference. For example, if we see a <code>BCC</code> instruction,
|
||||
we know that the carry will be clear at the branch target address, and
|
||||
set at the following instruction. The instruction doesn't affect the
|
||||
value of the flag, but we know what the value is at either address.</li>
|
||||
value of the flag, but we know what the value will be at both
|
||||
addresses.</li>
|
||||
</ol>
|
||||
<p>Self-modifying code can render spoil any of these, possibly requiring a
|
||||
status flag override to get correct disassembly.</p>
|
||||
@ -145,7 +244,7 @@ code does <code>CLC</code>/<code>PHP</code>, followed a bit later by the
|
||||
flag around. Flagging the carry bit as indeterminate with a status flag
|
||||
override on the instruction following the PLP fixes things.)</p>
|
||||
|
||||
<p>Some other things that the code analyzer can't handle:</p>
|
||||
<p>Some other things that the code analyzer can't recognize automatically:</p>
|
||||
<ul>
|
||||
<li>Jumping indirectly through an address outside the file, e.g.
|
||||
storing an address in zero-page memory and jumping through it.
|
||||
@ -163,6 +262,26 @@ that it's equal to the program bank register ("K"). Handling this
|
||||
correctly will require improvements to the user interface.</p>
|
||||
|
||||
|
||||
<h3><a name="extension-scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts can mark data that follows a JSR or JSL as inline
|
||||
data, or change the format of nearby data or instructions. The first
|
||||
time a JSR/JSL instruction is encountered, all loaded extension scripts
|
||||
are offered a chance to act.</p>
|
||||
|
||||
<p>The first script that applies a format wins. Attempts to re-format
|
||||
instructions or data will fail. This rule ensure that anything explicitly
|
||||
formatted by the user will not be overridden by a script.</p>
|
||||
|
||||
<p>If code jumps into a region that is marked as inline data, the
|
||||
branch will be ignored. If an extension script tries to flag bytes
|
||||
as inline data that have already been executed, the script will be
|
||||
ignored. This can lead to a race condition in the analyzer if
|
||||
an extension script is doing the wrong thing. (The race doesn't exist
|
||||
with inline data hints specified by the user, because those are applied
|
||||
before code analysis starts.)</p>
|
||||
|
||||
|
||||
<h2><a name="data-analysis">Data Analysis</a></h2>
|
||||
<p>The data analyzer performs two tasks. It matches operands with
|
||||
offsets, and it analyzes uncategorized data. Either or both of
|
||||
@ -171,17 +290,17 @@ these can be disabled from the
|
||||
|
||||
<p>The data target analyzer examines every instruction and data operand
|
||||
to see if it's referring to an offset within the data file. If the
|
||||
target is within the file, and has a label, a weak symbolic reference
|
||||
to that label is added to the Anattrib array. If the target doesn't
|
||||
have a label, the analyzer will either use a nearby label, or generate
|
||||
a unique label and use that.</p>
|
||||
target is within the file, and has a label, a format descriptor with a
|
||||
weak symbolic reference to that label is added to the Anattrib array. If
|
||||
the target doesn't have a label, the analyzer will either use a nearby
|
||||
label, or generate a unique label and use that.</p>
|
||||
<p>While most of the "nearby label" logic can be disabled, targets that
|
||||
land in the middle of an instruction are always adjusted backward to
|
||||
the instruction start. This is necessary because labels are only visible
|
||||
if they're associated with the first (opcode) byte of an instruction.</p>
|
||||
|
||||
<p>The uncategorized data analyzer tries to find ASCII strings and
|
||||
opportunities to use the ".FILL" instruction. It breaks the file into
|
||||
opportunities to use the ".FILL" operation. It breaks the file into
|
||||
pieces, where contiguous regions hold nothing but data, are not split
|
||||
across a ".ORG" directive, are not interrupted by data, and do not
|
||||
contain anything that the user has chosen to format. Each region is
|
||||
|
@ -15,8 +15,8 @@
|
||||
|
||||
<p>SourceGen can generate an assembly source file that, when fed into
|
||||
the target assembler, will recreate the original data file exactly.
|
||||
Every assembler is different, so code must be written especially for
|
||||
each.<p>
|
||||
Every assembler is different, so support must be added to SourceGen
|
||||
for each.</p>
|
||||
<p>The generation / assembly dialog can be opened with File > Assemble.</p>
|
||||
|
||||
|
||||
@ -37,7 +37,7 @@ assembler. This is most easily understood with an example.</p>
|
||||
<code>54 02 01</code>, with the arguments reversed. cc65 v2.17 doesn't
|
||||
do that; this is a bug that was fixed in a later version. So if you're
|
||||
generating code for v2.17, you want to create source code with the
|
||||
arguments the other way around.</p>
|
||||
arguments the wrong way around.</p>
|
||||
<p>Having version-dependent source code is a bad idea, so SourceGen
|
||||
just outputs raw hex bytes for MVN/MVP instructions. This yields the
|
||||
correct code for all versions of the assembler, but is ugly and
|
||||
@ -56,7 +56,7 @@ intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
|
||||
generators may produce multiple source files, perhaps a link script or
|
||||
symbol definition header to go with the assembly source. To avoid
|
||||
spreading files across the filesystem, SourceGen does all of its work
|
||||
in the same directory where the project lives. So before you can generate
|
||||
in the same directory where the project lives. Before you can generate
|
||||
code, you have to have given your project a name by saving it.</p>
|
||||
|
||||
<p>The Generate and Assemble dialog has a drop-down list near the top
|
||||
@ -98,12 +98,12 @@ command-line output will be displayed, with stdout and stderr separated.
|
||||
provides.)</p>
|
||||
|
||||
<p>The output will show the assembler's exit code, which will be zero
|
||||
on success (note: sometimes they lie.) If it did, SourceGen will then
|
||||
compare the assembler's output to the original file, and report any
|
||||
differences.</p>
|
||||
on success (note: sometimes they lie.) If it appeared to succeed,
|
||||
SourceGen will then compare the assembler's output to the original file,
|
||||
and report any differences.</p>
|
||||
<p>Failures here may be due to bugs in the cross-assembler or in
|
||||
SourceGen. However, SourceGen can generally work around assembler bugs,
|
||||
so any failure here is an opportunity for improvement.</p>
|
||||
so any failure is an opportunity for improvement.</p>
|
||||
|
||||
</div>
|
||||
|
||||
|
@ -16,13 +16,13 @@
|
||||
|
||||
<h2><a name="address">Edit Address</a></h2>
|
||||
<p>This adds a target address directive (".ORG") to the current offset.
|
||||
If you leave the field blank, the directive will be removed.</p>
|
||||
If you leave the text field blank, the directive will be removed.</p>
|
||||
<p>Addresses are always interpreted as hexadecimal. You can prefix
|
||||
it with a '$', but that's not necessary.</p>
|
||||
<p>24-bit addresses may be written with a bank separator, e.g. "12/3456"
|
||||
it with a '$', but that's not required.
|
||||
24-bit addresses may be written with a bank separator, e.g. "12/3456"
|
||||
would resolve to address $123456.</p>
|
||||
|
||||
<p>There will always be an address directive at the start of the list.
|
||||
<p>There will always be an address directive at the start of the file.
|
||||
Attempts to remove it will be ignored.</p>
|
||||
|
||||
|
||||
@ -34,14 +34,15 @@ that instruction. You can override the value of individual flags.</p>
|
||||
<p>The 65816 emulation bit, which is not part of the processor status
|
||||
register, may also be set in the editor.</p>
|
||||
<p>The M, X, and E flags will not be editable unless your CPU configuration
|
||||
is set to a 16-bit CPU.</p>
|
||||
is set to 65816.</p>
|
||||
|
||||
|
||||
<h2><a name="label">Edit Label</a></h2>
|
||||
<p>Sets or clears a label at the selected offset. The label must have
|
||||
the proper form, and not have the same name as another symbol.</p>
|
||||
the proper form, and not have the same name as another symbol. If
|
||||
you edit an auto-generated label you will be required to change the name.</p>
|
||||
<p>The label may be marked as local, global, or global and exported.
|
||||
Local labels may be generated in the assembler output in a
|
||||
Local labels may be modified by the assembly code generator to have a more
|
||||
convenient form, such as a local loop identifier. Global labels are
|
||||
always output as-is. Exported labels are added to a table that may
|
||||
be imported by other projects.</p>
|
||||
@ -51,16 +52,17 @@ be imported by other projects.</p>
|
||||
<p>Operands can be displayed in a variety of numeric formats, or as a
|
||||
symbol. The ASCII character format is only available for operands
|
||||
whose value falls into the range of low- or high-ASCII characters.</p>
|
||||
<p>Symbols may be used in their entirety, or offset by a byte or two.
|
||||
<p>Symbols may be used in their entirety, or shifted and masked.
|
||||
The low / high / bank selector determines which byte is used as the
|
||||
low byte. For 16-bit operands, this acts as a shift rather than a byte
|
||||
select.</p>
|
||||
select. If the symbol is wider than the operand field, a mask will be
|
||||
applied automatically.</p>
|
||||
|
||||
<p>A few shortcuts are provided when specifying a symbol. As noted in
|
||||
the introductory sections, operand symbols are weak references. If the
|
||||
symbol hasn't been defined as a label yet, the operand will be formatted
|
||||
as hex, which is probably not what you want.</p>
|
||||
<p>The default behavior is to just set the operand's symbol.</p>
|
||||
<p>The default behavior is just to set the operand's symbol.</p>
|
||||
<p>For operands that target an offset inside the file, if the target
|
||||
address does not yet have a label, and the symbol doesn't exist, you may
|
||||
set the symbol as the label on the target address as well. You can do
|
||||
@ -84,24 +86,35 @@ future release.)</p>
|
||||
|
||||
<h2><a name="data">Edit Data Format</a></h2>
|
||||
<p>This dialog offers a variety of choices, and can be used to apply a
|
||||
format to a range of offsets. If the range crosses a visual boundary,
|
||||
format to a range of offsets. You must select all of the bytes you want
|
||||
to format. For example, to format two bytes as a 16-bit word, you must
|
||||
select both bytes in the editor. (If you click on the first item, then
|
||||
Shift+double-click on the operand field of the last item, you can do
|
||||
this very quickly.) The selection does not need to be contiguous: you
|
||||
can use Control+click to select scattered items.)
|
||||
<p>If the range is discontiguous, or crosses a visual boundary
|
||||
such as a change in address, a user-specified label, or a long comment
|
||||
or note, the region will be split. The top of the dialog indicates how
|
||||
many bytes have been selected, and how many regions they have been
|
||||
divided into.</p>
|
||||
or note, the selection will be split into smaller regions. A message at the
|
||||
top of the dialog indicates how many bytes have been selected, and how
|
||||
many regions they have been divided into.</p>
|
||||
<p>(End-of-line comments do <i>not</i> split a region, and will
|
||||
disappear if they end up inside a multi-byte data item.)</p>
|
||||
|
||||
<p>The "Simple Data" items behave the same as their equivalents in the
|
||||
Edit Operand dialog. However, because the width is not determined by
|
||||
an instruction opcode, you will need to specify how wide each item is,
|
||||
and the byte order.</p>
|
||||
<p>Suppose you find a table of 16-bit addresses in the code. Click on
|
||||
an instruction opcode, and multiple items can be selected, you will need
|
||||
to specify how wide each item is and what its byte order is. For data
|
||||
you also have the option of setting the format to "Address", which marks
|
||||
the selected bytes as a numeric reference.</p>
|
||||
|
||||
<p>Consider a simple example: suppose you find a table of 16-bit
|
||||
addresses in the code. Click on
|
||||
the first byte, shift-click the last byte, then select the Edit Data menu
|
||||
item. The number of bytes selected should be even. Select
|
||||
"16-bit words, little-endian", then to the right "Address". When you
|
||||
click OK, the selected data will be formatted as a series of 16-bit
|
||||
address values.</p>
|
||||
"16-bit words, little-endian", then over to the right click on
|
||||
"Address". When you click OK, the selected data will be formatted as a
|
||||
series of 16-bit address values. If the addresses can be resolved inside
|
||||
the data file, each address will be assigned a label.</p>
|
||||
|
||||
<p>The "Bulk Data" items can represent large chunks of data compactly.
|
||||
The "fill" option is only available if all selected bytes have the
|
||||
@ -161,8 +174,8 @@ want to limit the overall length if you're hoping to create 80-column
|
||||
output. Some retro assemblers may have hard line length limitations,
|
||||
which could result in the comment being truncated in generated sources.</p>
|
||||
<p>A semicolon (';') is placed at the start of the line. If an assembler
|
||||
has different conventions, a different character may be used. You don't
|
||||
need to include a delimiter in the comment field.</p>
|
||||
has different conventions, a different delimiter character may be used. You
|
||||
don't need to include a semicolon in the comment field.</p>
|
||||
|
||||
<p>Comments on platform symbols are read from the platform symbol file, and
|
||||
cannot be edited from within SourceGen. Comments on project symbols are
|
||||
@ -176,11 +189,11 @@ will be word-wrapped at a line width of your choosing. They're always
|
||||
drawn with a fixed-width font, so you can create ASCII-art diagrams.
|
||||
Comment delimiters are added automatically at the start of each line.</p>
|
||||
<p>For a true retro look you can "box" the comment with asterisks. You
|
||||
can create a fill-width row of asterisks by putting a '*' on a line by
|
||||
can create a full-width row of asterisks by putting a '*' on a line by
|
||||
itself. (Assembly source generators are allowed to use a character
|
||||
other than '*' for the output, e.g. they might use a full set of
|
||||
box outline characters, though that's somewhat against the spirit of
|
||||
the thing.)</p>
|
||||
the thing. Regardless, a solo '*' results in a line.)</p>
|
||||
<p>The bottom window will update automatically as you type, showing what
|
||||
the output is expected to look like. The actual assembler source output
|
||||
will depend on features of the target assembler, such as comment
|
||||
@ -226,7 +239,7 @@ the same way when used in a .EQ directive.</p>
|
||||
the .EQ directive.</p>
|
||||
<p>Symbols marked as "address" will be applied automatically when an
|
||||
operand references an address outside the scope of the data file. Symbols
|
||||
marked as "constant" will not, though you can still specify it manually.</p>
|
||||
marked as "constant" will not, though you can still specify them manually.</p>
|
||||
|
||||
</div>
|
||||
|
||||
|
@ -19,10 +19,10 @@ school in the late 1980s, I read Don Lancaster's
|
||||
<i>Enhancing Your Apple II, Vol. 1</i> (available for download
|
||||
<a href="https://www.tinaja.com/ebksamp1.shtml">here</a>). This
|
||||
included a very detailed methodology for disassembling 6502 software.
|
||||
I decided to give it a try, so I dumped a monitor listing of the
|
||||
operating system from an SSI game ("RDOS") to paper with my Epson
|
||||
RX-80 -- tractor feed paper was helpful for this sort of thing -- and
|
||||
set to work.</p>
|
||||
I wanted to give it a try, so I generated a monitor listing of an
|
||||
operating system (called "RDOS") that SSI used on their games, and
|
||||
printed it out on my Epson RX-80 -- tractor feed paper was helpful for
|
||||
this sort of thing -- then set to work.</p>
|
||||
|
||||
<p>Lancaster's methodology involved highlighting different types of
|
||||
instructions with different colors, making notes, and adding labels.
|
||||
@ -44,14 +44,17 @@ like a modern IDE, because I didn't just want it to translate machine code
|
||||
into readable form. I wanted it to help me with the process of
|
||||
understanding the code, by providing cross-reference tables and symbol
|
||||
lists and giving me a place to scribble notes to myself while I worked.
|
||||
Especially the note-scribbling.</p>
|
||||
I especially wanted the note-scribbling, because learning how something
|
||||
works is usually an iterative process, where the function of a chunk of
|
||||
code gradually reveals itself over time.</p>
|
||||
|
||||
<p>In 2002, while writing the 6502/65816 disassembler for CiderPress, I
|
||||
ran into the same problems I had with the original Apple II monitor: it
|
||||
blundered through data sections and got lost briefly when a new code
|
||||
section started. This made it annoying to use for even small binaries. I
|
||||
section started. You had to pick long or short registers for the entire
|
||||
diassembly, which made 65816 code something of a disaster. I
|
||||
jotted down some notes on what I thought the core features of a good
|
||||
6502 diassembler should be, then went back to work on other features. It
|
||||
6502 disassembler should be, then moved on to work on other features. It
|
||||
was another 15 years before I picked up the idea again.</p>
|
||||
|
||||
<p>More recently, I disassembled some code by dumping it to a text
|
||||
|
@ -54,6 +54,7 @@ and 65816 code. The official web site is
|
||||
<li><a href="mainwin.html#info">Info Window</a></li>
|
||||
<li><a href="mainwin.html#navigation">Navigation</a></li>
|
||||
<li><a href="mainwin.html#hints">Adding and Removing Hints</a></li>
|
||||
<li><a href="mainwin.html#toggle-format">Quick Format Toggle</a></li>
|
||||
<li><a href="mainwin.html#clipboard">Copying to Clipboard</a></li>
|
||||
</ul>
|
||||
</ul>
|
||||
@ -108,11 +109,6 @@ and 65816 code. The official web site is
|
||||
<li><a href="tools.html#ascii-chart">ASCII Chart</a></li>
|
||||
</ul>
|
||||
|
||||
<li><a href="tutorials.html">Tutorials</a></li>
|
||||
<ul>
|
||||
<li><a href="tutorials.html#basic-features">Basic Features</a></li>
|
||||
</ul>
|
||||
|
||||
<li><a href="advanced.html">Advanced Topics</a></li>
|
||||
<ul>
|
||||
<li><a href="advanced.html#multi-bin">Working With Multiple Binaries</a></li>
|
||||
@ -123,12 +119,26 @@ and 65816 code. The official web site is
|
||||
<li><a href="analysis.html">Appendix: Instruction and Data Analysis</a></li>
|
||||
<ul>
|
||||
<li><a href="analysis.html#analysis-process">Analysis Process</a></li>
|
||||
<ul>
|
||||
<li><a href="analysis.html#auto-format">Automatic Formatting</a></li>
|
||||
<li><a href="analysis.html#undo-redo">Interaction With Undo/Redo</a></li>
|
||||
</ul>
|
||||
<li><a href="analysis.html#code-analysis">Code Analysis</a></li>
|
||||
<ul>
|
||||
<li><a href="analysis.html#extension-scripts">Extension Scripts</a></li>
|
||||
</ul>
|
||||
<li><a href="analysis.html#data-analysis">Data Analysis</a></li>
|
||||
</ul>
|
||||
|
||||
<li><a href="end-notes.html">End Notes</a></li>
|
||||
|
||||
<br/>
|
||||
|
||||
<li><a href="tutorials.html">Tutorials</a></li>
|
||||
<ul>
|
||||
<li><a href="tutorials.html#basic-features">Basic Features</a></li>
|
||||
</ul>
|
||||
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -28,7 +28,7 @@ navigate the code while trying to figure out what it does. A
|
||||
disassembler should help you understand the code, not just dump the
|
||||
instructions to a text file.</p>
|
||||
<p>The computer I built in 2014 has a 4GHz CPU and 8GB of RAM.
|
||||
We should put that to good use.</p>
|
||||
I figured we should put that kind of power to good use.</p>
|
||||
|
||||
<p>The second purpose is to facilitate sharing and collaboration. Most
|
||||
disassemblers generate output for a specific assembler, or in a way that's
|
||||
@ -49,12 +49,13 @@ capabilities within SourceGen are sufficiently flexible. If you need to
|
||||
generate assembly source and tweak it a bunch to express the intent of
|
||||
the original code, then passing a SourceGen project around won't work.
|
||||
This sort of thing is a bit outside the bounds of what a typical
|
||||
disassembler does, so it remains to be seen whether this succeeds at
|
||||
what it's trying to do, and also whether what it's trying to do is actually
|
||||
something that people want.</p>
|
||||
disassembler does, so it remains to be seen whether SourceGen succeeds at
|
||||
what it's trying to do, and also whether what it's trying to do is
|
||||
something that people actually want.</p>
|
||||
|
||||
<p>You can get started by watching the demo video and playing with the
|
||||
tutorials.</p>
|
||||
<p>You can get started by watching the
|
||||
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and playing with the
|
||||
<a href="tutorials.html">tutorials</a>.</p>
|
||||
|
||||
|
||||
<h2><a name="fundamental-concepts">Fundamental Concepts</a></h2>
|
||||
@ -63,7 +64,7 @@ tutorials.</p>
|
||||
rest of the documentation assumes you've read and understood this. It will
|
||||
be helpful if you already understand something about the 6502 instruction
|
||||
set and assembly-language programming, but disassembling other programs is
|
||||
actually a pretty good way to learn assembly.</p>
|
||||
actually a pretty good way to learn how to code in assembly.</p>
|
||||
|
||||
<h2><a name="begin">About 6502 Code</a></h2>
|
||||
|
||||
@ -71,21 +72,24 @@ actually a pretty good way to learn assembly.</p>
|
||||
the 6502 CPU or any of its derivatives, including but not limited to
|
||||
the 65C02 and 65816". So let's talk about 6502 code.</p>
|
||||
|
||||
<p>Code usually arrives in a big binary blob. Some of it will be
|
||||
instructions, some of it will be data, some will be empty space used
|
||||
for variable storage. Part of the challenge of disassembly is
|
||||
identifying which parts of the file contain which.</p>
|
||||
|
||||
<p>Much of the code you'll find for the 6502 was written by humans,
|
||||
rather than generated by a compiler, which means it won't conform to a
|
||||
specific set of conventions. However, most programmers will use
|
||||
subroutines, and will often intersperse code with bits of data storage
|
||||
for variables. The variable data storage is referred to as a "stash".
|
||||
standard set of conventions. However, most programmers will use
|
||||
subroutines, which can be identified and analyzed in isolation. Subroutines
|
||||
are often interspersed with variable storage, referred to as a "stash".
|
||||
Variables may be single-byte or multi-byte, the latter typically
|
||||
in little-endian byte order.</p>
|
||||
|
||||
<p>Data that is principally read-only can take many forms. Among the
|
||||
more common forms are graphics and ASCII string data. The former is
|
||||
generally difficult to recognize automatically, but strings can often be
|
||||
identified. Address tables, which are a collection of addresses to
|
||||
other things, are also fairly common. When used as jump tables, they
|
||||
might actually refer to the address before the actual instruction, because
|
||||
of the way the RTS (Return to Subroutine) instruction works.</p>
|
||||
<p>Much of the data in a typical program is read-only, often in the
|
||||
form of graphics or ASCII string data. Graphics can be difficult
|
||||
to recognize automatically, but strings can be identified with a
|
||||
reasonable degree of confidence. Address tables, which are a collection
|
||||
of addresses to other things, are also fairly common.</p>
|
||||
|
||||
<p>A simple disassembler would start at the top of the file and just
|
||||
start converting bytes to instructions. Unfortunately there's no reliable
|
||||
@ -127,14 +131,17 @@ by the program bank register and the data bank register, respectively.
|
||||
The disassembler can't generally know the contents of the data bank
|
||||
register, which makes life a bit more interesting.</p>
|
||||
|
||||
<p>The 6502 has an 8-bit processor status register with a bunch of flags
|
||||
in it. One use of certain flags is to determine whether a
|
||||
conditional branch is taken or not.
|
||||
Two flags that are only present on the 65816 (M and X) are especially
|
||||
interesting, because they determine whether the accumulator and index
|
||||
registers are 8 or 16 bits wide. This determines the width of immediate-mode
|
||||
instructions, so if you don't know what's in the processor status
|
||||
register it's hard to correctly disassemble the instruction stream.</p>
|
||||
<p>The 6502 has an 8-bit processor status register ("P") with a bunch of flags
|
||||
in it. Some of the flags determine whether a conditional branch is taken
|
||||
or not, which is important because some branches appear to be conditional
|
||||
but actually are always or never taken in practice. The disassembler needs
|
||||
to be able to figure this out so that it doesn't try to disassemble the
|
||||
bytes that follow an always-taken branch.
|
||||
A more significant concern is the M and X flags found on the 65802/65816,
|
||||
which determine the width of the registers and of immediate load
|
||||
instructions. If you don't know what state the flags are in, you can't
|
||||
know whether <code>LDA #value</code> is two bytes or three, and the
|
||||
disassembly of the instruction stream will come out wrong.</p>
|
||||
|
||||
|
||||
<h2><a name="sgintro">How SourceGen Works</a></h2>
|
||||
@ -145,9 +152,9 @@ only its effect on the flow of execution matters.
|
||||
|
||||
<p>The code tracing has to start somewhere, so SourceGen uses "code entry
|
||||
point hints" to identify places where execution may begin. By default,
|
||||
one is placed at the start of the file. From there, the tracing process
|
||||
a hint is placed at the start of the file. From there, the tracing process
|
||||
walks through the code, pursuing all branches. In many cases, if you
|
||||
mark all code entry points, SourceGen will automatically find all
|
||||
mark all external entry points, SourceGen will automatically find all
|
||||
executable code and separate it from variable storage and data areas.</p>
|
||||
|
||||
<p>As noted earlier, tracking the processor status flags can make the
|
||||
@ -155,7 +162,7 @@ analysis more accurate. Identifying situations where a branch instruction
|
||||
is always or never taken avoids mis-categorizing a data region as code.
|
||||
On the 65816, it's absolutely crucial to track the M/X flags, since those
|
||||
affect the width of instructions. SourceGen tracks the value of the
|
||||
processor flags at every instruction, blending sets together when
|
||||
processor flags at every instruction, blending sets of flags together when
|
||||
multiple paths of execution converge.</p>
|
||||
|
||||
<p>Once instructions and data have been separated, the instruction operands
|
||||
@ -172,23 +179,16 @@ by an equate directive.</p>
|
||||
<h3><a name="scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts are C# source files that are compiled and
|
||||
executed by SourceGen. They can be added to a project from the RuntimeData
|
||||
directory or the directory the project file lives in.</p>
|
||||
<p>In v1.0, scripts are only called to examine JSR/JSL instructions.
|
||||
They can format nearby bytes as inline data, or apply symbols to
|
||||
operands.</p>
|
||||
|
||||
<p>If code jumps into a region that is marked as inline data, the
|
||||
branch will be ignored. If an extension script tries to flag bytes
|
||||
as inline data that have already been executed, the script will be
|
||||
ignored. This can lead to a race condition in the analyzer if
|
||||
an extension script is doing the wrong thing. (The race doesn't exist
|
||||
with inline data hints specified by the user, because those are applied
|
||||
before code analysis starts.)</p>
|
||||
executed by SourceGen. They can be added to a project from SourceGen's
|
||||
runtime data directory, or can live in the directory next to the project
|
||||
file.</p>
|
||||
<p>In the current implementation, scripts are only called to examine
|
||||
JSR/JSL instructions. They can format nearby bytes as inline data, or
|
||||
apply symbols to operands.</p>
|
||||
|
||||
<p>To reduce the chances of a script causing problems, all scripts are
|
||||
executed in a sandbox with severely restricted access. Notably, nothing
|
||||
in the script can access files, except to read those in the PluginDll
|
||||
in the sandbox can access files, except to read files from the PluginDll
|
||||
directory.</p>
|
||||
<p>The PluginDll directory lives next to the SourceGen executable, and
|
||||
contains all of the compiled script DLLs, as well as two pre-built
|
||||
@ -199,10 +199,9 @@ is launched, but may be manually deleted without harm.</p>
|
||||
|
||||
<h3><a name="hints">Analyzer Hints</a></h3>
|
||||
|
||||
<p>Sometimes SourceGen can't automatically find the start or end of a
|
||||
code area. Maybe there's inline data after a JSR that didn't get
|
||||
recognized by an extension scripts. These situations can be resolved
|
||||
by adding an appropriate hint.</p>
|
||||
<p>Sometimes SourceGen can't automatically find the start or end of an
|
||||
instruction stream, or gets confused by inline data. These situations
|
||||
can be resolved by adding an appropriate hint.</p>
|
||||
|
||||
<p><b>Code entry point hints</b> tell the analyzer to add the offset
|
||||
to the list of instruction start points. Suppose you've got a code
|
||||
@ -247,9 +246,9 @@ end up with this:</p>
|
||||
<pre>
|
||||
.ORG $1000
|
||||
JMP L1009
|
||||
JMP ⏩ L10ef
|
||||
BPL ⏩ L1053
|
||||
JMP ⏩ L1230
|
||||
JMP ⏩ L10ef
|
||||
BPL ⏩ L1053
|
||||
JMP ⏩ L1230
|
||||
BMI L101b
|
||||
L1009 CLC
|
||||
</pre>
|
||||
@ -276,7 +275,7 @@ would actually be better solved by setting a status flag override on
|
||||
the BNE that sets Z=0, so the code tracer will know it's a branch-always
|
||||
and do the right thing.) It's only necessary to place a hint on the
|
||||
very first (opcode) byte. Placing a data hint in the middle of what
|
||||
SourceGen believes is an instruction will have no effect.</p>
|
||||
SourceGen believes to be instruction will have no effect.</p>
|
||||
|
||||
<p><b>Inline data hints</b> identify bytes as being part of the
|
||||
instruction stream, but not instructions. A simple example of this
|
||||
@ -285,11 +284,13 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
||||
JSR $bf00
|
||||
.DD1 $function
|
||||
.DD2 $address
|
||||
BCS BAD
|
||||
</pre>
|
||||
|
||||
<p>The three bytes following a JSR to $bf00 should be skipped over by
|
||||
the code analyzer. In this case, all three bytes must be hinted.</p>
|
||||
<p>If code jumps into a region that is marked as inline data, the
|
||||
<p>The three bytes following the <code>JSR $bf00</code> should be hinted
|
||||
as inline data, so that the code analyzer skips them and continues the
|
||||
analysis at the <code>BCS</code>.</p>
|
||||
<p>If code branches into a region that is marked as inline data, the
|
||||
branch will be ignored.</p>
|
||||
|
||||
|
||||
@ -303,9 +304,9 @@ of the work being disassembled. (This will vary by region. Also, note
|
||||
that the mere act of disassembling a piece of software may be illegal in
|
||||
some cases.)</p>
|
||||
|
||||
<p>To avoid mix-ups, the data file's length and CRC are stored in the
|
||||
project file. SourceGen will refuse to open a project if the data file's
|
||||
length and CRC don't match.</p>
|
||||
<p>To avoid mix-ups where the wrong data file is used, the file's length
|
||||
and CRC are stored in the project file. SourceGen will refuse to open a
|
||||
project if the data file's length and CRC don't match.</p>
|
||||
|
||||
<p>Most of the data in the project file is associated with a file offset.
|
||||
When you create a comment, you aren't associating it with line 53, you're
|
||||
@ -317,14 +318,20 @@ convention, file offsets are always shown as a six-digit hexadecimal value
|
||||
with a leading '+', e.g. "+0012ab". This makes it easy to distinguish
|
||||
between an address and a offset.</p>
|
||||
|
||||
<p>Instruction and data operands can be formatted in various ways. The
|
||||
formatting choice is associated with the first offset of the item. For
|
||||
instructions the number of bytes in the operand is determined by the opcode
|
||||
(and, on the 65816, the M/X status flags). For data items the length
|
||||
can be a single byte or an entire file. Operand formats are not allowed
|
||||
to overlap.</p>
|
||||
|
||||
<p>When an instruction or data operand references an address, we call
|
||||
it a <b>numeric reference</b>. When the target address has a label, and
|
||||
the operand uses that symbol, we call that a <b>symbolic reference</b>.
|
||||
SourceGen tries to establish symbolic references whenever possible,
|
||||
so that the generated assembly source doesn't refer to hard-coded
|
||||
locations within the program.</p>
|
||||
<p>Data operands can also be numeric references. From the Edit Data
|
||||
dialog, select the "Address" format.</p>
|
||||
locations within the program. Labels are generated automatically for
|
||||
the targets of numeric references.</p>
|
||||
|
||||
<p>As your understanding of the disassembled code develops, you will want
|
||||
to add comments explaining it. SourceGen projects have three kinds of
|
||||
@ -339,32 +346,38 @@ comments:</p>
|
||||
are a way for you to leave notes to yourself, perhaps "don't forget
|
||||
to figure this out" or "this is the cool part".
|
||||
</ol>
|
||||
<p>Each offset can have one of each.</p>
|
||||
<p>Every file offset can have one of each.</p>
|
||||
|
||||
<p>Labels and comments may disappear if you associate them with a file
|
||||
offset that is in the middle of a multi-byte instruction or data item.
|
||||
For example, suppose you put a long comment at offset +000010, and then
|
||||
mark a 50-byte region starting at offset +000008 as an ASCII string. The
|
||||
comment won't be deleted, but won't be displayed either. The same thing
|
||||
happens to labels.</p>
|
||||
can happen to labels. SourceGen will try to prevent this from happening
|
||||
by splitting formatted data into sub-regions at label boundaries.</p>
|
||||
|
||||
|
||||
<h2><a name="about-symbols">All About Symbols</a></h2>
|
||||
|
||||
<p>A symbol has two parts, a label and a value. The value may be an
|
||||
address or a numeric constant. Symbols can be defined in different ways,
|
||||
and applied in different ways.</p>
|
||||
<p>A symbol has two parts, a label and a value. The label is a short
|
||||
ASCII string; the value may be an 8-to-24-bit address or a numeric
|
||||
constant. Symbols can be defined in different ways, and applied in
|
||||
different ways.</p>
|
||||
|
||||
<p>The label format is restricted:</p>
|
||||
<p>The label syntax is restricted to a format that should be compatible
|
||||
with most assemblers:</p>
|
||||
<ul>
|
||||
<li>2-32 characters long.
|
||||
<li>Starts with a letter or underscore.
|
||||
<li>Comprised of ASCII letters, numbers, and the underscore.
|
||||
</ul>
|
||||
<p>Label comparisons are case-sensitive, as is customary for programming
|
||||
languages.</p>
|
||||
|
||||
<p><b>Platform symbols</b> are defined in platform symbol files, which
|
||||
have a ".sym65" filename extension. Several come with SourceGen and
|
||||
live in the <code>RuntimeData</code> directory. You can also create your
|
||||
<p><b>Platform symbols</b> are defined in platform symbol files. These
|
||||
are named with a ".sym65" extension, and have a fairly straightforward
|
||||
name/value syntax. Several files for popular platforms come with SourceGen
|
||||
and live in the <code>RuntimeData</code> directory. You can also create your
|
||||
own, but they have to live in the same directory as the project file.</p>
|
||||
|
||||
<p>Platform symbols can be addresses or constants. If an instruction
|
||||
@ -384,7 +397,7 @@ creating two symbols with the same name. If two symbols have the same
|
||||
value, the one whose label comes first alphabetically is used.</p>
|
||||
|
||||
<p>Project symbols always have precedence over platform symbols, allowing
|
||||
you to redefine symbols within a project. (You can "block" a platform
|
||||
you to redefine symbols within a project. (You can "hide" a platform
|
||||
symbol by creating a project symbol with the same name and an unused
|
||||
value, such as $ffffffff.)</p>
|
||||
|
||||
@ -400,8 +413,8 @@ instructions or data offsets that are the target of operands. They're
|
||||
formed by appending the hexadecimal address to the letter "L", with
|
||||
additional characters added if some other symbol has already defined
|
||||
that label. Auto labels are only added where they are needed. Because
|
||||
auto labels may be redefined at any time, the editor will try to prevent
|
||||
you from using them in operands.</p>
|
||||
auto labels may be redefined or disappear, the editor will try to prevent
|
||||
you from referring to them when editing operands.</p>
|
||||
|
||||
<p>Operands may use parts of symbols. For example, if you have a label
|
||||
<code>MYSTRING</code>, you can write:</p>
|
||||
@ -414,7 +427,7 @@ MYSTRING .STR "hello"
|
||||
</pre>
|
||||
|
||||
<p>The format editor allows you to choose which part of the symbol's
|
||||
value to use. If the value doesn't match exactly, and adjustment will
|
||||
value to use. If the value doesn't match exactly, an adjustment will
|
||||
be applied.</p>
|
||||
|
||||
<h3><a name="weak-refs">Weak References</a></h3>
|
||||
@ -451,9 +464,9 @@ results are probably not what you want:</p>
|
||||
</pre>
|
||||
|
||||
<p>This happened because you added a weak reference to "FOO" in the operand,
|
||||
but the label doesn't exist. The operand is formatted as hex. This also
|
||||
means that there's no longer a need for an auto label on the NOP instruction,
|
||||
so SourceGen removed that as well.</p>
|
||||
but the label doesn't exist. The operand is formatted as hex. Because
|
||||
there's no longer a reference to L1003, SourceGen removed the auto-label
|
||||
as well.</p>
|
||||
|
||||
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
|
||||
probably wanted:</p>
|
||||
@ -518,7 +531,9 @@ and jumps to it with the RTS instruction. However, RTS requires the
|
||||
address of the byte before the target instruction, so we actually push
|
||||
$1006.</p>
|
||||
|
||||
<p>After adding a code hint at $1007, the project looks like this:</p>
|
||||
<p>The disassembler won't know that offset $1007 is code because nothing
|
||||
appears to reference it. After adding a code hint at $1007, the project
|
||||
looks like this:</p>
|
||||
<pre>
|
||||
LDA #$10
|
||||
PHA
|
||||
|
@ -31,7 +31,7 @@ incomplete. The maximum size for a data file is currently 1 MiB.</p>
|
||||
|
||||
<p>The first time you save the project (with File > Save), you will be
|
||||
prompted for the project name. It's best to use the data file's name
|
||||
with ".dis65" added. This will be configured automatically. The data
|
||||
with ".dis65" added, so this will be set as the default. The data
|
||||
file's name is not stored in the project file, so if you pick a different
|
||||
name, or save the project in a different directory, you will have to
|
||||
select the data file manually whenever you open the project.</p>
|
||||
@ -58,7 +58,7 @@ to cancel the loading of the project.</p>
|
||||
<p>The locations of the last few projects you've worked with are saved
|
||||
in the application settings. You can access them from
|
||||
File > Recent Projects. If no project is open, links to the two
|
||||
most-recently opened projects will be available.</p>
|
||||
most-recently-opened projects will be available.</p>
|
||||
|
||||
|
||||
<h2><a name="working">Working With a Project</a></h2>
|
||||
@ -70,7 +70,7 @@ most-recently opened projects will be available.</p>
|
||||
<li>Top left: cross-reference list.
|
||||
<li>Bottom left: notes list.
|
||||
<li>Top right: symbols list.
|
||||
<li>Bottom right: line info.
|
||||
<li>Bottom right: info on selected line.
|
||||
</ol>
|
||||
|
||||
<p>Most of the action takes place in the center code list.</p>
|
||||
@ -94,10 +94,12 @@ assembler directive.</p>
|
||||
correspond to the instruction or data. To see the full dump of
|
||||
a longer item, such as an ASCII string, double-click on the field
|
||||
to open the
|
||||
<a href="tools.html#hexdump">Hex Dump Viewer</a>. (Note this is
|
||||
a floating window, so you can keep it open while you work.)</li>
|
||||
<a href="tools.html#hexdump">Hex Dump Viewer</a>. This is
|
||||
a floating window, so you can keep it open while you work.
|
||||
Double-clicking in the bytes column in other rows will update
|
||||
the window position and selection.</li>
|
||||
<li><b>Flags</b>. This shows the state of the status flags as they
|
||||
were before the instruction was executed. Double-click on this
|
||||
are before the instruction is executed. Double-click on this
|
||||
field to open the
|
||||
<a href="editors.html#flags">Edit Status Flag Override</a> dialog.</li>
|
||||
<li><b>Attributes</b>. Some instructions and data items have
|
||||
@ -115,8 +117,8 @@ assembler directive.</p>
|
||||
If an instruction is embedded inside this one, a ⏩ symbol
|
||||
will appear.
|
||||
If you double-click this field for an instruction or data item
|
||||
whose operand refers to an address in the file, the view will jump to
|
||||
that location.</li>
|
||||
whose operand refers to an address in the file, the selection will
|
||||
jump to that location.</li>
|
||||
<li><b>Operand</b>. The instruction or data operand. Data operands
|
||||
may span a large number of bytes. Double-click on this field to
|
||||
open the
|
||||
@ -177,7 +179,7 @@ enabled will depend on what you have selected in the main window.</p>
|
||||
when a single equate directive, generated from a project symbol, is
|
||||
selected.</li>
|
||||
|
||||
<li><a href="#hinting">Hinting</a> (Hint As Code Entry Point, Hint As
|
||||
<li><a href="#hints">Hinting</a> (Hint As Code Entry Point, Hint As
|
||||
Data Start, Hint As Inline Data, Remove Hints). Enabled when one or more
|
||||
code and data lines are selected. Remove Hints is only enabled when
|
||||
at least one line has hints.</li>
|
||||
@ -187,7 +189,8 @@ enabled will depend on what you have selected in the main window.</p>
|
||||
<li>Delete Note / Long Comment. Deletes the selected note or long
|
||||
comment. Enabled when a single note or long comment is selected.</li>
|
||||
<li><a href="tools.html#hexdump">Show Hex Dump</a>. Opens the hex dump
|
||||
viewer with the current selection highlighted. Always enabled.</li>
|
||||
viewer, with the current selection highlighted. Always enabled. If
|
||||
nothing is selected, the viewer will open at the top of the file.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
@ -199,8 +202,8 @@ change with Edit > Redo, Ctrl+Y, or Ctrl+Shift+Z.</p>
|
||||
are added to the undo/redo buffer. This has no fixed size limit, so no
|
||||
matter how much you change, you can always undo back to the point where
|
||||
the project was opened.</p>
|
||||
<p>The undo buffer is not saved as part of the project, so closing and
|
||||
reopening the project resets the buffer.</p>
|
||||
<p>The undo history is not saved as part of the project. Closing a project
|
||||
clears the buffer.</p>
|
||||
|
||||
|
||||
<h3><a name="references">References Window</a></h3>
|
||||
@ -264,7 +267,9 @@ Use Edit > Find Next to find the next match.</p>
|
||||
|
||||
<p>Use Edit > Go To to jump to an offset, address, or label. Remember
|
||||
that offsets and addresses are always hexadecimal, and offsets start
|
||||
with a '+'.</p>
|
||||
with a '+'. If you have a label that is also a valid hexadecimal
|
||||
address, like "FEED", the label takes precedence. To jump to the address
|
||||
write "$FEED" instead.</p>
|
||||
|
||||
<p>When you jump around, by double-clicking on an opcode or an entry
|
||||
in one of the side windows, the currently-selected line is added to
|
||||
@ -291,6 +296,17 @@ entirely from the
|
||||
<a href="settings.html#project-properties">project properties</a> editor.
|
||||
|
||||
|
||||
<h3><a name="toggle-format">Quick Format Toggle</a></h3>
|
||||
|
||||
<p>The "Toggle Single-Byte Format" feature provides a quick way to
|
||||
change a range of bytes to single bytes
|
||||
or back to their default format. It's equivalent to opening the Edit
|
||||
Data Format dialog and selecting "Single bytes" displayed as hex, or
|
||||
selecting "Default".</p>
|
||||
<p>This can be handy if the default format for a range of bytes is a
|
||||
string, but you want to see it as bytes or set a label in the middle.</p>
|
||||
|
||||
|
||||
<h3><a name="clipboard">Copying to Clipboard</a></h3>
|
||||
|
||||
<p>When you use Edit > Copy, all lines selected in the code list are
|
||||
@ -298,14 +314,16 @@ copied to the system clipboard. This can be a convenient way to post
|
||||
code snippets into forum postings or documentation. The text is
|
||||
copied from the data shown on screen, so your chosen capitalization
|
||||
and pseudo-ops will appear in the copy.</p>
|
||||
<p>A copy of all of the fields is also written to the clipboard, in
|
||||
CSV format. If you open a program like Excel, you can use Paste Special
|
||||
to put the data into individual cells.</p>
|
||||
<p>Long comments are included, but notes are not.</p>
|
||||
<p>By default, the label, opcode, operand, and comment fields are included.
|
||||
From the
|
||||
<a href="settings.html#app-settings">app settings</a> dialog you can select
|
||||
a different format, "Disassembly", which also includes the address and byte
|
||||
columns.</p>
|
||||
|
||||
<p>By default, the label, opcode, operand, and comment fields are included
|
||||
in the text form. From the
|
||||
<a href="settings.html#app-settings">app settings</a> you can select
|
||||
a different format that also includes the address and byte columns.</p>
|
||||
<p>A copy of all of the fields is also written to the clipboard in CSV
|
||||
format. If you have a spreadsheet like Excel, you can use Paste Special
|
||||
to put the data into individual cells.</p>
|
||||
|
||||
</div>
|
||||
|
||||
|
@ -21,15 +21,15 @@ project properties.</p>
|
||||
<p>Application settings are stored in a file called "SourceGen-settings"
|
||||
in the SourceGen installation directory. If the file is missing or
|
||||
corrupted, some default settings will be used. These settings are local
|
||||
to your system, and include everything from window sizes to whether you
|
||||
prefer hexadecimal values to be shown in upper case. None of them
|
||||
to your system, and include everything from window sizes to whether or not
|
||||
you prefer hexadecimal values to be shown in upper case. None of them
|
||||
affect the way the project analyzes code and data, though they may affect
|
||||
the way generated assembly sources look.</p>
|
||||
|
||||
<p>Project properties are stored in each individual .dis65 project file.
|
||||
They specify which CPU to use, which extension scripts to load, and a
|
||||
variety of other things that directly impact how SourceGen processes
|
||||
the project. Because of the way it impacts the project, all changes to
|
||||
the project. Because of the potential impact, all changes to
|
||||
the project properties are made through the undo/redo buffer.</p>
|
||||
|
||||
|
||||
@ -50,7 +50,7 @@ hide columns from the code list. The buttons may be more convenient
|
||||
though.</p>
|
||||
|
||||
<p>You can select a different font for the code list. Make it as large
|
||||
or small as you want. Monospace fonts like Courier or Consolas are
|
||||
or small as you want. Mono-space fonts like Courier or Consolas are
|
||||
recommended.</p>
|
||||
|
||||
<p>You can choose to display different parts of the display in upper or
|
||||
@ -147,8 +147,8 @@ you later hit Cancel, but the changes are not applied immediately.</p>
|
||||
|
||||
<p>The choice of CPU determines the set of available instructions, as
|
||||
well as cycle costs and register widths. There are many variations
|
||||
on the 6502, but from the perspective of a disassembler only three
|
||||
matter:
|
||||
on the 6502, but from the perspective of a disassembler most can be
|
||||
treated as one of these three:
|
||||
<ol>
|
||||
<li>MOS 6502. The original 8-bit instruction set.</li>
|
||||
<li>WDC W65C02S. Expanded the instruction set and smoothed
|
||||
@ -156,9 +156,9 @@ matter:
|
||||
<li>WDC W65C816S. Expanded instruction set, 24-bit address space,
|
||||
and 16-bit registers.</li>
|
||||
</ol>
|
||||
<p>The Rockwell R65C02 features an expanded instruction set that is
|
||||
compatible with the WDC 65C02 but incompatible with the 65816. It's
|
||||
not currently supported by SourceGen.</p>
|
||||
<p>The Rockwell R65C02, Hudson Soft HuC6280, and Commodore CSG 4510 / 65CE02
|
||||
have instruction sets that expand on the 6502/65C02, but aren't compatible
|
||||
with the 65816. These are not yet supported by SourceGen.</p>
|
||||
|
||||
<p>If "enable undocumented instructions" is checked, some additional
|
||||
opcodes are recognized on the 6502 and 65C02. These instructions are
|
||||
@ -198,14 +198,18 @@ create two symbols with the same label.</p>
|
||||
<p>The Import button allows you to import symbols from another project.
|
||||
Only labels that have been tagged as global and exported will be imported.
|
||||
Existing symbols with identical labels will be replaced, so it's okay to
|
||||
run the importer multiple times.</p>
|
||||
run the importer multiple times. Labels that aren't found will not be
|
||||
removed, so you can safely import from multiple projects, but will need
|
||||
to manually delete any symbols that are no longer being exported.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-symfiles">Symbol Files</a></h3>
|
||||
<p>From here, you can add and remove platform symbol files, or change
|
||||
the order in which they are loaded.
|
||||
See the <a href="intro.html#about-symbols">symbols</a> section for an
|
||||
explanation of how platform symbols work.</p>
|
||||
explanation of how platform symbols work.
|
||||
See "README.md" in the RuntimeData directory for a description of the
|
||||
file syntax.</p>
|
||||
|
||||
<p>Platform symbol files must live in the RuntimeData directory that comes
|
||||
with SourceGen, or in the directory where the project file lives. This
|
||||
@ -222,7 +226,9 @@ you will receive a warning.</p>
|
||||
<h3><a name="projprop-extscripts">Extension Scripts</a></h3>
|
||||
<p>From here, you can add and remove extension script files.
|
||||
See the <a href="intro.html#scripts">extension scripts</a> section for
|
||||
an explanation of how extension scripts work.</p>
|
||||
an overview of how extension scripts work.
|
||||
There's a more detailed document in the RuntimeData directory
|
||||
("ExtensionScripts.md").</p>
|
||||
|
||||
|
||||
<p>Extension script files must live in the RuntimeData directory that comes
|
||||
|
@ -46,7 +46,7 @@ pasting in some situations.</p>
|
||||
|
||||
<p>If "always on top" is checked, the window will stay above all other
|
||||
windows that don't also declare that they should always be on top. By
|
||||
default this box is checked for the project dump, and not checked for
|
||||
default this box is checked when displaying project data, and not checked for
|
||||
external files.</p>
|
||||
|
||||
|
||||
|
@ -70,15 +70,18 @@ these distracting, collapse the column.</p>
|
||||
<p>Click on the fourth line down, which has address 1002. The line has
|
||||
a label, "L1002", and is performing an indexed load from L1017. Both
|
||||
of these labels were automatically generated, and are named for the
|
||||
address they appear. When you clicked on the line, a few things happened:</p>
|
||||
address at which they appear. When you clicked on the line, a few
|
||||
things happened:</p>
|
||||
<ul>
|
||||
<li>The line was highlighted in the system selection color.</li>
|
||||
<li>The line was highlighted in the system selection color (usually
|
||||
blue).</li>
|
||||
<li>Address 1017 and label L1017 were highlighted. When a line
|
||||
with an in-file operand is selected, the target address is higlighted.</li>
|
||||
<li>An entry appeared in the References window. This notes that the only
|
||||
reference to L1002 is a branch from address $100B.</li>
|
||||
with an in-file operand is selected, the target address is
|
||||
highlighted.</li>
|
||||
<li>An entry appeared in the References window. This tells you that the
|
||||
only reference to L1002 is a branch from address $100B.</li>
|
||||
<li>The Info window filled with a bunch of text that describes the
|
||||
line and the LDA instruction.</li>
|
||||
line format and some details about the LDA instruction.</li>
|
||||
</ul>
|
||||
|
||||
<p>Click some other lines, such as address $100B and $1014. Note how the
|
||||
@ -91,17 +94,17 @@ the operand itself opens a format editor; more on that later.)</p>
|
||||
References window. Note the selection jumps to L1002. You can immediately
|
||||
jump to any reference.</p>
|
||||
<p>At the top of the Symbols window on the right side of the screen is a
|
||||
row of buttons. Make sure "Auto" is highlighted. You should see three
|
||||
row of buttons. Make sure "Auto" is selected. You should see three
|
||||
labels in the window (L1002, L1014, L1017). Double-click on L1014. The
|
||||
selection jumps to the appropriate line.</p>
|
||||
|
||||
<p>Select Edit > Find. Type "hello", and hit Enter. The selection will
|
||||
move to address $100E, which is a string that says "hello!". You can use
|
||||
Edit > Find Next to try to find the next occurrence (there isn't one). You
|
||||
can search for text that appears in the rightmost columns (label, opcode,
|
||||
can search for any text that appears in the rightmost columns (label, opcode,
|
||||
operand, comment).</p>
|
||||
<p>Select Edit > Go To. You can enter a label, address, or file offset.
|
||||
Enter "100b" to jump the selection to $100B.</p>
|
||||
Enter "100b" to set the selection to $100B.</p>
|
||||
|
||||
<p>Near the top-left of the SourceGen window is a set of toolbar icons.
|
||||
Click the left-arrow, and watch the selection moves. Click it again. Then
|
||||
@ -118,21 +121,21 @@ something like "6502bench SourceGen vX.Y.Z". There are three ways to
|
||||
open the comment editor:</p>
|
||||
<ol>
|
||||
<li>Select Actions > Edit Long Comment from the menu bar.</li>
|
||||
<li>Right click, and select Actions > Edit Long Comment from the
|
||||
pop-up menu. (The menus area exactly the same.)</li>
|
||||
<li>Right click, and select Edit Long Comment from the
|
||||
pop-up menu. (This menu is exactly the same as the Actions menu.)</li>
|
||||
<li>Double-click the comment</li>
|
||||
</ol>
|
||||
<p>Most things in the code list will respond to a double-click.
|
||||
Double-clicking on addresses, flags, labels, operands, and comments will
|
||||
open editors for those things. Double-clicking on a value in the "bytes"
|
||||
column will open a floating hex dump viewer. This is usually the most
|
||||
convenient way to edit something.</p>
|
||||
convenient way to edit something: point and click.</p>
|
||||
<p>Double-click the comment to open the editor. Type some words into the
|
||||
upper window, and note that a formatted version appears in the bottom
|
||||
window. Experiment with the maximum line width and "render in box"
|
||||
settings to see what they do. You can hit Enter to create line breaks,
|
||||
or let SourceGen wrap lines for you. When you're done, click OK. (Or
|
||||
hit Ctrl-Enter.</p>
|
||||
hit Ctrl+Enter.)</p>
|
||||
<p>When the dialog closes, you'll see your new comment in place at the
|
||||
top of the file. If you typed enough words, your comment will span
|
||||
multiple lines. You can select the comment by selecting any line in it.</p>
|
||||
@ -151,15 +154,17 @@ differences:</p>
|
||||
<ol>
|
||||
<li>You can't pick their line width, but you can pick their color.</li>
|
||||
<li>They don't appear in generated assembly sources, making them
|
||||
useful for leaving notes to yourself.</li>
|
||||
useful for leaving notes to yourself as you work.</li>
|
||||
<li>They're listed in the Notes window. Double-clicking them jumps
|
||||
the selection to the note, making them useful as bookmarks.</li>
|
||||
</ol>
|
||||
|
||||
<p>It's time to do something with the code. It's copying the instructions
|
||||
from $1017 to $2000, then jumping to $2000, so it looks like it's
|
||||
relocating the code before executing it. We want to do the same thing
|
||||
to our disassembled code, so select the line at address $1017 and then
|
||||
<p>It's time to do something with the code. If you look at what the code
|
||||
does you'll see that it's copying several dozen bytes from $1017
|
||||
to $2000, then jumping to $2000. It appears to be relocating the next
|
||||
part of the code before
|
||||
executing it. We want to let the disassembler know what's going on, so
|
||||
select the line at address $1017 and then
|
||||
Edit > Edit Address. (Or double-click the "1017" in the addr column.)
|
||||
In the Edit Address dialog, type "2000", and hit Enter.)</p>
|
||||
|
||||
@ -178,8 +183,8 @@ so you'll be forgiven if you reduce the offset column width to zero.)</p>
|
||||
<p>On the line at address $2000, select Actions > Edit Label, or
|
||||
double-click on the label "L2000". Change the label to "MAIN", and hit
|
||||
Enter. The label changes on that line, and on the two lines that refer
|
||||
to address $2000. (If you're not sure what refers to line $2000, check
|
||||
the References window.)</p>
|
||||
to address $2000. (If you're not sure what refers to line $2000, select
|
||||
it and check the References window.)</p>
|
||||
<p>On that same line, select Actions > Edit Comment. Type a short
|
||||
comment, and hit Enter. Your comment appears in the "comment" column.</p>
|
||||
|
||||
@ -215,12 +220,12 @@ Actions > Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are
|
||||
case-sensitive, so it needs to match the operand at $2005 exactly.) You'll
|
||||
see the new label appear, and the operand at line $2005 will use it.</p>
|
||||
<p>There's an easier way. Use Edit > Undo twice, to get back to the place
|
||||
where line $2005 is using "L2009" as it's operand. Select that line and
|
||||
where line $2005 is using "L2009" as its operand. Select that line and
|
||||
Actions > Edit Operand. Enter "IS_OK", then select "Create label at target
|
||||
address instead". Hit "OK".</p>
|
||||
<p>You should now see that both the operand at $2005 and the label at
|
||||
$2009 have changed to IS_OK, accomplishing what we wanted to do in a
|
||||
single step. (There's actually a sutble difference compared to the two-step
|
||||
single step. (There's actually a subtle difference compared to the two-step
|
||||
process: the operand at $2005 is still a numeric reference. It was
|
||||
automatically changed to match IS_OK in the same way that the references
|
||||
to MAIN were when we renamed "L2000" earlier. If you actually do want the
|
||||
@ -248,7 +253,7 @@ label to "STR1". Move up a bit and select address $2030, then scroll to
|
||||
the bottom and shift-click address $2070. Select Actions > Edit Data
|
||||
Format. At the top it should now say, "65 bytes selected in 2 groups".
|
||||
There are two groups because the presence of a label split the data into
|
||||
two separate regions. Selected "mixed ASCII and non-ASCII", then click
|
||||
two separate regions. Select "mixed ASCII and non-ASCII", then click
|
||||
"OK".</p>
|
||||
<p>We now have two ".STR" lines, one for "string zero ", one with the
|
||||
STR1 label and the rest of the string data. This is okay, but it's not
|
||||
@ -260,8 +265,8 @@ a single ".STR" line at the bottom, split across two lines with a '+'.</p>
|
||||
but that appears to be incorrect, so let's format it as individual bytes
|
||||
instead. There's an easy way to do that: use Actions > Toggle Single-Byte
|
||||
Format (or hit Ctrl+B).</p>
|
||||
<p>The data starting at $2025 appears to be 16-bit addresses into the
|
||||
table of strings, so let's format them appropriately.</p>
|
||||
<p>The data starting at $2025 appears to be 16-bit addresses that point
|
||||
into the table of strings, so let's format them appropriately.</p>
|
||||
<p>Select the line at $2025, then shift-click the line at $202E. Select
|
||||
Actions > Edit Data Format. If you selected the correct set of bytes,
|
||||
the top should say, "10 bytes selected". Click the
|
||||
@ -277,7 +282,7 @@ on their own line, so each string is now in a separate ".STR" statement.
|
||||
|
||||
<h3>Generating Assembly Code</h3>
|
||||
|
||||
<p>You can generate asssembly source code from the disassembled data.
|
||||
<p>You can generate assembly source code from the disassembled data.
|
||||
Select File > Assembler (or hit Ctrl+Shift+A) to open the generation
|
||||
and assembly dialog.</p>
|
||||
<p>Pick your favorite assembler from the drop list at the top right,
|
||||
|
Loading…
Reference in New Issue
Block a user