mirror of
https://github.com/fadden/6502bench.git
synced 2024-11-29 10:50:28 +00:00
Various doc fixes
This commit is contained in:
parent
330b4a238a
commit
4aee3af089
@ -13,7 +13,11 @@
|
||||
<h1>6502bench SourceGen: Instruction and Data Analysis</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<p><i>This section discusses the internal workings of SourceGen. It is
|
||||
not necessary to understand this to use the program.</i></p>
|
||||
|
||||
<h2><a name="analysis-process">Analysis Process</a></h2>
|
||||
|
||||
<p>Analysis of the file data is a complex multi-step process. Some
|
||||
changes to the project, such as adding a code entry point hint or
|
||||
changing the CPU selection, require a full re-analysis of instructions
|
||||
@ -42,16 +46,16 @@ method in <code>DisasmProject.cs</code>):</p>
|
||||
attributes, or "anattribs", with one entry per byte in the file.
|
||||
The Anattrib array tracks most of the state from here on. If we're
|
||||
doing a partial re-analysis, this step will just clone a copy of the
|
||||
Anattrib array that was made at this point in a previous run. (This
|
||||
step is described in more detail below.)</li>
|
||||
Anattrib array that was made at this point in a previous run. (The
|
||||
code analysis pass is described in more detail below.)</li>
|
||||
<li>Apply user-specified labels to Anattribs.</li>
|
||||
<li>Apply user-specified format descriptors. These are the instruction
|
||||
and data operand formats.</li>
|
||||
<li>Run the data analyzer. This looks for patterns in uncategorized
|
||||
data, and connects instruction and data operands to target offsets.
|
||||
The "nearby label" stuff is handled here. All of the results are
|
||||
stored in the Anattribs array. (This step is described in more
|
||||
detail below.)</li>
|
||||
stored in the Anattribs array. (The data analysis pass is described in
|
||||
more detail below.)</li>
|
||||
<li>Remove hidden labels from the symbol table. These are user-specified
|
||||
labels that have been placed on offsets that are in the middle of an
|
||||
instruction or multi-byte data item. They can't be referenced, so we
|
||||
@ -105,12 +109,12 @@ fill out the Anattribs.</p>
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>We haven't formatted anything yet. The data analyzer sees that the
|
||||
JMP operand is inside the file, and has no label, so it creates an
|
||||
<p>We haven't explicitly formatted anything yet. The data analyzer sees
|
||||
that the JMP operand is inside the file, and has no label, so it creates an
|
||||
auto-label at offset +000003 and a format descriptor with a symbolic
|
||||
operand reference to "L1003" at +000000.</p>
|
||||
<p>Now we edit the label, changing L1003 to "FOO". This goes into the
|
||||
project's "user label" list. The analyzer is
|
||||
<p>Suppose we now edit the label, changing L1003 to "FOO". This goes into
|
||||
the project's "user label" list. The analyzer is
|
||||
run, and applies the new "user label" to the Anattrib array. The
|
||||
data analyzer finds the numeric reference in the JMP operand, and finds
|
||||
a label at the target address, so it creates a symbolic operand reference
|
||||
@ -119,8 +123,9 @@ in both places.</p>
|
||||
<p>Even though the JMP operand changed from "L1003" to "FOO", the only
|
||||
change actually written to the project file is the label edit. The
|
||||
contents of the Anattrib array are disposable, so it can be used to
|
||||
add labels and "fix up" numeric references. Generated labels and
|
||||
format descriptors are never added to the project file.</p>
|
||||
hold auto-generated labels and "fix up" numeric references. Labels and
|
||||
format descriptors generated by SourceGen are never added to the
|
||||
project file.</p>
|
||||
|
||||
<p>If the JMP operand were edited, a format descriptor would be added
|
||||
to the user-specified descriptor list. During the analysis pass it would
|
||||
@ -167,9 +172,10 @@ reformat them:</p>
|
||||
|
||||
<p>Each entry in the change set has "before" and "after" states for the
|
||||
format descriptor at a specific offset. Only the state for the affected
|
||||
offsets is included -- the program doesn't take a complete state snapshot
|
||||
(even with the RAM on a modern system that would add up quickly). When
|
||||
undoing a change, before and after are simply reversed.</p>
|
||||
offsets is included -- the program doesn't record the state of the full
|
||||
project after each change (even with the RAM on a modern system that would
|
||||
add up quickly). When undoing a change, before and after are simply
|
||||
reversed.</p>
|
||||
|
||||
|
||||
<h2><a name="code-analysis">Code Analysis</a></h2>
|
||||
@ -222,9 +228,9 @@ performed, we assume a transition to emulation mode (E=1).</p>
|
||||
|
||||
<p>There are three ways in which code can set a flag to a definite value:</p>
|
||||
<ol>
|
||||
<li>By explicit instructions, like <code>SEC</code> or
|
||||
<li>With explicit instructions, like <code>SEC</code> or
|
||||
<code>CLD</code>.</li>
|
||||
<li>By immediate-operand instructions. <code>LDA #$00</code> sets Z=1
|
||||
<li>With immediate-operand instructions. <code>LDA #$00</code> sets Z=1
|
||||
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
|
||||
<li>By inference. For example, if we see a <code>BCC</code> instruction,
|
||||
we know that the carry will be clear at the branch target address, and
|
||||
@ -274,8 +280,9 @@ time a JSR/JSL instruction is encountered, all loaded extension scripts
|
||||
are offered a chance to act.</p>
|
||||
|
||||
<p>The first script that applies a format wins. Attempts to re-format
|
||||
instructions or data will fail. This rule ensure that anything explicitly
|
||||
formatted by the user will not be overridden by a script.</p>
|
||||
instructions or data that has already been formatted will fail. This rule
|
||||
ensures that anything explicitly formatted by the user will not be
|
||||
overridden by a script.</p>
|
||||
|
||||
<p>If code jumps into a region that is marked as inline data, the
|
||||
branch will be ignored. If an extension script tries to flag bytes
|
||||
|
@ -82,8 +82,8 @@ rather than generated by a compiler, which means it won't conform to a
|
||||
standard set of conventions. However, most programmers will use
|
||||
subroutines, which can be identified and analyzed in isolation. Subroutines
|
||||
are often interspersed with variable storage, referred to as a "stash".
|
||||
Variables may be single-byte or multi-byte, the latter typically
|
||||
in little-endian byte order.</p>
|
||||
Variables and constants may be single-byte or multi-byte, the latter
|
||||
typically in little-endian byte order.</p>
|
||||
|
||||
<p>Much of the data in a typical program is read-only, often in the
|
||||
form of graphics or ASCII string data. Graphics can be difficult
|
||||
@ -100,14 +100,15 @@ data ends and code resumes: 6502 instructions are variable-length, so if
|
||||
the last byte of the data area appears to be a three-byte instruction,
|
||||
the first two bytes of the next instruction area will be gobbled up.</p>
|
||||
|
||||
<p>Some programmers will use a trick where they "embed" an instruction
|
||||
<p>To make things even more difficult (sometimes deliberately), programmers
|
||||
will sometimes use a trick where they "embed" an instruction
|
||||
inside another instruction. This allows code to branch to two different
|
||||
entry points, one of which will set a flag or load a register, and then
|
||||
continue on to common code.</p>
|
||||
|
||||
<p>Another trick is to embed "inline data" after a JSR or JSL instruction.
|
||||
The caller pulls the calling address off the stack, uses it to access
|
||||
the parameters, then pushes the address back on after modifying it to
|
||||
The called subroutine pulls the caller's address off the stack, uses it to
|
||||
access the parameters, then pushes the address back on after modifying it to
|
||||
point to an address past the inline data. This can be very confusing
|
||||
for the disassembler, which will try to interpret the inline data as
|
||||
instructions.</p>
|
||||
@ -293,7 +294,7 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
||||
|
||||
<p>The three bytes following the <code>JSR $bf00</code> should be hinted
|
||||
as inline data, so that the code analyzer skips them and continues the
|
||||
analysis at the <code>BCS</code>. Because you need to hint *every* byte
|
||||
analysis at the <code>BCS</code>. Because you need to hint <i>every</i> byte
|
||||
of inline data, all bytes in a selected line will receive hints.</p>
|
||||
<p>If code branches into a region that is marked as inline data, the
|
||||
branch will be ignored.</p>
|
||||
@ -660,7 +661,7 @@ use the shortest instruction possible.</p>
|
||||
way. Some use opcode suffixes, others use operand prefixes, some
|
||||
allow both. You can configure how they appear in the
|
||||
<a href="settings.html#app-settings">application settings</a>.</p>
|
||||
<p>SourcGen will only add width disambiguators to opcodes or operands when
|
||||
<p>SourceGen will only add width disambiguators to opcodes or operands when
|
||||
they are needed, with one exception: the opcode suffix for long
|
||||
(24-bit address) operations is always applied. This is done because some
|
||||
assemblers require it, insisting on "LDAL" rather than "LDA" for an
|
||||
|
@ -45,7 +45,7 @@ limitations under the License.
|
||||
<TextBlock FontSize="24"
|
||||
Text="{Binding ProgramVersionString, StringFormat={}Version {0},
|
||||
FallbackValue=Version X.Y.Z-alpha1}"/>
|
||||
<TextBlock Text="Copyright 2018 faddenSoft" Margin="0,30,0,0"/>
|
||||
<TextBlock Text="Copyright 2019 faddenSoft" Margin="0,30,0,0"/>
|
||||
<TextBlock Text="Created by Andy McFadden"/>
|
||||
</StackPanel>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user