1
0
mirror of https://github.com/fadden/6502bench.git synced 2024-12-01 22:50:35 +00:00

Various doc fixes

This commit is contained in:
Andy McFadden 2019-07-29 13:20:03 -07:00
parent 330b4a238a
commit 4aee3af089
3 changed files with 33 additions and 25 deletions

View File

@ -13,7 +13,11 @@
<h1>6502bench SourceGen: Instruction and Data Analysis</h1> <h1>6502bench SourceGen: Instruction and Data Analysis</h1>
<p><a href="index.html">Back to index</a></p> <p><a href="index.html">Back to index</a></p>
<p><i>This section discusses the internal workings of SourceGen. It is
not necessary to understand this to use the program.</i></p>
<h2><a name="analysis-process">Analysis Process</a></h2> <h2><a name="analysis-process">Analysis Process</a></h2>
<p>Analysis of the file data is a complex multi-step process. Some <p>Analysis of the file data is a complex multi-step process. Some
changes to the project, such as adding a code entry point hint or changes to the project, such as adding a code entry point hint or
changing the CPU selection, require a full re-analysis of instructions changing the CPU selection, require a full re-analysis of instructions
@ -42,16 +46,16 @@ method in <code>DisasmProject.cs</code>):</p>
attributes, or "anattribs", with one entry per byte in the file. attributes, or "anattribs", with one entry per byte in the file.
The Anattrib array tracks most of the state from here on. If we're The Anattrib array tracks most of the state from here on. If we're
doing a partial re-analysis, this step will just clone a copy of the doing a partial re-analysis, this step will just clone a copy of the
Anattrib array that was made at this point in a previous run. (This Anattrib array that was made at this point in a previous run. (The
step is described in more detail below.)</li> code analysis pass is described in more detail below.)</li>
<li>Apply user-specified labels to Anattribs.</li> <li>Apply user-specified labels to Anattribs.</li>
<li>Apply user-specified format descriptors. These are the instruction <li>Apply user-specified format descriptors. These are the instruction
and data operand formats.</li> and data operand formats.</li>
<li>Run the data analyzer. This looks for patterns in uncategorized <li>Run the data analyzer. This looks for patterns in uncategorized
data, and connects instruction and data operands to target offsets. data, and connects instruction and data operands to target offsets.
The "nearby label" stuff is handled here. All of the results are The "nearby label" stuff is handled here. All of the results are
stored in the Anattribs array. (This step is described in more stored in the Anattribs array. (The data analysis pass is described in
detail below.)</li> more detail below.)</li>
<li>Remove hidden labels from the symbol table. These are user-specified <li>Remove hidden labels from the symbol table. These are user-specified
labels that have been placed on offsets that are in the middle of an labels that have been placed on offsets that are in the middle of an
instruction or multi-byte data item. They can't be referenced, so we instruction or multi-byte data item. They can't be referenced, so we
@ -105,12 +109,12 @@ fill out the Anattribs.</p>
L1003 NOP L1003 NOP
</pre> </pre>
<p>We haven't formatted anything yet. The data analyzer sees that the <p>We haven't explicitly formatted anything yet. The data analyzer sees
JMP operand is inside the file, and has no label, so it creates an that the JMP operand is inside the file, and has no label, so it creates an
auto-label at offset +000003 and a format descriptor with a symbolic auto-label at offset +000003 and a format descriptor with a symbolic
operand reference to "L1003" at +000000.</p> operand reference to "L1003" at +000000.</p>
<p>Now we edit the label, changing L1003 to "FOO". This goes into the <p>Suppose we now edit the label, changing L1003 to "FOO". This goes into
project's "user label" list. The analyzer is the project's "user label" list. The analyzer is
run, and applies the new "user label" to the Anattrib array. The run, and applies the new "user label" to the Anattrib array. The
data analyzer finds the numeric reference in the JMP operand, and finds data analyzer finds the numeric reference in the JMP operand, and finds
a label at the target address, so it creates a symbolic operand reference a label at the target address, so it creates a symbolic operand reference
@ -119,8 +123,9 @@ in both places.</p>
<p>Even though the JMP operand changed from "L1003" to "FOO", the only <p>Even though the JMP operand changed from "L1003" to "FOO", the only
change actually written to the project file is the label edit. The change actually written to the project file is the label edit. The
contents of the Anattrib array are disposable, so it can be used to contents of the Anattrib array are disposable, so it can be used to
add labels and "fix up" numeric references. Generated labels and hold auto-generated labels and "fix up" numeric references. Labels and
format descriptors are never added to the project file.</p> format descriptors generated by SourceGen are never added to the
project file.</p>
<p>If the JMP operand were edited, a format descriptor would be added <p>If the JMP operand were edited, a format descriptor would be added
to the user-specified descriptor list. During the analysis pass it would to the user-specified descriptor list. During the analysis pass it would
@ -167,9 +172,10 @@ reformat them:</p>
<p>Each entry in the change set has "before" and "after" states for the <p>Each entry in the change set has "before" and "after" states for the
format descriptor at a specific offset. Only the state for the affected format descriptor at a specific offset. Only the state for the affected
offsets is included -- the program doesn't take a complete state snapshot offsets is included -- the program doesn't record the state of the full
(even with the RAM on a modern system that would add up quickly). When project after each change (even with the RAM on a modern system that would
undoing a change, before and after are simply reversed.</p> add up quickly). When undoing a change, before and after are simply
reversed.</p>
<h2><a name="code-analysis">Code Analysis</a></h2> <h2><a name="code-analysis">Code Analysis</a></h2>
@ -222,9 +228,9 @@ performed, we assume a transition to emulation mode (E=1).</p>
<p>There are three ways in which code can set a flag to a definite value:</p> <p>There are three ways in which code can set a flag to a definite value:</p>
<ol> <ol>
<li>By explicit instructions, like <code>SEC</code> or <li>With explicit instructions, like <code>SEC</code> or
<code>CLD</code>.</li> <code>CLD</code>.</li>
<li>By immediate-operand instructions. <code>LDA #$00</code> sets Z=1 <li>With immediate-operand instructions. <code>LDA #$00</code> sets Z=1
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li> and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
<li>By inference. For example, if we see a <code>BCC</code> instruction, <li>By inference. For example, if we see a <code>BCC</code> instruction,
we know that the carry will be clear at the branch target address, and we know that the carry will be clear at the branch target address, and
@ -274,8 +280,9 @@ time a JSR/JSL instruction is encountered, all loaded extension scripts
are offered a chance to act.</p> are offered a chance to act.</p>
<p>The first script that applies a format wins. Attempts to re-format <p>The first script that applies a format wins. Attempts to re-format
instructions or data will fail. This rule ensure that anything explicitly instructions or data that has already been formatted will fail. This rule
formatted by the user will not be overridden by a script.</p> ensures that anything explicitly formatted by the user will not be
overridden by a script.</p>
<p>If code jumps into a region that is marked as inline data, the <p>If code jumps into a region that is marked as inline data, the
branch will be ignored. If an extension script tries to flag bytes branch will be ignored. If an extension script tries to flag bytes

View File

@ -82,8 +82,8 @@ rather than generated by a compiler, which means it won't conform to a
standard set of conventions. However, most programmers will use standard set of conventions. However, most programmers will use
subroutines, which can be identified and analyzed in isolation. Subroutines subroutines, which can be identified and analyzed in isolation. Subroutines
are often interspersed with variable storage, referred to as a "stash". are often interspersed with variable storage, referred to as a "stash".
Variables may be single-byte or multi-byte, the latter typically Variables and constants may be single-byte or multi-byte, the latter
in little-endian byte order.</p> typically in little-endian byte order.</p>
<p>Much of the data in a typical program is read-only, often in the <p>Much of the data in a typical program is read-only, often in the
form of graphics or ASCII string data. Graphics can be difficult form of graphics or ASCII string data. Graphics can be difficult
@ -100,14 +100,15 @@ data ends and code resumes: 6502 instructions are variable-length, so if
the last byte of the data area appears to be a three-byte instruction, the last byte of the data area appears to be a three-byte instruction,
the first two bytes of the next instruction area will be gobbled up.</p> the first two bytes of the next instruction area will be gobbled up.</p>
<p>Some programmers will use a trick where they "embed" an instruction <p>To make things even more difficult (sometimes deliberately), programmers
will sometimes use a trick where they "embed" an instruction
inside another instruction. This allows code to branch to two different inside another instruction. This allows code to branch to two different
entry points, one of which will set a flag or load a register, and then entry points, one of which will set a flag or load a register, and then
continue on to common code.</p> continue on to common code.</p>
<p>Another trick is to embed "inline data" after a JSR or JSL instruction. <p>Another trick is to embed "inline data" after a JSR or JSL instruction.
The caller pulls the calling address off the stack, uses it to access The called subroutine pulls the caller's address off the stack, uses it to
the parameters, then pushes the address back on after modifying it to access the parameters, then pushes the address back on after modifying it to
point to an address past the inline data. This can be very confusing point to an address past the inline data. This can be very confusing
for the disassembler, which will try to interpret the inline data as for the disassembler, which will try to interpret the inline data as
instructions.</p> instructions.</p>
@ -293,7 +294,7 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
<p>The three bytes following the <code>JSR $bf00</code> should be hinted <p>The three bytes following the <code>JSR $bf00</code> should be hinted
as inline data, so that the code analyzer skips them and continues the as inline data, so that the code analyzer skips them and continues the
analysis at the <code>BCS</code>. Because you need to hint *every* byte analysis at the <code>BCS</code>. Because you need to hint <i>every</i> byte
of inline data, all bytes in a selected line will receive hints.</p> of inline data, all bytes in a selected line will receive hints.</p>
<p>If code branches into a region that is marked as inline data, the <p>If code branches into a region that is marked as inline data, the
branch will be ignored.</p> branch will be ignored.</p>
@ -660,7 +661,7 @@ use the shortest instruction possible.</p>
way. Some use opcode suffixes, others use operand prefixes, some way. Some use opcode suffixes, others use operand prefixes, some
allow both. You can configure how they appear in the allow both. You can configure how they appear in the
<a href="settings.html#app-settings">application settings</a>.</p> <a href="settings.html#app-settings">application settings</a>.</p>
<p>SourcGen will only add width disambiguators to opcodes or operands when <p>SourceGen will only add width disambiguators to opcodes or operands when
they are needed, with one exception: the opcode suffix for long they are needed, with one exception: the opcode suffix for long
(24-bit address) operations is always applied. This is done because some (24-bit address) operations is always applied. This is done because some
assemblers require it, insisting on "LDAL" rather than "LDA" for an assemblers require it, insisting on "LDAL" rather than "LDA" for an

View File

@ -45,7 +45,7 @@ limitations under the License.
<TextBlock FontSize="24" <TextBlock FontSize="24"
Text="{Binding ProgramVersionString, StringFormat={}Version {0}, Text="{Binding ProgramVersionString, StringFormat={}Version {0},
FallbackValue=Version X.Y.Z-alpha1}"/> FallbackValue=Version X.Y.Z-alpha1}"/>
<TextBlock Text="Copyright 2018 faddenSoft" Margin="0,30,0,0"/> <TextBlock Text="Copyright 2019 faddenSoft" Margin="0,30,0,0"/>
<TextBlock Text="Created by Andy McFadden"/> <TextBlock Text="Created by Andy McFadden"/>
</StackPanel> </StackPanel>