1
0
mirror of https://github.com/fadden/6502bench.git synced 2026-04-20 19:16:34 +00:00

Relocate manual

Move the SourceGen manual to a subdirectory in "docs", so that it can
be accessed directly from the 6502bench web site.  The place where
it's installed in the distribution doesn't change.
This commit is contained in:
Andy McFadden
2021-10-08 08:43:12 -07:00
parent a395909574
commit ed4cc84782
15 changed files with 2 additions and 0 deletions
+495
View File
@@ -0,0 +1,495 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Advanced Topics - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Advanced Topics</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="platform-symbols">Platform Symbol Files (.sym65)</a></h2>
<p>Platform symbol files contain lists of symbols, each of which has a
label and a value. SourceGen comes with a collection of symbols for
popular systems, but you can create your own. This can be handy if a
few different projects are coded against a common library.</p>
<p>If two symbols have the same value, the older symbol is replaced by
the newer one. This is why the order in which symbol files are loaded
matters.</p>
<p>Platform symbol files consist of comments, commands, and symbols.
Blank lines, and lines that begin with a semicolon (';'), are ignored. Lines
that begin with an asterisk ('*') are commands. Three are currently
defined:</p>
<ul>
<li><code>*SYNOPSIS</code> - a short summary of the file contents.</li>
<li><code>*TAG</code> - a tag string to apply to all symbols that follow
in this file.</li>
<li><code>*MULTI_MASK</code> - specify a mask for symbols that appear
at multiple addresses.</li>
</ul>
<p>Tags can be used by extension scripts to identify a subset of symbols.
The symbols are still part of the global set; the tag just provides a
way to extract a subset. Tags should be comprised of non-whitespace ASCII
characters. Tags are global, so use a long, descriptive string. If
<code>*TAG</code> is not followed by a string, the symbols that follow
are treated as untagged.</p>
<p>All other lines are symbols, which have the form:</p>
<pre>
LABEL {=|@|&lt;|&gt;} VALUE [WIDTH] [;COMMENT]
</pre>
<p>The LABEL must be at least two characters long, begin with a letter or
underscore, and consist entirely of alphanumeric ASCII characters
(A-Z, a-z, 0-9) and the underscore ('_'). (This is the same format
required for line labels in SourceGen.)</p>
<p>The next token can be one of:</p>
<ul>
<li>@: general addresses</li>
<li>&lt;: read-only addresses</li>
<li>&gt;: write-only addresses</li>
<li>=: constants</li>
</ul>
<p>If an instruction references an address, and that address is outside
the bounds of the file, the list of address symbols (i.e. everything
that's not a constant) will be scanned for a match.
If found, the symbol is applied automatically. You normally want to
use '@', but can use '&lt;' and '&gt;' for memory-mapped I/O locations
that have different behavior depending on whether they are read or
written.</p>
<p>The VALUE is a number in decimal, hexadecimal (with a leading '$'), or
binary (with a leading '%'). The numeric base will be recorded and used when
formatting the symbol in generated output, so use whichever form is most
appropriate. Values are unsigned 24-bit numbers. The special value
"erase" may be used for an address to erase a symbol defined in an earlier
platform file.</p>
<p>The WIDTH is optional, and ignored for constants. It must be a
decimal or hexadecimal value between 1 and 65536, inclusive. If omitted,
the default width is 1.</p>
<p>The COMMENT is optional. If present, it will be saved and used as the
end-of-line comment on the .EQ directive if the symbol is used.</p>
<h4>Using MULTI_MASK</h4>
<p>The multi-address mask is used for systems like the Atari 2600, where
RAM, ROM, and I/O registers appear at multiple addresses. The hardware
looks for certain address lines to be set or clear, and if the pattern
matches, another set of bits is examined to determine which register or
RAM address is being accessed.</p>
<p>This is expressed in symbol files with the MULTI_MASK statement.
Address symbol declarations that follow have the mask set applied. Symbols
whose addresses don't fit the pattern cause a warning and will be
ignored. Constants are not affected.</p>
<p>The mask set is best explained with an example. Suppose the address
pattern for a set of registers is <code>???0 ??1? 1??x xxxx</code>
(where '?' can be any value, 0/1 must be that value, and 'x' means the bit
is used to determine the register).
So any address between $0280-029F matches, as does $23C0-23DF, but
$0480 and $1280 don't. The register number is found in the low five bits.</p>
<p>The corresponding MULTI_MASK line, with values specifed in binary,
would be:</p>
<pre> *MULTI_MASK %0001001010000000 %0000001010000000 %0000000000011111</pre>
<p>The values are CompareMask, CompareValue, and AddressMask. To
determine if an address is in the register set, we check to see if
<code>(address &amp; CompareMask) == CompareValue</code>. If so, we can
extract the register number with <code>(address &amp; AddressMask)</code>.</p>
<p>We don't want to have a huge collection of equates at the top of the
generated source file, so whatever value is used in the symbol declaration
is considered the "canonical" value. All other matching values are output
with an offset.</p>
<p>All mask values must fall between 0 and $00FFFFFF. The set bits in
CompareMask and AddressMask must not overlap, and CompareValue must not
have any bits set that aren't also set in CompareMask.</p>
<p>If an address can be mapped to a masked value and an unmasked value,
the unmasked value takes precedence for exact matches. In the example
above, if you declare <code>REG1 @ $0281</code> outside the MULTI_MASK
declaration, the disassembler will use <code>REG1</code> for all operands
that reference $0281. If other code accesses the same register as $23C1,
the symbol established for the masked value will be used instead.</p>
<p>If there are multiple masked values for a given address, the precedence
is undefined.</p>
<p>To disable the MULTI_MASK and resume normal declarations, write the
tag without arguments:
<pre> *MULTI_MASK</pre></p>
<h3>Creating a Project-Specific Symbol File</h3>
<p>To create a platform symbol file for your project, just create a new
text file, named with a ".sym65" extension. (If your text editor of choice
doesn't like that, you can put a ".txt" on the end while you're editing.)
Make sure you create it in the same directory where your project file
(the file that ends with ".dis65") lives. Add a <code>*SYNOPSIS</code>,
then add the desired symbols.</p>
<p>Finally, add it to your project. Select Edit &gt; Project Properties,
switch to the Symbol Files tab, click Add Symbol Files from Project, and
select your symbol file. It should appear in the list with a
"PROJ:" prefix.</p>
<p>If an example helps, the A2-Amper-fdraw project in the Examples
directory has a project-local symbol file, called "fdraw-exports".
(fdraw-exports is a list of exported symbols from the fdraw library,
for which Amper-fdraw provides an Applesoft BASIC interface.)
<p>NOTE: in the current version of SourceGen, changes to .sym65 files are
not detected automatically. Use File &gt; Reload External Files to
import the changes.</p>
<h2><a name="extension-scripts">Extension Scripts</a></h2>
<p>Extension scripts, also called "plugins", are C# programs with access to
the full .NET Standard 2.0 APIs. They're compiled at run time by SourceGen
and executed in a sandbox with security restrictions.</p>
<p>SourceGen defines an interface that plugins must implement, and an
interface that plugins can use to interact with SourceGen. See
Interfaces.cs in the PluginCommon directory.</p>
<p>The current interfaces can be used to generate visualizations, to
identify inline data that follows JSR, JSL, or BRK instructions, and to
format operands. The latter can be used to format code and data, e.g.
replacing immediate load operands with symbolic constants.</p>
<p>Scripts may be loaded from the RuntimeData directory, or from the directory
where the project file lives. Attempts to load them from other locations
will fail.</p>
<p>A project may load multiple scripts. The order in which they are
invoked is not defined.</p>
<h4>Known Issues and Limitations</h4>
<p>Scripts are currently limited to C# version 5, because the compiler
built into .NET only handles that. C# 6 and later require installing an
additional package ("Roslyn"), so SourceGen does not support this.</p>
<p>When a project is opened, any errors encountered by the script compiler
are reported to the user. If the project is already open, and a script
is added to the project through the Project Properties editor, compiler
messages are silently discarded. (This also applies if you undo/redo across
the property edit.) Use File &gt; Reload External Files to see the
compiler messages.</p>
<h4>Development</h4>
<p>The easiest way to develop extension scripts is inside the 6502bench
solution in Visual Studio. This way you have the interfaces available
for IntelliSense completion, and get all the usual syntax and compile
checking in the editor. (This is why there's a RuntimeData project for
Visual Studio.)</p>
<p>If you have the solution configured for debug builds, SourceGen will pass
<code>IncludeDebugInformation=true</code> to the script compiler. This
causes a .PDB file to be created. While this can help with debugging,
it can sometimes get in the way: if you edit the script source code and
reload the project without restarting the app, SourceGen will recompile
the script, but the old .PDB file will still be open by VisualStudio
and you'll see some failure messages. Exiting and restarting SourceGen
will allow regeneration of the PDB files.</p>
<p>Some commonly useful functions are defined in the
<code>PluginCommon.Util</code> class, which is available to plugins. These
call into the CommonUtil library, which is shared with SourceGen.
While plugins could use CommonUtil directly, they should avoid doing so. The
APIs there are not guaranteed to be stable, so plugins that rely on them
may break in a subsequent release of SourceGen.</p>
<h4>PluginDllCache Directory</h4>
<p>Extension scripts are compiled into .DLLs, and saved in the PluginDllCache
directory, which lives next to the application executable and RuntimeData.
If the extension script is the same age or older than the DLL, SourceGen
will continue to use the existing DLL.</p>
<p>The DLL names are a combination of the script filename and script location.
The compiled name for "MyPlatform/MyScript.cs" in the RuntimeData directory
will be "RT_MyPlatform_MyScript.dll". For a project-specific script, it
would look like "PROJ_MyProject_MyScript.dll".</p>
<p>The PluginCommon and CommonUtil DLLs will be copied into the directory, so
that code in the sandbox has access to them.</p>
<p>The contents of the directory are generated as needed, and can be deleted
entirely whenever SourceGen isn't running.</p>
<h4>Sandboxing</h4>
<p>Extension scripts are executed in an App Domain sandbox. App domains are
a .NET feature that creates a partition inside the virtual machine, isolating
code. It still runs in the same address space, on the same threads, so the
isolation is only effective for "partially trusted" code that has been
declared safe by the bytecode verifier.</p>
<p>SourceGen disallows most actions, notably file access. An exception is
made for reading files from the directory where the plugin DLLs live, but
scripts are otherwise unable to read or write from the filesystem. (A
future version of SourceGen may provide an API that allows limited access
to data files.)</p>
<p>App domain security is not absolute. I don't really expect SourceGen to
be used as a malware vector, so there's no value in forcing scripts to
execute in an isolated server process, or to jump through the other hoops
required to really lock things down. I do believe there's value in
defining the API in such a way that we <b>could</b> implement full security if
circumstances change, so I'm using app domains as a way to keep the API
honest.</p>
<h2><a name="multi-bin">Working With Multiple Binaries</a></h2>
<p>Sometimes a program is split into multiple files on disk. They
may be all loaded at once, or some may be loaded into the same place
at different times. In such situations it's not uncommon for one
file to provide a set of interfaces that other files use. It's
useful to have symbols for these interfaces be available to all
projects.</p>
<p>There are two ways to do this: (1) define a common platform symbol
file with the relevant addresses, and keep it up to date as you work;
or (2) declare the labels as global and exported, and import them
as project symbols into the other projects.</p>
<p>Support for this is currently somewhat weak, requiring a manual
symbol-import step in every interested project. This step must be
repeated whenever the labels are updated.</p>
<p>A different but related problem is typified by arcade ROM sets,
where files are split apart because each file must be burned into a
separate PROM. All files are expected to be present in memory at
once, so there's no reason to treat them as separate projects. Currently,
the best way to deal with this is to concatenate the files into a single
file, and operate on that.</p>
<h2><a name="overlap">Overlapping Address Spaces</a></h2>
<p>Some programs use memory overlays, where multiple parts of the
code run in the same address in RAM. Others use bank switching to access
parts of the program that reside in separate physical RAM or ROM,
but appear at the same address. Nested address regions allow for a
variety of configurations, which can make address resolution complicated.</p>
<p>The general goal is to have references to an address resolve to
the "nearest" match. For example, consider a simple overlay:</p>
<pre>
.ADDRS $1000
JMP L1100
.ADDRS $1100
L1100 BIT L1100
L1103 LDA #$11
BRA L1103
.ADREND
.ADDRS $1100
L1100_0 BIT L1100_0
L1103_0 LDA #$22
JMP L1103_0
.ADREND
.ADREND
</pre>
<p>Both sections start at $1100, and have branches to $1103. The branch
in the first section resolves to the label in the first version of
that address chunk, while the branch in the second section resolves to
the label in the second chunk. When branches originate outside the current
address chunk, the first chunk that includes that address is used, as
it is with the <code>JMP $1000</code> at the start of the file.</p>
<p>The full address-to-offset algorithm is as follows.
There are two inputs: the file offset of the instruction or data item
that has the reference (e.g. the JMP or LDA), and the address
it is referring to.</p>
<ul>
<li>Create a tree with all address regions. Each "node" in the tree
has an offset, length, and start address.</li>
<li>Search the tree for a node that includes the offset of the
reference source.
When there are multiple overlapping regions, descend until the
deepest child that spans the offset is found. This node will be
the starting point of the search.</li>
<li>Loop until we hit the top of the tree:
<ul>
<li>Perform a recursive depth-first search of all children of the
current node. They're searched in order of ascending file offset.</li>
<li>If the address wasn't found in the children, check the current
node. If we find it here, return this node as the result.</li>
<li>Move up to the parent node.
</ul></li>
</ul>
<p>This searches all children and all siblings before checking the parent.
If we hit the top of the tree without finding a match, we conclude
that the reference is to an external address.</p>
<h2><a name="reloc-data">OMF Relocation Dictionaries</a></h2>
<p><i>This feature is considered experimental. Some features,
like cross-reference tracking, may not work correctly with it.</i></p>
<p>65816 code can be tricky to disassemble for a number of reasons.
24-bit addresses are formed from 16-bit data-access operands by combining
with the Data Bank Register (DBR), which often requires a bit of manual
intervention. But the problems go beyond that. Consider the following
bit of source code for the Apple IIgs:</p>
<pre>
rsrcmsg pea rsrcmsg2|-16
pea rsrcmsg2
_WriteCString
lda #buffer
sta pArcRead+$04
lda #buffer|-16
sta pArcRead+$06
</pre>
<p>In both cases we're referencing a 24-bit address as two 16-bit values.
Without context, the disassembler will treat the PEA instruction as two
independent 16-bit addresses, and the immediate values as constants:</p>
<pre>
.dbank $02
02/317c: f4 02 00 L2317C pea L20002 & $ffff
02/317f: f4 54 32 pea L23254 & $ffff
02/3182: a2 0c 20 ldx #WriteCString
02/3185: 22 00 00 e1 jsl Toolbox
02/3189: a9 00 00 L23189 lda #$0000
02/318c: 8d 78 3f sta L23F78 & $ffff
02/318f: a9 03 00 lda #$0003
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
</pre>
<p>Worse yet, those <code>STA</code> instruction operands would have been
shown as hex values or incorrect labels if the DBR had been set incorrectly.
However, if we have the relocation data, we know the full
address from which the addresses were formed, and we can tell when
immediate values are addresses rather than constants. And we can do this
even without setting the DBR.</p>
<pre>
02/317c: f4 02 00 L2317C pea L23254 >> 16
02/317f: f4 54 32 pea L23254 & $ffff
02/3182: a2 0c 20 ldx #WriteCString
02/3185: 22 00 00 e1 jsl Toolbox
02/3189: a9 00 00 L23189 lda #L30000 & $ffff
02/318c: 8d 78 3f sta L23F78 & $ffff
02/318f: a9 03 00 lda #L30000 >> 16
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
</pre>
<p>The absence of relocation data can be a useful signal as well. For
example, when pushing arguments for a toolbox call, the disassembler
can tell the difference between addresses and constants without needing
emulation or pattern-matching, because only the addresses get
relocated. Consider this bit of source code:</p>
<pre>
lda &lt;total_records
pha
pea linebuf|-16
pea linebuf+65
pea $0005
pea $0000
_Int2Dec
</pre>
<p>Without relocation data, it becomes:</p>
<pre>
02/0aa8: a5 42 lda $42
02/0aaa: 48 pha
02/0aab: f4 02 00 pea L20002 & $ffff
02/0aae: f4 03 31 pea L23103 & $ffff
02/0ab1: f4 05 00 pea L20005 & $ffff
02/0ab4: f4 00 00 pea L20000 & $ffff
02/0ab7: a2 0b 26 ldx #Int2Dec
02/0aba: 22 00 00 e1 jsl Toolbox
</pre>
<p>If we treat the non-relocated operands as constants:</p>
<pre>
02/0aa8: a5 42 lda $42
02/0aaa: 48 pha
02/0aab: f4 02 00 pea L230C2 >> 16
02/0aae: f4 03 31 pea L23103 & $ffff
02/0ab1: f4 05 00 pea $0005
02/0ab4: f4 00 00 pea $0000
02/0ab7: a2 0b 26 ldx #Int2Dec
02/0aba: 22 00 00 e1 jsl Toolbox
</pre>
<h2><a name="debug">Debug Menu Options</a></h2>
<p>The DEBUG menu is hidden by default in release builds, but can be
exposed by checking the "enable DEBUG menu" box in the application
settings. These features are used for debugging SourceGen. They will
not help you debug 6502 projects.</p>
<p>Features:</p>
<ul>
<li>Re-analyze (F5). Causes a full re-analysis. Useful if you think
the display is out of sync.</li>
<li>Source Generation Tests. Opens the regression test harness. See
<code>README.md</code> in the SGTestData directory for more information.
If the regression tests weren't included in the SourceGen distribution,
this will have nothing to do.</li>
<li>Show Analyzer Output. Opens a floating window with a text log from
the most recent analysis pass. The exact contents will vary depending
on how the verbosity level is configured internally. Debug messages
from extension scripts appear here.</li>
<li>Show Analysis Timers. Opens a floating window with a dump of
timer results from the most recent analysis pass. Times for individual
stages are noted, as are times for groups of functions. This
provides a crude sense of where time is being spent.</li>
<li>Show Undo/Redo History. Opens a floating window that lets you
watch the contents of the undo buffer while you work.</li>
<li>Extension Script Info. Shows a bit about the currently-loaded
extension scripts.</li>
<li>Show Comment Rulers. Adds a string of digits above every
multi-line comment (long comment, note). Useful for confirming that
the width limitation is being obeyed. These are added exactly
as shown, without comment delimiters, into generated assembly output,
which doesn't work out well if you run the assembler.</li>
<li>Disable Security Sandbox. Extension scripts are loaded and run in
a "sandbox" to prevent security issues. Setting this flag allows
them to execute with full permissions.
This setting is not persistent.</li>
<li>Disable Keep-Alive Hack. The hack sends a "ping" to the extension
script sandbox every 60 seconds. This seems to be required to avoid
an infrequently-encountered Windows bug. (See code for notes and
stackoverflow.com links.)
This setting is not persistent.</li>
<li>Reboot Security Sandbox. Discards the sandbox, creates a new one,
and reloads it. Only useful for exercising the sandbox code that
runs when the keep-alives are unsuccessful.</li>
<li>Applesoft to HTML. An experimental feature that formats an
Applesoft program as HTML.</li>
<li>Export Edit Commands. Outputs comments and notes in
SourceGen Edit Command format. This is an experimental feature.</li>
<li>Apply Edit Commands. Reads a file in SourceGen Edit Command
format and applies the commands.</li>
<li>Apply External Symbols. An experimental feature for turning platform
and project symbols into address labels. This will run through the list
of all symbols loaded from .sym65 files and find addresses that fall
within the bounds of the file. If it finds an address that is the start
of a code/data line and doesn't already have a user-supplied label,
and the platform symbol's label isn't already defined elsewhere, the
platform label will be applied. Useful when disassembling ROM images
or other code with an established set of public entry points.
(Tip: disable "analyze uncategorized data" from the project
properties editor first, as this will not set labels in the middle
of multi-byte data items.)</li>
</ul>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+340
View File
@@ -0,0 +1,340 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Instruction and Data Analysis - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Instruction and Data Analysis</h1>
<p><a href="index.html">Back to index</a></p>
<p><i>This section discusses the internal workings of SourceGen. It is
not necessary to understand this to use the program.</i></p>
<h2><a name="analysis-process">Analysis Process</a></h2>
<p>Analysis of the file data is a complex multi-step process. Some
changes to the project, such as adding a code start point or
changing the CPU selection, require a full re-analysis of instructions
and data. Other changes, such as adding or removing a label, don't
affect the code tracing and only require a re-analysis of the data areas.
And some changes, such as editing a comment, only require a refresh
of the displayed lines.</p>
<p>It should be noted that none of the analysis results are stored in
the project file. Only user-supplied data, such as the locations of
code entry points and label definitions, is written to the file. This
does create the possibility that two different users might get different
results when opening the same project file with different versions of
SourceGen, but these effects are expected to be minor.</p>
<p>The analyzer performs the following steps (see the <code>Analyze</code>
method in <code>DisasmProject.cs</code>):</p>
<ul>
<li>Reset the symbol table.</li>
<li>Merge platform symbols into the symbol table, loading the files
in order.</li>
<li>Merge project symbols into the symbol table, stomping on any
platform symbols that conflict.</li>
<li>Merge user label symbols into the table, stomping any previous
entries.</li>
<li>Run the code analyzer. The outcome of this is an array of analysis
attributes, or "anattribs", with one entry per byte in the file.
The Anattrib array tracks most of the state from here on. If we're
doing a partial re-analysis, this step will just clone a copy of the
Anattrib array that was made at this point in a previous run. (The
code analysis pass is described in more detail below.)</li>
<li>Apply user-specified labels to Anattribs.</li>
<li>Apply user-specified format descriptors. These are the instruction
and data operand formats.</li>
<li>Run the data analyzer. This looks for patterns in uncategorized
data, and connects instruction and data operands to target offsets.
The "nearby label" stuff is handled here. Auto-labels are generated
for references to internal addresses. All of the results are
stored in the Anattribs array. (The data analysis pass is described in
more detail below.)</li>
<li>Remove hidden labels from the symbol table. These are user-specified
labels that have been placed on offsets that are in the middle of an
instruction or multi-byte data item. They can't be referenced, so we
want to pull them out of the symbol table. (Remember, symbolic
operands use "weak references", so a missing symbol just means the
operand is shown as a hex value.)</li>
<li>Resolve references to local variables. This sets the operand symbol
in Anattrib so we won't try to apply platform/project symbols to
zero-page addresses. If we somehow ended up with a variable that has
the same as a user label, we rename the variable.</li>
<li>Resolve references to platform and project external symbols.
This sets the operand symbol in Anattrib, and adds the symbol to
the list that is displayed in .EQ directives.</li>
<li>Generate cross-reference lists. This is done for internal references,
for local variables, and for any platform/project symbols that are
referenced.</li>
<li>If annotated auto-labels are enabled, the simple labels are
replaced with the annotated versions here. (This can't be done earlier
because the annotations are generated from the cross-reference data.)</li>
<li>In a debug build, some validity checks are performed.</li>
</ul>
<p>Once analysis is complete, a line-by-line display list is generated
by walking through the annotated file data. Most of the actual text
isn't rendered until they're needed. For complicated multi-line items
like string operands, the formatted text must be generated to know how
many lines it will occupy, so it's done immediately and cached for re-use
on subsequent runs.</p>
<h3><a name="auto-format">Automatic Formatting</a></h3>
<p>Every offset in the file is marked as an instruction byte, data byte, or
inline data byte. Some offsets are also marked as the start of an instruction
or data area. The start offsets may have a format descriptor associated
with them.</p>
<p>Format descriptors have a format (like "numeric" or
"null-terminated string") a sub-format (like "hexadecimal" or
"high ASCII"), and a length. For
an instruction operand the length is redundant, but for a data operand it
determines the width of the numeric value or length of the string. For
this reason, instructions do not need a format descriptor, but all
data items do.</p>
<p>Symbolic references are format descriptors with a symbol attached.
The symbol reference also specifies low/high/bank, for partial symbol
references like <code>LDA #&gt;symbol</code>.</p>
<p>Every offset marked as a start point gets its own line in the on-screen
display list. Embedded instructions are identified internally by
looking for instruction-start offsets inside instructions.</p>
<p>The Anattrib array holds the post-analysis state for every offset,
including comments and formatting, but any changes you make in the
editors are applied to the data structures that are saved in the project
file. After a change is made, a full or partial re-analysis is done to
fill out the Anattribs.</p>
<p>Consider a simple example:</p>
<pre>
.ADDRS $1000
JMP L1003
L1003 NOP
</pre>
<p>We haven't explicitly formatted anything yet. The data analyzer sees
that the JMP operand is inside the file, and has no label, so it creates an
auto-label at offset +000003 and a format descriptor with a symbolic
operand reference to "L1003" at +000000.</p>
<p>Suppose we now edit the label, changing L1003 to "FOO". This goes into
the project's "user label" list. The analyzer is
run, and applies the new "user label" to the Anattrib array. The
data analyzer finds the numeric reference in the JMP operand, and finds
a label at the target address, so it creates a symbolic operand reference
to "FOO". When the display list is generated, the symbol "FOO" appears
in both places.</p>
<p>Even though the JMP operand changed from "L1003" to "FOO", the only
change actually written to the project file is the label edit. The
contents of the Anattrib array are disposable, so it can be used to
hold auto-generated labels and "fix up" numeric references. Labels and
format descriptors generated by SourceGen are never added to the
project file.</p>
<p>If the JMP operand were edited, a format descriptor would be added
to the user-specified descriptor list. During the analysis pass it would
be added to the Anattrib array at offset +000000.</p>
<h3><a name="undo-redo">Interaction With Undo/Redo</a></h3>
<p>The analysis pass always considers the current state of the user
data structures. Whether you're adding a label or removing one, the
code runs through the same set of steps. The advantage of this approach
is that the act of doing a thing, undoing a thing, and redoing a thing
are all handled the same way.</p>
<p>None of the editors modify the project data structures directly. All
changes are added to a change set, which is processed by a single
"apply changes" function. The change sets are kept in the undo/redo
buffer indefinitely. After
the changes are made, the Anattrib array and other data structures are
regenerated.</p>
<p>Data format editing can create some tricky situations. For example,
suppose you have 8 bytes that have been formatted as two 32-bit words:</p>
<pre>
1000: 68690074 .dd4 $74006968
1004: 65737400 .dd4 $00747365
</pre>
<p>You realize these are null-terminated strings, select both words, and
reformat them:</p>
<pre>
1000: 686900 .zstr "hi"
1003: 74657374+ .zstr "test"
</pre>
<p>Seems simple enough. Under the hood, SourceGen created three changes:</p>
<ol>
<li>At offset +000000, replace the current format descriptor (4-byte
numeric) with a 3-byte null-terminated string descriptor.</li>
<li>At offset +000003, add a new 5-byte null-terminated string
descriptor.</li>
<li>At offset +000004, remove the 4-byte numeric descriptor.</li>
</ol>
<p>Each entry in the change set has "before" and "after" states for the
format descriptor at a specific offset. Only the state for the affected
offsets is included -- the program doesn't record the state of the full
project after each change (even with the RAM on a modern system that would
add up quickly). When undoing a change, before and after are simply
reversed.</p>
<h2><a name="code-analysis">Code Analysis</a></h2>
<p>The code tracer walks through the instructions, examining them to
determine where execution will proceed next. There are five possibilities
for every instruction:</p>
<ol>
<li>Continue. Execution always continues at the next instruction.
Examples: <code>LDA</code>, <code>STA</code>, <code>AND</code>,
<code>NOP</code>.</li>
<li>Don't continue. The next instruction to be executed can't be
determined from the file data (unless you're disassembling the
system ROM around the BRK vector).
Examples: <code>RTS</code>, <code>BRK</code>.</li>
<li>Branch always. The operand specifies the next instruction address.
Examples: <code>JMP</code>, <code>BRA</code>, <code>BRL</code>.</li>
<li>Branch sometimes. Execution may continue at the operand address,
or may execute the following instruction. If we know the value of
the flags in the processor status register, we can eliminate one
possibility. Examples: <code>BCC</code>, <code>BEQ</code>,
<code>BVS</code>.</li>
<li>Call subroutine. Execution will continue at the operand address,
and is expected to also continue at the following instruction.
Examples: <code>JSR</code>, <code>JSL</code>.</li>
</ol>
<p>Branch targets are added to a list. When the current run of instructions
is exhausted (i.e. a "don't continue" or "branch always" instruction is
reached), the next target is pulled off of the list.</p>
<p>The state of the processor status flags is recorded for every
instruction. When execution proceeds to the next instruction or branches
to a new address, the flags are merged with the flags at the new
location. If one execution path through a given address has the flags
in one state (say, the carry is clear), while another execution path
sees a different state (carry is set), the merged flag is
"indeterminate". Indeterminate values cannot become determinate through
a merge, but can be set by an instruction.</p>
<p>There can be multiple paths to a single address. If the analyzer
sees that an instruction has been visited before, with an identical set
of status flags, the analyzer stops pursuing that path.</p>
<p>The analyzer must always know the width of immediate load instructions
when examining 65816 code, but it's possible for the status flag values
to be indeterminate. In such a situation, short registers are assumed.
Similarly, if the carry flag is unknown when an <code>XCE</code> is
performed, we assume a transition to emulation mode (E=1).</p>
<p>There are three ways in which code can set a flag to a definite value:</p>
<ol>
<li>With explicit instructions, like <code>SEC</code> or
<code>CLD</code>.</li>
<li>With immediate-operand instructions. <code>LDA #$00</code> sets Z=1
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
<li>By inference. For example, if we see a <code>BCC</code> instruction,
we know that the carry will be clear at the branch target address, and
set at the following instruction. The instruction doesn't affect the
value of the flag, but we know what the value will be at both
addresses.</li>
</ol>
<p>Self-modifying code can spoil any of these, possibly requiring a
status flag override to get correct disassembly.</p>
<p>The instruction that is most likely to cause problems is <code>PLP</code>,
which pulls the processor status flags off of the stack. SourceGen
doesn't try to track stack contents, so it can't know what values may
be pulled. In many cases the <code>PLP</code> appears not long after a
<code>PHP</code>, so SourceGen can scan backward through the file to
find the nearest <code>PHP</code>, and use the status flags from that.
In practice this doesn't work well, but the "smart" behavior can be
enabled from the project properties if desired. Otherwise, a
<code>PLP</code> causes all flags to be set to "indeterminate", except
for the M/X flags on the 65816 which are left unmodified.</p>
<p>Some other things that the code analyzer can't recognize automatically:</p>
<ul>
<li>Jumping indirectly through an address outside the file, e.g.
storing an address in zero-page memory and jumping through it.</li>
<li>Jumping to an address by pushing the location onto the stack,
then executing an <code>RTS</code>.</li>
<li>Self-modifying code, e.g. overwriting a <code>JMP</code> instruction.</li>
<li>Addresses invoked by external code, e.g. interrupt handlers.</li>
</ul>
<p>Sometimes the indirect jump targets are coming from a table of
addresses in the file. If so, these can be formatted as addresses,
and then the target locations tagged as code entry points.</p>
<p>The 65816 adds an additional twist: some instructions combine their
operands with the Data Bank Register ("B") to form a 24-bit address.
SourceGen can't automatically determine what the register holds, so it
assumes that it's equal to the program bank register ("K"), and provides
a way to override the value.</p>
<h3><a name="extension-scripts">Extension Scripts</a></h3>
<p>Extension scripts can mark data that follows a JSR, JSL, or BRK as inline
data, or change the format of nearby data or instructions. The first
time a JSR/JSL/BRK instruction is encountered, all loaded extension scripts
that implement the appropriate interface are offered a chance to act.</p>
<p>The first script that applies a format wins. Attempts to re-format
instructions or data that have already been formatted will fail. This rule
ensures that anything explicitly formatted by the user will not be
overridden by a script.</p>
<p>If code jumps into a region that is marked as inline data, the
branch will be ignored. If an extension script tries to flag bytes
as inline data that have already been executed, the script will be
ignored. This can lead to a race condition in the analyzer if
an extension script is doing the wrong thing. (The race doesn't exist
with inline data tags specified by the user, because those are applied
before code analysis starts.)</p>
<h2><a name="data-analysis">Data Analysis</a></h2>
<p>The data analyzer performs two tasks. It matches operands with
offsets, and it analyzes uncategorized data. This behavior can be
modified in the
<a href="settings.html#project-properties">project properties</a>.</p>
<p>The data target analyzer examines every instruction and data operand
to see if it's referring to an offset within the data file. If the
target is within the file, and has a label, a format descriptor with a
weak symbolic reference to that label is added to the Anattrib array. If
the target doesn't have a label, the analyzer will either use a nearby
label, or generate a unique label and use that.</p>
<p>While most of the "nearby label" logic can be disabled, targets that
land in the middle of an instruction are always adjusted backward to
the instruction start. This is necessary because labels are only visible
if they're associated with the first (opcode) byte of an instruction.</p>
<p>The uncategorized data analyzer tries to find character strings and
opportunities to use the ".FILL" operation. It breaks the file into
pieces, where contiguous regions hold nothing but data, are not split
across address region start/end directives, are not interrupted by data,
and do not contain anything that the user has chosen to format. Each
region is scanned for matching patterns. If a match is found, a format entry
is added to the Anattrib array. Otherwise, data is added as single-byte
values.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+415
View File
@@ -0,0 +1,415 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Code Generation &amp; Assembly - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Code Generation &amp; Assembly</h1>
<p><a href="index.html">Back to index</a></p>
<p>SourceGen can generate an assembly source file that, when fed into
the target assembler, will recreate the original data file exactly.
Every assembler is different, so support must be added to SourceGen
for each.</p>
<p>The generation / assembly dialog can be opened with File &gt; Assemble.</p>
<p>If you want to show code to others, perhaps by adding a page to
your web site, you can "export" the formatted code as text or HTML.
This is explained in more detail <a href="#export-source">below</a>.
<h2><a name="generate">Generating Source Code</a></h2>
<p>Cross assemblers tend to generate additional files, either compiler
intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
generators may produce multiple source files, perhaps a link script or
symbol definition header to go with the assembly source. To avoid
spreading files across the filesystem, SourceGen does all of its work
in the same directory where the project lives. Before you can generate
code, you have to have assigned your project a directory. This is why
you can't assemble a project until you've saved it for the first time.</p>
<p>The Generate and Assemble dialog has a drop-down list near the top
that lets you pick which assembler to target. The name of the assembler
will be shown with the detected version number. If the assembler
executable isn't configured, "[latest version]" will be shown instead
of a version number.</p>
<p>The Settings button will take you directly to the assembler configuration
tab in the application settings dialog.</p>
<p>Hit the Generate button to generate the source code into a file on disk.
The file will use the project name, with the <code>.dis65</code> extension
replaced by <code>_&lt;assembler&gt;.S</code>.</p>
<p>The first 64KiB of each generated file will be shown in the preview
window. If multiple files were generated, you can use the "preview file"
drop-down to select between them. Line numbers are
prepended to each line to make it easier to track down errors.</p>
<h3><a name="localizer">Label Localizer</a></h3>
<p>The label localizer is an optional feature that automatically converts
some labels to an assembler-specific less-than-global label format. Local
labels may be reusable (e.g. using "]LOOP" for multiple consecutive
loops is easier to understand than giving each one a unique label) or
reduce the size of a generated link table. There are usually restrictions
on local labels, e.g. references to them may not be allowed to cross a
global label definition, which the localizer factors in automatically.</p>
<h3><a name="reserved-labels">Reserved Label Names</a></h3>
<p>Some label names aren't allowed. For example, 64tass reserves the
use of labels that begin with two underscores. Most assemblers will
also prevent you from using opcode mnemonics as labels (which means
you can't assemble <code>jmp jmp jmp</code>).</p>
<p>If a label doesn't appear to be legal, the generated code will use
a suitable replacement (e.g. <code>jmp_1 jmp jmp_1</code>).</p>
<h3><a name="platform-features">Platform-Specific Features</a></h3>
<p>SourceGen needs to be able to assemble binaries for any system
with any assembler, so it generally avoids platform-specific features.
One exception to that is C64 PRG files.</p>
<p>PRG files start with a 16-bit value that tells the OS where the
rest of the file should be loaded. The value is not usually part of
the source code, but instead is generated by the assembler, based on
the address of the first byte output. If SourceGen detects that
a file is PRG, the source generators for some assemblers will suppress
the first 2 bytes, and instead pass appropriate meta-data (such as
an additional command-line option) to the assembler.</p>
<p>A file is treated as a PRG if:</p>
<ul>
<li>it is between 3 and 65536 bytes long (inclusive)</li>
<li>the format at offset +000000 is a 16-bit numeric data item
(not executable code, not two 8-byte values, not the first part
of a 24-bit value, etc.)</li>
<li>there is an address region start directive at +000002
<li>the 16-bit value at +000000 is equal to the address of the
byte at +000002</li>
<li>there is no label at offset +000000 (explicit or auto-generated)</li>
</ul>
<p>The definition is sufficiently narrow to avoid most false-positives.
If a file is being treated as PRG and you'd rather it weren't, you
can add a label or reformat the bytes. This feature is currently only
enabled for 64tass.</p>
<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
<p>After generating sources, if you have a cross-assembler executable
configured, you can run it by clicking the "Run Assembler" button. The
command-line output will be displayed, with stdout and stderr separated.
(I'd prefer them to be interleaved, but that's not what the system
provides.)</p>
<p>The output will show the assembler's exit code, which will be zero
on success (note: sometimes they lie.) If it appeared to succeed,
SourceGen will then compare the assembler's output to the original file,
and report any differences.</p>
<p>Failures here may be due to bugs in the cross-assembler or in
SourceGen. However, SourceGen can generally work around assembler bugs,
so any failure is an opportunity for improvement.</p>
<h2><a name="supported">Supported Assemblers</a></h2>
<p>SourceGen currently supports the following cross-assemblers:</p>
<ul>
<li><a href="#64tass">64tass</a></li>
<li><a href="#acme">ACME</a></li>
<li><a href="#cc65">cc65</a></li>
<li><a href="#merlin32">Merlin 32</a></li>
</ul>
<h3><a name="version">Version-Specific Code Generation</a></h3>
<p>Code generation must be tailored to the specific version of the
assembler. This is most easily understood with an example.</p>
<p>If the code has a statement like <code>MVN #$01,#$02</code>, the
assembler is expected to output <code>54 02 01</code>, with the arguments
reversed. cc65 v2.17 got it backward; the behavior was fixed in v2.18. The
bug means we can't generate the same <code>MVN</code>/<code>MVP</code>
instructions for both versions of the assembler.</p>
<p>Having version-dependent source code is a bad idea. If we generated
reversed operands (<code>MVN #$02,#$01</code>), we'd get the correct
output with v2.17, but the wrong output for v2.18. Unambiguous code can
be generated for all versions of the assembler by just outputting raw hex
bytes, but that's ugly and annoying, so we don't want to be stuck doing
that forever. We want to detect which version of the assembler is in
use, and output actual <code>MVN</code>/<code>MVP</code> instructions
when producing code for newer versions of the assembler.</p>
<p>When you configure a cross-assembler, SourceGen runs the executable with
version query args, and extracts the version information from the output
stream. This is used by the generator to ensure that the output will compile.
If no assembler is configured, SourceGen will produce code optimized
for the latest version of the assembler.</p>
<h3><a name="quirks">Assembler-Specific Bugs &amp; Quirks</a></h3>
<p>This is a list of bugs and quirky behavior in cross-assemblers that
SourceGen works around when generating code.</p>
<p>Every assembler seems to have a different way of dealing with expressions.
Most of them will let you group expressions with parenthesis, but that
doesn't always help. For example, <code>PEA label &gt;&gt; 8 + 1</code> is
perfectly valid, but writing <code>PEA (label &gt;&gt; 8) + 1</code> will cause
most assemblers to assume you're trying to use an alternate (and non-existent)
form of <code>PEA</code> with indirect addressing, causing the assembler
to halt with an error message. The code generator needs
to understand expression syntax and operator precedence to generate correct
code, but also needs to know how to handle the corner cases.</p>
<h3><a name="64tass">64tass</a></h3>
<p>Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625
<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
<p>Bugs:</p>
<ul>
<li>[Fixed in v1.55.2176]
Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
<li>[Fixed in v1.55.2176] WDM is not supported.</li>
</ul>
<p>Quirks:</p>
<ul>
<li>The underscore character ('_') is allowed as a character in labels,
but when used as the first character in a label it indicates the
label is local. If you create labels with leading underscores that
are not local, the labels must be altered to start with some other
character, and made unique.</li>
<li>Labels starting with two underscores are "reserved". Trying to
use them causes an error.</li>
<li>By default, 64tass sets the first two bytes of the output file to
the load address. The <code>--nostart</code> flag is used to
suppress this.</li>
<li>By default, 64tass is case-insensitive, but SourceGen treats labels
as case-sensitive. The <code>--case-sensitive</code> flag must be passed
to the assembler.</li>
<li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
and operands must be lower-case. Most of the SourceGen options that
cause things to appear in upper case must be disabled.</li>
<li>For 65816, selecting the bank byte is done with the grave accent
character ('`') rather than the caret ('^'). (There's a note in the
docs to the effect that they plan to move to carets.)</li>
<li>Instructions whose argument is formed by combining with the
65816 Program Bank Register (16-bit JMP/JSR) must be specified
as 24-bit values for code that lives outside bank 0. This is
true for both symbols and raw hex (e.g. <code>JSR $1234</code>
is invalid outside bank 0). Attempting to JSR to a label in bank
0 from outside bank 0 causes an error, even though it is technically
a 16-bit operand.</li>
<li>The arguments to COP and BRK require immediate-mode syntax
(<code>COP #$03</code> rather than <code>COP $03</code>).
<li>For historical reasons, the default behavior of the assembler is to
assume that the source file is PETSCII, and the desired encoding for
strings is also PETSCII. No character conversion is done, so anybody
assembling ASCII files will get ASCII strings (which works out pretty
well if you're assembling code for a non-Commodore target). However,
the documentation says you're required to pass the "--ascii" flag when
the input is ASCII/UTF-8, so to build files that want ASCII operands
an explicit character encoding definition must be provided.</li>
</ul>
<h3><a name="acme">ACME</a></h3>
<p>Tested versions: v0.96.4, v0.97
<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
<p>Bugs:</p>
<ul>
<li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
outside bank zero cannot be assembled. SourceGen currently deals with
this by outputting the entire file as a hex dump.</li>
<li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
<li>BRK and WDM are not allowed to have operands.</li>
</ul>
<p>Quirks:</p>
<ul>
<li>The assembler shares some traits with one-pass assemblers. In
particular, if you forward-reference a zero-page label, the reference
generates a 16-bit absolute address instead of an 8-bit zero-page
address. Unlike other one-pass assemblers, the width is "sticky",
and backward references appearing later in the file also use absolute
addressing even though the proper width is known at that point. This is
worked around by using explicit "force zero page" annotations on
all references to zero-page labels.</li>
<li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
<code>ASR</code> instead.</li>
<li>Does not allow the accumulator to be specified explicitly as an
operand, e.g. you can't write <code>LSR A</code>.</li>
<li>[Fixed in v0.97.]
Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
before 8-bit operands.</li>
<li>Officially, the preferred file extension for ACME source code is ".a",
but this is already used on UNIX systems for static libraries (which
means shell filename completion tends to ignore them). Since ".S" is
pretty universally recognized as assembly source, code generated by
SourceGen for ACME also uses ".S".</li>
<li>Version 0.97 started interpreting '\' in strings as an escape
character, to allow C-style escapes like "\n". This requires escaping
all occurrences of '\' in data strings as "\\". Compiling an older
source file with a newer version of ACME may fail unless you pass
a backward-compatibility command-line argument.</li>
</ul>
<h3><a name="cc65">cc65</a></h3>
<p>Tested versions: v2.17, v2.18
<a href="https://cc65.github.io/">[web site]</a></p>
<p>Bugs:</p>
<ul>
<li>PC relative branches don't wrap around at bank boundaries.</li>
<li>BRK can only be given an argument in 65816 mode.</li>
<li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
<li>[Fixed in v2.18] <code>BRK &lt;arg&gt;</code> is assembled to opcode
$05 rather than $00.</li>
<li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
</ul>
<p>Quirks:</p>
<ul>
<li>Operator precedence is unusual. Consider <code>label &gt;&gt; 8 - 16</code>.
cc65 puts shift higher than subtraction, whereas languages like C
and assemblers like 64tass do it the other way around. So cc65
regards the expression as <code>(label &gt;&gt; 8) - 16</code>, while the
more common interpretation would be <code>label &gt;&gt; (8 - 16)</code>.
(This is actually somewhat convenient, since none of the expressions
SourceGen currently generates require parenthesis.)</li>
<li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS. All
other opcodes match up with the "unintended opcodes" document.</li>
<li>ca65 is implemented as a single-pass assembler, so label widths
can't always be known in time. For example, if you use some zero-page
labels, but they're defined via <code>.ORG $0000</code> after the point
where the labels are used, the assembler will already have generated them
as absolute values. Width disambiguation must be applied to operands
that wouldn't be ambiguous to a multi-pass assembler.</li>
<li>Assignment of constants and variables (<code>=</code> and
<code>.set</code>) ends local label scope, so the label localizer
has to take variable assignment into account.</li>
<li>The assembler is geared toward generating relocatable code with
multiple segments (it is, after all, an assembler for a C compiler).
A linker configuration script is expected to be provided for anything
complex. SourceGen generates a custom config file for each project.</li>
</ul>
<h3><a name="merlin32">Merlin 32</a></h3>
<p>Tested Versions: v1.0
<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
</p>
<p>Bugs:</p>
<ul>
<li>PC relative branches don't wrap around at bank boundaries.</li>
<li>For some failures, an exit code of zero is returned.</li>
<li>Immediate operands with a comma (e.g. <code>LDA #','</code>)
or curly braces (e.g. <code>LDA #'{'</code>) cause an error.</li>
<li>Some DP indexed store instructions cause errors if the label isn't
unambiguously DP (e.g. <code>STX $00,X</code> vs.
<code>STX $0000,X</code>). This isn't a problem with project/platform
symbols, which are output as two-digit hex values when possible, but
causes failures when direct page locations are included in the project
and given labels.</li>
<li>The check for 64KiB overflow appears to happen before instructions
that might be absolute or direct page are resolved and reduced in size.
This makes it unlikely that a full 64KiB bank of code can be
assembled.</li>
</ul>
<p>Quirks:</p>
<ul>
<li>Operator precedence is unusual. Expressions are generally processed
from left to right. The byte-selection operators have a lower
precedence than all of the others, and so are always processed last.</li>
<li>The byte selection operators ('&lt;', '&gt;', '^') are actually
word-selection operators, yielding 16-bit values when wide registers
are enabled on the 65816.</li>
<li>Values loaded into registers are implicitly mod 256 or 65536. There
is no need to explicitly mask an expression.</li>
<li>The assembler tracks register widths when it sees SEP/REP instructions,
but doesn't attempt to track the emulation flag. So if you issue a
<code>REP #$20</code>
while in emulation mode, the assembler will incorrectly assume long
registers. Ideally it would be possible to configure that off, but
there's no way to do that, so instead we occasionally generate
additional width directives.</li>
<li>Non-unique local labels should cause an error, but don't.</li>
<li>No undocumented opcodes are supported, nor are the Rockwell
65C02 instructions.</li>
</ul>
<h2><a name="export-source">Exporting Source Code</a></h2>
<p>The "export" function takes what you see in the code list in the app
and converts it to text or HTML. The options you've set in the app
settings, such as capitalization, text delimiters, pseudo-opcode names,
operand expression style, and display of cycle counts are all taken into
account. The file generated is not expected to work with an actual
assembler.</p>
<p>The text output is similar to what you'd get by copying lines to the
clipboard and pasting them into a text file, except that you have greater
control over which columns are included. The HTML version is augmented
with links and (optionally) images.</p>
<p>Use File &gt; Export to open the export dialog. You have several
options:</p>
<ul>
<li><b>Include only selected lines</b>. This allows you to choose between
exporting all or part of a file. If no lines are selected, the entire
file will exported. This setting does <b>not</b> affect link generation
for HTML output, so you may have some dead internal links if you don't
export the entire file.</li>
<li><b>Include notes</b>. Notes are normally excluded from generated
sources. Check this to include them.</li>
<li><b>Show &lt;Column&gt;</b>. The leftmost five columns are optional,
and will not appear in the output unless the appropriate option is
checked.</li>
<li><b>Column widths</b>. These determine the minimum widths of the
rightmost four columns. These are not hard limits: if the contents
of the column are too wide, the next column will start farther over.
The widths are not used at all for CSV output.</li>
<li><b>Text vs. CSV</b>. For text generation, you can choose between
plain text and Comma-Separated Value format. The latter is useful
for importing source code into another application, such as a
spreadsheet.</li>
<li><b>Generate image files</b>. When exporting to HTML, selecting this
will cause GIF images to be generated for visualizations.</li>
<li><b>Overwrite CSS file</b>. Some aspects of the HTML output's format
are defined by a file called "SGStyle.css", which may be shared between
multiple HTML files and customized. The file is copied out
of the RuntimeData directory without modification. It will be
created if it doesn't exist, but will not be overwritten unless this
box is checked. The setting is <b>not</b> sticky, and will revert
to unchecked. (Think of this as a proactive alternative to "are you
sure you wish to overwrite SGStyle.css?")</li>
</ul>
<p>Once you've picked your options, click either "Generate HTML" or
"Generate Text", then select an output file name from the standard file
dialog. Any additional files generated, such as graphics for HTML pages,
will be written to the same directory.</p>
<p>All output uses UTF-8 encoding. Filenames of HTML files will have '#'
replaced with '_' to make linking easier.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+467
View File
@@ -0,0 +1,467 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Editors - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Editors</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="address">Define Address Region</a></h2>
<p><a href="intro-details.html#address-regions">Address regions</a>
may be created, edited, resized, or removed. Which
operation is performed depends on the current selection. You can
specify the start and end points of a region by selecting the entire
region, or by selecting just the first and last lines.</p>
<p>In all cases, you can specify the range's initial address
as a hexadecimal value. You can prefix it with '$', but that's not
required.
24-bit addresses may be written with a bank separator, e.g. "12/3456"
would resolve to address $123456.
If you want to set the region to be non-addressable, enter
"<code>NA</code>".</p>
<p>You can also enter a <a href="intro-details.html#pre-labels">pre-label</a>
or specify that the operand should be formatted as a
<a href="intro-details.html#relative-addr">relative address</a>.
<p>To delete a region, click the "Delete Region" button.</p>
<h4>Create</h4>
<p>If your selection starts with a code or data line, the editor
will allow to create a new address region. If a single line was
selected, the default behavior will be to create a region with a
floating end point. If multiple lines were selected, the default
behavior will be to create a region with a fixed end point.</p>
<p>The address field will be initialized to the address of the
first selected line.</p>
<p>You can create a child region that shares the same start offset
as an existing region by selecting the first code or data line
within that region. Note that regions with floating end points cannot
have the same start offset as another region.</p>
<h4>Edit</h4>
<p>If you select only the address region start line, perhaps by
double-clicking the operand there, you will be able to edit the
current region's properties.</p>
<p>If the region has a floating end point, you can choose to convert
it to a fixed end. The end doesn't move; it just gets fixed in place.
This is a quick way to "lock down" regions once you've established
their end points.</p>
<h4>Resize</h4>
<p>If you select multiple lines, and the first line is an address
region start directive, you will be able to resize that region to
the selection. By definition, the updated region will have a fixed
end point.</p>
<h4>Other notes</h4>
<p>There is no affordance for moving the start offset of a region. You
must create a new region and then delete the old one.</p>
<p>Regions may not "straddle" the start or end points of other regions.</p>
<p>Double-clicking on the pseudo-opcode of a region start or end
declaration will move the selection to the other end, rather than
opening the editor.</p>
<p>To see detailed information about an address region in the "Info"
window, select the region start or end directive. You can see the
current arrangement of address regions across your entire
project with Navigate &gt; View Address Map.</p>
<h2><a name="flags">Override Status Flags</a></h2>
<p>The state of the processor status flags are tracked for every
instruction. Each individual flag is recorded as zero, one, or
"indeterminate", meaning it could hold either value at the start of
that instruction. You can override the value of individual flags.</p>
<p>The 65816 emulation bit, which is not part of the processor status
register, may also be set in the editor.</p>
<p>The M, X, and E flags will not be editable unless your CPU configuration
is set to 65816.</p>
<h2><a name="label">Edit Label</a></h2>
<p>Sets or clears a label at the selected offset. The label must have the
<a href="intro-details.html#about-symbols">proper form</a>, and not have the same
name as another symbol, unless it's specified to be non-unique. If you
edit an auto-generated label you will be required to change the name.</p>
<p>The label may be marked as non-unique local, unique local, global,
or global and exported. The default is global. If you start typing
a label with the non-unique label prefix character (usually '@',
configurable in
<a href="settings.html#appset-displayformat">application settings</a>),
the selection will automatically switch to non-unique local.</p>
<p>Local labels may be "promoted" to global if the assembler requires it.
Most assemblers define local scope as starting clean after each global
label, but there are exceptions. If a label's name conflicts or is
incompatible with the assembler, it will be renamed.</p>
<p>Exported labels are added to a table that may
be imported by other projects (see
<a href="advanced.html#multi-bin">Working With Multiple Binaries</a>).</p>
<h2><a name="instruction-operand">Edit Operand (Instruction)</a></h2>
<p>Operands can be formatted explicitly, or you can let the disassembler
select the format for you. By default, immediate constants and
addresses with no matching symbol are formatted as hex. Symbols
defined as address labels, platform/project symbols, and local
variables will be identified and applied automatically.</p>
<h3><a name="explicit-format">Explicit Formats</a></h3>
<p>Operands can be displayed in a variety of numeric formats, or as a
symbol. The character formats are only available for operands
whose value falls into the proper range. The ASCII format handles
both plain and high ASCII; the correct encoding is chosen based on
the operand's value.</p>
<p>Symbols may be used in their entirety, or, when used as constants,
can be shifted and masked.
The low / high / bank selector determines which byte is used as the
low byte. For 16-bit operands, this acts as a shift rather than a byte
select. If the symbol is wider than the operand field, e.g. you're
referencing a 16-bit address in an 8-bit constant, a mask will be
applied automatically.</p>
<p>The editor will try to prevent you from using auto-generated
labels and local variables in the symbol field. These types of symbols
can be freely renamed by SourceGen, and thus cannot be reliably
referenced by name.
You can reference a non-unique local by writing it with the non-unique
label prefix character (default '@'). Ambiguous non-unique references
are not allowed, so if the symbol can't be found the label will
be discarded.</p>
<p>When you select a non-default format option, a "preview" of the
formatted operand will be shown.</p>
<p>The <code>MVN</code> and <code>MVP</code> instructions on the 65816
are a bit peculiar, because they have two operands rather than one.
SourceGen currently only allows you to set one format, which will be
applied to both operands. If you specify a symbol, the symbol will
be used twice, adjusted if necessary. (This limitation may be addressed
in a future release.)</p>
<p>The <code>BBR</code> and <code>BBS</code> instructions on the W65C02
also have two operands: a direct page address, and a relative branch.
In general the direct page address is ignored, so these are treated as
branch instructions.</p>
<p>The bottom part of the window has some shortcuts for working with
address references and local variables. These are primarily used to
change the way things work when "Default" is selected. The shortcuts
don't cause any changes to the recorded format of the instruction
being edited. All of the actions can be performed elsewhere, by
editing the label at the target address, editing the project symbol
set, or editing a local variable table.</p>
<h3><a name="shortcut-nar">Numeric Address References</a></h3>
<p>For operands that are 8-bit, 16-bit, or 24-bit addresses, you can
define a symbol for the address as a label or
<a href="intro-details.html#symbol-types">project symbol</a>.</p>
<p>If the operand is an address inside the project, you can set a
label at that address. If the address falls in the middle of an
instruction or multi-byte data item, its position will be adjusted to
the start. Labels may be created, modified, or (by erasing the label)
deleted.</p>
<p>The label finder does not do the optional search for "nearby" labels
that the main analyzer does, so there will be times when an instruction
that is shown with a symbol in the code list won't have a symbol
in the editor.</p>
<p>If the operand is an address outside the project, e.g. a ROM
address or I/O location, you can define a project symbol. If a
match was found in the configured platform definition files, it will be
shown; it can't be edited, but it can be overridden by a project symbol.
You can create or modify a project symbol by clicking on "Create Project
Symbol" or "Edit Project Symbol". You can't delete project symbols
from this editor (use Project Properties instead).</p>
<p>It's possible to have more than one project symbol for the same
address. For example, on the Apple II, reading from the memory-mapped
I/O address $C000 returns the last key pressed, but writing to it
changes the state of the 80-column display hardware, so it's useful to
have two different names for it. If more than one project symbol has the
same address, the first one found will be used, which may not be
what is desired. In such situations, you should create the project
symbol and then copy the symbol name into the operand. You can do this
in one step by clicking the "Copy to Operand" button.
(In most cases you don't want to do this, because if the project
symbol is deleted or renamed, you'll have operands that refer to a
nonexistent symbol. Unlike labels, project symbol renames do not
refactor the rest of the project.)
<h3><a name="shortcut-local-var">Local Variable References</a></h3>
<p>For zero-page address operands and (65816-only) stack-relative
constant operands, a local variable can be created or modified. This
requires that a local variable table has been defined at or before
the instruction being edited.</p>
<p>If an existing entry is found, you will be able to edit the name
and comment fields. If not, a new entry with a generic name and
pre-filled value field will be created in the nearest table.</p>
<h2><a name="data-operand">Edit Operand (Data)</a></h2>
<p>This dialog offers a variety of choices, and can be used to apply a
format to multiple lines. You must select all of the bytes you want
to format. For example, to format two bytes as a 16-bit word, you must
select both bytes in the editor. (If you click on the first item, then
Shift+double-click on the operand field of the last item, you can do
this very quickly.) The selection does not need to be contiguous: you
can use Control+click to select scattered items.</p>
<p>If the range is discontiguous, crosses a logical boundary
such as a change in address or a user-specified label, or crosses a
visual boundary like a long comment, note, or visualization, the selection
will be split into smaller regions. A message at the
top of the dialog indicates how many bytes have been selected, and how
many regions they have been divided into.</p>
<p>(End-of-line comments do <i>not</i> split a region, and will
disappear if they end up inside a multi-byte data item.)</p>
<p>The "Simple Data" items behave the same as their equivalents in the
Edit Operand dialog. However, because the width is not determined by
an instruction opcode, and multiple items can be selected, you will need
to specify how wide each item is and what its byte order is. For data
you also have the option of setting the format to "Address", which marks
the selected bytes as a numeric reference.</p>
<p>Consider a simple example: suppose you find a table of 16-bit
addresses in the code. Click on
the first byte, shift-click the last byte, then select the Edit Data menu
item. The number of bytes selected should be even. Select
"16-bit words, little-endian", then over to the right click on
"Address". When you click OK, the selected data will be formatted as a
series of 16-bit address values. If the addresses can be resolved inside
the data file, each address will be assigned a label.</p>
<p>The "Bulk Data" items can represent large chunks of data compactly.
The "fill" option is only available if all selected bytes have the
same value.
If a region of bytes is irrelevant, perhaps used only as padding, you
can mark it as "junk". If it appears to be adding bytes to reach a
power-of-two address boundary, you can designate it as an alignment
directive. If you have multiple regions selected, only options that
work for all regions will be shown.</p>
<p>The "String" items are enabled or disabled depending on whether the
data you have selected is in the appropriate format. For example,
"Null-terminated strings" is only enabled if the data regions are
composed entirely of characters followed by $00. Zero-length strings
are allowed.
DCI (Dextral Character Inverted) strings have the high bit on the last
byte flipped; for PETSCII this will usually look like a series of
lower-case letters followed by a capital letter, but may look odd if the
last character is punctuation (e.g. '!' becomes $A1, which is a
rectangle character that SourceGen will only display as hex).</p>
<p>The character encoding can be selected, offering a choice between
plain ASCII, low + high ASCII, C64 PETSCII, and C64 screen codes. When
you change the encoding, your available options may change. The
low + high ASCII setting will accept both, configuring the appropriate
encoding based on the data values, but when identifying multiple strings
it requires that each individual string be entirely one or the other.</p>
<p>Due to fundamental limitations of the character set, C64 screen code
strings cannot be null terminated ($00 is '@').</p>
<p>As noted earlier, to avoid burying elements such as labels in the middle
of a data item, contiguous areas may be split into smaller regions. This
can sometimes have unexpected effects. For example, this can be formatted
as two 16-bit words or one 32-bit word:</p>
<pre>
.DD1 $01
.DD1 $ef
.DD1 $01
.DD1 $f0
</pre>
<p>With a label in the middle, it can be formatted as two 16-bit words, but
not as a 32-bit word:</p>
<pre>
.DD1 $01
.DD1 $ef
LABEL .DD1 $01
.DD1 $f0
CODE LDA LABEL
</pre>
<p>If this is undesirable, you can add a label at a 32-bit boundary, and
reference that instead:</p>
<pre>
LABEL .DD1 $01
.DD1 $ef
.DD1 $01
.DD1 $f0
CODE LDA LABEL+2
</pre>
<p>With the label out of the way, the data can be formatted as desired.</p>
<h2><a name="comment">Edit Comment</a></h2>
<p>Enter an end-of-line (EOL) comment, or leave the text field blank to
delete it. EOL comments may be placed on instruction and data lines, but
not on assembler directives.</p>
<p>It's wise to restrict comments to the ASCII character set, because
not all assemblers can accept UTF-8 input. Code generators for such
assemblers will convert non-ASCII characters to '?' or something similar.
If this isn't a concern, you can enter any characters you like.</p>
<p>There is no fixed limit on the number of characters, but you may
want to limit the overall length if you're hoping to create 80-column
output. Some retro assemblers may have hard line length limitations,
which could result in the comment being truncated in generated sources.</p>
<p>A semicolon (';') is placed at the start of the comment. If an assembler
has different conventions, a different delimiter character may be used. You
don't need to include a delimiter explicitly in the comment field.</p>
<p>Comments on platform symbols are read from the platform symbol file, and
cannot be edited from within SourceGen. Comments on project symbols are
stored in the project file, and can be edited with the project symbol
editor.</p>
<h2><a name="long-comment">Edit Long Comment</a></h2>
<p>Long comments can be arbitrarily long and span multiple lines. They
will be word-wrapped at a line width of your choosing. They're always
drawn with a fixed-width font, so you can create ASCII-art diagrams.
Comment delimiters are added automatically at the start of each line.</p>
<p>For a true retro look you can "box" the comment with asterisks. You
can create a full-width row of asterisks by putting a '*' on a line by
itself. (Assembly source generators are allowed to use a character
other than '*' for the output, e.g. they might use a full set of
box outline characters, though that's somewhat against the spirit of
the thing. Regardless, a solo '*' results in a line.)</p>
<p>The bottom window will update automatically as you type, showing what
the output is expected to look like. The actual assembler source output
will depend on features of the target assembler, such as comment
delimiter choices and maximum line length limitations. For example,
Merlin allows a leading '*' to indicate a comment, while cc65 does not,
so cc65 code uses ";*' instead. Because the length limitation affects
the length of the line, not just the comment text, an asterisk-boxed
comment will have one fewer character per line in cc65 output.</p>
<p>Clear the text field to delete the comment.</p>
<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
<p>The long comment at the very top of the project is special, as it's
not associated with a file offset. If you delete it, you can get it
back by using Edit &gt; Edit Header Comment.</p>
<h2><a name="data-bank">Edit Data Bank (65816 only)</a></h2>
<p>Sets the Data Bank Register (DBR) value for 65816 code. This is used
when matching 16-bit address operands with labels. The new value is
in effect from the line where it's declared to the end of the file, even
across bank boundaries.
If you leave the text field blank, the directive will be removed.</p>
<p>A hexadecimal value from $00 to $ff can be entered directly. As
with other address inputs, a leading '$' is not required. Entering
"K" will set the DBR to the current address, and will automatically
update if you change the address to a different bank.</p>
<p>The pop-up menu has a list of all banks that hold code or data.
To make them easier to identify, each is shown with the label on the
first address in the bank, if any.</p>
<p>While you can override automatically-generated data bank change
directives, you can't remove them individually. You can disable
automatic generation by un-checking "smart PLB handling" in the project
properties.</p>
<p>Because the directive is frequently associated with <code>PLB</code>
instructions, double-clicking on a <code>PLB</code> opcode in the
code list will open the editor.</p>
<h2><a name="note">Edit Note</a></h2>
<p>Notes are similar to long comments, in that they can be arbitrarily
long and span multiple lines. However, because they're never included
in generated output, options like line width formatting and boxing
aren't relevant.</p>
<p>Instead, you can select a highlight color for the note to make it
stand out. You may want to assign certain colors to specific things,
e.g. blue for "I don't know what this is" or green for "this is a
bookmark for the really interesting stuff". The color will be applied
to the note in the code list and in the "Notes" window.</p>
<p>If you don't like the standard colors you can define your own.
You can do this with web RGB syntax, which uses a '#' followed by
two hex digits per channel. For example, bright red is
<code>#ff0000</code>, while teal is <code>#008080</code>. You can
also simply type a color name like "violet" so long as it appears in the
<a href="https://docs.microsoft.com/en-us/dotnet/media/art-color-table.png?view=netframework-4.8">list of Microsoft .NET colors</a>.</p>
<p>Clear the text field to delete the note.</p>
<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
<h2><a name="project-symbol">Edit Project Symbol</a></h2>
<p>This is used to edit the properties of a project symbol.</p>
<p>Symbols marked as "address" will be applied automatically when an
operand references an address outside the scope of the data file. They
will not be applied to addresses inside the data file. Symbols
marked as "constant" are not applied automatically, and must be
explicitly specified as an operand.</p>
<p>The label must meet the criteria for symbols (see
<a href="intro-details.html#about-symbols">All About Symbols</a>), and must
not have the same name as another project symbol. It can overlap
with platform symbols and user labels.</p>
<p>The value may be entered in decimal, hexadecimal, or binary. The numeric
base you choose will be remembered, so that the value will be displayed
the same way when used in a .EQ directive.</p>
<p>You can optionally provide a width for address symbols. For example,
if the address is of a two-byte pointer or a 64-byte buffer, you would
set the width field to cause all references to any location in that range
to be set to the symbol. Widths may be entered in hex or decimal. If
the field is left blank, a width of 1 is assumed. Overlapping symbols
are allowed. The width is ignored for constants.</p>
<p>If you enter a comment, it will be placed at the end of the line of
the .EQ directive.</p>
<p>For address symbols that represent a memory-mapped I/O location, it
can be useful to have different symbols for reads and writes. Use
the Read/Write checkboxes to specify the desired behavior.</p>
<h2><a name="lvtable">Create/Edit Local Variable Table</a></h2>
<p><a href="intro-details.html#local-vars">Local variables</a> are arranged in
tables, which are created at a specific file offset. They must be
associated with a line of code, and are usually placed at the start of
a subroutine.
The "Create Local Variable Table" action creates a new table, and
opens the editor. The "Edit Prior Local Variable Table" searches
for the closest table that appears at or before the selected line,
and edits that.</p>
<p>The editor allows you to create, edit, and delete entries, as well
as move and delete entire tables (though these last two options are not
available when creating a new table). Empty tables are allowed. These
can be useful if the "clear previous" flag is set. If you want to
delete the table, click the "Delete Table" button.</p>
<p>Use the buttons to add, edit, or remove individual variables. Each
variable has a name, a value, a width, and an optional comment. The
standard naming rules for symbols apply. Variables are only used for
zero-page and stack-relative operands, so all values must fall in the
range 0-255. The width may extend one byte past the end (to address $0100)
to allow 16-bit accesses to $ff (particularly useful on 65816).</p>
<p>You can move a table to any offset that is the start of an instruction
and doesn't already have a local variable table present. Click the
"Move Table" button and enter the new offset in hex. You can also click
on the up/down buttons to move to the next valid offset.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+86
View File
@@ -0,0 +1,86 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>End notes - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: End Notes</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="origins">Origins</a></h2>
<p>The inspiration for SourceGen goes a long way back. While in high
school in the late 1980s, I read Don Lancaster's
<i>Enhancing Your Apple II, Vol. 1</i> (available for download
<a href="https://www.tinaja.com/ebksamp1.shtml">here</a>). This
included a very detailed methodology for disassembling 6502 software
(nicely reformatted
<a href="https://www.tinaja.com/ebooks/tearing_rework.pdf">here</a>).
I wanted to give it a try, so I generated a monitor listing of an
operating system (called "RDOS") that SSI used on their games, and
printed it out on my Epson RX-80 -- tractor feed paper was helpful for
this sort of thing -- then set to work.</p>
<p>Lancaster's methodology involved highlighting different types of
instructions with different colors, making notes, and adding labels.
All this being done with felt-tip and colored highlighter pens. The
process worked remarkably well: by the time I was finished marking
things up, I knew how everything in the code worked.</p>
<p>I really wanted a better system though. The disassembler built into
the Apple II could get out of sync when it walked through a data area,
so sometimes you had to hand-write in the correct instruction. Applying
a label to every place that referenced it was tedious. When you got to
the end, you had a colorful print out, but you can't run that through
an assembler.</p>
<p>There were commercially-available disassemblers that generated source
code and removed some of the tedium from the process, and for many tasks
they solved the problem nicely. What I really wanted, though, looked more
like a modern IDE, because I didn't just want it to translate machine code
into readable form. I wanted it to help me with the process of
understanding the code, by providing cross-reference tables and symbol
lists and giving me a place to scribble notes to myself while I worked.
I especially wanted the note-scribbling, because learning how something
works is usually an iterative process, where the function of a chunk of
code gradually reveals itself over time.</p>
<p>In 2002, while writing the 6502/65816 disassembler for CiderPress, I
ran into the same problems I had with the original Apple II monitor: it
blundered through data sections and got lost briefly when a new code
section started. You had to pick long or short registers for the entire
diassembly, which made 65816 code something of a disaster. I
jotted down some notes on what I thought the core features of a good
6502 disassembler should be, then moved on to work on other features. It
was another 15 years before I picked up the idea again.</p>
<p>More recently, I disassembled some code by dumping it to a text
file with CiderPress and then fiddling with it in a text editor. I could
leave free-form notes, but when I found some code that I wanted to
exercise a bit I realized that getting it into an assembler was going
to take some effort. Raw addresses needed to be converted to labels,
the address and byte dump in the left column needed to be stripped out --
really just some basic text and string replace operations, but tedious
to do by hand.</p>
<p>The original design for SourceGen was substantially less feature-rich
than the final result. I kept discovering opportunities for features
that I wanted to have, or at least wanted to write. The result is
something of a monument to creeping featurism. Hopefully the core features
are solid enough to excuse the excesses.</p>
<p>-- Andy McFadden, September 2018</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+214
View File
@@ -0,0 +1,214 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Contents - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen Reference Manual</h1>
<p>SourceGen is an interactive disassembler for 6502, 65C02,
and 65816 code. The official web site is
<a href="https://6502bench.com/">https://6502bench.com/</a>.</p>
<p>If you want to get up to speed quickly, start with the
<a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
<h2>Contents</h2>
<ul>
<li><a href="intro.html">Overview</a>
<ul>
<li><a href="intro.html#fundamental-concepts">Fundamentals</a></li>
<ul>
<li><a href="intro.html#begin">About 6502 Code</a>
<li><a href="intro.html#charenc">Character Encoding</a></li>
<li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
</ul></li>
<li><a href="intro.html#sgintro">How SourceGen Works</a></li>
</ul></li>
<li><a href="intro-details.html">Digging Deeper</a>
<ul>
<li><a href="intro-details.html#about-symbols">All About Symbols</a>
<ul>
<li><a href="intro-details.html#connecting-operands">Connecting Operands With Labels</a></li>
<li><a href="intro-details.html#internal-address-symbols">Internal Address Symbols</a></li>
<li><a href="intro-details.html#external-address-symbols">External Address Symbols</a></li>
<li><a href="intro-details.html#unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></li>
<li><a href="intro-details.html#weak-refs">Weak Symbolic References</a></li>
<li><a href="intro-details.html#symbol-parts">Parts and Adjustments</a></li>
<li><a href="intro-details.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
</ul></li>
<li><a href="intro-details.html#width-disambiguation">Width Disambiguation</a></li>
<li><a href="intro-details.html#address-regions">Address Regions</a>
<ul>
<li><a href="intro-details.html#fixed-float">Fixed vs. Floating</a></li>
<li><a href="intro-details.html#non-addr">Non-Addressable Areas</a></li>
<li><a href="intro-details.html#pre-labels">Pre-Labels</a></li>
<li><a href="intro-details.html#relative-addr">Relative Addressing</a></li>
</ul></li>
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
<li><a href="intro-details.html#atags">Directing the Code Analyzer</a>
<ul>
<li><a href="intro-details.html#scripts">Extension Scripts</a></li>
</ul></li>
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
</ul></li>
<li><a href="mainwin.html">Using SourceGen</a>
<ul>
<li><a href="mainwin.html#starting-new">Starting a New Project</a></li>
<li><a href="mainwin.html#opening">Opening an Existing Project</a></li>
<li><a href="mainwin.html#working">Working With a Project</a>
<ul>
<li><a href="mainwin.html#code-list">Code List</a></li>
<li><a href="mainwin.html#undo">Undo &amp; Redo</a></li>
<li><a href="mainwin.html#references">References Window</a></li>
<li><a href="mainwin.html#notes">Notes Window</a></li>
<li><a href="mainwin.html#symbols">Symbols Window</a></li>
<li><a href="mainwin.html#info">Info Window</a></li>
<li><a href="mainwin.html#messages">Messages Window</a></li>
<li><a href="mainwin.html#navigation">Navigation</a></li>
<li><a href="mainwin.html#atags">Adding and Removing Analyzer Tags</a></li>
<li><a href="mainwin.html#address-table">Format Address Table</a></li>
<li><a href="mainwin.html#toggle-single">Toggle Single-Byte Format</a></li>
<li><a href="mainwin.html#format-as-word">Format As Word</a></li>
<li><a href="mainwin.html#toggle-data">Toggle Data Scan</a></li>
<li><a href="mainwin.html#clipboard">Copying to Clipboard</a></li>
</ul></li>
</ul></li>
<li><a href="editors.html">Editors</a>
<ul>
<li><a href="editors.html#address">Define Address Region<a></li>
<li><a href="editors.html#flags">Override Status Flags</a></li>
<li><a href="editors.html#label">Edit Label</a></li>
<li><a href="editors.html#instruction-operand">Edit Operand (Instruction)</a>
<ul>
<li><a href="editors.html#explicit-format">Explicit Formats</a></li>
<li><a href="editors.html#shortcut-nar">Numeric Address References</a></li>
<li><a href="editors.html#shortcut-local-var">Local Variable References</a></li>
</ul></li>
<li><a href="editors.html#data-operand">Edit Operand (Data)</a></li>
<li><a href="editors.html#comment">Edit Comment</a></li>
<li><a href="editors.html#long-comment">Edit Long Comment</a></li>
<li><a href="editors.html#data-bank">Edit Data Bank (65816 only)</a></li>
<li><a href="editors.html#note">Edit Note</a></li>
<li><a href="editors.html#project-symbol">Edit Project Symbol</a></li>
<li><a href="editors.html#lvtable">Create / Edit Local Variable Table</a></li>
</ul></li>
<li><a href="visualization.html">Visualizations</a>
<ul>
<li><a href="visualization.html#overview">Overview</a></li>
<li><a href="visualization.html#vis-and-sets">Visualizations and Visualization Sets</a></li>
<li><a href="visualization.html#runtime">Scripts Included with SourceGen</a></li>
</ul></li>
<li><a href="codegen.html">Code Generation &amp; Assembly</a>
<ul>
<li><a href="codegen.html#generate">Generating Source Code</a>
<ul>
<li><a href="codegen.html#localizer">Label Localizer</a></li>
<li><a href="codegen.html#reserved-labels">Reserved Label Names</a></li>
<li><a href="codegen.html#platform-features">Platform-Specific Features</a></li>
</ul></li>
<li><a href="codegen.html#assemble">Cross-Assembling Generated Code</a></li>
<li><a href="codegen.html#supported">Supported Assemblers</a>
<ul>
<li><a href="codegen.html#version">Version-Specific Code Generation</a></li>
<li><a href="codegen.html#quirks">Assembler-Specific Bugs &amp; Quirks</a>
<ul>
<li><a href="codegen.html#64tass">64tass</a></li>
<li><a href="codegen.html#acme">ACME</a></li>
<li><a href="codegen.html#cc65">cc65</a></li>
<li><a href="codegen.html#merlin32">Merlin 32</a></li>
</ul></li>
</ul></li>
<li><a href="codegen.html#export-source">Exporting Source Code</a>
</ul></li>
<li><a href="settings.html">Properties &amp; Settings</a>
<ul>
<li><a href="settings.html#app-settings">Application Settings</a>
<ul>
<li><a href="settings.html#appset-codeview">Code View</a></li>
<li><a href="settings.html#appset-textdelim">Text Delimiters</a></li>
<li><a href="settings.html#appset-asmconfig">Asm Config</a></li>
<li><a href="settings.html#appset-displayformat">Display Format</a></li>
<li><a href="settings.html#appset-pseudoop">Pseudo-Op</a></li>
</ul></li>
<li><a href="settings.html#project-properties">Project Properties</a>
<ul>
<li><a href="settings.html#projprop-general">General</a></li>
<li><a href="settings.html#projprop-projsym">Project Symbols</a></li>
<li><a href="settings.html#projprop-symfiles">Symbol Files</a></li>
<li><a href="settings.html#projprop-extscripts">Extension Scripts</a></li>
</ul></li>
</ul></li>
<li><a href="tools.html">Tools</a>
<ul>
<li><a href="tools.html#instruction-chart">Instruction Chart</a></li>
<li><a href="tools.html#ascii-chart">ASCII Chart</a></li>
<li><a href="tools.html#apple2-screen-chart">Apple II Screen Chart</a></li>
<li><a href="tools.html#hexdump">Hex Dump Viewer</a></li>
<li><a href="tools.html#file-concat">File Concatenator</a></li>
<li><a href="tools.html#file-slicer">File Slicer</a></li>
<li><a href="tools.html#omf-converter">OMF Converter</a></li>
</ul></li>
<li><a href="advanced.html">Advanced Topics</a>
<ul>
<li><a href="advanced.html#platform-symbols">Platform Symbol Files (.sym65)</a></li>
<li><a href="advanced.html#extension-scripts">Extension Scripts</a></li>
<li><a href="advanced.html#multi-bin">Working With Multiple Binaries</a></li>
<li><a href="advanced.html#overlap">Overlapping Address Spaces</a></li>
<li><a href="advanced.html#reloc-data">OMF Relocation Dictionaries</a></li>
<li><a href="advanced.html#debug">Debug Menu Options</a></li>
</ul></li>
<li><a href="analysis.html">Appendix: Instruction and Data Analysis</a>
<ul>
<li><a href="analysis.html#analysis-process">Analysis Process</a>
<ul>
<li><a href="analysis.html#auto-format">Automatic Formatting</a></li>
<li><a href="analysis.html#undo-redo">Interaction With Undo/Redo</a></li>
</ul></li>
<li><a href="analysis.html#code-analysis">Code Analysis</a>
<ul>
<li><a href="analysis.html#extension-scripts">Extension Scripts</a></li>
</ul></li>
<li><a href="analysis.html#data-analysis">Data Analysis</a></li>
</ul></li>
<li><a href="end-notes.html">End Notes</a> </li>
<br/>
<!--
<li><a href="tutorials.html">Tutorials</a>
<ul>
<li><a href="tutorials.html#basic-features">Tutorial #1: Basic Features</a></li>
<li><a href="tutorials.html#advanced-features">Tutorial #2: Advanced Features</a></li>
<li><a href="tutorials.html#address-tables">Tutorial #3: Address Table Formatting</a></li>
<li><a href="tutorials.html#extension-scripts">Tutorial #4: Extension Scripts</a></li>
<li><a href="tutorials.html#visualizations">Tutorial #5: Visualizations</a></li>
</ul></li>
-->
</ul>
</div>
<div id="footer">
<hr/>
<p>Copyright 2020 faddenSoft</p>
</div>
</body>
</html>
+958
View File
@@ -0,0 +1,958 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>More Details - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Intro Details</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="more-details">More Details</a></h2>
<p>This section digs a little deeper into how SourceGen works.</p>
<h2><a name="about-symbols">All About Symbols</a></h2>
<p>A symbol has two essential parts, a label and a value. The label is a short
ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
constant. Symbols can be defined in different ways, and applied in
different ways.</p>
<p>The label syntax is restricted to a format that should be compatible
with most assemblers:</p>
<ul>
<li>2-32 characters long.</li>
<li>Starts with a letter or underscore.</li>
<li>Comprised of ASCII letters, numbers, and the underscore.</li>
</ul>
<p>Label comparisons are case-sensitive, as is customary for programming
languages.</p>
<p>Sometimes the purpose of a subroutine or variable isn't immediately
clear, but you can take a reasonable guess. You can document your
uncertainty by adding a question mark ('?') to the end of the label.
This isn't really part of the label, so it won't appear in the assembled
output, and you don't have to include it when searching for a symbol.</p>
<p>Some assemblers restrict the set of valid labels further. For example,
64tass uses a leading underscore to indicate a local label, and reserves
a double leading underscore (e.g. <code>__label</code>) for its own
purposes. In such cases, the label will be modified to comply with the
target assembler syntax.</p>
<p>Operands may use parts of symbols. For example, if you have a label
<code>MYSTRING</code>, you can write:</p>
<pre>
MYSTRING .STR "hello"
LDA #&lt;MYSTRING
STA $00
LDA #&gt;MYSTRING
STA $01
</pre>
<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
<p>Symbols that represent a memory address within a project are treated
differently from those outside a project. We refer to these as internal
and external addresses, respectively.</p>
<h3><a name="connecting-operands">Connecting Operands with Labels</a></h3>
<p>Suppose you have the following code:</p>
<pre>
LDA $1234
JSR $2345
</pre>
<p>If we put that in a source file, it will assemble correctly.
However, if those addresses are part of the file, the code may break if
changes are made and things assemble to different addresses. It would
be better to generate code that references labels, e.g.:</p>
<pre>
LDA my_data
JSR nifty_func
</pre>
<p>SourceGen tries to establish labels for address operands automatically.
How this works depends on whether the operand's address is inside the file or
external, and whether there are existing labels at or near the target
address. The details are explored in the next few sections.</p>
<p>On the 65816 this process is trickier, because addresses are 24 bits
instead of 16. For a control-transfer instruction like <code>JSR</code>,
the high 8 bits come from the Program Bank Register (K). For a data-access
instruction like <code>LDA</code>, the high 8 bits come from the Data
Bank Register (B). The PBR value is determined by the address in which
the code is executing, so it's easy to determine. The DBR value can be
set arbitrarily. Sometimes it's easy to figure out, sometimes it has
to be specified manually.</p>
<h3><a name="internal-address-symbols">Internal Address Symbols</a></h3>
<p>Symbols that represent an address inside the file being disassembled
are referred to as <i>internal</i>. They come in two varieties.</p>
<p><b>User labels</b> are labels added to instructions or data by the user.
The editor will try to prevent you from creating a label that has the same
name as another symbol, but if you manage to do so, the user label takes
precedence over symbols from other sources. User labels may be tagged
as non-unique local, unique local, global, or global and exported. Local
vs. global is important for the label localizer, while exported symbols
can be pulled directly into other projects.</p>
<p><b>Auto labels</b> are automatically generated labels placed on
instructions or data offsets that are the target of operands. They're
formed by appending the hexadecimal address to the letter "L", with
additional characters added if some other symbol has already defined
that label. Options can be set that change the "L" to a character or
characters based on how the label is referenced, e.g. "B" for branch targets.
Auto labels are only added where they are needed, and are removed when
no longer necessary. Because auto labels may be renamed or vanish, the
editor will try to prevent you from referring to them explicitly when
editing operands.</p>
<h3><a name="external-address-symbols">External Address Symbols</a></h3>
<p>Symbols that represent an address outside the file being disassembled
are referred to as <i>external</i>. These may be ROM entry points,
data buffers, zero-page variables, or a number of other things. Because
the memory address they appear at aren't within the bounds of the file,
we can't simply put an address label on them. Three different mechanisms
exist for defining them. If an instruction or data operand refers to
an address outside the file bounds, SourceGen looks for a symbol with
a matching address value.</p>
<p><b>Platform symbols</b> are defined in platform symbol files. These
are named with a ".sym65" extension, and have a fairly straightforward
name/value syntax. Several files for popular platforms come with SourceGen
and live in the <code>RuntimeData</code> directory. You can also create your
own, but they have to live in the same directory as the project file.</p>
<p>Platform symbols can be addresses or constants. Addresses are
limited to 24-bit values, and are matched automatically. Constants may
be 32-bit values, but must be specified manually.</p>
<p>If two platform symbols have the same label, only the most recently read
one is kept. If two platform symbols have different labels but the
same value, both symbols will be kept, but the one in the file loaded
last will take priority when doing a lookup by address. If symbols with
the same value are defined in the same file, the one whose symbol appears
first alphabetically takes priority.</p>
<p>Platform address symbols have an optional width. This can be used
to define multi-byte items, such as two-byte pointers or 256-byte stacks.
If no width is specified, a default value of 1 is used. Widths are ignored
for constants.
Overlapping symbols are resolved as described earlier, with symbols loaded
later taking priority over previously-loaded symbols. In addition,
symbols defined closer to the target address take priority, so if you put
a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
be visible because the start point is closer to the addresses it covers
than the start of the 256-byte range.</p>
<p>Platform symbols can be designated for reading, writing, or both.
Normally you'd want both, but if an address is a memory-mapped I/O
location that has different behavior for reads and writes, you'd want
to define two different symbols, and have the correct one applied
based on the access type.</p>
<p><b>Project symbols</b> behave like platform symbols, but they are
defined in the project file itself, through the Project Properties editor.
The editor will try to prevent you from creating two symbols with the same
name. If two symbols have the same value, the one whose label comes
first alphabetically is used.</p>
<p>Project symbols always have precedence over platform symbols, allowing
you to redefine symbols within a project. (You can "hide" a platform
symbol by creating a project symbol constant with the same name. Use a
value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
<p><b>Address region pre-labels</b> are an oddity: they're external
address symbols that also act like user labels. These are explained
in more detail <a href="#pre-labels">later</a>.</p>
<p><b>Local variables</b> are redefinable symbols that are organized
into tables. They're used to specify labels for zero-page addresses
and 65816 stack-relative instructions. These are explained in more
detail in the next section.</p>
<h4><a name="local-vars">How Local Variables Work</a></h4>
<p>Local variables are applied to instructions that have zero
page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
65816 stack relative operands
(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>). While they must be
unique relative to other kinds of labels, they don't have to be unique
with respect to earlier variable definitions. So you can define
<code>TMP .EQ $10</code>, and a few lines later define
<code>TMP .EQ $20</code>. This is handy because zero-page addresses are
often used in different ways by different parts of the program. For
example:</p>
<pre>
LDA ($00),Y
INC $02
... elsewhere ...
DEC $00
STA ($01),Y
</pre>
<p>If we had given <code>$00</code> the label <code>PTR</code> and
<code>$02</code> the label <code>COUNT</code> globally,
the second pair of instructions would look all wrong. With local
variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
<p>Local variables have a value and a width. If we create a pair of
variable definitions like this:</p>
<pre>
PTR .eq $00 ;2 bytes
COUNT .eq $02 ;1 byte
</pre>
<p>Then this:</p>
<pre>
STA $00
STX $01
LDY $02
</pre>
<p>Would become:</p>
<pre>
STA PTR
STX PTR+1
LDY COUNT
</pre>
<p>The scope of a variable definition starts at the point where it is
defined, and stops when its definition is erased. There are three
ways for a table to erase an earlier definition:</p>
<ol>
<li>Create a new definition with the same name.</li>
<li>Create a new definition that has an overlapping value. For
example, if you have a two-byte variable <code>PTR = $00</code>,
and define a one-byte variable <code>COUNT = $01</code>, the
definition for <code>PTR</code> will be cleared because its second
byte overlaps.</li>
<li>Tables have a "clear previous" flag that erases all previous
definitions. This doesn't usually cause anything to be generated in the
assembly sources; instead, it just causes SourceGen to stop using
that label.</li>
</ol>
<p>As you might expect, you're not allowed to have duplicate labels or
overlapping values in an individual table.</p>
<p>If a platform/project symbol has the same value as a local variable,
the local variable is used. If the local variable definition is cleared,
use of the platform/project symbol will resume.</p>
<p>Not all assemblers support redefinable variables. In those cases,
the symbol names will be modified to be unique (e.g. the second definition
of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
global scope.</p>
<h3><a name="unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></h3>
<p>Most assemblers have a notion of "local" labels, which have a scope
that is book-ended by global labels. These are handy for generic branch
target names like "loop" or "notzero" that you might want to use in
multiple places. The exact definition of local variable scope varies
between assemblers, so labels that you want to be local might have to
be promoted to global (and probably renamed).</p>
<p>SourceGen has a similar concept with a slight twist: they're called
non-unique labels, because the goal is to be able to use the same
label in more than one place. Whether or not they actually turn out
to be local is a decision deferred to assembly source generation time.
(You can also declare a label to be a unique local if you like; the
auto-generated labels like "L1234" do this.)</p>
<p>When you're writing code for an assembler, it has to be unambiguous,
because the assembler can't guess at what the output should be. For a
disassembler, the output is known, so a greater degree of ambiguity is
tolerable. Instead of throwing errors and refusing to continue, the
source generator can modify the output until it works. For example:<p>
<pre>
@LOOP LDX #$02
@LOOP DEX
BNE @LOOP
DEY
BNE @LOOP
</pre>
<p>This would confuse an assembler. SourceGen already knows which @LOOP
is being branched to, so it can just rename one of them to "@LOOP1".</p>
<p>One situation where non-unique labels cause difficulty is with
weak symbolic references (see next section). For example, suppose
the above code then did this:</p>
<pre>
LDA #&lt;@LOOP
</pre>
<p>While it's possible to make an educated guess at which @LOOP was
meant, it's easy to get wrong. In situations like this, it's best to
give the labels different names.</p>
<h3><a name="weak-refs">Weak Symbolic References</a></h3>
<p>Symbolic references in operands are "weak references". If the named
symbol exists, the reference is used. If the symbol can't be found, the
operand is formatted in hex instead. They're called "weak" because
failing to resolve the reference isn't considered an error.</p>
<p>It's important to know this when editing a project. Consider the
following trivial chunk of code:</p>
<pre>
1000: 4c0310 JMP $1003
1003: ea NOP
</pre>
<p>When you load it into SourceGen, it will be formatted like this:</p>
<pre>
.ADDRS $1000
JMP L1003
L1003 NOP
</pre>
<p>The analyzer found the JMP operand, and created an auto label for
address $1003. It then created a weak reference to "L1003" in the JMP
operand.</p>
<p>If you edit the JMP instruction's operand to use the symbol "FOO", the
results are probably not what you want:</p>
<pre>
.ADDRS $1000
JMP $1003
NOP
</pre>
<p>This happened because you added a weak reference to "FOO" in the operand,
but the label doesn't exist. The operand is formatted as hex. Because
there's no longer a reference to L1003, SourceGen removed the auto-label
as well.</p>
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
probably wanted:</p>
<pre>
.ADDRS $1000
JMP FOO
FOO NOP
</pre>
<p>You don't actually need the explicit reference in the JMP instruction.
If you edit the JMP operand and set it back to "Default", the code will
still look the same. This is because SourceGen identified the numeric
reference, and automatically added a symbolic reference to the label on
the NOP instruction.</p>
<p>However, suppose you didn't actually want FOO as the operand label.
You can create a project symbol, BAR with the value $1003, and then edit
the operand to reference BAR instead. Your code would then look like:</p>
<pre>
BAR .EQ $1003
.ADDRS $1000
JMP BAR
FOO NOP
</pre>
<p>If you change the value of BAR in the project symbol file, the operand
will continue to refer to it, but with an adjustment. For example, if
you changed BAR from $1003 to $1007, the code would become:</p>
<pre>
BAR .EQ $1007
.ADDRS $1000
JMP BAR-4
FOO NOP
</pre>
<p>If you rename a label, all references to that label are updated. For
numeric references that happens implicitly. For explicit operand
references, the weak references are updated individually. (Modern IDEs
call this "refactoring".)</p>
<p>If you remove a label, all of the numeric references to it will
reference something else, probably a new auto label. Weak references
to the symbol will break and be formatted as hex, but will not be
removed. Similarly, removing symbols from a platform or project file
will break the reference but won't modify the operands.</p>
<h3><a name="symbol-parts">Parts and Adjustments</a></h3>
<p>Sometimes you want to use part of a label, or adjust the value slightly.
(I use "adjustment" rather than "offset" to avoid confusing it with file
offsets.) Consider the following example:</p>
<pre>
1000: a910 LDA #$10
1002: 48 PHA
1003: a906 LDA #$06
1005: 48 PHA
1006: 60 RTS
1007: 4c3aff JMP $ff3a
</pre>
<p>This pushes the address of the JMP instruction ($1007) onto the stack,
and jumps to it with the RTS instruction. However, RTS requires the
address of the byte before the target instruction, so we actually push
$1006.</p>
<p>The disassembler won't know that offset $1007 is code because nothing
appears to reference it. After tagging $1007 as a code start point, the
project looks like this:</p>
<pre>
LDA #$10
PHA
LDA #$06
PHA
RTS
JMP $ff3a
</pre>
<p>We set a label called "NEXT" on the JMP instruction, and then edit
the two LDA instructions to reference the high and low parts, yielding:</p>
<pre>
.ADDRS $1000
LDA #&gt;NEXT
PHA
LDA #&lt;NEXT-1
PHA
RTS
NEXT JMP $ff3a
</pre>
<p>SourceGen will adjust label values by whatever amount is required to
generate the original value. If the adjustment seems wrong, make sure
you're selecting the right part of the symbol.</p>
<p>Different assemblers use different syntaxes to form expressions. This
is particularly noticeable in 65816 code. You can adjust how it appears
on-screen from the app settings.</p>
<h3><a name="nearby-targets">Automatic Use of Nearby Targets</a></h3>
<p>Sometimes you want to use a symbol that doesn't match up with the
operand. SourceGen tries to anticipate situations where that might be
the case, and apply adjustments for you.</p>
<p>Suppose you have the following:</p>
<pre>
.ADDRS $1000
LDA #$00
STA L1010
LDA #$20
STA L1011
LDA #$e1
STA L1012
RTS
L1010 .DD1 $00
L1011 .DD1 $00
L1012 .DD1 $00
</pre>
<p>Showing stores to three different labeled addresses is fine, but
the code is actually setting up a single 24-bit address. For clarity,
you'd like the output to reflect the fact that it's a single, multi-byte
variable. So, if you set a label at $1010, SourceGen removes the
nearby auto labels, and sets the numeric references to use your label:</p>
<pre>
.ADDRS $1000
LDA #$00
STA DATA
LDA #$20
STA DATA+1
LDA #$e1
STA DATA+2
RTS
DATA .DD1 $00
.DD1 $00
.DD1 $00
</pre>
<p>If you decide that you really wanted each store to have its own
label, you can set labels on the other two addresses. SourceGen won't
search for alternate labels if the numeric reference target has a
user-defined label.</p>
<p>This is also used for self-modifying code. For example:</p>
<pre>
1000: a9ff LDA #$ff
1002: 8d0610 STA $1006
1005: 4900 EOR #$00
</pre>
<p>The above changes the <code>EOR #$00</code> instruction to
<code>EOR #$ff</code>. The operand target is $1006, but we can't
put a label there because it's in the middle of the instruction. So
SourceGen puts a label at $1005 and adjusts it:</p>
<pre>
LDA #$ff
STA L1005+1
L1005 EOR #$00
</pre>
<p>If you really don't like the way this works, you can disable the
search for nearby targets entirely from the
<a href="settings.html#project-properties">project properties</a>.
Self-modifying code will always be adjusted because of the limitation
on mid-instruction labels.</p>
<h2><a name="width-disambiguation">Width Disambiguation</a></h2>
<p>It's possible to interpret certain instructions in multiple ways.
For example, "LDA $0000" might be an absolute load from a 16-bit
address, or it might be a direct page load from an 8-bit address.
Humans can infer from the fact that it was written with a 4-digit address
that it's meant to be absolute, but assemblers often treat operands
purely as numbers, and would just see "LDA 0". Common practice is to
use the shortest instruction possible.</p>
<p>Every assembler seems to address the problem in a slightly different
way. Some use opcode suffixes, others use operand prefixes, some
allow both. You can configure how they appear in the
<a href="settings.html#app-settings">application settings</a>.</p>
<p>SourceGen will only add width disambiguators to opcodes or operands when
they are needed, with one exception: the opcode suffix for long
(24-bit address) operations is always applied. This is done because some
assemblers require it, insisting on "LDAL" rather than "LDA" for an
absolute long load, and because it can make 65816 code easier to read.</p>
<h2 id="address-regions">Address Regions</h2>
<p>Simple programs are loaded at a particular address and executed there.
The source code starts with a directive that tells the assembler what the
initial address is, and the code and data statements that follow are
placed appropriately. More complicated programs might relocate parts
of themselves to other parts of memory, or be comprised of multiple
"overlay" segments that, through disk I/O or bank-switching, all execute
at the same address.</p>
<p>Consider the code in the first tutorial. It loads at $1000, copies
part of itself to $2000, and transfers execution there:</p>
<pre>
.ADDRS $1000
1000: a0 71 LDY #$71
1002: b9 17 10 L1002 LDA SRC,y
1005: 99 00 20 STA MAIN,y
1008: 88 DEY
1009: 30 09 BMI L1014
100b: 10 f5 BPL L1002
100d: 00 .DD1 $00
100e: 68 65 6c 6c+ .STR "hello!"
1014: 4c 00 20 L1014 JMP MAIN
1017: SRC
.ADDRS $2000
2000: ad 00 30 MAIN LDA $3000
[...]
</pre>
<p>The arrangement of this code can be viewed in a couple of ways. One
way is to see it linearly: the code starts at $1000, continues to $1017,
then restarts at $2000:</p>
<pre>
+000000 +- start
| $1000 - $1016 length=23 ($0017)
+000016 +- end (floating)
+000017 +- start 'MAIN'
| $2000 - $2070 length=113 ($0071)
+000087 +- end (floating)
</pre>
<p>The other way to picture it is hierarchical: the file loads
fully at $1000, and has a "child" region at offset +000017 in which the
address changes to $2000:</p>
<pre>
+000000 +- start
| $1000 - $1016 length=23 ($0017)
+000017 | +- start 'MAIN' pre='SRC'
| | $2000 - $2070 length=113 ($0071)
+000087 | +- end
+000087 +- end
</pre>
<p>The latter is closer to what many assemblers expect, with a "physical"
PC that starts where the file is loaded, and a "logical" or "pseudo" PC
that determines how the code is generated. SourceGen supports both
approaches. The only thing that would change in this example is that
the nested approach allows the "SRC" label to exist. (More on this
later, on the section on <a href="#pre-labels">pre-labels</a>.)</p>
<p>The real value of a hierarchical arrangement becomes apparent when
the area copied out of the file is only a small part of it. For
example, suppose something like:</p>
<pre>
.ADDRS $1000
LDA SUB_SRC,Y
STA SUB_DST,Y
JMP CONT
SUB_SRC
.ADDRS $2000
SUB_DST [small routine]
.ADREND
CONT LDA #$12
JSR SUB_DST
</pre>
<p>In this case, a small routine is copied out of the middle of the
code that lives at $1000. We want the code at CONT to pick up where
things left off. If SUB_SRC is at $1009, and is 23 bytes long, then
CONT should be $1020. We could output <code>.ADDRS $1020</code>
directly before CONT, but it's inconvenient to work with the generated
code if we want to modify the subroutine (changing its length)
and re-assemble it.</p>
<h3 id="fixed-float">Fixed vs. Floating</h3>
<p>Sometimes when disassembling code you know exactly where an address
region starts and ends. Other times you know where it starts, but won't
know where it stops until you've had a chance to look at the updated
disassembly. In the former case you create a region with a "fixed" end
point, in the latter you create one with a "floating" end point.</p>
<p>Address regions with fixed end points always stop in the same place.
Regions with floating end points stop at the next address region boundary,
which means they can change size as regions are added or removed.
The end will be placed for either the start of a new region (a "sibling"),
or the end of an encapsulating region (the "parent").</p>
<p>Regions that overlap must have a parent/child relationship. Whichever
one starts last or ends first is the child. A strict ordering is necessary
because a given file offset can only have one address, and if we don't
know which region is the child we can't know which address to assign.
Regions cannot straddle the start or end of another region, and cannot
exactly overlap (have the same start and length) as another region.
One consequence of these rules is that "floating" regions cannot share
a start offset with another region, because their end point would be
adjusted to match the end of the other region.</p>
<p>The arrangement of regions is particularly important when attempting
to resolve an address operand (such as a JSR) to a location within the
file. The process is straightforward if the address only appears once,
but when overlays cause multiple parts of the file to have the same
address, the operand target may be in different places depending on where
the call is being made from.
The algorithm for resolving addresses is described
in the <a href="advanced.html#overlap">advanced topics</a> section.</p>
<h3 id="non-addr">Non-Addressable Areas</h3>
<p>Some files have contents that aren't actually loaded into memory
addressable by the 6502. One example is a file header, such as a load
address extracted by the system when reading the program into memory, or
something intended to be read by an emulator. Another example is the
CHR graphic data on the NES, which is loaded into an area inaccessible
to the CPU.</p>
<p>The generated source file must recreate the original binary exactly,
but we don't really want to assign an address to non-addressable data,
because it should never be resolved as the target of a JSR or LDA. To
handle this case, you can set a region's address to "NA". The assembler
needs to have <i>some</i> notion of address, so the start address will
be treated as zero.</p>
<p>Non-addressable regions cannot include executable code. You may put
labels on data items, but attempting to reference them will cause a
warning and will likely generate code that doesn't assemble.</p>
<p>It's possible to delete all address regions from a project, or edit
them so that there are "holes" not covered by a region.
To handle this, all projects are effectively covered by a non-addressable
region that spans the entire file. Any part of the file that isn't
explicitly covered by a user-specified region will be provided an
auto-generated non-addressable region. Such regions don't actually exist,
so attempting to edit one will actually cause a new region to be created.</p>
<h3 id="pre-labels">Pre-Labels</h3>
<p>The need for pre-labels was illustrated in the earlier example, where
code in Tutorial1 was copied from $1017 to $2000. The fundamental issue
is that offset +000017 has <i>two</i> addresses: $1017 and $2000. The
assembler can only generate code for one. Pre-labels allow you to do
the same thing you'd do in the source code, which is to add a label
immediately before the address is changed.</p>
<p>Pre-labels are "external" symbols, similar to project symbols,
because they refer to an address that is outside the file bounds.
They're always treated as having global scope.
However, they also behave like user labels, because they're generated
as part of the instruction stream and interfere with local label
references that cross them.</p>
<p>The address of a pre-label is determined by the parent region.
Suppose you have a file with an arrangement like:</p>
<pre>
region1 start
...
region2 start
...
region2 end
region1 end
</pre>
<p>You can put a pre-label on <code>region2</code>, which will be the
address of the byte in <code>region1</code> right before the address
changed. You can't put a pre-label on <code>region1</code>, because
before <code>region1</code> there was no address. Similarly:</p>
<pre>
region1 start
...
region1 end
region2 start
...
region2 end
</pre>
<p>You can't put a pre-label on <code>region2</code> because its parent
is non-addressable. <code>region1</code>'s address doesn't apply,
because <code>region1</code> ended before the label would be issued.</p>
<h3 id="relative-addr">Relative Addressing</h3>
<p>It is occasionally useful to output an address region start directive
that uses relative addressing instead of absolute addressing. For
example, given:</p>
<pre>
.ADDRS $1000
[...]
.ADDRS $2000
</pre>
<p>We could instead generate:</p>
<pre>
.ADDRS $1000
[...]
.ADDRS *+$0fe9
</pre>
<p>This has no effect on the definition of the region. It only affects
how the start directive is generated in the assembly source file.</p>
<p>The value is an offset from the current assembler program counter.
If the new region is the child of a non-addressable region, a relative
offset cannot be used.</p>
<h2><a name="atags">Directing the Code Analyzer</a></h2>
<p>Sometimes SourceGen can't automatically find the start or end of an
instruction stream, or gets confused by inline data. These situations
can be resolved by adding analyzer tags.</p>
<p><b>Code start point</b> tags tell the analyzer to add the offset
to the list of instruction start points. Suppose you've got a code
library that begins with jump vectors, like this:</p>
<pre>
1000: 4c0910 JMP $1009
1003: 4cef10 JMP $10ef
1006: 4c3012 JMP $1230
1009: 18 CLC
</pre>
<p>When opened with SourceGen, it will look like this:</p>
<pre>
.ADDRS $1000
JMP L1009
.DD1 $4c
.DD1 $ef
.DD1 $10
.DD1 $4c
.DD1 $30
.DD1 $12
L1009 CLC
</pre>
<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
assumes those are data. Further, the functions at those addresses may
also be considered data unless some bit of code reachable from L1009
calls into them. If you tag $1003 and $1006 as code start points,
you'll get better results:</p>
<pre>
.ADDRS $1000
JMP L1009
JMP L10ef
JMP L1230
L1009 CLC
</pre>
<p>Be careful that you only tag the instruction opcode byte. If
you tagged each and every byte from $1003 to $1008, you would
end up with a mess:</p>
<pre>
.ADDRS $1000
JMP L1009
JMP &#x25bc; L10ef
BPL &#x25bc; L1053
JMP &#x25bc; L1230
BMI L101b
L1009 CLC
</pre>
<p>The exact set of instructions shown depends on your CPU configuration.
The problem is that the bytes in the middle of the instruction have
been tagged as start points, so SourceGen is treating them as
embedded instructions. $EF and $12 aren't valid 6502 opcodes, so
they're being ignored, but $10 is BPL and $30 is BMI. Because tagging
multiple consecutive bytes is rarely useful, SourceGen only applies code
start tags to the first byte in a selected line.</p>
<p><b>Code stop point</b> tags tell the analyzer when it should stop. For
example, suppose address $ff00 is known to always be nonzero, and the code
uses that fact to get a branch-always on the 6502:</p>
<pre>
.ADDRS $1000
LDA $ff00
BNE L1010
BRK $11
</pre>
<p>By tagging the BRK as a code stop point, you're telling the analyzer that
it should stop trying to execute code when it reaches that point. (Note
that this example would actually be better solved by setting a status flag
override on the BNE that sets Z=0, so the code tracer will know it's a
branch-always and just do the right thing.) As with code start points,
code stop points should only be placed on the opcode byte. Placing a
code stop point in the middle of what SourceGen believes to be instruction
will have no effect.</p>
<p>As with code start points, only the first byte in each selected line will
be tagged.</p>
<p><b>Inline data</b> tags identify bytes as being part of the
instruction stream, but not instructions. A simple example of this
is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
<pre>
JSR $bf00
.DD1 $function
.DD2 $address
BCS BAD
</pre>
<p>The three bytes following the <code>JSR $bf00</code> should be tagged
as inline data, so that the code analyzer skips over them and continues the
analysis at the <code>BCS</code> instruction. You can think of these as
"code skip" tags, but they're different from stop/start points, because
every byte of inline data must be tagged. When
applying the tag, all bytes in a selected line will be modified.</p>
<p>If code branches into a region that is tagged as inline data, the
branch will be ignored.</p>
<h3><a name="scripts">Extension Scripts</a></h3>
<p>Extension scripts are C# source files that are compiled and
executed by SourceGen. They can be added to a project from SourceGen's
runtime data directory, or can live in the directory next to the project
file. They're used to generate visualizations of graphical data, and
to format inline data automatically.</p>
<p>The inline data formatting feature can significantly reduce the tedium
in certain projects. For example, suppose the code uses a string print
routine that embeds a null-terminated string right after a JSR. Ordinarily
you'd have to walk through the code, marking every instance by hand so
the disassembler would know where the string ends and execution resumes.
With an extension script, you can just pass in the print routine's label,
and let the script do the formatting automatically.</p>
<p>To reduce the chances of a script causing problems, all scripts are
executed in a sandbox with severely restricted access. Notably, nothing
in the sandbox can access files, except to read files from the PluginDll
directory.</p>
<p>The PluginDll directory lives next to the SourceGen executable, and
contains all of the compiled script DLLs, as well as two pre-built
application DLLs that plugins are allowed access to. The contents
are persistent, to avoid recompiling the scripts every time SourceGen
is launched, but may be manually deleted without harm.</p>
<p>More details can be found in the
<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
<h2><a name="pseudo-ops">Data and Directive Pseudo-Opcodes</a></h2>
<p>The on-screen code list shows assembler directives that are similar
to what the various cross-assemblers provide. The actual directives
generated for a given assembler may match exactly or be totally different.
The idea is to represent the concept behind the directive, then let the
code generator figure out the implementation details.</p>
<p>There are eight assembler directives that appear in the code list:</p>
<ul>
<li>.EQ - defines a symbol's value. These are generated automatically
when an operand that matches a platform or project symbol is found.</li>
<li>.VAR - defines a local variable. These are generated for
local variable tables.</li>
<li>.ADDRS/.ADREND - specifies the start or end of an
address region.</li>
<li>.RWID - specifies the width of the accumulator and index registers
(65816 only). Note this doesn't change the actual width, just tells
the assembler that the width has changed.</li>
<li>.DBANK - specifies what value the Data Bank Register holds
(65816 only). Used when matching operands to labels.</li>
<li>.JUNK - indicates that the data in a range of bytes is irrelevant.
(When generating sources, this will become .FILL or .BULK
depending on the contents of the memory region and the assembler's
capabilities.)</li>
<li>.ALIGN - a special case of .JUNK that indicates the irrelevant
bytes exist to force alignment to a memory boundary (usually a
256-byte page). Depending on the memory contents, it may be possible
to output this as an assembler-specific alignment directive.</li>
</ul>
<p>Every data item is represented by a pseudo-op. Some of them may
represent hundreds of bytes and span multiple lines.</p>
<ul>
<li>.DD1, .DD2, .DD3, .DD4 - basic "define data" op. A 1-4 byte
little-endian value.</li>
<li>.DBD2, .DBD3, .DBD4 - "define big-endian data". 2-4 bytes of
big-endian data. (The 3- and 4-byte versions are not currently
available in the UI, since they're very unusual and few assemblers
support them.)</li>
<li>.BULK - data packed in as compact a form as the assembler allows.
Useful for chunks of graphics data.</li>
<li>.FILL - a series of identical bytes. The operand
has two parts, the byte count followed by the byte value.</li>
</ul>
<p>In addition, several pseudo-ops are defined for string constants:</p>
<ul>
<li>.STR - basic character string.</li>
<li>.RSTR - string in reverse order.</li>
<li>.ZSTR - null-terminated string.</li>
<li>.DSTR - Dextral Character Inverted string. The high bit of the
last byte is flipped.</li>
<li>.L1STR - string prefixed with a length byte.</li>
<li>.L2STR - string prefixed with a length word.</li>
</ul>
<p>You can configure the pseudo-operands to look more like what your
favorite assembler uses in the
<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
application settings.</p>
<p>String constants start and end with delimiter characters, typically
single or double quotes. You can configure the delimiters differently
for each character encoding, so that it's obvious whether the text is
in ASCII or PETSCII. See the
<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
the application settings.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+292
View File
@@ -0,0 +1,292 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Intro - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Intro</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="overview">Overview</a></h2>
<p>SourceGen converts 6502/65C02/65816 machine-language programs to
assembly-language source.</p>
<p>SourceGen has two purposes. The first is to be a really nice
disassembler for the 6502 and related CPUs. Code tracing with status
flag tracking makes it easier to separate the code from the data,
automatic formatting of character strings and filled-data areas helps
get the data regions sorted out, and modern IDE-style features like
cross-reference generation and color-highlighted bookmarks help
navigate the code while trying to figure out what it does. A
disassembler should help you understand the code, not just dump the
instructions to a text file.</p>
<p>The computer I built back in 2014 has a 4GHz CPU and 8GB of RAM. I
figured we should put the power of modern computing hardware to good use.</p>
<p>The second purpose is to facilitate sharing and collaboration. Most
disassemblers generate output for a specific assembler, or in a way that's
generic enough to match most any assembler; either way, you're left with
a text file in somebody's idea of the "correct" format. SourceGen keeps
everything in an assembler-neutral format, and provides numerous options
for customizing the display, so that multiple people viewing the same
project can each do so with the conventions they are accustomed to.
Code and data operands can be formatted in various numeric formats or
as symbols.
The project file uses a text format that is fairly diff-friendly, so
sharing projects through git works reasonably well. If you want source
code you can assemble, SourceGen will generate code optimized for the
assembler of your choice.</p>
<p>The sharing and collaboration ideas only work if the formatting
capabilities within SourceGen are sufficiently flexible. If you need to
generate assembly source and tweak it a bunch to express the intent of
the original code, then passing a SourceGen project around won't work.
This sort of thing is a bit outside the bounds of what a typical
disassembler does, so it remains to be seen whether SourceGen succeeds at
what it's trying to do, and also whether what it's trying to do is
something that people actually want.</p>
<p>You can get started by watching a
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and working through
the <a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
<h2><a name="fundamental-concepts">Fundamentals</a></h2>
<p>The next few sections present some general concepts and terminology. The
rest of the documentation assumes you've read and understood this.</p>
<p>It will be helpful if you already understand something about the 6502
instruction set and assembly-language programming, but disassembling
other programs is actually a pretty good way to learn how to code in
assembly. You will need to be familiar with hexadecimal numbers and
general programming concepts to make sense of this, however.</p>
<h3><a name="begin">About 6502 Code</a></h3>
<p>For brevity's sake, "6502 code" should be taken to mean "code for
the 6502 CPU or any of its derivatives, including but not limited to
the 65C02 and 65816". So let's talk about 6502 code.</p>
<p>Code usually arrives in a big binary blob. Some of it will be
instructions, some of it will be data, some will be empty space used
for variable storage. Part of the challenge of disassembly is
identifying which parts of the file contain which.</p>
<p>Much of the code you'll find for the 6502 was written by humans,
rather than generated by a compiler, which means it won't conform to a
standard set of conventions. However, most programmers will use
subroutines, which can be identified and analyzed in isolation. Subroutines
are often interspersed with variable storage, referred to as a "stash".
Variables and constants may be single-byte or multi-byte, the latter
typically in little-endian byte order.</p>
<p>Much of the data in a typical program is read-only, often in the
form of graphics or character string data. Graphics can be difficult
to recognize automatically, but strings can be identified with a
reasonable degree of confidence. Address tables, which are a collection
of addresses to other things, are also fairly common.</p>
<p>A simple disassembler would start at the top of the file and just
start converting bytes to instructions. Unfortunately there's no reliable
way to tell the difference between instructions, data, and variable
stashes. When the converter hits data bytes it'll start generating
instructions that won't make sense. You'll have another problem when the
data ends and code resumes: 6502 instructions are variable-length, so if
the last byte of the data area appears to be a three-byte instruction,
the first two bytes of the next instruction area will be gobbled up.</p>
<p>To make things even more difficult (sometimes deliberately), programmers
will sometimes use a trick where they "embed" an instruction
inside another instruction. This allows code to branch to two different
entry points, one of which will set a flag or load a register, and then
continue on to common code.</p>
<p>Another trick is to embed "inline data" after a JSR or JSL instruction.
The called subroutine pulls the caller's address off the stack, uses it to
access the parameters, then pushes the address back on after modifying it to
point to an address past the inline data. This can be very confusing
for the disassembler, which will try to interpret the inline data as
instructions.</p>
<p>Sometimes code is loaded at one location, then moved to another and
executed there. If you're disassembling an executing program you don't
have to worry about this, but if you're disassembling the binary from the
loadable file on disk then you need to track the address changes. The
address is communicated to the assembler with a "pseudo-opcode", usually
something like "ORG" (short for "origin"). Other pseudo-op directives
are used to define things like constants and (for 65816 code)
register widths.</p>
<p>The 8-bit CPUs have a 16-bit (64KiB) address space, so addresses can
range from $0000 to $ffff. (I'm going to write hex values with a
preceding '$', like "$12ab", rather than "0x12ab" or "12abh", because
that's what 6502 systems commonly used.) The 65816 has a 24-bit address
space, but it's not contiguous -- a branch that extends past the end will
wrap around to the start of the 64KiB "bank". For 16-bit instruction
operands, the bank is identified for instruction and data addresses
by the program bank register and the data bank register, respectively.
The disassembler can't always discern the value of the data bank
register through static analysis, so some user input may be required.</p>
<p>The 6502 has an 8-bit processor status register ("P") with a bunch of flags
in it. Some of the flags determine whether a conditional branch is taken
or not, which is important because some branches appear to be conditional
but actually are always or never taken in practice. The disassembler needs
to be able to figure this out so that it doesn't try to disassemble the
bytes that follow an always-taken branch.
A more significant concern is the M and X flags found on the 65802/65816,
which determine the width of the registers and of immediate load
instructions. If you don't know what state the flags are in, you can't
know whether <code>LDA #value</code> is two bytes or three, and the
disassembly of the instruction stream will come out wrong.</p>
<p>Some addresses correspond to memory-mapped I/O, rather than RAM or ROM.
Accessing the address can have side effects, like changing between text
and graphics modes. Sometimes reading and writing have different effects.
For example, on later models of the Apple II, reading from
$C000 returns the most recently hit key, while writing to $C000 changes
how 80-column display memory is mapped.</p>
<p>On a few systems, such as the Atari 2600, RAM, ROM, and registers can
appear at multiple locations, "mirrored" across the address space.</p>
<h3><a name="charenc">Character Encoding</a></h3>
<p>The American Standard Code for Information Interchange (ASCII) was
developed in the 1960s, and became widely used as the means for representing
text data on a computer. It's compatible with Unicode, in that the
binary representation of an ASCII string is exactly the same when
expressed as a Unicode string with UTF-8 encoding.</p>
<p>Not all 6502-based computers used ASCII, notably those from Commodore
International (e.g. PET, VIC-20, 64, 128), which used variants
collectively known as "PETSCII". PETSCII had most of the same symbols,
but rearranged them, and added a number of graphical symbols. This was
further complicated by the use of two different character sets, one of
which dropped lower-case letters in favor of additional symbols, and
the use of a separate encoding for characters stored in the text frame
buffer ("screen codes").</p>
<p>Apple II computers were based on ASCII, but tended to store bytes
with the high bit set rather than clear. This is known as "high ASCII".</p>
<p>SourceGen allows you to specify that a string is encoded with ASCII,
High ASCII, C64 PETSCII, or C64 Screen Codes. Because the goal is to
generate assembly sources for cross-assemblers, the C64 character
support is limited to the set that overlaps with ASCII.</p>
<p>For the most part only printable characters are accepted in strings,
but certain control characters are also allowed. The characters for
bell ($07), linefeed ($0a), and carriage return ($0d) are recognized as
string data, and in C64 PETSCII a number of text color and formatting
control codes are also allowed.</p>
<h3><a name="sgconcepts">SourceGen Concepts</a></h3>
<p>As you work on a disassembled file, formatting operands and adding
comments, everything you do is saved in the project file as "meta data".
None of the data from the file being disassembled is included. This
should allow project files to be shared without violating the copyright
of the work being disassembled. (This will vary by region. Also, note
that the mere act of disassembling a piece of software may be illegal in
some cases.)</p>
<p>To avoid mix-ups where the wrong data file is used, the file's length
and CRC are stored in the project file. SourceGen will refuse to open a
project if the data file's length and CRC don't match.</p>
<p>Most of the data in the project file is associated with a file offset.
When you create a comment, you aren't associating it with line 53, you're
associating it with the 127th byte in the file. This ensures that, as the
project evolves, the comment you wrote is always connected to the
same instruction or data item. This also means you can't have two
comments on the same line -- each offset only has room for one. By
convention, file offsets are always shown as a six-digit hexadecimal value
with a leading '+', e.g. "+0012ab". This makes it easy to distinguish
between an address and a offset.</p>
<p>Instruction and data operands can be formatted in various ways. The
formatting choice is associated with the first offset of the item. For
instructions the number of bytes in the operand is determined by the opcode
(and, on the 65816, the M/X status flags). For data items the length
can be a single byte or an entire file. Operand formats are not allowed
to overlap.</p>
<p>When an instruction or data operand references an address, we call
it a <b>numeric reference</b>. When the target address has a label, and
the operand uses that symbol, we call that a <b>symbolic reference</b>.
SourceGen tries to establish symbolic references whenever possible,
so that the generated assembly source doesn't refer to hard-coded
locations within the program. Labels are generated automatically for
the targets of numeric references.</p>
<p>As your understanding of the disassembled code develops, you will want
to add comments explaining it. SourceGen projects have three kinds of
comments:</p>
<ol>
<li>End-of-line comments. As the name implies, these appear at the
end of a line, to the right of the opcode or operand.</li>
<li>Long comments, also known as multi-line comments. These get a
line all to themselves, and may span multiple lines.</li>
<li>Notes. Like long comments, these get a line to themselves. Unlike
long comments, these do not appear in generated assembly code. They
are a way for you to leave notes to yourself, perhaps "don't forget
to figure this out" or "this is the cool part".</li>
</ol>
<p>Every file offset can have one of each.</p>
<p>Labels and comments may disappear if you associate them with a file
offset that is in the middle of a multi-byte instruction or data item.
For example, suppose you put a long comment at offset +000010, and then
mark a 50-byte region starting at offset +000008 as an ASCII string. The
comment won't be deleted, but won't be displayed either. The same thing
can happen to labels. SourceGen will try to prevent this from happening
by splitting formatted data into sub-regions at label boundaries.</p>
<h2><a name="sgintro">How SourceGen Works</a></h2>
<p>SourceGen employs a partial emulation technique that traces the flow
of execution through the program. Most of what a given instruction does
isn't important; only its effect on the flow of execution matters. This
makes SourceGen different from most other disassemblers, because instead
of assuming everything is code and expecting the user to separate out the
data, it assumes everything is data and asks the user to identify where the
code starts executing.</p>
<p>SourceGen uses "code start points" to tag places where execution may
begin. By default, the first byte of the file is marked as a start point.
From there, the tracing process walks through the code, pursuing all
branches. In many cases, if you tag all external entry points, SourceGen
will automatically find all executable code and separate it from variable
storage and data areas.</p>
<p>As noted earlier, tracking the processor status flags can make the
analysis more accurate. Identifying situations where a branch instruction
is always or never taken avoids mis-categorizing a data region as code.
On the 65816, it's absolutely crucial to track the M/X flags, since those
affect the width of instructions. SourceGen tracks the value of the
processor flags at every instruction, blending sets of flags together when
multiple paths of execution converge.</p>
<p>Once instructions and data have been separated, the instruction operands
can be examined. Branches, loads, and stores that reference an address
that falls inside the address space covered by the file can be replaced
with a symbol. Operands that refer to addresses outside the file, such
as ROM or operating system routines, can be replaced with a symbol defined
by an equate directive.</p>
(For more details on how this works, see the
<a href="analysis.html">analysis appendix</a>.)
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+18
View File
@@ -0,0 +1,18 @@
/*
* Overall look and feel.
*/
body {
font-family: Arial, Helvetica, sans-serif;
padding: 0px;
margin: 0px;
}
#content {
/* top right bottom left */
margin: 20px 10px 10px 10px;
/*position: relative;*/
}
#footer {
/* top right bottom left */
margin: 20px 10px 10px 10px;
/*position: relative;*/
}
+615
View File
@@ -0,0 +1,615 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Using SourceGen - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Using SourceGen</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="starting-new">Starting a New Project</a></h2>
<p>Select File &gt; New, or if no project is open, click "Start new project".
This opens the Create New Project window.</p>
<p>Start by selecting your target system from the tree on the left.
The panel on the right will show the CPU that will be selected, as well
as the symbol files and extension scripts that will be loaded by default.
All of these may be overridden later from the project properties.
(If the description in the panel on the right says "[placeholder]", it
means that the system doesn't yet have a set of symbols defined for it.)</p>
<p>Next, click the "Select File..." button. Pick the file you wish to
disassemble. The dialog will update with the pathname and some notes
about the file's size. Click "OK" if all looks good to create the
project.</p>
<p><strong>NOTE:</strong> Support for very large 65816 programs is
incomplete. The maximum size for a data file is limited to 1 MiB.</p>
<p>The first time you save the project (with File &gt; Save), you will be
prompted for the project name. It's best to use the data file's name
with ".dis65" added, so this will be set as the default. The data
file's name is not stored in the project file, so if you pick a different
name, or save the project in a different directory, you will have to
select the data file manually whenever you open the project.</p>
<h2><a name="opening">Opening an Existing Project</a></h2>
<p>Select File &gt; Open, or if no project is open, click "Open
existing project". Select the .dis65 project file from the standard
file dialog.</p>
<p>SourceGen will try to open a data file with the project's name,
minus the ".dis65". If it can't find a file with that name, or if there's
something wrong with it (e.g. the CRC doesn't match), you will be given
the opportunity to specify the location of the data file to use.</p>
<p>If non-fatal problems with the file are detected, a warning will be
shown. If it's something simple, like a missing .sym65 or extension
script file, you'll be notified. If it's something more complicated,
e.g. the project has a comment on an offset that doesn't exist, you
will be warned that the problematic data has been deleted, and will be
lost if the project is saved. By default, such a project will be opened
in read-only mode, though you can override this in the dialog. You will
also be given the opportunity to simply cancel loading the project.</p>
<p>The locations of the last few projects you've worked with are saved
in the application settings. You can access them from
File &gt; Recent Projects. If no project is open, links to the two
most-recently-opened projects will be available.</p>
<h2><a name="working">Working With a Project</a></h2>
<p>The main project window is divided into five areas:</p>
<ol>
<li>Center: the code list. If no project is open, this will instead
have buttons to open a new or existing project.</li>
<li>Top left: cross-reference list.</li>
<li>Bottom left: notes list.</li>
<li>Top right: symbols list.</li>
<li>Bottom right: info on selected line.</li>
</ol>
<p>Most actions are performed in the center code list. All of the
sub-windows can be resized. The window sizes and column widths are
saved in the application settings file.</p>
<p>A toolbar near the top of the screen has some shortcut buttons.
If you hover your mouse over them, a tooltip with an explanation will
appear.</p>
<h3><a name="code-list">Code List</a></h3>
<p>The code list provides a view of the code being disassembled. Each
line may be an instruction, data item, long comment, note, or
assembler directive.</p>
<p>The list is divided into columns:</p>
<ul>
<li><b>Offset</b>. The offset within the file where the instruction
or data item starts. Throughout the UI, file offsets are shown as
six-digit hex values with a leading '+'.</li>
<li><b>Address</b>. The address where the assembled code will execute.
For 8-bit CPUs this is shown as a 4-digit hex number, for 16-bit
CPUs the bank is shown as well. Double-click on this field to open the
<a href="editors.html#address">Edit Address</a> dialog.</li>
<li><b>Bytes</b>. Shows up to four bytes from the data file that
correspond to the instruction or data. To see the full dump of
a longer item, such as an ASCII string, double-click on the field
to open the
<a href="tools.html#hexdump">Hex Dump Viewer</a>. This is
a floating window, so you can keep it open while you work.
Double-clicking in the bytes column while the window is open will
update the viewer's position and selection.</li>
<li><b>Flags</b>. This shows the state of the status flags as they
are before the instruction is executed. Double-click on this
field to open the
<a href="editors.html#flags">Edit Status Flag Override</a> dialog.</li>
<li><b>Attributes</b>. Some instructions and data items have
interesting attributes.
'@' indicates an entry point,
'T' means one or more bytes has an analyzer tag (code start/stop/skip),
'#' means execution will not continue to the following instruction,
'>' is shown for branch targets, and
'!' appears when a conditional branch is never taken.
(This column is rarely useful and can be hidden.)</li>
<li><b>Label</b>. If a label has been defined for this offset, by
the user or generated automatically, it will appear here. Also,
full-line items like long comments and notes will start in this
field. Double-click on this field to open the
<a href="editors.html#label">Edit Label</a> dialog.</li>
<li><b>Opcode</b>. The instruction or pseudo-opcode mnemonic.
If an instruction is embedded inside this one, a &#x25bc; symbol
will appear.
If you double-click this field for an instruction or data item
whose operand refers to an address in the file, the selection will
jump to that location. If the operand is a local variable, the
selection will jump to the point where the variable was defined.</li>
<li><b>Operand</b>. The instruction or data operand. Data operands
may span a large number of bytes. Double-click on this field to
open the
<a href="editors.html#instruction-operand">Edit Instruction Operand</a>
or <a href="editors.html#data-operand">Edit Data Operand</a> dialog, as
appropriate. (Note you can shift-double-click on data items to
edit multiple lines.)</li>
<li><b>Comment</b>. End-of-line comment, generally shown with a ';'
prefix. If enabled, cycle counts will appear here. Double-click
on this field to open the
<a href="editors.html#comment">Edit Comment</a> dialog.</li>
</ul>
<p>Double-clicking anywhere on a line with a note or long comment will
open the
<a href="editors.html#note">Edit Note</a> or
<a href="editors.html#long-comment">Edit Long Comment</a> dialog,
respectively.</p>
<p>The code list is a standard Windows list view. You can left-click
to select an item, ctrl-left-click to toggle individual items on and
off, and shift-left-click to select a range. You can select all lines
with Edit &gt; Select All. Resize columns by
left-clicking on the divider in the header and dragging it.</p>
<p>Selecting any part of a multi-line item, such as a long comment
or character string, effectively selects the entire item.</p>
<p>Right-clicking opens a menu. The contents are the same as those in
the Actions menu item in the menu bar. The set of options that are
enabled will depend on what you have selected in the main window.</p>
<ul>
<li><a href="editors.html#address">Set Address</a>. Sets the
target address at that offset. When multiple lines are selected,
the target addresses at the start and end of the range is set.
Enabled when the first line selected is code, data, or an address
override, and the full selected range does not overlap with another
address override.</li>
<li><a href="editors.html#flags">Override Status Flags</a>. Changes
the status flags at that offset. Enabled when a single instruction
line is selected.</li>
<li><a href="editors.html#label">Edit Label</a>. Sets the label
at that offset. Enabled when a single instruction or data line is
selected.</li>
<li><a href="editors.html#instruction-operand">Edit Operand</a>. Opens the
Edit Instruction Operand or Edit Data Operand window, depending on
what's selected.
Enabled when a single instruction line is selected, or when one
or more data lines are selected.</li>
<li><a href="editors.html#comment">Edit Comment</a>. Sets the
comment at that offset. Enabled when a single instruction or data
line is selected.</li>
<li><a href="editors.html#long-comment">Edit Long Comment</a>. Sets
the long comment at that offset. Enabled when a single instruction
or data line, or an existing long comment, is selected.</li>
<li><a href="editors.html#note">Edit Note</a>. Sets the note at
that offset. Enabled when a single instruction or data line, or
an existing note, is selected.</li>
<li><a href="editors.html#project-symbol">Edit Project Symbol</a>.
Sets the name, value, and comment of the project symbol. Enabled
when a single equate directive, generated from a project symbol, is
selected.</li>
<li><a href="editors.html#lvtable">Create Local Variable Table</a>.
Create a new local variable table.</li>
<li><a href="editors.html#lvtable">Edit Prior Local Variable Table</a>.
Modify or delete entries in the most recently defined local
variable table.</li>
<li><a href="visualization.html#vis-and-sets">Create/Edit Visualization Set</a>.
Create a new visualization set or edit an existing set.</li>
<li><a href="#atags">Analyzer Tags</a> (Tag Address As Code Start Point,
Tag Address As Code Stop Point, Tag Bytes As Inline Data,
Remove Analyzer Tags).
Enabled when one or more code and data lines are selected. Remove
Analyzer Tags is only enabled when at least one line has tags. The
keyboard shortcuts are two-key combinations.</li>
<li><a href="#address-table">Format Address Table</a>. Formats
a series of bytes as parts of a table of addresses.</li>
<li><a href="#toggle-single">Toggle Single-Byte Format</a>. Toggles
a range of lines between default format and single-byte format. Enabled
when one or more data lines are selected.</li>
<li><a href="#format-as-word">Format As Word</a>. Formats two bytes as
a 16-bit little-endian word.</li>
<li>Delete Note / Long Comment. Deletes the selected note or long
comment. Enabled when a single note or long comment is selected.</li>
<li><a href="tools.html#hexdump">Show Hex Dump</a>. Opens the hex dump
viewer, with the current selection highlighted. Always enabled. If
nothing is selected, the viewer will open at the top of the file.</li>
</ul>
<h3><a name="undo">Undo &amp; Redo</a></h3>
<p>You can undo a change with Edit &gt; Undo, or Ctrl+Z. You can redo a
change with Edit &gt; Redo, Ctrl+Y, or Ctrl+Shift+Z.</p>
<p>All changes to the project, including changes to the project properties,
are added to the undo/redo buffer. This has no fixed size limit, so no
matter how much you change, you can always undo back to the point where
the project was opened.</p>
<p>The undo history is not saved as part of the project. Closing a project
clears it.</p>
<h3><a name="references">References Window</a></h3>
<p>When a single instruction or data line is selected in the main window,
all references to that offset will be shown in the References window.
For each reference, the file offset, address, and some details about the
type of reference will be shown.</p>
<p>The reference type indicates whether the origin is an instruction or
data operand, and provides an indication of the nature of the reference:</p>
<ul>
<li>call - subroutine call
(e.g. <code>JSR addr</code>, <code>JSL addr</code>)</li>
<li>branch - conditional or unconditional branch
(e.g. <code>JMP addr</code>, <code>BCC addr</code>)</li>
<li>read - read from memory
(e.g. <code>LDA addr</code>, <code>BIT addr</code>)</li>
<li>write - write to memory
(e.g. <code>STA addr</code>)</li>
<li>rmw - read-modify-write
(e.g. <code>LSR addr</code>, <code>TSB addr</code>)</li>
<li>ref - reference to address by instruction
(e.g. <code>LDA #&lt;addr</code>, <code>PEA addr</code>)</li>
<li>data - reference to address by data
(e.g. <code>.DD2 addr</code>)</li>
</ul>
<p>References from instructions that use indexed addressing
(e.g. <code>LDA addr,Y</code>) will also show "idx" to indicate that
the instruction is using the location as a base address.</p>
<p>References from instructions that treat the address as a pointer
(e.g. <code>LDA (dp),Y</code>) will show "ptr". This makes it easy
to identify the locations that are reading or writing through the
pointer from those that are reading or writing the pointer itself.</p>
<p>This will be prefixed with "Sym" or "Oth" to indicate whether or not
the reference used the label at the current address. To understand
this, consider that addresses can be referenced in different ways.
For example:</p>
<pre>
LDA DATA0
LDX DATA0+1
RTS
DATA0 .DD1 $80
DATA1 .DD2 $90
</pre>
<p>Both <code>DATA0</code> and <code>DATA1</code> are accessed, but
both operands used <code>DATA0</code>. When the <code>DATA0</code> line
is selected in the code list, the references window will show the
<code>LDA</code> and <code>LDX</code> instructions, because both
instructions referenced it. When <code>DATA1</code> is selected, the
references window will show the <code>LDX</code>, because that
instruction accessed <code>DATA1</code>'s location even though it didn't
use the symbol. To make the difference clear, the lines in the references
window will either show "Sym" (to indicate that the symbol at the selected
line was referenced) or "Oth" (to indicate that some other symbol, or no
symbol, was used).</p>
<p>When an equate directive (generated for platform and project
symbols) or local variable assignment is selected, the References
window will show all references to that symbol. Unlike in-file
references, only the uses of that symbol are shown. So if you have
both a project symbol and a local variable for address $30, they
will show disjoint sets of references. Furthermore, if you explicitly
format an instruction operand as hex, e.g. <code>LDA $30</code>, it will
not appear in either set because it's not a symbolic reference.</p>
<p>The cross-reference data is used to generate the set of equate
directives at the top of the listing. If nothing references a platform
or project symbol, an equate directive will not be generated for it.</p>
<p>Double-clicking on a reference moves the code list selection to that
reference, and adds the previous selection to the navigation stack.</p>
<h3><a name="notes">Notes Window</a></h3>
<p>When you add a note, it will also be added to this window.
Double-clicking on a note will jump directly to it, and add the previous
selection to the navigation stack. This makes notes useful as bookmarks.</p>
<h3><a name="symbols">Symbols Window</a></h3>
<p>All known <a href="intro-details.html#about-symbols">symbols</a> are shown
here. The filter buttons allow you to screen out symbols you're not
interested in, such as platform symbols or constants.</p>
<p>Clicking on one of the column headers will sort the list on that
field. Click a second time to reverse the sort direction.</p>
<p>Double-clicking on an auto or user label will jump to that label, and
add the previous selection to the navigation stack. This can be a handy
way to move around the file, jumping from label to label.</p>
<p>The "type" column uses a two-letter code to identify the symbol's
type and scope. The first letter is one of A (auto), U (user),
P (platform), J (project), R (pre-label), or V (variable).
The second letter is one of N (non-unique local), L (local), G (global),
X (exported), E (external), or C (constant).</p>
<h3><a name="info">Info Window</a></h3>
<p>Some additional information about the currently-selected line is
shown, such as the formatting applied to the operand. If the operand
has a default format, any automatically-generated format will be noted.
For an instruction,
a summary is shown that includes the cycle count, flags affected, and a
brief description of what the instruction does. The latter can be
especially handy for undocumented instructions.</p>
<h3><a name="messages">Messages Window</a></h3>
<p>Sometimes a change will invalidate an earlier change. For example,
suppose you add a code stop point, and format the data that follows
as a string. Later on you change it to a code start point. You now have
a block of executable code with a string format record sitting in the
middle of it. SourceGen tries very hard not to throw away anything
you've done, but it will ignore anything invalid.</p>
<p>If a problem like this is encountered, an entry is added to a list
of messages displayed at the bottom of the main window. Each entry identifies
the nature of the problem, the severity of the problem, the offset where
it occurred, and what was done to resolve it. The problem categories
include:</p>
<ul>
<li>Hidden label: a label placed on code or data is now stuck in the
middle of a multi-byte instruction or data item.</li>
<li>Unresolved weak ref: a reference to a non-existent symbol was found.</li>
<li>Invalid offset or length: the offset or length in a format object
had an invalid value.</li>
<li>Invalid descriptor: the format descriptor is inappropriate,
e.g. formatting an instruction as a string.</li>
</ul>
<p>The "context" column will provide additional detail about the problem,
and the "resolution" column will indicate how it's being handled. In most
cases, the offending item will be ignored.</p>
<p>Double-clicking on an entry will jump to that offset.</p>
<p>The message list will not appear if there are no messages. You can
hide the list by clicking on the "Hide" button to the left of the messages.
Un-hide the list by clicking on the "N messages" button at the bottom-right
corner of the application window.</p>
<h3><a name="navigation">Navigation</a></h3>
<p>The simplest way to move through the code list is with the scroll wheel
on your mouse, or by left-clicking and dragging the scroll bar. You
can also use PgUp/PgDn and the arrow keys.</p>
<p>Use Navigate &gt; Find to search for text. This performs a case-insensitive
text search on the label, opcode, operand, and comment fields.
Use Navigate &gt; Find Next to find the next match, and
Navigate &gt; Find Previous to find the previous match. Note "next" is
always downward, and "previous" is always upward, regardless of the
direction of the initial search chosen in the Find dialog.</p>
<p>Use Navigate &gt; Go To to jump to an offset, address, or label. Remember
that offsets and addresses are always hexadecimal, and offsets start
with a '+'. If you have a label that is also a valid hexadecimal
address, like "FEED", the label takes precedence. To jump to the address
write "$FEED" instead. If you enter a non-unique label, the selection
will jump to the nearest instance.</p>
<p>If an instruction or data line has an operand that references an address
in the file, you can navigate to the operand's location with
Navigate &gt; Jump to Operand. You can also do this by double-clicking
in the opcode column.</p>
<p>When you edit something, lines throughout the listing can change. This
is different from a source code editor, where editing a line just changes
that line. To allow you to watch the effects changes have, the undo/redo
commands try to keep the listing in the same position.
If you want to go to the place where the last change (i.e. the change
that will be undone by the next Undo operation) was made,
Navigate &gt; Go to Last Change will jump to the first offset
associated with the most recent change.
If the last change was to the project properties, it will jump to the
first offset in the file.</p>
<p>When you jump around, e.g. by double-clicking on an opcode or an entry
in one of the side windows, the previously-selected line is added to
a navigation stack. You can use Navigate &gt; Nav Forward and
Navigate &gt; Nav Backward to move forward and backward through the
stack. (The curly arrows on the left side of the toolbar may be more
convenient. You can use Alt+Left/Right Arrow, or
Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)</p>
<h3><a name="atags">Adding and Removing Analyzer Tags</a></h3>
<p><i>(Note: These were referred to as code/data "hints" in older
versions of SourceGen.)</i></p>
<p>To set code start or stop points, select the desired offsets and
use Actions &gt; Tag Address As Code Start Point (or Stop Point). Because
these indicate a transition between code and data regions, there is rarely
any need to tag multiple consecutive bytes.
For this reason, only the first byte on each selected line will be tagged.</p>
<p>For inline data, you need to cover the entire range, so every byte in every
selected line is tagged when you select Tag Bytes As Inline Data. Similarly,
the Remove Analyzer Tags menu item will remove tags from every byte.</p>
<p>If you're having a hard time selecting just the right bytes because
the instructions are caught up in a multi-byte data item, such as an
auto-detected character string, you can disable uncategorized data analysis
(the thing that creates the .STR and .FILL ops for you). You can do this
from the
<a href="settings.html#project-properties">project properties</a> editor,
or simply by hitting Ctrl+D. Hit that, tag the byte or bytes, then hit it
again to re-enable the string &amp; fill analyzer.</p>
<p>Another approach is to use the "Toggle Single-Byte Format"
menu item to "flatten" the item.</p>
<h3><a name="address-table">Format Address Table</a></h3>
<p>Tables of addresses are fairly common. Sometimes you'll find them as a
series of 16-bit words, like this:</p>
<pre>
jmptab .dd2 func1
.dd2 func2
.dd2 func3
</pre>
<p>While that's fairly common in 16-bit software, 8-bit software often splits
the high and low bytes into separate arrays, like this:</p>
<pre>
jmptabl .dd1 &lt;func1
.dd1 &lt;func2
.dd1 &lt;func3
jmptabh .dd1 &gt;func1
.dd1 &gt;func2
.dd1 &gt;func3
</pre>
<p>Sometimes the tables contain <code>address - 1</code>, because the
values are to be pushed onto the stack for an RTS call.</p>
<p>While the .dd2 case is easy to format with the data operand editor,
formatting addresses whose components are split into multiple tables can
be tedious. Even in the easy case, you may want to create labels and set
code start points for each item.</p>
<p>The Address Table Formatter helps you associate symbols with the
addresses in the table. It works for simple and "split" tables.</p>
<p>To use it, start by selecting the entire table. In the examples above,
you would select all 6 bytes. The number of bytes in each part of a
split table must be equal: here, it's 3 low bytes, followed by 3 high
bytes. If the number of bytes selected can't be evenly divided by the
number of parts -- two parts for 16-bit data, three parts for 24-bit data --
the formatter will report an error.</p>
<p>With the data selected, open the format dialog with
Actions &gt; Format Split-Address Table. The rather complicated dialog
is split into sections.</p>
<ul>
<li>Address Characteristics: select whether the table has 16-bit
addresses or 24-bit addresses. (24-bit addresses are disabled if you
don't have the CPU set to 65816.) If the table is split into individual
sub-tables for low bytes and high bytes, check the "Parts are split
across sub-tables" box. If the address parts are being pushed
on the stack for an RTS/RTL, check the "Adjusted for RTS/RTL" box to
adjust them by 1.</li>
<li>Low Byte Source: indicate which part of the table or word holds the
low bytes. For common little-endian words, the low bytes come first. In
the split-table example above, the low bytes came first, followed by the
high bytes, so you would select "first part of selection". If they were
stored the other way around, you would click "second part" instead.</li>
<li>High Byte Source: indicate which part of the table or word holds
the high bytes. For a 16-bit address this will be the part you didn't
pick for the low bytes.
Sometimes, if all addresses land on the same 256-byte page, the high byte
will be a constant in the code, and only the low bytes will be stored in
a table. If that's the case, select "Constant", and enter the high byte
in the text box. (Decimal, hex, and binary are accepted.)</li>
<li>Bank Byte Source: for 24-bit addresses, you can select "Nth part of
selection", which will just use whichever part you didn't specify for
the low and high bytes. If the table holds 16-bit addresses, you can
use the "Constant" field to specify the data bank.</li>
<li>Options: if the table holds the addresses of executable code, check
the "Tag targets as code start points" box. If the target address
hasn't been identified by the code analyzer through some other execution
path, it will be tagged as a code start point.</li>
<li>Generated Addresses: this shows the full list of addresses that are
generated with the current set of parameters. Each address is shown with
a file offset and a symbol. If the address can't be mapped within the
file, the offset is shown as dashes instead. If the address can be
mapped, and it already has a user-specified label, the label will be
shown. If no label was found, the table will show "(+)", indicating
that a permanent label will be added at the target offset. If everything
is set up correctly, and the addresses fall entirely within the program,
you shouldn't see any unknown entries here.</li>
</ul>
<p>For a 16-bit address, you have three choices: low byte first, high byte
first, or low byte only with a constant high byte. For a 24-bit address
the set of possibilities expands, but is essentially the same: pick the
order in which things appear, using fixed constants if desired.</p>
<p>A message at the top of the screen shows how many bytes are selected.
It also tells you how many groups there are, but unlike the data operand
formatter, the split-address table formatter doesn't care about group
boundaries. For this reason, tables do not have to be contiguous in
memory. The low bytes and high bytes could be on separate 256-byte
pages. You just need to have all of the data selected.</p>
<p>It should be mentioned that SourceGen does not record the fact that the
data in question is part of a table. The formatting, labels, and code
start point tags are applied as if you entered them all individually by
hand. The formatter is just significantly more convenient. It also
does everything as a single undoable action, so if it comes out looking
wrong, just hit "undo" and try something else.</p>
<h3><a name="toggle-single">Toggle Single-Byte Format</a></h3>
<p>The "Toggle Single-Byte Format" feature provides a quick way to
change a range of bytes to single bytes
or back to their default format. It's equivalent to opening the Edit
Data Operand dialog and selecting "Single bytes" displayed as hex, or
selecting "Default".</p>
<p>This can be handy if the default format for a range of bytes is a
string, but you want to see it as bytes or set a label in the middle.</p>
<h3><a name="format-as-word">Format As Word</a></h3>
<p>This is a quick way to format pairs of bytes as 16-bit words. It's
equivalent to opening the Edit Data Operand dialog and selecting
"16-bit words, little-endian", displayed as hex.</p>
<p>To avoid some confusing situations, it only works on sets of
single-byte values. This means, for example, that you can't select a
10-byte string and have it turn into five 16-bit words. You can select as
many bytes as you want, but they must come in pairs. (Remember that you
can turn off auto-generation of strings and .FILLs with
<a href="#toggle-data">Toggle Data Scan</a>.)</p>
<p>As a special case, if you select a single byte, the following byte will
also be selected. This won't work if the following byte is part of a
multi-byte data item, is the start of a new region (see
<a href="editors.html#data-operand">Edit Data Operand</a> for a definition of
what splits a region), or is the last byte in the file.</p>
<h3><a name="toggle-data">Toggle Data Scan</a></h3>
<p>This menu item is in the Edit menu, and acts as a shortcut to opening
the Project Properties editor, and clicking on the "Analyze Uncategorized
Data" checkbox. When enabled, SourceGen will look for character strings and
regions of identical bytes, and generate .STR and .FILL directives. When
disabled, uncategorized data is presented as one byte per line, which can
be handy if you're trying to get at a byte in the middle of a string.</p>
<p>As with all other project property changes, this is an undoable
action.</p>
<h3><a name="clipboard">Copying to Clipboard</a></h3>
<p>When you use Edit &gt; Copy, all lines selected in the code list are
copied to the system clipboard. This can be a convenient way to post
code snippets into forum postings or documentation. The text is
copied from the data shown on screen, so your chosen capitalization
and pseudo-ops will appear in the copy.</p>
<p>Long comments are included, but notes are not.</p>
<p>By default, only the label, opcode, operand, and comment fields are
included. From the
<a href="settings.html#app-settings">app settings</a> dialog you can select
alternative formats that include additional columns.</p>
<p>A copy of all of the fields is also written to the clipboard in CSV
format. If you have a spreadsheet like Excel, you can use Paste Special
to put the data into individual cells.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+393
View File
@@ -0,0 +1,393 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Properties &amp; Settings - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Properties &amp; Settings</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="overview">Settings Overview</a></h2>
<p>There are two kinds of settings: application settings, and
project properties.</p>
<h2><a name="app-settings">Application Settings</a></h2>
<p>Application settings are stored in a file called "SourceGen-settings"
in the SourceGen installation directory. If the file is missing or
corrupted, default settings will be used. These settings are local
to your system, and include everything from window sizes to whether or not
you prefer hexadecimal values to be shown in upper case. None of them
affect the way the project analyzes code and data, though they may affect
the way generated assembly sources look.</p>
<p>The settings editor is divided into four tabs. Changes don't take
effect until you hit Apply or OK.</p>
<h3><a name="appset-codeview">Code View</a></h3>
<p>These settings change the way the code looks on screen.</p>
<p>Click the Column Visibility buttons to hide columns. Click them
again to restore the column to a width appropriate for the current font.
A "hidden" column just has a width of zero, so with careful mouse
positioning you can show and hide columns by dragging the column headers.
The buttons may be more convenient though.</p>
<p>You can select a different font for the code list, and make it as large
or as small as you want. Mono-space fonts like Courier or Consolas are
recommended (and will be the only ones shown).</p>
<p>You can choose to display different parts of the display in upper or
lower case, using the "all lower" and "all upper" buttons as a quick way
to set all values. These settings are also used for generated assembly
code, unless the assembler has specific case-sensitivity requirements. There
is no setting for labels, which are always case-sensitive.</p>
<p>The Clipboard drop-down list lets you choose the format for text
<a href="mainwin.html#clipboard">copied to the clipboard</a>. The
"Assembler Source" format includes the rightmost columns (label,
opcode, operand, and comment), like assembly source code does. The
"Disassembly" format adds the address and bytes on the left. Use
the "All Columns" format to get all columns.</p>
<p>When "show cycle counts for instructions" is checked, every instruction
line will have an end-of-line comment that indicates the number of cycles
required for that instruction. If the cycle count can't be determined
solely from a static analysis, e.g. an extra cycle is required if
<code>LDA (dp),Y</code> crosses a page boundary, a '+' will be shown.
In some cases the variability can be factored out if the state of
certain status flags is known, e.g. 65C02 instructions that take longer
in decimal mode won't be shown as variable if the analyzer can determine
that D=0 or D=1. This checkbox enables display in the on-screen list, but
does not affect generated source code, which can be configured independently
on the Asm Config tab.</p>
<p>Check "use 'dark' color scheme" to change the main disassembly list
to use white text on a black background, and mute the Note highlight
colors.
(Most of the GUI uses standard Windows controls that take their colors
from the system theme, but the disassembly list uses a custom style. You
can change the rest of the UI from the Windows display "personalization"
controls.)</p>
<h3><a name="appset-textdelim">Text Delimiters</a></h3>
<p>Character and string operands are shown surrounded by quotes, e.g.
<code>LDA #'*'</code> or <code>.STR "Hello, world!"</code>. It's
handy to be able to tell at a glance how characters are encoded, so
SourceGen allows you to set the delimiters independently for every
supported character encoding.</p>
<p>String operands may contain a mixture of text and hexadecimal values.
For example, in ASCII data, the control characters for linefeed and
carriage return ($0a and $0d) are considered part of the string, but
don't have a printable symbol. (Unicode defines some glpyhs, but they
don't look very good at smaller font sizes.)</p>
<p>If one of the delimiter characters appears in the string itself,
the character will be output as hex to avoid confusion. For this
reason, it's generally wise to use delimiter characters that aren't
part of the ASCII character set. The "Sample Characters" box holds some
characters that you can copy and paste (with Ctrl+C / Ctrl+V) into the
delimiter fields.</p>
<p>For character operands, the prefix and suffix are added to the start
and end of the operand. For string operands, the prefix is added to the
start of the first line, and suffixes aren't allowed.
<p>These options change the way the code list looks on screen. They
do not affect generated code, which must use the delimiter characters
specified by the chosen assembler.</p>
<h3><a name="appset-displayformat">Display Format</a></h3>
<p>These options change the way the code list looks on screen. They
do not affect generated code.</p>
<p>The
<a href="intro-details.html#width-disambiguation">operand width disambiguator</a>
strings are used when the width of an instruction operand is unclear.
You may specify values for all of them or none of them.</p>
<p>Different assemblers have different ways of forming expressions.
Sometimes the rules allow expressions to be written simply, other times
explicit grouping with parenthesis is required. Select whichever style
you are most comfortable with.</p>
<p>Non-unique labels are identified with a prefix character, typically
'@' or ':'. The default is '@', but you can configure it to any character
that isn't valid for the start of a label. (64tass uses '_' for locals,
but that's a valid label start character, and so isn't allowed here.)
The setting affects label editing as well as display.</p>
<p>If you would like your local variables to be shown with a prefix
character, you can set it in the "local variable prefix" box.</p>
<p>The "quick set" pop-up configures the fields on the left side of the
tab to match the conventions of the specified assembler. Select your
preferred assembler in the combo box to set the fields. The setting
automatically switches to "custom" when you edit a field.
(64tass and ACME use the "common"
expression style, cc65 and Merlin 32 have their own unique styles.)</p>
<p>The "add spaces in Bytes column" checkbox changes the format of the
hex data in the code list "bytes" column from dense (<code>20edfd</code>)
to spaced (<code>20 ed fd</code>). This also affects the format of
clipboard copies and exports.</p>
<p>The "comma-separated format for bulk data" determines whether large
blocks of hex look like <code>ABC123</code> or
<code>$AB,$C1,$23</code>. The former reduces the number of lines
required, the latter is more readable.</p>
<p>Long operands, such as strings and bulk data, are wrapped to a new
line after a certain number of characters. Use the pop-up to configure
the value. Larger values can make the code easier to read, but smaller
values allow you to shrink the width of the operand column in the
on-screen listing, moving the comment field closer in.</p>
<h3><a name="appset-pseudoop">Pseudo-Op</a></h3>
<p>These options change the way the code list looks on screen. Assembler
directives and data pseudo-opcodes will use these values. This does
not affect generated source code, which always matches the conventions
of the target assembler.</p>
<p>Enter the string you want to use for the various data formats. If
a field is left blank, a default value is used.</p>
<p>The "quick set" pop-up configures the fields on this tab to match
the conventions of the specified assembler. Select your preferred assembler
in the combo box to set the fields. The setting automatically switches to
"custom" when you edit a field.</p>
<h3><a name="appset-asmconfig">Asm Config</a></h3>
<p>These settings configure cross-assemblers and modify assembly source
generation in various ways.</p>
<p>To configure an assembler, select it in the pop-up menu. The fields
will initially contain assembler-specific default values. All of
the values in the Assembler Configuration box may be configured
differently for each assembler.</p>
<p>The "executable" box holds the full path to the cross-assembler
executable.</p>
<ul>
<li>64tass: <code>64tass.exe</code>
<li>ACME: <code>acme.exe</code>
<li>cc65: <code>bin/cl65.exe</code> -- full installation required,
with all configuration files and libraries
<li>Merlin 32: <code>Merlin32.exe</code>
</ul>
<p>The "column widths" section allows you to specify the minimum
width of the label, opcode, operand, and comment fields. If the width
is less than 1, or isn't a valid number, 1 will be used. These are
not hard stops: if the contents of a field are too wide, the contents
of the next column will be pushed over. (The comment field width is
not currently being used, but may be used to fold lines in the future.)</p>
<p>When "show cycle counts in comments" is checked, cycle counts are
inserted into end-of-line comments. This works the same as the option
in the Code View tab, but applies to generated source code rather than
the on-screen display.</p>
<p>If "put long labels on separate line" is checked, labels that are
longer than the label column are placed on their own line. This looks
a bit nicer because otherwise the opcode gets pushed out of alignment.
(Some assemblers get bent out of shape if you split an equate
directive, so those might stay on one line.)</p>
<p>If you enable "identify assembler in output", a comment will be
added to the top of the generated assembly output that identifies the
target assembler and version. It also shows the command-line options
passed to the assembler. This can be very helpful if the source
file is sent to other people, since it may not otherwise be obvious from
the source file what the intended target assembler is, or what options
are required to process the file correctly.</p>
<h2><a name="project-properties">Project Properties</a></h2>
<p>Project properties are stored in the .dis65 project file.
They specify which CPU to use, which extension scripts to load, and a
variety of other things that directly impact how SourceGen processes
the project. Because of the potential impact, all changes to
the project properties are made through the undo/redo buffer,
which means you hit "undo" to revert a property change.</p>
<p>The properties editor is divided into four tabs. Changes aren't pushed
out to the main application until you close the dialog. Clicking Apply
will capture the current changes, ensuring that they're applied even if
you later hit Cancel, but the changes are not applied immediately.</p>
<h3><a name="projprop-general">General</a></h3>
<p>The choice of CPU determines the set of available instructions, as
well as cycle costs and register widths. There are many variations
on the 6502, but from the perspective of a disassembler most can be
treated as one of these four:</p>
<ol>
<li>MOS 6502. The original 8-bit instruction set.</li>
<li>WDC 65C02. Expanded the instruction set and smoothed
some rough edges.</li>
<li>WDC W65C02S. An enhanced version of the 65C02, with some
additional instructions introduced by Rockwell (R65C02), as well
as WDC's STP and WAI instructions. The Rockwell additions overlap
with 65816 instructions, so code that uses them will not work on
16-bit CPUs.</li>
<li>WDC W65C816S. Expanded instruction set, 24-bit address space,
and 16-bit registers.</li>
</ol>
<p>The Hudson Soft HuC6280 and Commodore CSG 4510 / 65CE02 are very
similar, but they have additional instructions and some fundamental
architectural changes. These are not currently supported by SourceGen.</p>
<p>If "enable undocumented instructions" is checked, some additional
opcodes are recognized on the 6502 and 65C02. These instructions are
not part of the chip specification, but most of them have consistent
behavior and can be used. If the box is not checked, the instructions
are treated as invalid and cause the code analyzer to assume that it
has run into a data area. This option has no effect on the 65816.</p>
<p>The "treat BRK as two-byte instruction" checkbox determines whether
BRK instructions should be handled as if they have an operand.</p>
<p>The entry flags determine the initial value for the processor status
flag register. Code that is unreachable internally (requiring a code
start point tag) will use this value. This is chiefly of value for
65816 code, where the initial value of the M/X/E flags has a significant
impact on how instructions are disassembled.</p>
<p>If "analyze uncategorized data" is checked, SourceGen will attempt to
identify character strings and regions that are filled with a repeated
value. If it's not checked, anything that isn't detected as code or
explicitly formatted as data will be shown as individual byte values.</p>
<p>If "seek nearby targets" is checked, the analyzer will try to use
nearby labels for data loads and stores, adjusting them to fit
(e.g. <code>LDA LABEL+1</code>). If not enabled, labels are not applied
unless they match exactly. Note that references into the middle of an
instruction or formatted data area are always adjusted, regardless of
how this is set. This setting has no effect on local variables, and
only enables a 1-byte backward search on project/platform symbols.</p>
<p>The "use relocation data" checkbox is only available if the project
was created from a relocatable source, e.g. by the OMF Converter tool.
If checked, information from the relocation dictionary will be used to
improve automatic operand formatting.</p>
<p>If "smart PLP handling" is checked, the analyzer will try to use
the processor status flags from a nearby <code>PHP</code> when a
<code>PLP</code> is encountered. If not enabled, all flags are set to
"indeterminate" following a <code>PLP</code>, except for the M/X
flags on the 65816, which are left unmodified. (In practice this
approach doesn't seem to work all that well, so the setting is
un-checked by default.)</p>
<p>If "smart PLB handling" is checked, the analyzer will watch for
code patterns like <code>PLB</code> preceded by <code>PHK</code>,
and generate appropriate Data Bank Register changes. If not enabled,
the DBR is set to the bank of the address of the start of the file,
and does not change unless explicitly set. Only useful for 65816 code.</p>
<p>The "default text encoding" setting has two effects. First, it
specifies which character encoding to use when searching for strings in
uncategorized data. Second, if an assembler has a notion of preferred
character encoding (e.g. you can default string operands to PETSCII),
this setting will determine which encoding is preferred.</p>
<p>The "min chars for string detection" setting determines how many
ASCII characters need to appear consecutively for the data analyzer to
declare it a string. Shorter values are prone to false-positive
identifications, longer values miss out on short strings. You can also
set it to "none" to disable automatic string identification.</p>
<p>The auto-label style setting determines the format for labels that are
generated automatically. By default the label will be the letter 'L'
followed by the hexadecimal address, but the label can be annotated based
on usage. For example, addresses that are the target of branch instructions
can be labeled with the letter 'B'.</p>
<h3><a name="projprop-projsym">Project Symbols</a></h3>
<p>You can add, edit, and delete individual symbols and constants.
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
explanation of how project symbols work.</p>
<p>The Edit Symbol button opens the
<a href="editors.html#project-symbol">Edit Project Symbol</a> dialog, which
allows changing any part of a symbol definition. You're not allowed to
create two symbols with the same label.</p>
<p>The Import button allows you to import symbols from another project.
Only labels that have been tagged as global and exported will be imported.
Existing symbols with identical labels will be replaced, so it's okay to
run the importer multiple times. Labels that aren't found will not be
removed, so you can safely import from multiple projects, but will need
to manually delete any symbols that are no longer being exported.</p>
<p>Shortcut: you can open the project properties window with the
Project Symbols tab selected by hitting F6 from the main code list.</p>
<h3><a name="projprop-symfiles">Symbol Files</a></h3>
<p>From here, you can add and remove platform symbol files, or change
the order in which they are loaded.
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
explanation of how platform symbols work, and the
<a href="advanced.html#platform-symbols">advanced topics</a> section
for a description of the file syntax.</p>
<p>Platform symbol files must live in the RuntimeData directory that comes
with SourceGen, or in the directory where the project file lives. This
is mostly to keep things manageable when projects are distributed to
other people, but also acts as a minor security check, to prevent a
wayward project from trying to open files it shouldn't.</p>
<p>Click one of the "Add Symbol Files" buttons to include one or more
symbol files in the project.
The "Add Symbol Files from Runtime" button sets the directory
to the SourceGen RuntimeData directory, while "Add Symbol Files from Project"
starts in the project directory. If you haven't yet saved the project,
the latter button will be disabled. The only difference between the
buttons is the initial directory.</p>
<p>In the list, files loaded from the RuntimeData directory will be
prefixed with <code>RT:</code>. Files loaded from the project directory
will be prefixed with <code>PROJ:</code>.</p>
<p>If a platform symbol file can't be found when the project is opened,
you will receive a warning.</p>
<h3><a name="projprop-extscripts">Extension Scripts</a></h3>
<p>From here, you can add and remove extension script files.
See the <a href="advanced.html#extension-scripts">extension scripts</a>
section for details on how extension scripts work.</p>
<p>Extension script files must live in the RuntimeData directory that comes
with SourceGen, or in the directory where the project file lives. This
is mostly to keep things manageable when projects are distributed to
other people, but also acts as a minor security check, to prevent a
wayward project from trying to open files it shouldn't.</p>
<p>Click one of the "Add Scripts" buttons to include one more scripts in
the project. The "Add Scripts from Runtime" button sets the directory
to the SourceGen RuntimeData directory, while "Add Scripts from Project"
starts in the project directory. If you haven't yet saved the project,
the latter button will be disabled. The only difference between the
buttons is the initial directory.</p>
<p>In the list, files loaded from the RuntimeData directory will be
prefixed with <code>RT:</code>. Files loaded from the project directory
will be prefixed with <code>PROJ:</code>.</p>
<p>If an extension script file can't be found when the project is opened,
you will receive a warning.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+159
View File
@@ -0,0 +1,159 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Tools - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Tools</h1>
<h2><a name="instruction-chart">Instruction Chart</a></h2>
<p>This opens a window with a summary of all 256 opcodes. The CPU can
be chosen from the pop-up list at the bottom. Undocumented opcodes for
6502/65C02 are shown in italics, and can be excluded from the list
by unchecking the box at the bottom.</p>
<p>The status flags affected by each instruction reflect their behavior
on the 65816. The only significant difference between 65816 and
6502/65C02 is the way the BRK instruction affects the D and B/X flags.</p>
<h2><a name="ascii-chart">ASCII Chart</a></h2>
<p>This opens a window with the ASCII character set. Each character is
displayed next to its numeric value in decimal and hexadecimal. The
pop-up list at the bottom allows you to flip between standard and "high"
ASCII.</p>
<h2><a name="apple2-screen-chart">Apple II Screen Chart</a></h2>
<p>The Apple II text and hi-res screens are mapped to memory in a way
that makes sense to computers but is a little confusing for humans. This
chart maps line numbers to addresses and vice-versa. Select different
screens and sort orders from the list at the bottom.</p>
<h2><a name="hexdump">Hex Dump Viewer</a></h2>
<p>You can use this to view the contents of the project data file
by double-clicking the "bytes" column, or with Actions &gt; Show Hex Dump.
The viewer is displayed in a "modeless" dialog that does not
prevent you from continuing to work with the project. If you
double-click a different line in the project, the viewer will automatically
highlight those bytes.</p>
<p>You can also use this to view the contents of arbitrary files by
using Tools &gt; Hex Dump. There is no fixed limit on the number of
viewers you can have open simultaneously. (Be aware that the viewer
currently loads the entire file into memory, and you will run out of room
eventually. Not coincidentally, the viewer has a size limit of 16MiB
per file.)</p>
<p>You can select lines with the mouse as you would in any other list
view. Ctrl+A selects all lines. Ctrl+C copies the selected lines to
the system clipboard.</p>
<p>The "character conversion" selector allows you to choose how the
bytes are converted to characters for the Text column. Choose from
the usual set of encodings.</p>
<p>If "ASCII-only dump" is not checked, non-printable bytes are shown in
the ASCII dump as a middle dot ('&#183;'). If the box is checked,
non-printable bytes are represented by a period ('.') instead. The
use of non-ASCII characters makes the dump unambiguous when unprintable
characters are mixed with periods, but the lines may be unsuitable for
pasting in some forums.</p>
<p>If "always on top" is checked, the window will stay above all other
windows that don't also declare that they should always be on top. By
default this box is checked when displaying project data, and not checked for
external files.</p>
<h2><a name="file-concat">File Concatenator</a></h2>
<p>The File Concatenator combines multiple files into a single file.
Select the files to add, arrange them in the proper order, then hit
"Save". CRC-32 values are shown for reference.</p>
<h2><a name="file-slicer">File Slicer</a></h2>
<p>The File Slicer allows you to "slice" a piece out of a file, saving
it to a new file. Specify the start and length in decimal or hex. If
you leave a field blank, they will default to offset 0 and the remaining
length of the file, respectively.</p>
<p>The hex dumps show the area just before and after the chunk to be
sliced, allowing you to confirm the placement.</p>
<h2><a name="omf-converter">OMF Converter</a></h2>
<p>This tool allows you to view Apple IIgs Object Module Format (OMF)
executables, and convert them for disassembly.</p>
<p>OMF executables have multiple segments with relocatable code. References
to addresses aren't filled in until the program is loaded into memory,
which makes it difficult to disassemble the file. The conversion tool
loads the OMF file in roughly the same way the GS/OS System Loader would,
placing each segment at the start of a bank unless otherwise directed.
The loaded image is saved to a new file, and a SourceGen project file is
created with some basic attributes filled in.</p>
<p>Only "Load" files (S16, PIF, TOL, etc) may be converted. Compiler object
files and libraries contain references that must be resolved by
a IIgs linker, and are not supported.</p>
<p>Before you can examine or convert a file, you must first extract
it from the Apple II disk image, using a mode that does not modify the
original (e.g. extract with "configure to preserve Apple II formats"
in CiderPress). Then, open it with the "Tools &gt; Convert OMF".</p>
<p>The initial view shows all of the OMF segments in the file. Double-clicking
on an entry opens a detailed view that shows the segment header and a
list of all the OMF records. For load files, the relocation dictionary is
also shown.</p>
<p>To convert the file, click "Generate" to create a modified binary and a
SourceGen project file.</p>
<p>If "offset segment start by $0100" is checked, the converter will try
to shift the segment's load address from <code>$xx/0000</code> to
<code>$xx/0100</code>. This can make the generated code a little nicer
to work with because it removes potential ambiguity with direct page
addresses. For example, <code>LDA $56</code> and <code>LDA $0056</code>
may be interpreted as the same thing by the assembler, requiring
generation of operand width disambiguators. By shifting the initial
address we avoid the potential ambiguity.</p>
<p>Check "add comments and notes for each segment" to add a long comment
and a note at the start of each segment. The comments include the
segment name, type, and optional flags. The notes just provide a quick
way to jump to a segment.</p>
<p>The binary generated by the tool is not in OMF format and will not
execute on an Apple IIgs. To be functional, the generated sources must be
assembled by a program capable of generating OMF output, such as Merlin.</p>
<p>The <a href="advanced.html#reloc-data">relocation dictionaries</a> from
the executable are included in the project file, and can be used to guide
the disassembler's analysis. The "use reloc data" setting in the project
properties controls this feature.</p>
<p>A full explanation of the structure of OMF is beyond the scope of this
manual. For more information on OMF, see Appendix F of the GS/OS Reference
Manual.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+25
View File
@@ -0,0 +1,25 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Tutorials - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Tutorials</h1>
<p><strong>NOTE:</strong> this tutorial has been replaced by
content on the 6502bench web site. Visit
<a href="https://6502bench.com/sgtutorial/">https://6502bench.com/sgtutorial/</a>.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2018 faddenSoft -->
</html>
+315
View File
@@ -0,0 +1,315 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="main.css" rel="stylesheet" type="text/css" />
<title>Visualizations - 6502bench SourceGen</title>
</head>
<body>
<div id="content">
<h1>6502bench SourceGen: Visualizations</h1>
<p><a href="index.html">Back to index</a></p>
<h2><a name="overview">Overview</a></h2>
<p>Programs are generally a combination of code and data. Sometimes
the data is graphical in nature, e.g. a bitmap used as a font or
game sprite. Being able to see the data in graphic form can make it
easier to determine the purpose of associated code.</p>
<p>While modern systems use GIF, JPEG, and PNG to hold 2D bitmaps,
graphical elements embedded in 6502 applications are almost always
in a platform-specific form. For this reason, the task of generating
images from data is performed by
<a href="advanced.html#extension-scripts">extension scripts</a>. Some
scripts for common formats are included in the SourceGen runtime directory.
If these don't do what you need, you can write your own scripts and
include them in your project.</p>
<p>The project file doesn't store the converted graphics. Instead, the
project file holds a string that identifies the converter, and a list of
parameters that are passed to the converter. Images are generated when
the project is first opened, and updated when certain things change in
the project.</p>
<p>Visualizations are not included in generated assembly output. They
may be included in HTML exports.</p>
<p>Because visualizations are associated with a specific file offset,
they will become "hidden" if the offset isn't at the start of a line,
e.g. it's in the middle of a multi-byte instruction or data item. The
editors will try to prevent you from doing this.</p>
<p>Bitmaps will always be scaled up as much as possible to make them
easy to see. This means that small shapes and large shapes may appears
to be the same size when displayed as thumbnails in the code list.</p>
<p>The role of a visualization generator is to take a collection of input
parameters and generate graphical data. It's most useful for graphical
sources like bitmaps, but it's not limited to that. You could, for example,
write a script that generates random flowers, and use it to make your
source listings more cheerful.</p>
<h2><a name="vis-and-sets">Visualizations and Visualization Sets</a></h2>
<p>Visualizations are essentially decorative: they do not affect the
assembled output, and do not change how code is analyzed. They are
contained in sets that are placed at arbitrary offsets. Each set can
contain multiple items. For example, if a file has data for
10 bitmaps, you can place a visualization near each, or create a single
visualization set with all 10 items and put it at the start of the file.
You can display a visualization near the data or near the instructions
that perform the drawing. Or both.</p>
<p>To create a visualization set, select a code or data line, and use
Actions &gt; Create/Edit Visualization Set. To edit a visualization set,
select it and use the same menu item, or just double-click on it. This
opens the Visualization Set Editor window.</p>
<p>The visualization set editor shows a list of visualizations associated
with the selected file offset. You can create a new visualization, edit
or remove an existing entry, or rearrange them.
If you select "New Bitmap" or edit an existing bitmap entry, the
Bitmap Visualization Editor window will open.
Similarly, if you select "New Bitmap Animation" or edit an existing
bitmap animation, the Bitmap Animation Editor will open.</p>
<h4>Visualization Editor</h4>
<p>The combo box at the top of the screen lists every visualization
generator defined by an active extension script. Select the one that is
appropriate for the data you're trying to visualize. Every visualizer may
have different parameters, so as you select different entries the set of
input parameters below the preview window may change.</p>
<p>There are two categorizes of visualization generator: bitmap, and
wireframe. Bitmaps are simple 2D images, but wireframes are 2D or 3D
meshes that can be viewed from different angles. When you select a
wireframe generator, additional view controls will be added at the bottom.
(See below.)</p>
<p>The "tag" is a unique string that will be shown in the display list.
This is not a label, and may contain any characters you want (but leading
and trailing whitespace will be trimmed). The only requirement is that
it be unique across all visualizations (bitmaps, animations, etc).</p>
<p>The preview window shows the visualizer output. The generated image is
expanded to fill the window, so small bitmaps will be shown with very
large pixels.
If you resize the editor window, the preview window will expand, which
can make it easier to see detail on larger images.
If the generator fails, the preview window will show a red 'X', and an
error message will appear below it.</p>
<p>Parameters may be numeric or boolean. The latter use a simple checkbox,
the former a text entry field that accepts decimal and hexadecimal values.
The range of allowable values is shown to the right of the entry field.
If you enter an invalid value, the parameter description will turn red.</p>
<p>The "Export" button at the top right can be used to save a copy of
the bitmap or wireframe rendering with the current parameters.</p>
<h5>Wireframe View Controls</h5>
<p>The wireframe generator may offer the choice of perspective vs.
orthographic projection, and whether or not to enable backface
culling. These are declared in the visualization generator script,
but implemented in the viewer. If the generator doesn't
declare them, the default is to render with a perspective projection
and without culling.</p>
<p>The viewer allows you to rotate the image about the X, Y, and Z
axes. The viewer provides a left-handed coordinate system,
with +X toward the right, +Y toward the top of the screen, and +Z
going into the screen. The object will be placed a short distance
down the Z axis and scaled to fit the window.
Positive rotations cause a counter-clockwise rotation when the axis
about which rotations are performed points toward the viewer. The
rotations are performed with a matrix using Euler angles, and are
subject to gimbal lock (e.g. if you set Y to 90 degrees, X and Z rotate
about the same axis).</p>
<p>If you check the "Animated" box, you can add a simple spin. Choose
the number of degrees to rotate per frame, how many frames to generate before
resetting, and the delay between each frame. Clicking the "Auto" button
will automatically select the number of frames needed to display the
animation in an unbroken loop (useful for animated GIFs). Click
the "Test Animation" button to see it in action.</p>
<h4>Bitmap Animation Editor</h4>
<p>Bitmap animations allow you to create a simple animation from a
collection of other visualizations. This can be useful when a program
stores animated graphics as a series of frames.</p>
<p>The "tag" is a unique string that will be shown in the display list.
The same rules apply as for bitmap visualizations.</p>
<p>The list at the top left holds all visualizations. Select items on
the left and use the "Add" button to add them to the list on the right,
which has the set that is included in the animation. You can reorder
the list with the up/down buttons. Adding the same frame multiple times
is allowed.</p>
<p>The "frame delay" field lets you specify how long each frame is shown
on screen, in milliseconds. Some animation formats may use a different
time resolution; for example, animated GIFs use units of 1/100th of a
second. The closest value will be used. Note also that some viewers
(notably web browsers) will cap the update rate.</p>
<p>When you have one or more frames in the animation list, you can preview
the result in the window at the bottom. The actual appearance may be
slightly different, especially if the frames are different sizes. For
example, the preview window scales individual frames, but animated GIFs
will be scaled to the size of the largest frame.</p>
<h2><a name="runtime">Scripts Included with SourceGen</a></h2>
<p>A number of visualization generation scripts are included with
SourceGen, in the platform-specific runtime data directories.</p>
<p>Most generators will take the file offset, bitmap width, and bitmap
height as parameters. Offsets are handled as they are elsewhere, i.e.
always in hexadecimal, with a leading '+'.
Some less-common parameters include:</p>
<ul>
<li><b>Column stride</b> - number of bytes used to hold a column.
This is uncommon, but could be used if (say) a pair of bitmaps
was stored with interleaved bytes. If you set this to zero the
visualizer will default to no interleave (col_stride = 1).</li>
<li><b>Row stride</b> - number of bytes between the start of each
row. This is used when a row has padding on the end, e.g. a
bitmap that's 7 bytes wide might be padded to 8 for easy indexing,
or when bitmap data is interleaved. If you set this to zero the
visualizer will default to no padding
(row_stride = width * column_stride).</li>
<li><b>Cell stride</b> - for multi-bitmap data like a font or sprite
sheet, this determines the number of bytes between the start of
one item and the next. If set to zero a "dense" arrangement is
assumed (cell_stride = row_stride * item_height).</li>
</ul>
<p>Remember that this is a disassembler, not an image converter. The
results do not need to be perfectly accurate to be useful when disassembling
code.</p>
<h3>Apple II - Apple/VisHiRes and Apple/VisShapeTable</h3>
<p>There is no standard format for small hi-res bitmaps, but certain
arrangements are common. The VisHiRes script defines four generators:</p>
<ul>
<li><b>Hi-Res Bitmap</b> - converts an MxN row-major bitmap.</li>
<li><b>Hi-Res Sprite Sheet</b> - converts a series of bitmaps and
renders them in a grid. Useful for games that use cell
animation. The generated bitmap has a 1-pixel transparent gap
between elements.</li>
<li><b>Hi-Res Bitmap Font</b> - a simplified version of the
Sprite Sheet, intended for the common 7x8 monochrome fonts.
Most fonts have 96 or 128 glyphs, though some drop the last
character.
(This also works for Apple /// fonts, but currently ignores
the high bit in each byte.)</li>
<li><b>Hi-Res Screen Image</b> - used for 8KiB screen images. The
data is linearized and converted to a 280x192 bitmap. Because
images are relatively large, the generator does not require them
to be contiguous in the file, i.e. two halves of the image can be
in different parts of the file so long as they end up contiguous
in memory.</li>
</ul>
<p>Widths are specified in bytes, not pixels. Each byte represents 7
pixels (with some hand-waving).</p>
<p>In addition to offset, dimensions, and stride values, the bitmap
converter has a checkbox for monochrome or color, and two checkboxes
that affect the color. The first causes the first byte to be treated
as being in an odd column rather than an even one, which affects
green vs. purple and orange vs. blue. The second flips the high bits
on every byte, switching green vs. orange and purple vs. blue.
Neither has any effect on black &amp; white bitmaps.</p>
<p>The converter generates one output pixel for every source pixel, so
half-pixel shifts are not represented.</p>
<p>The VisShapeTable script renders Applesoft shape tables, which can
have multiple vector shapes. The only parameter other than the offset
is the shape number.</p>
<h3>Atari 2600 - Atari/Vis2600</h3>
<p>The Atari 2600 graphics system has registers that determine the
appearance of a sprite or playfield on a single row. The register
values are typically changed as the screen is drawn to get different
data on successive rows. The visualization generator doesn't attempt
to emulate this behavior, but works well for data stored in a
straightforward fashion.</p>
<ul>
<li><b>Sprite</b> - basic 1xN sprite, converted to an image 8 pixels
wide. Square pixels are assumed.</li>
<li><b>Playfield</b> - assumes PF0,PF1,PF2 are stored in that order,
multiple entries following each other. Specify the number of
3-byte entries as the height.
Since most playfields aren't the full height of the screen,
it will tend to look squashed. Use the "row thickness" feature
to repeat each row N times to adjust the proportions.
The "Reflected" checkbox determines whether the right-side image is
repeated as-is or flipped.</li>
</ul>
<h3>Atari Arcade - Atari/VisAVG </h3>
<p>Different versions of Atari's Analog Vector Graphics were used in
several games, notably Battlezone, Tempest, and Star Wars. The commands
drove a vector display monitor. SourceGen visualizes them as 2D
wireframes, which isn't a perfect fit since they can describe points as
well as lines, but works fine for annotating a disassembly.</p>
<p>The visualizer takes two arguments: the offset of the start of
the commands to visualize, and the base address of vector RAM. The latter
is necessary to convert AVG JMP/JSR commands into offsets.</p>
<h3>Commodore 64 - Commodore/VisC64</h3>
<p>The Commodore 64 has a 64-bit sprite format defined by the hardware.
It comes in two basic varieties:</p>
<ul>
<li><b>High-resolution sprite</b> - 24x21 monochrome. Pixels are either
colored or transparent.</li>
<li><b>Multi-color sprite</b> - 12x21 3-color. The width of each pixel
is doubled to make it 24x21.
</ul>
<p>Sprites can be doubled in width and/or height.</p>
<p>Colors come from a hardware-defined palette of 16:</p>
<ol start="0" style="columns:2; -webkit-columns:2; -moz-columns:2;">
<li><span style="color:#ffffff;background-color:#000000">&nbsp;black&nbsp;</span></li>
<li><span style="color:#000000;background-color:#ffffff">&nbsp;white&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#67372b">&nbsp;red&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#70a4b2">&nbsp;cyan&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#6f3d86">&nbsp;purple&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#588d43">&nbsp;green&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#352879">&nbsp;blue&nbsp;</span></li>
<li><span style="color:#000000;background-color:#b8c76f">&nbsp;yellow&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#6f4f25">&nbsp;orange&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#433900">&nbsp;brown&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#9a6759">&nbsp;light red&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#444444">&nbsp;dark grey&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#6c6c6c">&nbsp;grey&nbsp;</span></li>
<li><span style="color:#000000;background-color:#9ad284">&nbsp;light green&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#6c5eb5">&nbsp;light blue&nbsp;</span></li>
<li><span style="color:#ffffff;background-color:#959595">&nbsp;light grey&nbsp;</span></li>
</ol>
<p>Bear in mind that the editor scales images to their maximum size, so
a sprite that is doubled in both width and height will look exactly like
a sprite that is not doubled at all.</p>
<h3>Nintendo Entertainment System - Nintendo/VisNES</h3>
<p>NES PPU pattern tables hold 8x8 tiles with 2 bits of color per pixel.
Converting the full collection to a reference bitmap is straightforward.
A few color palette options are offered.</p>
<p>Sprites and backgrounds are formed from collections of tiles. In
some cases this is straightfoward, in others it's not. A visualization
generator that renders a "tile grid" is available for simpler cases.</p>
</div>
<div id="footer">
<p><a href="index.html">Back to index</a></p>
</div>
</body>
<!-- Copyright 2019 faddenSoft -->
</html>