Relocate manual

Move the SourceGen manual to a subdirectory in "docs", so that it can be accessed directly from the 6502bench web site. The place where it's installed in the distribution doesn't change.
2026-04-20 19:16:34 +00:00 · 2021-10-08 08:43:12 -07:00
parent a395909574
commit ed4cc84782
15 changed files with 2 additions and 0 deletions
@@ -0,0 +1,495 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Advanced Topics - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Advanced Topics</h1>
+<p><a href="index.html">Back to index</a></p>
+
+
+<h2><a name="platform-symbols">Platform Symbol Files (.sym65)</a></h2>
+
+<p>Platform symbol files contain lists of symbols, each of which has a
+label and a value.  SourceGen comes with a collection of symbols for
+popular systems, but you can create your own.  This can be handy if a
+few different projects are coded against a common library.</p>
+
+<p>If two symbols have the same value, the older symbol is replaced by
+the newer one.  This is why the order in which symbol files are loaded
+matters.</p>
+
+<p>Platform symbol files consist of comments, commands, and symbols.
+Blank lines, and lines that begin with a semicolon (';'), are ignored.  Lines
+that begin with an asterisk ('*') are commands.  Three are currently
+defined:</p>
+<ul>
+  <li><code>*SYNOPSIS</code> - a short summary of the file contents.</li>
+  <li><code>*TAG</code> - a tag string to apply to all symbols that follow
+    in this file.</li>
+  <li><code>*MULTI_MASK</code> - specify a mask for symbols that appear
+    at multiple addresses.</li>
+</ul>
+
+<p>Tags can be used by extension scripts to identify a subset of symbols.
+The symbols are still part of the global set; the tag just provides a
+way to extract a subset.  Tags should be comprised of non-whitespace ASCII
+characters.  Tags are global, so use a long, descriptive string.  If
+<code>*TAG</code> is not followed by a string, the symbols that follow
+are treated as untagged.</p>
+
+<p>All other lines are symbols, which have the form:</p>
+<pre>
+  LABEL {=|@|&lt;|&gt;} VALUE [WIDTH] [;COMMENT]
+</pre>
+
+<p>The LABEL must be at least two characters long, begin with a letter or
+underscore, and consist entirely of alphanumeric ASCII characters
+(A-Z, a-z, 0-9) and the underscore ('_').  (This is the same format
+required for line labels in SourceGen.)</p>
+<p>The next token can be one of:</p>
+<ul>
+  <li>@: general addresses</li>
+  <li>&lt;: read-only addresses</li>
+  <li>&gt;: write-only addresses</li>
+  <li>=: constants</li>
+</ul>
+<p>If an instruction references an address, and that address is outside
+the bounds of the file, the list of address symbols (i.e. everything
+that's not a constant) will be scanned for a match.
+If found, the symbol is applied automatically.  You normally want to
+use '@', but can use '&lt;' and '&gt;' for memory-mapped I/O locations
+that have different behavior depending on whether they are read or
+written.</p>
+
+<p>The VALUE is a number in decimal, hexadecimal (with a leading '$'), or
+binary (with a leading '%').  The numeric base will be recorded and used when
+formatting the symbol in generated output, so use whichever form is most
+appropriate.  Values are unsigned 24-bit numbers.  The special value
+"erase" may be used for an address to erase a symbol defined in an earlier
+platform file.</p>
+
+<p>The WIDTH is optional, and ignored for constants.  It must be a
+decimal or hexadecimal value between 1 and 65536, inclusive.  If omitted,
+the default width is 1.</p>
+
+<p>The COMMENT is optional.  If present, it will be saved and used as the
+end-of-line comment on the .EQ directive if the symbol is used.</p>
+
+<h4>Using MULTI_MASK</h4>
+
+<p>The multi-address mask is used for systems like the Atari 2600, where
+RAM, ROM, and I/O registers appear at multiple addresses.  The hardware
+looks for certain address lines to be set or clear, and if the pattern
+matches, another set of bits is examined to determine which register or
+RAM address is being accessed.</p>
+<p>This is expressed in symbol files with the MULTI_MASK statement.
+Address symbol declarations that follow have the mask set applied.  Symbols
+whose addresses don't fit the pattern cause a warning and will be
+ignored.  Constants are not affected.</p>
+
+<p>The mask set is best explained with an example.  Suppose the address
+pattern for a set of registers is <code>???0 ??1? 1??x xxxx</code>
+(where '?' can be any value, 0/1 must be that value, and 'x' means the bit
+is used to determine the register).
+So any address between $0280-029F matches, as does $23C0-23DF, but
+$0480 and $1280 don't.  The register number is found in the low five bits.</p>
+<p>The corresponding MULTI_MASK line, with values specifed in binary,
+would be:</p>
+<pre>  *MULTI_MASK %0001001010000000 %0000001010000000 %0000000000011111</pre>
+<p>The values are CompareMask, CompareValue, and AddressMask.  To
+determine if an address is in the register set, we check to see if
+<code>(address &amp; CompareMask) == CompareValue</code>.  If so, we can
+extract the register number with <code>(address &amp; AddressMask)</code>.</p>
+
+<p>We don't want to have a huge collection of equates at the top of the
+generated source file, so whatever value is used in the symbol declaration
+is considered the "canonical" value.  All other matching values are output
+with an offset.</p>
+<p>All mask values must fall between 0 and $00FFFFFF.  The set bits in
+CompareMask and AddressMask must not overlap, and CompareValue must not
+have any bits set that aren't also set in CompareMask.</p>
+<p>If an address can be mapped to a masked value and an unmasked value,
+the unmasked value takes precedence for exact matches.  In the example
+above, if you declare <code>REG1 @ $0281</code> outside the MULTI_MASK
+declaration, the disassembler will use <code>REG1</code> for all operands
+that reference $0281.  If other code accesses the same register as $23C1,
+the symbol established for the masked value will be used instead.</p>
+<p>If there are multiple masked values for a given address, the precedence
+is undefined.</p>
+<p>To disable the MULTI_MASK and resume normal declarations, write the
+tag without arguments:
+<pre>  *MULTI_MASK</pre></p>
+
+
+<h3>Creating a Project-Specific Symbol File</h3>
+
+<p>To create a platform symbol file for your project, just create a new
+text file, named with a ".sym65" extension.  (If your text editor of choice
+doesn't like that, you can put a ".txt" on the end while you're editing.)
+Make sure you create it in the same directory where your project file
+(the file that ends with ".dis65") lives.  Add a <code>*SYNOPSIS</code>,
+then add the desired symbols.</p>
+<p>Finally, add it to your project.  Select Edit &gt; Project Properties,
+switch to the Symbol Files tab, click Add Symbol Files from Project, and
+select your symbol file.  It should appear in the list with a
+"PROJ:" prefix.</p>
+
+<p>If an example helps, the A2-Amper-fdraw project in the Examples
+directory has a project-local symbol file, called "fdraw-exports".
+(fdraw-exports is a list of exported symbols from the fdraw library,
+for which Amper-fdraw provides an Applesoft BASIC interface.)
+
+<p>NOTE: in the current version of SourceGen, changes to .sym65 files are
+not detected automatically.  Use File &gt; Reload External Files to
+import the changes.</p>
+
+
+<h2><a name="extension-scripts">Extension Scripts</a></h2>
+
+<p>Extension scripts, also called "plugins", are C# programs with access to
+the full .NET Standard 2.0 APIs.  They're compiled at run time by SourceGen
+and executed in a sandbox with security restrictions.</p>
+
+<p>SourceGen defines an interface that plugins must implement, and an
+interface that plugins can use to interact with SourceGen.  See
+Interfaces.cs in the PluginCommon directory.</p>
+
+<p>The current interfaces can be used to generate visualizations, to
+identify inline data that follows JSR, JSL, or BRK instructions, and to
+format operands.  The latter can be used to format code and data, e.g.
+replacing immediate load operands with symbolic constants.</p>
+
+<p>Scripts may be loaded from the RuntimeData directory, or from the directory
+where the project file lives.  Attempts to load them from other locations
+will fail.</p>
+<p>A project may load multiple scripts.  The order in which they are
+invoked is not defined.</p>
+
+<h4>Known Issues and Limitations</h4>
+
+<p>Scripts are currently limited to C# version 5, because the compiler
+built into .NET only handles that.  C# 6 and later require installing an
+additional package ("Roslyn"), so SourceGen does not support this.</p>
+
+<p>When a project is opened, any errors encountered by the script compiler
+are reported to the user.  If the project is already open, and a script
+is added to the project through the Project Properties editor, compiler
+messages are silently discarded.  (This also applies if you undo/redo across
+the property edit.)  Use File &gt; Reload External Files to see the
+compiler messages.</p>
+
+<h4>Development</h4>
+
+<p>The easiest way to develop extension scripts is inside the 6502bench
+solution in Visual Studio.  This way you have the interfaces available
+for IntelliSense completion, and get all the usual syntax and compile
+checking in the editor.  (This is why there's a RuntimeData project for
+Visual Studio.)</p>
+
+<p>If you have the solution configured for debug builds, SourceGen will pass
+<code>IncludeDebugInformation=true</code> to the script compiler.  This
+causes a .PDB file to be created.  While this can help with debugging,
+it can sometimes get in the way: if you edit the script source code and
+reload the project without restarting the app, SourceGen will recompile
+the script, but the old .PDB file will still be open by VisualStudio
+and you'll see some failure messages.  Exiting and restarting SourceGen
+will allow regeneration of the PDB files.</p>
+
+<p>Some commonly useful functions are defined in the
+<code>PluginCommon.Util</code> class, which is available to plugins.  These
+call into the CommonUtil library, which is shared with SourceGen.
+While plugins could use CommonUtil directly, they should avoid doing so.  The
+APIs there are not guaranteed to be stable, so plugins that rely on them
+may break in a subsequent release of SourceGen.</p>
+
+<h4>PluginDllCache Directory</h4>
+
+<p>Extension scripts are compiled into .DLLs, and saved in the PluginDllCache
+directory, which lives next to the application executable and RuntimeData.
+If the extension script is the same age or older than the DLL, SourceGen
+will continue to use the existing DLL.</p>
+
+<p>The DLL names are a combination of the script filename and script location.
+The compiled name for "MyPlatform/MyScript.cs" in the RuntimeData directory
+will be "RT_MyPlatform_MyScript.dll".  For a project-specific script, it
+would look like "PROJ_MyProject_MyScript.dll".</p>
+
+<p>The PluginCommon and CommonUtil DLLs will be copied into the directory, so
+that code in the sandbox has access to them.</p>
+
+<p>The contents of the directory are generated as needed, and can be deleted
+entirely whenever SourceGen isn't running.</p>
+
+<h4>Sandboxing</h4>
+
+<p>Extension scripts are executed in an App Domain sandbox.  App domains are
+a .NET feature that creates a partition inside the virtual machine, isolating
+code.  It still runs in the same address space, on the same threads, so the
+isolation is only effective for "partially trusted" code that has been
+declared safe by the bytecode verifier.</p>
+
+<p>SourceGen disallows most actions, notably file access.  An exception is
+made for reading files from the directory where the plugin DLLs live, but
+scripts are otherwise unable to read or write from the filesystem.  (A
+future version of SourceGen may provide an API that allows limited access
+to data files.)</p>
+
+<p>App domain security is not absolute.  I don't really expect SourceGen to
+be used as a malware vector, so there's no value in forcing scripts to
+execute in an isolated server process, or to jump through the other hoops
+required to really lock things down.  I do believe there's value in
+defining the API in such a way that we <b>could</b> implement full security if
+circumstances change, so I'm using app domains as a way to keep the API
+honest.</p>
+
+
+<h2><a name="multi-bin">Working With Multiple Binaries</a></h2>
+
+<p>Sometimes a program is split into multiple files on disk.  They
+may be all loaded at once, or some may be loaded into the same place
+at different times.  In such situations it's not uncommon for one
+file to provide a set of interfaces that other files use.  It's
+useful to have symbols for these interfaces be available to all
+projects.</p>
+<p>There are two ways to do this: (1) define a common platform symbol
+file with the relevant addresses, and keep it up to date as you work;
+or (2) declare the labels as global and exported, and import them
+as project symbols into the other projects.</p>
+<p>Support for this is currently somewhat weak, requiring a manual
+symbol-import step in every interested project.  This step must be
+repeated whenever the labels are updated.</p>
+<p>A different but related problem is typified by arcade ROM sets,
+where files are split apart because each file must be burned into a
+separate PROM.  All files are expected to be present in memory at
+once, so there's no reason to treat them as separate projects. Currently,
+the best way to deal with this is to concatenate the files into a single
+file, and operate on that.</p>
+
+<h2><a name="overlap">Overlapping Address Spaces</a></h2>
+<p>Some programs use memory overlays, where multiple parts of the
+code run in the same address in RAM.  Others use bank switching to access
+parts of the program that reside in separate physical RAM or ROM,
+but appear at the same address.  Nested address regions allow for a
+variety of configurations, which can make address resolution complicated.</p>
+
+<p>The general goal is to have references to an address resolve to
+the "nearest" match.  For example, consider a simple overlay:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1100
+
+         .ADDRS  $1100
+L1100    BIT     L1100
+L1103    LDA     #$11
+         BRA     L1103
+         .ADREND
+
+         .ADDRS  $1100
+L1100_0  BIT     L1100_0
+L1103_0  LDA     #$22
+         JMP     L1103_0
+         .ADREND
+
+         .ADREND
+</pre>
+
+<p>Both sections start at $1100, and have branches to $1103.  The branch
+in the first section resolves to the label in the first version of
+that address chunk, while the branch in the second section resolves to
+the label in the second chunk.  When branches originate outside the current
+address chunk, the first chunk that includes that address is used, as
+it is with the <code>JMP $1000</code> at the start of the file.</p>
+
+<p>The full address-to-offset algorithm is as follows.
+There are two inputs: the file offset of the instruction or data item
+that has the reference (e.g. the JMP or LDA), and the address
+it is referring to.</p>
+<ul>
+  <li>Create a tree with all address regions.  Each "node" in the tree
+    has an offset, length, and start address.</li>
+  <li>Search the tree for a node that includes the offset of the
+    reference source.
+    When there are multiple overlapping regions, descend until the
+    deepest child that spans the offset is found.  This node will be
+    the starting point of the search.</li>
+  <li>Loop until we hit the top of the tree:
+  <ul>
+    <li>Perform a recursive depth-first search of all children of the
+      current node.  They're searched in order of ascending file offset.</li>
+    <li>If the address wasn't found in the children, check the current
+      node.  If we find it here, return this node as the result.</li>
+    <li>Move up to the parent node.
+  </ul></li>
+</ul>
+
+<p>This searches all children and all siblings before checking the parent.
+If we hit the top of the tree without finding a match, we conclude
+that the reference is to an external address.</p>
+
+
+<h2><a name="reloc-data">OMF Relocation Dictionaries</a></h2>
+
+<p><i>This feature is considered experimental.  Some features,
+like cross-reference tracking, may not work correctly with it.</i></p>
+
+<p>65816 code can be tricky to disassemble for a number of reasons.
+24-bit addresses are formed from 16-bit data-access operands by combining
+with the Data Bank Register (DBR), which often requires a bit of manual
+intervention.  But the problems go beyond that.  Consider the following
+bit of source code for the Apple IIgs:</p>
+<pre>
+rsrcmsg    pea   rsrcmsg2|-16
+           pea   rsrcmsg2
+           _WriteCString
+           lda   #buffer
+           sta   pArcRead+$04
+           lda   #buffer|-16
+           sta   pArcRead+$06
+</pre>
+<p>In both cases we're referencing a 24-bit address as two 16-bit values.
+Without context, the disassembler will treat the PEA instruction as two
+independent 16-bit addresses, and the immediate values as constants:</p>
+<pre>
+                               .dbank  $02
+02/317c: f4 02 00     L2317C   pea     L20002 & $ffff
+02/317f: f4 54 32              pea     L23254 & $ffff
+02/3182: a2 0c 20              ldx     #WriteCString
+02/3185: 22 00 00 e1           jsl     Toolbox
+02/3189: a9 00 00     L23189   lda     #$0000
+02/318c: 8d 78 3f              sta     L23F78 & $ffff
+02/318f: a9 03 00              lda     #$0003
+02/3192: 8d 7a 3f              sta     L23F78 & $ffff +2
+</pre>
+<p>Worse yet, those <code>STA</code> instruction operands would have been
+shown as hex values or incorrect labels if the DBR had been set incorrectly.
+However, if we have the relocation data, we know the full
+address from which the addresses were formed, and we can tell when
+immediate values are addresses rather than constants.  And we can do this
+even without setting the DBR.</p>
+<pre>
+02/317c: f4 02 00     L2317C   pea     L23254 >> 16
+02/317f: f4 54 32              pea     L23254 & $ffff
+02/3182: a2 0c 20              ldx     #WriteCString
+02/3185: 22 00 00 e1           jsl     Toolbox
+02/3189: a9 00 00     L23189   lda     #L30000 & $ffff
+02/318c: 8d 78 3f              sta     L23F78 & $ffff
+02/318f: a9 03 00              lda     #L30000 >> 16
+02/3192: 8d 7a 3f              sta     L23F78 & $ffff +2
+</pre>
+<p>The absence of relocation data can be a useful signal as well.  For
+example, when pushing arguments for a toolbox call, the disassembler
+can tell the difference between addresses and constants without needing
+emulation or pattern-matching, because only the addresses get
+relocated.  Consider this bit of source code:</p>
+<pre>
+           lda   &lt;total_records
+           pha
+           pea   linebuf|-16
+           pea   linebuf+65
+           pea   $0005
+           pea   $0000
+           _Int2Dec
+</pre>
+<p>Without relocation data, it becomes:</p>
+<pre>
+02/0aa8: a5 42                 lda     $42
+02/0aaa: 48                    pha
+02/0aab: f4 02 00              pea     L20002 & $ffff
+02/0aae: f4 03 31              pea     L23103 & $ffff
+02/0ab1: f4 05 00              pea     L20005 & $ffff
+02/0ab4: f4 00 00              pea     L20000 & $ffff
+02/0ab7: a2 0b 26              ldx     #Int2Dec
+02/0aba: 22 00 00 e1           jsl     Toolbox
+</pre>
+<p>If we treat the non-relocated operands as constants:</p>
+<pre>
+02/0aa8: a5 42                 lda     $42
+02/0aaa: 48                    pha
+02/0aab: f4 02 00              pea     L230C2 >> 16
+02/0aae: f4 03 31              pea     L23103 & $ffff
+02/0ab1: f4 05 00              pea     $0005
+02/0ab4: f4 00 00              pea     $0000
+02/0ab7: a2 0b 26              ldx     #Int2Dec
+02/0aba: 22 00 00 e1           jsl     Toolbox
+</pre>
+
+
+<h2><a name="debug">Debug Menu Options</a></h2>
+
+<p>The DEBUG menu is hidden by default in release builds, but can be
+exposed by checking the "enable DEBUG menu" box in the application
+settings.  These features are used for debugging SourceGen.  They will
+not help you debug 6502 projects.</p>
+
+<p>Features:</p>
+<ul>
+  <li>Re-analyze (F5).  Causes a full re-analysis.  Useful if you think
+    the display is out of sync.</li>
+  <li>Source Generation Tests.  Opens the regression test harness.  See
+    <code>README.md</code> in the SGTestData directory for more information.
+    If the regression tests weren't included in the SourceGen distribution,
+    this will have nothing to do.</li>
+  <li>Show Analyzer Output.  Opens a floating window with a text log from
+    the most recent analysis pass.  The exact contents will vary depending
+    on how the verbosity level is configured internally.  Debug messages
+    from extension scripts appear here.</li>
+  <li>Show Analysis Timers.  Opens a floating window with a dump of
+    timer results from the most recent analysis pass.  Times for individual
+    stages are noted, as are times for groups of functions.  This
+    provides a crude sense of where time is being spent.</li>
+  <li>Show Undo/Redo History.  Opens a floating window that lets you
+    watch the contents of the undo buffer while you work.</li>
+  <li>Extension Script Info.  Shows a bit about the currently-loaded
+    extension scripts.</li>
+  <li>Show Comment Rulers.  Adds a string of digits above every
+    multi-line comment (long comment, note).  Useful for confirming that
+    the width limitation is being obeyed.  These are added exactly
+    as shown, without comment delimiters, into generated assembly output,
+    which doesn't work out well if you run the assembler.</li>
+  <li>Disable Security Sandbox.  Extension scripts are loaded and run in
+    a "sandbox" to prevent security issues.  Setting this flag allows
+    them to execute with full permissions.
+    This setting is not persistent.</li>
+  <li>Disable Keep-Alive Hack.  The hack sends a "ping" to the extension
+    script sandbox every 60 seconds.  This seems to be required to avoid
+    an infrequently-encountered Windows bug.  (See code for notes and
+    stackoverflow.com links.)
+    This setting is not persistent.</li>
+  <li>Reboot Security Sandbox.  Discards the sandbox, creates a new one,
+    and reloads it.  Only useful for exercising the sandbox code that
+    runs when the keep-alives are unsuccessful.</li>
+  <li>Applesoft to HTML.  An experimental feature that formats an
+    Applesoft program as HTML.</li>
+  <li>Export Edit Commands.  Outputs comments and notes in
+    SourceGen Edit Command format.  This is an experimental feature.</li>
+  <li>Apply Edit Commands.  Reads a file in SourceGen Edit Command
+    format and applies the commands.</li>
+  <li>Apply External Symbols.  An experimental feature for turning platform
+    and project symbols into address labels.  This will run through the list
+    of all symbols loaded from .sym65 files and find addresses that fall
+    within the bounds of the file.  If it finds an address that is the start
+    of a code/data line and doesn't already have a user-supplied label,
+    and the platform symbol's label isn't already defined elsewhere, the
+    platform label will be applied.  Useful when disassembling ROM images
+    or other code with an established set of public entry points.
+    (Tip: disable "analyze uncategorized data" from the project
+    properties editor first, as this will not set labels in the middle
+    of multi-byte data items.)</li>
+</ul>
+
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,340 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Instruction and Data Analysis - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Instruction and Data Analysis</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<p><i>This section discusses the internal workings of SourceGen.  It is
+not necessary to understand this to use the program.</i></p>
+
+<h2><a name="analysis-process">Analysis Process</a></h2>
+
+<p>Analysis of the file data is a complex multi-step process.  Some
+changes to the project, such as adding a code start point or
+changing the CPU selection, require a full re-analysis of instructions
+and data.  Other changes, such as adding or removing a label, don't
+affect the code tracing and only require a re-analysis of the data areas.
+And some changes, such as editing a comment, only require a refresh
+of the displayed lines.</p>
+<p>It should be noted that none of the analysis results are stored in
+the project file.  Only user-supplied data, such as the locations of
+code entry points and label definitions, is written to the file.  This
+does create the possibility that two different users might get different
+results when opening the same project file with different versions of
+SourceGen, but these effects are expected to be minor.</p>
+
+<p>The analyzer performs the following steps (see the <code>Analyze</code>
+method in <code>DisasmProject.cs</code>):</p>
+<ul>
+  <li>Reset the symbol table.</li>
+  <li>Merge platform symbols into the symbol table, loading the files
+    in order.</li>
+  <li>Merge project symbols into the symbol table, stomping on any
+    platform symbols that conflict.</li>
+  <li>Merge user label symbols into the table, stomping any previous
+    entries.</li>
+  <li>Run the code analyzer.  The outcome of this is an array of analysis
+    attributes, or "anattribs", with one entry per byte in the file.
+    The Anattrib array tracks most of the state from here on.  If we're
+    doing a partial re-analysis, this step will just clone a copy of the
+    Anattrib array that was made at this point in a previous run.  (The
+    code analysis pass is described in more detail below.)</li>
+  <li>Apply user-specified labels to Anattribs.</li>
+  <li>Apply user-specified format descriptors.  These are the instruction
+    and data operand formats.</li>
+  <li>Run the data analyzer.  This looks for patterns in uncategorized
+    data, and connects instruction and data operands to target offsets.
+    The "nearby label" stuff is handled here.  Auto-labels are generated
+    for references to internal addresses.  All of the results are
+    stored in the Anattribs array.  (The data analysis pass is described in
+    more detail below.)</li>
+  <li>Remove hidden labels from the symbol table.  These are user-specified
+    labels that have been placed on offsets that are in the middle of an
+    instruction or multi-byte data item.  They can't be referenced, so we
+    want to pull them out of the symbol table.  (Remember, symbolic
+    operands use "weak references", so a missing symbol just means the
+    operand is shown as a hex value.)</li>
+  <li>Resolve references to local variables.  This sets the operand symbol
+    in Anattrib so we won't try to apply platform/project symbols to
+    zero-page addresses.  If we somehow ended up with a variable that has
+    the same as a user label, we rename the variable.</li>
+  <li>Resolve references to platform and project external symbols.
+    This sets the operand symbol in Anattrib, and adds the symbol to
+    the list that is displayed in .EQ directives.</li>
+  <li>Generate cross-reference lists.  This is done for internal references,
+    for local variables, and for any platform/project symbols that are
+    referenced.</li>
+  <li>If annotated auto-labels are enabled, the simple labels are
+    replaced with the annotated versions here.  (This can't be done earlier
+    because the annotations are generated from the cross-reference data.)</li>
+  <li>In a debug build, some validity checks are performed.</li>
+</ul>
+
+<p>Once analysis is complete, a line-by-line display list is generated
+by walking through the annotated file data.  Most of the actual text
+isn't rendered until they're needed.  For complicated multi-line items
+like string operands, the formatted text must be generated to know how
+many lines it will occupy, so it's done immediately and cached for re-use
+on subsequent runs.</p>
+
+
+<h3><a name="auto-format">Automatic Formatting</a></h3>
+
+<p>Every offset in the file is marked as an instruction byte, data byte, or
+inline data byte.  Some offsets are also marked as the start of an instruction
+or data area.  The start offsets may have a format descriptor associated
+with them.</p>
+<p>Format descriptors have a format (like "numeric" or
+"null-terminated string") a sub-format (like "hexadecimal" or
+"high ASCII"), and a length.  For
+an instruction operand the length is redundant, but for a data operand it
+determines the width of the numeric value or length of the string.  For
+this reason, instructions do not need a format descriptor, but all
+data items do.</p>
+<p>Symbolic references are format descriptors with a symbol attached.
+The symbol reference also specifies low/high/bank, for partial symbol
+references like <code>LDA #&gt;symbol</code>.</p>
+<p>Every offset marked as a start point gets its own line in the on-screen
+display list.  Embedded instructions are identified internally by
+looking for instruction-start offsets inside instructions.</p>
+
+<p>The Anattrib array holds the post-analysis state for every offset,
+including comments and formatting, but any changes you make in the
+editors are applied to the data structures that are saved in the project
+file.  After a change is made, a full or partial re-analysis is done to
+fill out the Anattribs.</p>
+<p>Consider a simple example:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1003
+L1003    NOP
+</pre>
+
+<p>We haven't explicitly formatted anything yet.  The data analyzer sees
+that the JMP operand is inside the file, and has no label, so it creates an
+auto-label at offset +000003 and a format descriptor with a symbolic
+operand reference to "L1003" at +000000.</p>
+<p>Suppose we now edit the label, changing L1003 to "FOO".  This goes into
+the project's "user label" list.  The analyzer is
+run, and applies the new "user label" to the Anattrib array.  The
+data analyzer finds the numeric reference in the JMP operand, and finds
+a label at the target address, so it creates a symbolic operand reference
+to "FOO".  When the display list is generated, the symbol "FOO" appears
+in both places.</p>
+<p>Even though the JMP operand changed from "L1003" to "FOO", the only
+change actually written to the project file is the label edit.  The
+contents of the Anattrib array are disposable, so it can be used to
+hold auto-generated labels and "fix up" numeric references.  Labels and
+format descriptors generated by SourceGen are never added to the
+project file.</p>
+
+<p>If the JMP operand were edited, a format descriptor would be added
+to the user-specified descriptor list.  During the analysis pass it would
+be added to the Anattrib array at offset +000000.</p>
+
+
+<h3><a name="undo-redo">Interaction With Undo/Redo</a></h3>
+
+<p>The analysis pass always considers the current state of the user
+data structures.  Whether you're adding a label or removing one, the
+code runs through the same set of steps.  The advantage of this approach
+is that the act of doing a thing, undoing a thing, and redoing a thing
+are all handled the same way.</p>
+<p>None of the editors modify the project data structures directly.  All
+changes are added to a change set, which is processed by a single
+"apply changes" function.  The change sets are kept in the undo/redo
+buffer indefinitely.  After
+the changes are made, the Anattrib array and other data structures are
+regenerated.</p>
+
+<p>Data format editing can create some tricky situations.  For example,
+suppose you have 8 bytes that have been formatted as two 32-bit words:</p>
+
+<pre>
+1000: 68690074           .dd4    $74006968
+1004: 65737400           .dd4    $00747365
+</pre>
+
+<p>You realize these are null-terminated strings, select both words, and
+reformat them:</p>
+
+<pre>
+1000: 686900             .zstr   "hi"
+1003: 74657374+          .zstr   "test"
+</pre>
+
+<p>Seems simple enough.  Under the hood, SourceGen created three changes:</p>
+<ol>
+  <li>At offset +000000, replace the current format descriptor (4-byte
+    numeric) with a 3-byte null-terminated string descriptor.</li>
+  <li>At offset +000003, add a new 5-byte null-terminated string
+    descriptor.</li>
+  <li>At offset +000004, remove the 4-byte numeric descriptor.</li>
+</ol>
+
+<p>Each entry in the change set has "before" and "after" states for the
+format descriptor at a specific offset.  Only the state for the affected
+offsets is included -- the program doesn't record the state of the full
+project after each change (even with the RAM on a modern system that would
+add up quickly).  When undoing a change, before and after are simply
+reversed.</p>
+
+
+<h2><a name="code-analysis">Code Analysis</a></h2>
+
+<p>The code tracer walks through the instructions, examining them to
+determine where execution will proceed next.  There are five possibilities
+for every instruction:</p>
+<ol>
+  <li>Continue.  Execution always continues at the next instruction.
+    Examples: <code>LDA</code>, <code>STA</code>, <code>AND</code>,
+    <code>NOP</code>.</li>
+  <li>Don't continue.  The next instruction to be executed can't be
+    determined from the file data (unless you're disassembling the
+    system ROM around the BRK vector).
+    Examples: <code>RTS</code>, <code>BRK</code>.</li>
+  <li>Branch always.  The operand specifies the next instruction address.
+    Examples: <code>JMP</code>, <code>BRA</code>, <code>BRL</code>.</li>
+  <li>Branch sometimes.  Execution may continue at the operand address,
+    or may execute the following instruction.  If we know the value of
+    the flags in the processor status register, we can eliminate one
+    possibility.  Examples: <code>BCC</code>, <code>BEQ</code>,
+    <code>BVS</code>.</li>
+  <li>Call subroutine.  Execution will continue at the operand address,
+    and is expected to also continue at the following instruction.
+    Examples: <code>JSR</code>, <code>JSL</code>.</li>
+</ol>
+
+<p>Branch targets are added to a list.  When the current run of instructions
+is exhausted (i.e. a "don't continue" or "branch always" instruction is
+reached), the next target is pulled off of the list.</p>
+
+<p>The state of the processor status flags is recorded for every
+instruction.  When execution proceeds to the next instruction or branches
+to a new address, the flags are merged with the flags at the new
+location.  If one execution path through a given address has the flags
+in one state (say, the carry is clear), while another execution path
+sees a different state (carry is set), the merged flag is
+"indeterminate".  Indeterminate values cannot become determinate through
+a merge, but can be set by an instruction.</p>
+
+<p>There can be multiple paths to a single address.  If the analyzer
+sees that an instruction has been visited before, with an identical set
+of status flags, the analyzer stops pursuing that path.</p>
+
+<p>The analyzer must always know the width of immediate load instructions
+when examining 65816 code, but it's possible for the status flag values
+to be indeterminate.  In such a situation, short registers are assumed.
+Similarly, if the carry flag is unknown when an <code>XCE</code> is
+performed, we assume a transition to emulation mode (E=1).</p>
+
+<p>There are three ways in which code can set a flag to a definite value:</p>
+<ol>
+  <li>With explicit instructions, like <code>SEC</code> or
+    <code>CLD</code>.</li>
+  <li>With immediate-operand instructions.  <code>LDA #$00</code> sets Z=1
+    and N=0.  <code>ORA #$80</code> sets Z=0 and N=1.</li>
+  <li>By inference.  For example, if we see a <code>BCC</code> instruction,
+    we know that the carry will be clear at the branch target address, and
+    set at the following instruction.  The instruction doesn't affect the
+    value of the flag, but we know what the value will be at both
+    addresses.</li>
+</ol>
+<p>Self-modifying code can spoil any of these, possibly requiring a
+status flag override to get correct disassembly.</p>
+
+<p>The instruction that is most likely to cause problems is <code>PLP</code>,
+which pulls the processor status flags off of the stack.  SourceGen
+doesn't try to track stack contents, so it can't know what values may
+be pulled.  In many cases the <code>PLP</code> appears not long after a
+<code>PHP</code>, so SourceGen can scan backward through the file to
+find the nearest <code>PHP</code>, and use the status flags from that.
+In practice this doesn't work well, but the "smart" behavior can be
+enabled from the project properties if desired.  Otherwise, a
+<code>PLP</code> causes all flags to be set to "indeterminate", except
+for the M/X flags on the 65816 which are left unmodified.</p>
+
+<p>Some other things that the code analyzer can't recognize automatically:</p>
+<ul>
+  <li>Jumping indirectly through an address outside the file, e.g.
+    storing an address in zero-page memory and jumping through it.</li>
+  <li>Jumping to an address by pushing the location onto the stack,
+    then executing an <code>RTS</code>.</li>
+  <li>Self-modifying code, e.g. overwriting a <code>JMP</code> instruction.</li>
+  <li>Addresses invoked by external code, e.g. interrupt handlers.</li>
+</ul>
+<p>Sometimes the indirect jump targets are coming from a table of
+addresses in the file.  If so, these can be formatted as addresses,
+and then the target locations tagged as code entry points.</p>
+<p>The 65816 adds an additional twist: some instructions combine their
+operands with the Data Bank Register ("B") to form a 24-bit address.
+SourceGen can't automatically determine what the register holds, so it
+assumes that it's equal to the program bank register ("K"), and provides
+a way to override the value.</p>
+
+
+<h3><a name="extension-scripts">Extension Scripts</a></h3>
+
+<p>Extension scripts can mark data that follows a JSR, JSL, or BRK as inline
+data, or change the format of nearby data or instructions.  The first
+time a JSR/JSL/BRK instruction is encountered, all loaded extension scripts
+that implement the appropriate interface are offered a chance to act.</p>
+
+<p>The first script that applies a format wins.  Attempts to re-format
+instructions or data that have already been formatted will fail.  This rule
+ensures that anything explicitly formatted by the user will not be
+overridden by a script.</p>
+
+<p>If code jumps into a region that is marked as inline data, the
+branch will be ignored.  If an extension script tries to flag bytes
+as inline data that have already been executed, the script will be
+ignored.  This can lead to a race condition in the analyzer if
+an extension script is doing the wrong thing.  (The race doesn't exist
+with inline data tags specified by the user, because those are applied
+before code analysis starts.)</p>
+
+
+<h2><a name="data-analysis">Data Analysis</a></h2>
+<p>The data analyzer performs two tasks.  It matches operands with
+offsets, and it analyzes uncategorized data.  This behavior can be
+modified in the
+<a href="settings.html#project-properties">project properties</a>.</p>
+
+<p>The data target analyzer examines every instruction and data operand
+to see if it's referring to an offset within the data file.  If the
+target is within the file, and has a label, a format descriptor with a
+weak symbolic reference to that label is added to the Anattrib array.  If
+the target doesn't have a label, the analyzer will either use a nearby
+label, or generate a unique label and use that.</p>
+<p>While most of the "nearby label" logic can be disabled, targets that
+land in the middle of an instruction are always adjusted backward to
+the instruction start.  This is necessary because labels are only visible
+if they're associated with the first (opcode) byte of an instruction.</p>
+
+<p>The uncategorized data analyzer tries to find character strings and
+opportunities to use the ".FILL" operation.  It breaks the file into
+pieces, where contiguous regions hold nothing but data, are not split
+across address region start/end directives, are not interrupted by data,
+and do not contain anything that the user has chosen to format.  Each
+region is scanned for matching patterns.  If a match is found, a format entry
+is added to the Anattrib array.  Otherwise, data is added as single-byte
+values.</p>
+
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,415 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Code Generation &amp; Assembly - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Code Generation &amp; Assembly</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<p>SourceGen can generate an assembly source file that, when fed into
+the target assembler, will recreate the original data file exactly.
+Every assembler is different, so support must be added to SourceGen
+for each.</p>
+<p>The generation / assembly dialog can be opened with File &gt; Assemble.</p>
+<p>If you want to show code to others, perhaps by adding a page to
+your web site, you can "export" the formatted code as text or HTML.
+This is explained in more detail <a href="#export-source">below</a>.
+
+
+<h2><a name="generate">Generating Source Code</a></h2>
+
+<p>Cross assemblers tend to generate additional files, either compiler
+intermediaries ("file.o") or metadata ("_FileInformation.txt").  Some
+generators may produce multiple source files, perhaps a link script or
+symbol definition header to go with the assembly source.  To avoid
+spreading files across the filesystem, SourceGen does all of its work
+in the same directory where the project lives.  Before you can generate
+code, you have to have assigned your project a directory.  This is why
+you can't assemble a project until you've saved it for the first time.</p>
+
+<p>The Generate and Assemble dialog has a drop-down list near the top
+that lets you pick which assembler to target.  The name of the assembler
+will be shown with the detected version number.  If the assembler
+executable isn't configured, "[latest version]" will be shown instead
+of a version number.</p>
+<p>The Settings button will take you directly to the assembler configuration
+tab in the application settings dialog.</p>
+<p>Hit the Generate button to generate the source code into a file on disk.
+The file will use the project name, with the <code>.dis65</code> extension
+replaced by <code>_&lt;assembler&gt;.S</code>.</p>
+<p>The first 64KiB of each generated file will be shown in the preview
+window.  If multiple files were generated, you can use the "preview file"
+drop-down to select between them.  Line numbers are
+prepended to each line to make it easier to track down errors.</p>
+
+
+
+<h3><a name="localizer">Label Localizer</a></h3>
+<p>The label localizer is an optional feature that automatically converts
+some labels to an assembler-specific less-than-global label format.  Local
+labels may be reusable (e.g. using "]LOOP" for multiple consecutive
+loops is easier to understand than giving each one a unique label) or
+reduce the size of a generated link table.  There are usually restrictions
+on local labels, e.g. references to them may not be allowed to cross a
+global label definition, which the localizer factors in automatically.</p>
+
+
+<h3><a name="reserved-labels">Reserved Label Names</a></h3>
+<p>Some label names aren't allowed.  For example, 64tass reserves the
+use of labels that begin with two underscores.  Most assemblers will
+also prevent you from using opcode mnemonics as labels (which means
+you can't assemble <code>jmp jmp jmp</code>).</p>
+<p>If a label doesn't appear to be legal, the generated code will use
+a suitable replacement (e.g. <code>jmp_1 jmp jmp_1</code>).</p>
+
+
+<h3><a name="platform-features">Platform-Specific Features</a></h3>
+<p>SourceGen needs to be able to assemble binaries for any system
+with any assembler, so it generally avoids platform-specific features.
+One exception to that is C64 PRG files.</p>
+<p>PRG files start with a 16-bit value that tells the OS where the
+rest of the file should be loaded.  The value is not usually part of
+the source code, but instead is generated by the assembler, based on
+the address of the first byte output.  If SourceGen detects that
+a file is PRG, the source generators for some assemblers will suppress
+the first 2 bytes, and instead pass appropriate meta-data (such as
+an additional command-line option) to the assembler.</p>
+<p>A file is treated as a PRG if:</p>
+<ul>
+  <li>it is between 3 and 65536 bytes long (inclusive)</li>
+  <li>the format at offset +000000 is a 16-bit numeric data item
+    (not executable code, not two 8-byte values, not the first part
+    of a 24-bit value, etc.)</li>
+  <li>there is an address region start directive at +000002
+  <li>the 16-bit value at +000000 is equal to the address of the
+    byte at +000002</li>
+  <li>there is no label at offset +000000 (explicit or auto-generated)</li>
+</ul>
+<p>The definition is sufficiently narrow to avoid most false-positives.
+If a file is being treated as PRG and you'd rather it weren't, you
+can add a label or reformat the bytes.  This feature is currently only
+enabled for 64tass.</p>
+
+
+<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
+
+<p>After generating sources, if you have a cross-assembler executable
+configured, you can run it by clicking the "Run Assembler" button.  The
+command-line output will be displayed, with stdout and stderr separated.
+(I'd prefer them to be interleaved, but that's not what the system
+provides.)</p>
+
+<p>The output will show the assembler's exit code, which will be zero
+on success (note: sometimes they lie.)  If it appeared to succeed,
+SourceGen will then compare the assembler's output to the original file,
+and report any differences.</p>
+<p>Failures here may be due to bugs in the cross-assembler or in
+SourceGen.  However, SourceGen can generally work around assembler bugs,
+so any failure is an opportunity for improvement.</p>
+
+
+<h2><a name="supported">Supported Assemblers</a></h2>
+
+<p>SourceGen currently supports the following cross-assemblers:</p>
+<ul>
+  <li><a href="#64tass">64tass</a></li>
+  <li><a href="#acme">ACME</a></li>
+  <li><a href="#cc65">cc65</a></li>
+  <li><a href="#merlin32">Merlin 32</a></li>
+</ul>
+
+<h3><a name="version">Version-Specific Code Generation</a></h3>
+
+<p>Code generation must be tailored to the specific version of the
+assembler.  This is most easily understood with an example.</p>
+<p>If the code has a statement like <code>MVN #$01,#$02</code>, the
+assembler is expected to output <code>54 02 01</code>, with the arguments
+reversed.  cc65 v2.17 got it backward; the behavior was fixed in v2.18.  The
+bug means we can't generate the same <code>MVN</code>/<code>MVP</code>
+instructions for both versions of the assembler.</p>
+<p>Having version-dependent source code is a bad idea.  If we generated
+reversed operands (<code>MVN #$02,#$01</code>), we'd get the correct
+output with v2.17, but the wrong output for v2.18.  Unambiguous code can
+be generated for all versions of the assembler by just outputting raw hex
+bytes, but that's ugly and annoying, so we don't want to be stuck doing
+that forever.  We want to detect which version of the assembler is in
+use, and output actual <code>MVN</code>/<code>MVP</code> instructions
+when producing code for newer versions of the assembler.</p>
+<p>When you configure a cross-assembler, SourceGen runs the executable with
+version query args, and extracts the version information from the output
+stream.  This is used by the generator to ensure that the output will compile.
+If no assembler is configured, SourceGen will produce code optimized
+for the latest version of the assembler.</p>
+
+
+<h3><a name="quirks">Assembler-Specific Bugs &amp; Quirks</a></h3>
+
+<p>This is a list of bugs and quirky behavior in cross-assemblers that
+SourceGen works around when generating code.</p>
+<p>Every assembler seems to have a different way of dealing with expressions.
+Most of them will let you group expressions with parenthesis, but that
+doesn't always help.  For example, <code>PEA label &gt;&gt; 8 + 1</code> is
+perfectly valid, but writing <code>PEA (label &gt;&gt; 8) + 1</code> will cause
+most assemblers to assume you're trying to use an alternate (and non-existent)
+form of <code>PEA</code> with indirect addressing, causing the assembler
+to halt with an error message.  The code generator needs
+to understand expression syntax and operator precedence to generate correct
+code, but also needs to know how to handle the corner cases.</p>
+
+
+<h3><a name="64tass">64tass</a></h3>
+
+<p>Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625
+<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
+
+<p>Bugs:</p>
+<ul>
+  <li>[Fixed in v1.55.2176]
+    Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
+    the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
+  <li>[Fixed in v1.55.2176] WDM is not supported.</li>
+</ul>
+
+<p>Quirks:</p>
+<ul>
+  <li>The underscore character ('_') is allowed as a character in labels,
+    but when used as the first character in a label it indicates the
+    label is local.  If you create labels with leading underscores that
+    are not local, the labels must be altered to start with some other
+    character, and made unique.</li>
+  <li>Labels starting with two underscores are "reserved".  Trying to
+    use them causes an error.</li>
+  <li>By default, 64tass sets the first two bytes of the output file to
+    the load address.  The <code>--nostart</code> flag is used to
+    suppress this.</li>
+  <li>By default, 64tass is case-insensitive, but SourceGen treats labels
+    as case-sensitive.  The <code>--case-sensitive</code> flag must be passed
+    to the assembler.</li>
+  <li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
+    and operands must be lower-case.  Most of the SourceGen options that
+    cause things to appear in upper case must be disabled.</li>
+  <li>For 65816, selecting the bank byte is done with the grave accent
+    character ('`') rather than the caret ('^').  (There's a note in the
+    docs to the effect that they plan to move to carets.)</li>
+  <li>Instructions whose argument is formed by combining with the
+    65816 Program Bank Register (16-bit JMP/JSR) must be specified
+    as 24-bit values for code that lives outside bank 0.  This is
+    true for both symbols and raw hex (e.g. <code>JSR $1234</code>
+    is invalid outside bank 0).  Attempting to JSR to a label in bank
+    0 from outside bank 0 causes an error, even though it is technically
+    a 16-bit operand.</li>
+  <li>The arguments to COP and BRK require immediate-mode syntax
+    (<code>COP #$03</code> rather than <code>COP $03</code>).
+  <li>For historical reasons, the default behavior of the assembler is to
+    assume that the source file is PETSCII, and the desired encoding for
+    strings is also PETSCII.  No character conversion is done, so anybody
+    assembling ASCII files will get ASCII strings (which works out pretty
+    well if you're assembling code for a non-Commodore target).  However,
+    the documentation says you're required to pass the "--ascii" flag when
+    the input is ASCII/UTF-8, so to build files that want ASCII operands
+    an explicit character encoding definition must be provided.</li>
+</ul>
+
+
+<h3><a name="acme">ACME</a></h3>
+
+<p>Tested versions: v0.96.4, v0.97
+<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
+
+<p>Bugs:</p>
+<ul>
+  <li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
+    outside bank zero cannot be assembled.  SourceGen currently deals with
+    this by outputting the entire file as a hex dump.</li>
+  <li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
+  <li>BRK and WDM are not allowed to have operands.</li>
+</ul>
+
+<p>Quirks:</p>
+<ul>
+  <li>The assembler shares some traits with one-pass assemblers.  In
+    particular, if you forward-reference a zero-page label, the reference
+    generates a 16-bit absolute address instead of an 8-bit zero-page
+    address.  Unlike other one-pass assemblers, the width is "sticky",
+    and backward references appearing later in the file also use absolute
+    addressing even though the proper width is known at that point.  This is
+    worked around by using explicit "force zero page" annotations on
+    all references to zero-page labels.</li>
+  <li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
+    <code>ASR</code> instead.</li>
+  <li>Does not allow the accumulator to be specified explicitly as an
+    operand, e.g. you can't write <code>LSR A</code>.</li>
+  <li>[Fixed in v0.97.]
+    Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
+    before 8-bit operands.</li>
+  <li>Officially, the preferred file extension for ACME source code is ".a",
+    but this is already used on UNIX systems for static libraries (which
+    means shell filename completion tends to ignore them).  Since ".S" is
+    pretty universally recognized as assembly source, code generated by
+    SourceGen for ACME also uses ".S".</li>
+  <li>Version 0.97 started interpreting '\' in strings as an escape
+    character, to allow C-style escapes like "\n".  This requires escaping
+    all occurrences of '\' in data strings as "\\".  Compiling an older
+    source file with a newer version of ACME may fail unless you pass
+    a backward-compatibility command-line argument.</li>
+</ul>
+
+
+<h3><a name="cc65">cc65</a></h3>
+
+<p>Tested versions: v2.17, v2.18
+<a href="https://cc65.github.io/">[web site]</a></p>
+
+<p>Bugs:</p>
+<ul>
+  <li>PC relative branches don't wrap around at bank boundaries.</li>
+  <li>BRK can only be given an argument in 65816 mode.</li>
+  <li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
+  <li>[Fixed in v2.18] <code>BRK &lt;arg&gt;</code> is assembled to opcode
+    $05 rather than $00.</li>
+  <li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
+</ul>
+
+<p>Quirks:</p>
+<ul>
+  <li>Operator precedence is unusual.  Consider <code>label &gt;&gt; 8 - 16</code>.
+    cc65 puts shift higher than subtraction, whereas languages like C
+    and assemblers like 64tass do it the other way around.  So cc65
+    regards the expression as <code>(label &gt;&gt; 8) - 16</code>, while the
+    more common interpretation would be <code>label &gt;&gt; (8 - 16)</code>.
+    (This is actually somewhat convenient, since none of the expressions
+    SourceGen currently generates require parenthesis.)</li>
+  <li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS.  All
+    other opcodes match up with the "unintended opcodes" document.</li>
+  <li>ca65 is implemented as a single-pass assembler, so label widths
+    can't always be known in time.  For example, if you use some zero-page
+    labels, but they're defined via <code>.ORG $0000</code> after the point
+    where the labels are used, the assembler will already have generated them
+    as absolute values.  Width disambiguation must be applied to operands
+    that wouldn't be ambiguous to a multi-pass assembler.</li>
+  <li>Assignment of constants and variables (<code>=</code> and
+    <code>.set</code>) ends local label scope, so the label localizer
+    has to take variable assignment into account.</li>
+  <li>The assembler is geared toward generating relocatable code with
+    multiple segments (it is, after all, an assembler for a C compiler).
+    A linker configuration script is expected to be provided for anything
+    complex.  SourceGen generates a custom config file for each project.</li>
+</ul>
+
+
+<h3><a name="merlin32">Merlin 32</a></h3>
+
+<p>Tested Versions: v1.0
+<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
+<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
+</p>
+
+<p>Bugs:</p>
+<ul>
+  <li>PC relative branches don't wrap around at bank boundaries.</li>
+  <li>For some failures, an exit code of zero is returned.</li>
+  <li>Immediate operands with a comma (e.g. <code>LDA #','</code>)
+    or curly braces (e.g. <code>LDA #'{'</code>) cause an error.</li>
+  <li>Some DP indexed store instructions cause errors if the label isn't
+    unambiguously DP (e.g. <code>STX $00,X</code> vs.
+    <code>STX $0000,X</code>).  This isn't a problem with project/platform
+    symbols, which are output as two-digit hex values when possible, but
+    causes failures when direct page locations are included in the project
+    and given labels.</li>
+  <li>The check for 64KiB overflow appears to happen before instructions
+    that might be absolute or direct page are resolved and reduced in size.
+    This makes it unlikely that a full 64KiB bank of code can be
+    assembled.</li>
+</ul>
+
+<p>Quirks:</p>
+<ul>
+  <li>Operator precedence is unusual.  Expressions are generally processed
+    from left to right.  The byte-selection operators have a lower
+    precedence than all of the others, and so are always processed last.</li>
+  <li>The byte selection operators  ('&lt;', '&gt;', '^') are actually
+    word-selection operators, yielding 16-bit values when wide registers
+    are enabled on the 65816.</li>
+  <li>Values loaded into registers are implicitly mod 256 or 65536.  There
+    is no need to explicitly mask an expression.</li>
+  <li>The assembler tracks register widths when it sees SEP/REP instructions,
+    but doesn't attempt to track the emulation flag.  So if you issue a
+    <code>REP #$20</code>
+    while in emulation mode, the assembler will incorrectly assume long
+    registers.  Ideally it would be possible to configure that off, but
+    there's no way to do that, so instead we occasionally generate
+    additional width directives.</li>
+  <li>Non-unique local labels should cause an error, but don't.</li>
+  <li>No undocumented opcodes are supported, nor are the Rockwell
+    65C02 instructions.</li>
+</ul>
+
+
+
+<h2><a name="export-source">Exporting Source Code</a></h2>
+<p>The "export" function takes what you see in the code list in the app
+and converts it to text or HTML.  The options you've set in the app
+settings, such as capitalization, text delimiters, pseudo-opcode names,
+operand expression style, and display of cycle counts are all taken into
+account.  The file generated is not expected to work with an actual
+assembler.</p>
+<p>The text output is similar to what you'd get by copying lines to the
+clipboard and pasting them into a text file, except that you have greater
+control over which columns are included.  The HTML version is augmented
+with links and (optionally) images.</p>
+
+<p>Use File &gt; Export to open the export dialog.  You have several
+options:</p>
+<ul>
+  <li><b>Include only selected lines</b>.  This allows you to choose between
+    exporting all or part of a file.  If no lines are selected, the entire
+    file will exported.  This setting does <b>not</b> affect link generation
+    for HTML output, so you may have some dead internal links if you don't
+    export the entire file.</li>
+  <li><b>Include notes</b>.  Notes are normally excluded from generated
+    sources.  Check this to include them.</li>
+  <li><b>Show &lt;Column&gt;</b>.  The leftmost five columns are optional,
+    and will not appear in the output unless the appropriate option is
+    checked.</li>
+  <li><b>Column widths</b>.  These determine the minimum widths of the
+    rightmost four columns.  These are not hard limits: if the contents
+    of the column are too wide, the next column will start farther over.
+    The widths are not used at all for CSV output.</li>
+  <li><b>Text vs. CSV</b>.  For text generation, you can choose between
+    plain text and Comma-Separated Value format.  The latter is useful
+    for importing source code into another application, such as a
+    spreadsheet.</li>
+  <li><b>Generate image files</b>.  When exporting to HTML, selecting this
+    will cause GIF images to be generated for visualizations.</li>
+  <li><b>Overwrite CSS file</b>.  Some aspects of the HTML output's format
+    are defined by a file called "SGStyle.css", which may be shared between
+    multiple HTML files and customized.  The file is copied out
+    of the RuntimeData directory without modification.  It will be
+    created if it doesn't exist, but will not be overwritten unless this
+    box is checked.  The setting is <b>not</b> sticky, and will revert
+    to unchecked.  (Think of this as a proactive alternative to "are you
+    sure you wish to overwrite SGStyle.css?")</li>
+</ul>
+<p>Once you've picked your options, click either "Generate HTML" or
+"Generate Text", then select an output file name from the standard file
+dialog.  Any additional files generated, such as graphics for HTML pages,
+will be written to the same directory.</p>
+
+<p>All output uses UTF-8 encoding.  Filenames of HTML files will have '#'
+replaced with '_' to make linking easier.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,467 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Editors - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Editors</h1>
+<p><a href="index.html">Back to index</a></p>
+
+
+<h2><a name="address">Define Address Region</a></h2>
+
+<p><a href="intro-details.html#address-regions">Address regions</a>
+may be created, edited, resized, or removed.  Which
+operation is performed depends on the current selection.  You can
+specify the start and end points of a region by selecting the entire
+region, or by selecting just the first and last lines.</p>
+<p>In all cases, you can specify the range's initial address
+as a hexadecimal value.  You can prefix it with '$', but that's not
+required.
+24-bit addresses may be written with a bank separator, e.g. "12/3456"
+would resolve to address $123456.
+If you want to set the region to be non-addressable, enter
+"<code>NA</code>".</p>
+
+<p>You can also enter a <a href="intro-details.html#pre-labels">pre-label</a>
+or specify that the operand should be formatted as a
+<a href="intro-details.html#relative-addr">relative address</a>.
+
+<p>To delete a region, click the "Delete Region" button.</p>
+
+<h4>Create</h4>
+
+<p>If your selection starts with a code or data line, the editor
+will allow to create a new address region.  If a single line was
+selected, the default behavior will be to create a region with a
+floating end point.  If multiple lines were selected, the default
+behavior will be to create a region with a fixed end point.</p>
+
+<p>The address field will be initialized to the address of the
+first selected line.</p>
+
+<p>You can create a child region that shares the same start offset
+as an existing region by selecting the first code or data line
+within that region.  Note that regions with floating end points cannot
+have the same start offset as another region.</p>
+
+<h4>Edit</h4>
+
+<p>If you select only the address region start line, perhaps by
+double-clicking the operand there, you will be able to edit the
+current region's properties.</p>
+
+<p>If the region has a floating end point, you can choose to convert
+it to a fixed end.  The end doesn't move; it just gets fixed in place.
+This is a quick way to "lock down" regions once you've established
+their end points.</p>
+
+<h4>Resize</h4>
+
+<p>If you select multiple lines, and the first line is an address
+region start directive, you will be able to resize that region to
+the selection.  By definition, the updated region will have a fixed
+end point.</p>
+
+<h4>Other notes</h4>
+
+<p>There is no affordance for moving the start offset of a region.  You
+must create a new region and then delete the old one.</p>
+
+<p>Regions may not "straddle" the start or end points of other regions.</p>
+
+<p>Double-clicking on the pseudo-opcode of a region start or end
+declaration will move the selection to the other end, rather than
+opening the editor.</p>
+
+<p>To see detailed information about an address region in the "Info"
+window, select the region start or end directive.  You can see the
+current arrangement of address regions across your entire
+project with Navigate &gt; View Address Map.</p>
+
+
+
+<h2><a name="flags">Override Status Flags</a></h2>
+
+<p>The state of the processor status flags are tracked for every
+instruction.  Each individual flag is recorded as zero, one, or
+"indeterminate", meaning it could hold either value at the start of
+that instruction.  You can override the value of individual flags.</p>
+<p>The 65816 emulation bit, which is not part of the processor status
+register, may also be set in the editor.</p>
+<p>The M, X, and E flags will not be editable unless your CPU configuration
+is set to 65816.</p>
+
+
+<h2><a name="label">Edit Label</a></h2>
+<p>Sets or clears a label at the selected offset.  The label must have the
+<a href="intro-details.html#about-symbols">proper form</a>, and not have the same
+name as another symbol, unless it's specified to be non-unique.  If you
+edit an auto-generated label you will be required to change the name.</p>
+<p>The label may be marked as non-unique local, unique local, global,
+or global and exported.  The default is global.  If you start typing
+a label with the non-unique label prefix character (usually '@',
+configurable in
+<a href="settings.html#appset-displayformat">application settings</a>),
+the selection will automatically switch to non-unique local.</p>
+<p>Local labels may be "promoted" to global if the assembler requires it.
+Most assemblers define local scope as starting clean after each global
+label, but there are exceptions.  If a label's name conflicts or is
+incompatible with the assembler, it will be renamed.</p>
+<p>Exported labels are added to a table that may
+be imported by other projects (see
+<a href="advanced.html#multi-bin">Working With Multiple Binaries</a>).</p>
+
+
+<h2><a name="instruction-operand">Edit Operand (Instruction)</a></h2>
+<p>Operands can be formatted explicitly, or you can let the disassembler
+select the format for you.  By default, immediate constants and
+addresses with no matching symbol are formatted as hex.  Symbols
+defined as address labels, platform/project symbols, and local
+variables will be identified and applied automatically.</p>
+
+<h3><a name="explicit-format">Explicit Formats</a></h3>
+<p>Operands can be displayed in a variety of numeric formats, or as a
+symbol.  The character formats are only available for operands
+whose value falls into the proper range.  The ASCII format handles
+both plain and high ASCII; the correct encoding is chosen based on
+the operand's value.</p>
+<p>Symbols may be used in their entirety, or, when used as constants,
+can be shifted and masked.
+The low / high / bank selector determines which byte is used as the
+low byte.  For 16-bit operands, this acts as a shift rather than a byte
+select.  If the symbol is wider than the operand field, e.g. you're
+referencing a 16-bit address in an 8-bit constant, a mask will be
+applied automatically.</p>
+<p>The editor will try to prevent you from using auto-generated
+labels and local variables in the symbol field.  These types of symbols
+can be freely renamed by SourceGen, and thus cannot be reliably
+referenced by name.
+You can reference a non-unique local by writing it with the non-unique
+label prefix character (default '@').  Ambiguous non-unique references
+are not allowed, so if the symbol can't be found the label will
+be discarded.</p>
+<p>When you select a non-default format option, a "preview" of the
+formatted operand will be shown.</p>
+<p>The <code>MVN</code> and <code>MVP</code> instructions on the 65816
+are a bit peculiar, because they have two operands rather than one.
+SourceGen currently only allows you to set one format, which will be
+applied to both operands.  If you specify a symbol, the symbol will
+be used twice, adjusted if necessary.  (This limitation may be addressed
+in a future release.)</p>
+<p>The <code>BBR</code> and <code>BBS</code> instructions on the W65C02
+also have two operands: a direct page address, and a relative branch.
+In general the direct page address is ignored, so these are treated as
+branch instructions.</p>
+
+<p>The bottom part of the window has some shortcuts for working with
+address references and local variables.  These are primarily used to
+change the way things work when "Default" is selected.  The shortcuts
+don't cause any changes to the recorded format of the instruction
+being edited.  All of the actions can be performed elsewhere, by
+editing the label at the target address, editing the project symbol
+set, or editing a local variable table.</p>
+
+<h3><a name="shortcut-nar">Numeric Address References</a></h3>
+
+<p>For operands that are 8-bit, 16-bit, or 24-bit addresses, you can
+define a symbol for the address as a label or
+<a href="intro-details.html#symbol-types">project symbol</a>.</p>
+<p>If the operand is an address inside the project, you can set a
+label at that address.  If the address falls in the middle of an
+instruction or multi-byte data item, its position will be adjusted to
+the start.  Labels may be created, modified, or (by erasing the label)
+deleted.</p>
+<p>The label finder does not do the optional search for "nearby" labels
+that the main analyzer does, so there will be times when an instruction
+that is shown with a symbol in the code list won't have a symbol
+in the editor.</p>
+
+<p>If the operand is an address outside the project, e.g. a ROM
+address or I/O location, you can define a project symbol.  If a
+match was found in the configured platform definition files, it will be
+shown; it can't be edited, but it can be overridden by a project symbol.
+You can create or modify a project symbol by clicking on "Create Project
+Symbol" or "Edit Project Symbol".  You can't delete project symbols
+from this editor (use Project Properties instead).</p>
+
+<p>It's possible to have more than one project symbol for the same
+address.  For example, on the Apple II, reading from the memory-mapped
+I/O address $C000 returns the last key pressed, but writing to it
+changes the state of the 80-column display hardware, so it's useful to
+have two different names for it.  If more than one project symbol has the
+same address, the first one found will be used, which may not be
+what is desired.  In such situations, you should create the project
+symbol and then copy the symbol name into the operand.  You can do this
+in one step by clicking the "Copy to Operand" button.
+(In most cases you don't want to do this, because if the project
+symbol is deleted or renamed, you'll have operands that refer to a
+nonexistent symbol.  Unlike labels, project symbol renames do not
+refactor the rest of the project.)
+
+<h3><a name="shortcut-local-var">Local Variable References</a></h3>
+
+<p>For zero-page address operands and (65816-only) stack-relative
+constant operands, a local variable can be created or modified.  This
+requires that a local variable table has been defined at or before
+the instruction being edited.</p>
+<p>If an existing entry is found, you will be able to edit the name
+and comment fields.  If not, a new entry with a generic name and
+pre-filled value field will be created in the nearest table.</p>
+
+
+<h2><a name="data-operand">Edit Operand (Data)</a></h2>
+
+<p>This dialog offers a variety of choices, and can be used to apply a
+format to multiple lines.  You must select all of the bytes you want
+to format.  For example, to format two bytes as a 16-bit word, you must
+select both bytes in the editor.  (If you click on the first item, then
+Shift+double-click on the operand field of the last item, you can do
+this very quickly.)  The selection does not need to be contiguous: you
+can use Control+click to select scattered items.</p>
+<p>If the range is discontiguous, crosses a logical boundary
+such as a change in address or a user-specified label, or crosses a
+visual boundary like a long comment, note, or visualization, the selection
+will be split into smaller regions.  A message at the
+top of the dialog indicates how many bytes have been selected, and how
+many regions they have been divided into.</p>
+<p>(End-of-line comments do <i>not</i> split a region, and will
+disappear if they end up inside a multi-byte data item.)</p>
+
+<p>The "Simple Data" items behave the same as their equivalents in the
+Edit Operand dialog.  However, because the width is not determined by
+an instruction opcode, and multiple items can be selected, you will need
+to specify how wide each item is and what its byte order is.  For data
+you also have the option of setting the format to "Address", which marks
+the selected bytes as a numeric reference.</p>
+
+<p>Consider a simple example: suppose you find a table of 16-bit
+addresses in the code.  Click on
+the first byte, shift-click the last byte, then select the Edit Data menu
+item.  The number of bytes selected should be even.  Select
+"16-bit words, little-endian", then over to the right click on
+"Address".  When you click OK, the selected data will be formatted as a
+series of 16-bit address values.  If the addresses can be resolved inside
+the data file, each address will be assigned a label.</p>
+
+<p>The "Bulk Data" items can represent large chunks of data compactly.
+The "fill" option is only available if all selected bytes have the
+same value.
+If a region of bytes is irrelevant, perhaps used only as padding, you
+can mark it as "junk".  If it appears to be adding bytes to reach a
+power-of-two address boundary, you can designate it as an alignment
+directive.  If you have multiple regions selected, only options that
+work for all regions will be shown.</p>
+
+<p>The "String" items are enabled or disabled depending on whether the
+data you have selected is in the appropriate format.  For example,
+"Null-terminated strings" is only enabled if the data regions are
+composed entirely of characters followed by $00.  Zero-length strings
+are allowed.
+DCI (Dextral Character Inverted) strings have the high bit on the last
+byte flipped; for PETSCII this will usually look like a series of
+lower-case letters followed by a capital letter, but may look odd if the
+last character is punctuation (e.g. '!' becomes $A1, which is a
+rectangle character that SourceGen will only display as hex).</p>
+<p>The character encoding can be selected, offering a choice between
+plain ASCII, low + high ASCII, C64 PETSCII, and C64 screen codes.  When
+you change the encoding, your available options may change.  The
+low + high ASCII setting will accept both, configuring the appropriate
+encoding based on the data values, but when identifying multiple strings
+it requires that each individual string be entirely one or the other.</p>
+<p>Due to fundamental limitations of the character set, C64 screen code
+strings cannot be null terminated ($00 is '@').</p>
+
+<p>As noted earlier, to avoid burying elements such as labels in the middle
+of a data item, contiguous areas may be split into smaller regions.  This
+can sometimes have unexpected effects.  For example, this can be formatted
+as two 16-bit words or one 32-bit word:</p>
+<pre>
+         .DD1    $01
+         .DD1    $ef
+         .DD1    $01
+         .DD1    $f0
+</pre>
+
+<p>With a label in the middle, it can be formatted as two 16-bit words, but
+not as a 32-bit word:</p>
+<pre>
+         .DD1    $01
+         .DD1    $ef
+LABEL    .DD1    $01
+         .DD1    $f0
+CODE     LDA     LABEL
+</pre>
+
+<p>If this is undesirable, you can add a label at a 32-bit boundary, and
+reference that instead:</p>
+<pre>
+LABEL    .DD1    $01
+         .DD1    $ef
+         .DD1    $01
+         .DD1    $f0
+CODE     LDA     LABEL+2
+</pre>
+
+<p>With the label out of the way, the data can be formatted as desired.</p>
+
+
+<h2><a name="comment">Edit Comment</a></h2>
+<p>Enter an end-of-line (EOL) comment, or leave the text field blank to
+delete it.  EOL comments may be placed on instruction and data lines, but
+not on assembler directives.</p>
+<p>It's wise to restrict comments to the ASCII character set, because
+not all assemblers can accept UTF-8 input.  Code generators for such
+assemblers will convert non-ASCII characters to '?' or something similar.
+If this isn't a concern, you can enter any characters you like.</p>
+<p>There is no fixed limit on the number of characters, but you may
+want to limit the overall length if you're hoping to create 80-column
+output.  Some retro assemblers may have hard line length limitations,
+which could result in the comment being truncated in generated sources.</p>
+<p>A semicolon (';') is placed at the start of the comment.  If an assembler
+has different conventions, a different delimiter character may be used.  You
+don't need to include a delimiter explicitly in the comment field.</p>
+
+<p>Comments on platform symbols are read from the platform symbol file, and
+cannot be edited from within SourceGen.  Comments on project symbols are
+stored in the project file, and can be edited with the project symbol
+editor.</p>
+
+
+<h2><a name="long-comment">Edit Long Comment</a></h2>
+<p>Long comments can be arbitrarily long and span multiple lines.  They
+will be word-wrapped at a line width of your choosing.  They're always
+drawn with a fixed-width font, so you can create ASCII-art diagrams.
+Comment delimiters are added automatically at the start of each line.</p>
+<p>For a true retro look you can "box" the comment with asterisks.  You
+can create a full-width row of asterisks by putting a '*' on a line by
+itself.  (Assembly source generators are allowed to use a character
+other than '*' for the output, e.g. they might use a full set of
+box outline characters, though that's somewhat against the spirit of
+the thing.  Regardless, a solo '*' results in a line.)</p>
+<p>The bottom window will update automatically as you type, showing what
+the output is expected to look like.  The actual assembler source output
+will depend on features of the target assembler, such as comment
+delimiter choices and maximum line length limitations.  For example,
+Merlin allows a leading '*' to indicate a comment, while cc65 does not,
+so cc65 code uses ";*' instead.  Because the length limitation affects
+the length of the line, not just the comment text, an asterisk-boxed
+comment will have one fewer character per line in cc65 output.</p>
+
+<p>Clear the text field to delete the comment.</p>
+<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
+
+<p>The long comment at the very top of the project is special, as it's
+not associated with a file offset.  If you delete it, you can get it
+back by using Edit &gt; Edit Header Comment.</p>
+
+<h2><a name="data-bank">Edit Data Bank (65816 only)</a></h2>
+
+<p>Sets the Data Bank Register (DBR) value for 65816 code.  This is used
+when matching 16-bit address operands with labels.  The new value is
+in effect from the line where it's declared to the end of the file, even
+across bank boundaries.
+If you leave the text field blank, the directive will be removed.</p>
+<p>A hexadecimal value from $00 to $ff can be entered directly.  As
+with other address inputs, a leading '$' is not required.  Entering
+"K" will set the DBR to the current address, and will automatically
+update if you change the address to a different bank.</p>
+<p>The pop-up menu has a list of all banks that hold code or data.
+To make them easier to identify, each is shown with the label on the
+first address in the bank, if any.</p>
+<p>While you can override automatically-generated data bank change
+directives, you can't remove them individually.  You can disable
+automatic generation by un-checking "smart PLB handling" in the project
+properties.</p>
+<p>Because the directive is frequently associated with <code>PLB</code>
+instructions, double-clicking on a <code>PLB</code> opcode in the
+code list will open the editor.</p>
+
+
+<h2><a name="note">Edit Note</a></h2>
+<p>Notes are similar to long comments, in that they can be arbitrarily
+long and span multiple lines.  However, because they're never included
+in generated output, options like line width formatting and boxing
+aren't relevant.</p>
+<p>Instead, you can select a highlight color for the note to make it
+stand out.  You may want to assign certain colors to specific things,
+e.g. blue for "I don't know what this is" or green for "this is a
+bookmark for the really interesting stuff".  The color will be applied
+to the note in the code list and in the "Notes" window.</p>
+<p>If you don't like the standard colors you can define your own.
+You can do this with web RGB syntax, which uses a '#' followed by
+two hex digits per channel.  For example, bright red is
+<code>#ff0000</code>, while teal is <code>#008080</code>.  You can
+also simply type a color name like "violet" so long as it appears in the
+<a href="https://docs.microsoft.com/en-us/dotnet/media/art-color-table.png?view=netframework-4.8">list of Microsoft .NET colors</a>.</p>
+
+<p>Clear the text field to delete the note.</p>
+<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
+
+
+<h2><a name="project-symbol">Edit Project Symbol</a></h2>
+<p>This is used to edit the properties of a project symbol.</p>
+<p>Symbols marked as "address" will be applied automatically when an
+operand references an address outside the scope of the data file.  They
+will not be applied to addresses inside the data file.  Symbols
+marked as "constant" are not applied automatically, and must be
+explicitly specified as an operand.</p>
+<p>The label must meet the criteria for symbols (see
+<a href="intro-details.html#about-symbols">All About Symbols</a>), and must
+not have the same name as another project symbol.  It can overlap
+with platform symbols and user labels.</p>
+<p>The value may be entered in decimal, hexadecimal, or binary.  The numeric
+base you choose will be remembered, so that the value will be displayed
+the same way when used in a .EQ directive.</p>
+<p>You can optionally provide a width for address symbols.  For example,
+if the address is of a two-byte pointer or a 64-byte buffer, you would
+set the width field to cause all references to any location in that range
+to be set to the symbol.  Widths may be entered in hex or decimal.  If
+the field is left blank, a width of 1 is assumed.  Overlapping symbols
+are allowed.  The width is ignored for constants.</p>
+<p>If you enter a comment, it will be placed at the end of the line of
+the .EQ directive.</p>
+<p>For address symbols that represent a memory-mapped I/O location, it
+can be useful to have different symbols for reads and writes.  Use
+the Read/Write checkboxes to specify the desired behavior.</p>
+
+
+<h2><a name="lvtable">Create/Edit Local Variable Table</a></h2>
+<p><a href="intro-details.html#local-vars">Local variables</a> are arranged in
+tables, which are created at a specific file offset.  They must be
+associated with a line of code, and are usually placed at the start of
+a subroutine.
+The "Create Local Variable Table" action creates a new table, and
+opens the editor.  The "Edit Prior Local Variable Table" searches
+for the closest table that appears at or before the selected line,
+and edits that.</p>
+<p>The editor allows you to create, edit, and delete entries, as well
+as move and delete entire tables (though these last two options are not
+available when creating a new table).  Empty tables are allowed.  These
+can be useful if the "clear previous" flag is set.  If you want to
+delete the table, click the "Delete Table" button.</p>
+<p>Use the buttons to add, edit, or remove individual variables.  Each
+variable has a name, a value, a width, and an optional comment.  The
+standard naming rules for symbols apply.  Variables are only used for
+zero-page and stack-relative operands, so all values must fall in the
+range 0-255.  The width may extend one byte past the end (to address $0100)
+to allow 16-bit accesses to $ff (particularly useful on 65816).</p>
+<p>You can move a table to any offset that is the start of an instruction
+and doesn't already have a local variable table present.  Click the
+"Move Table" button and enter the new offset in hex.  You can also click
+on the up/down buttons to move to the next valid offset.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,86 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>End notes - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: End Notes</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="origins">Origins</a></h2>
+<p>The inspiration for SourceGen goes a long way back.  While in high
+school in the late 1980s, I read Don Lancaster's
+<i>Enhancing Your Apple II, Vol. 1</i> (available for download
+<a href="https://www.tinaja.com/ebksamp1.shtml">here</a>).  This
+included a very detailed methodology for disassembling 6502 software
+(nicely reformatted
+<a href="https://www.tinaja.com/ebooks/tearing_rework.pdf">here</a>).
+I wanted to give it a try, so I generated a monitor listing of an
+operating system (called "RDOS") that SSI used on their games, and
+printed it out on my Epson RX-80 -- tractor feed paper was helpful for
+this sort of thing -- then set to work.</p>
+
+<p>Lancaster's methodology involved highlighting different types of
+instructions with different colors, making notes, and adding labels.
+All this being done with felt-tip and colored highlighter pens.  The
+process worked remarkably well: by the time I was finished marking
+things up, I knew how everything in the code worked.</p>
+
+<p>I really wanted a better system though.  The disassembler built into
+the Apple II could get out of sync when it walked through a data area,
+so sometimes you had to hand-write in the correct instruction.  Applying
+a label to every place that referenced it was tedious.  When you got to
+the end, you had a colorful print out, but you can't run that through
+an assembler.</p>
+
+<p>There were commercially-available disassemblers that generated source
+code and removed some of the tedium from the process, and for many tasks
+they solved the problem nicely.  What I really wanted, though, looked more
+like a modern IDE, because I didn't just want it to translate machine code
+into readable form.  I wanted it to help me with the process of
+understanding the code, by providing cross-reference tables and symbol
+lists and giving me a place to scribble notes to myself while I worked.
+I especially wanted the note-scribbling, because learning how something
+works is usually an iterative process, where the function of a chunk of
+code gradually reveals itself over time.</p>
+
+<p>In 2002, while writing the 6502/65816 disassembler for CiderPress, I
+ran into the same problems I had with the original Apple II monitor: it
+blundered through data sections and got lost briefly when a new code
+section started.  You had to pick long or short registers for the entire
+diassembly, which made 65816 code something of a disaster.  I
+jotted down some notes on what I thought the core features of a good
+6502 disassembler should be, then moved on to work on other features.  It
+was another 15 years before I picked up the idea again.</p>
+
+<p>More recently, I disassembled some code by dumping it to a text
+file with CiderPress and then fiddling with it in a text editor.  I could
+leave free-form notes, but when I found some code that I wanted to
+exercise a bit I realized that getting it into an assembler was going
+to take some effort.  Raw addresses needed to be converted to labels,
+the address and byte dump in the left column needed to be stripped out --
+really just some basic text and string replace operations, but tedious
+to do by hand.</p>
+
+<p>The original design for SourceGen was substantially less feature-rich
+than the final result.  I kept discovering opportunities for features
+that I wanted to have, or at least wanted to write.  The result is
+something of a monument to creeping featurism.  Hopefully the core features
+are solid enough to excuse the excesses.</p>
+
+<p>-- Andy McFadden, September 2018</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,214 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Contents - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen Reference Manual</h1>
+<p>SourceGen is an interactive disassembler for 6502, 65C02,
+and 65816 code.  The official web site is
+<a href="https://6502bench.com/">https://6502bench.com/</a>.</p>
+
+<p>If you want to get up to speed quickly, start with the
+<a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
+
+<h2>Contents</h2>
+<ul>
+  <li><a href="intro.html">Overview</a>
+  <ul>
+    <li><a href="intro.html#fundamental-concepts">Fundamentals</a></li>
+      <ul>
+        <li><a href="intro.html#begin">About 6502 Code</a>
+        <li><a href="intro.html#charenc">Character Encoding</a></li>
+        <li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
+      </ul></li>
+    <li><a href="intro.html#sgintro">How SourceGen Works</a></li>
+  </ul></li>
+  <li><a href="intro-details.html">Digging Deeper</a>
+  <ul>
+    <li><a href="intro-details.html#about-symbols">All About Symbols</a>
+      <ul>
+        <li><a href="intro-details.html#connecting-operands">Connecting Operands With Labels</a></li>
+        <li><a href="intro-details.html#internal-address-symbols">Internal Address Symbols</a></li>
+        <li><a href="intro-details.html#external-address-symbols">External Address Symbols</a></li>
+        <li><a href="intro-details.html#unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></li>
+        <li><a href="intro-details.html#weak-refs">Weak Symbolic References</a></li>
+        <li><a href="intro-details.html#symbol-parts">Parts and Adjustments</a></li>
+        <li><a href="intro-details.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
+      </ul></li>
+    <li><a href="intro-details.html#width-disambiguation">Width Disambiguation</a></li>
+    <li><a href="intro-details.html#address-regions">Address Regions</a>
+      <ul>
+        <li><a href="intro-details.html#fixed-float">Fixed vs. Floating</a></li>
+        <li><a href="intro-details.html#non-addr">Non-Addressable Areas</a></li>
+        <li><a href="intro-details.html#pre-labels">Pre-Labels</a></li>
+        <li><a href="intro-details.html#relative-addr">Relative Addressing</a></li>
+      </ul></li>
+    <li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
+    <li><a href="intro-details.html#atags">Directing the Code Analyzer</a>
+      <ul>
+        <li><a href="intro-details.html#scripts">Extension Scripts</a></li>
+      </ul></li>
+    <li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
+  </ul></li>
+
+  <li><a href="mainwin.html">Using SourceGen</a>
+  <ul>
+    <li><a href="mainwin.html#starting-new">Starting a New Project</a></li>
+    <li><a href="mainwin.html#opening">Opening an Existing Project</a></li>
+    <li><a href="mainwin.html#working">Working With a Project</a>
+    <ul>
+      <li><a href="mainwin.html#code-list">Code List</a></li>
+      <li><a href="mainwin.html#undo">Undo &amp; Redo</a></li>
+      <li><a href="mainwin.html#references">References Window</a></li>
+      <li><a href="mainwin.html#notes">Notes Window</a></li>
+      <li><a href="mainwin.html#symbols">Symbols Window</a></li>
+      <li><a href="mainwin.html#info">Info Window</a></li>
+      <li><a href="mainwin.html#messages">Messages Window</a></li>
+      <li><a href="mainwin.html#navigation">Navigation</a></li>
+      <li><a href="mainwin.html#atags">Adding and Removing Analyzer Tags</a></li>
+      <li><a href="mainwin.html#address-table">Format Address Table</a></li>
+      <li><a href="mainwin.html#toggle-single">Toggle Single-Byte Format</a></li>
+      <li><a href="mainwin.html#format-as-word">Format As Word</a></li>
+      <li><a href="mainwin.html#toggle-data">Toggle Data Scan</a></li>
+      <li><a href="mainwin.html#clipboard">Copying to Clipboard</a></li>
+    </ul></li>
+  </ul></li>
+
+  <li><a href="editors.html">Editors</a>
+  <ul>
+    <li><a href="editors.html#address">Define Address Region<a></li>
+    <li><a href="editors.html#flags">Override Status Flags</a></li>
+    <li><a href="editors.html#label">Edit Label</a></li>
+    <li><a href="editors.html#instruction-operand">Edit Operand (Instruction)</a>
+      <ul>
+        <li><a href="editors.html#explicit-format">Explicit Formats</a></li>
+        <li><a href="editors.html#shortcut-nar">Numeric Address References</a></li>
+        <li><a href="editors.html#shortcut-local-var">Local Variable References</a></li>
+      </ul></li>
+    <li><a href="editors.html#data-operand">Edit Operand (Data)</a></li>
+    <li><a href="editors.html#comment">Edit Comment</a></li>
+    <li><a href="editors.html#long-comment">Edit Long Comment</a></li>
+    <li><a href="editors.html#data-bank">Edit Data Bank (65816 only)</a></li>
+    <li><a href="editors.html#note">Edit Note</a></li>
+    <li><a href="editors.html#project-symbol">Edit Project Symbol</a></li>
+    <li><a href="editors.html#lvtable">Create / Edit Local Variable Table</a></li>
+  </ul></li>
+
+  <li><a href="visualization.html">Visualizations</a>
+  <ul>
+    <li><a href="visualization.html#overview">Overview</a></li>
+    <li><a href="visualization.html#vis-and-sets">Visualizations and Visualization Sets</a></li>
+    <li><a href="visualization.html#runtime">Scripts Included with SourceGen</a></li>
+  </ul></li>
+
+  <li><a href="codegen.html">Code Generation &amp; Assembly</a>
+  <ul>
+    <li><a href="codegen.html#generate">Generating Source Code</a>
+    <ul>
+      <li><a href="codegen.html#localizer">Label Localizer</a></li>
+      <li><a href="codegen.html#reserved-labels">Reserved Label Names</a></li>
+      <li><a href="codegen.html#platform-features">Platform-Specific Features</a></li>
+    </ul></li>
+    <li><a href="codegen.html#assemble">Cross-Assembling Generated Code</a></li>
+    <li><a href="codegen.html#supported">Supported Assemblers</a>
+    <ul>
+      <li><a href="codegen.html#version">Version-Specific Code Generation</a></li>
+      <li><a href="codegen.html#quirks">Assembler-Specific Bugs &amp; Quirks</a>
+      <ul>
+        <li><a href="codegen.html#64tass">64tass</a></li>
+        <li><a href="codegen.html#acme">ACME</a></li>
+        <li><a href="codegen.html#cc65">cc65</a></li>
+        <li><a href="codegen.html#merlin32">Merlin 32</a></li>
+      </ul></li>
+    </ul></li>
+    <li><a href="codegen.html#export-source">Exporting Source Code</a>
+  </ul></li>
+
+  <li><a href="settings.html">Properties &amp; Settings</a>
+  <ul>
+    <li><a href="settings.html#app-settings">Application Settings</a>
+    <ul>
+      <li><a href="settings.html#appset-codeview">Code View</a></li>
+      <li><a href="settings.html#appset-textdelim">Text Delimiters</a></li>
+      <li><a href="settings.html#appset-asmconfig">Asm Config</a></li>
+      <li><a href="settings.html#appset-displayformat">Display Format</a></li>
+      <li><a href="settings.html#appset-pseudoop">Pseudo-Op</a></li>
+    </ul></li>
+    <li><a href="settings.html#project-properties">Project Properties</a>
+    <ul>
+      <li><a href="settings.html#projprop-general">General</a></li>
+      <li><a href="settings.html#projprop-projsym">Project Symbols</a></li>
+      <li><a href="settings.html#projprop-symfiles">Symbol Files</a></li>
+      <li><a href="settings.html#projprop-extscripts">Extension Scripts</a></li>
+    </ul></li>
+  </ul></li>
+
+  <li><a href="tools.html">Tools</a>
+  <ul>
+    <li><a href="tools.html#instruction-chart">Instruction Chart</a></li>
+    <li><a href="tools.html#ascii-chart">ASCII Chart</a></li>
+    <li><a href="tools.html#apple2-screen-chart">Apple II Screen Chart</a></li>
+    <li><a href="tools.html#hexdump">Hex Dump Viewer</a></li>
+    <li><a href="tools.html#file-concat">File Concatenator</a></li>
+    <li><a href="tools.html#file-slicer">File Slicer</a></li>
+    <li><a href="tools.html#omf-converter">OMF Converter</a></li>
+  </ul></li>
+
+  <li><a href="advanced.html">Advanced Topics</a>
+  <ul>
+    <li><a href="advanced.html#platform-symbols">Platform Symbol Files (.sym65)</a></li>
+    <li><a href="advanced.html#extension-scripts">Extension Scripts</a></li>
+    <li><a href="advanced.html#multi-bin">Working With Multiple Binaries</a></li>
+    <li><a href="advanced.html#overlap">Overlapping Address Spaces</a></li>
+    <li><a href="advanced.html#reloc-data">OMF Relocation Dictionaries</a></li>
+    <li><a href="advanced.html#debug">Debug Menu Options</a></li>
+  </ul></li>
+
+  <li><a href="analysis.html">Appendix: Instruction and Data Analysis</a>
+  <ul>
+    <li><a href="analysis.html#analysis-process">Analysis Process</a>
+    <ul>
+      <li><a href="analysis.html#auto-format">Automatic Formatting</a></li>
+      <li><a href="analysis.html#undo-redo">Interaction With Undo/Redo</a></li>
+    </ul></li>
+    <li><a href="analysis.html#code-analysis">Code Analysis</a>
+    <ul>
+      <li><a href="analysis.html#extension-scripts">Extension Scripts</a></li>
+    </ul></li>
+    <li><a href="analysis.html#data-analysis">Data Analysis</a></li>
+  </ul></li>
+
+  <li><a href="end-notes.html">End Notes</a> </li>
+
+  <br/>
+
+<!--
+  <li><a href="tutorials.html">Tutorials</a>
+  <ul>
+    <li><a href="tutorials.html#basic-features">Tutorial #1: Basic Features</a></li>
+    <li><a href="tutorials.html#advanced-features">Tutorial #2: Advanced Features</a></li>
+    <li><a href="tutorials.html#address-tables">Tutorial #3: Address Table Formatting</a></li>
+    <li><a href="tutorials.html#extension-scripts">Tutorial #4: Extension Scripts</a></li>
+    <li><a href="tutorials.html#visualizations">Tutorial #5: Visualizations</a></li>
+  </ul></li>
+-->
+
+</ul>
+
+
+
+</div>
+
+<div id="footer">
+<hr/>
+<p>Copyright 2020 faddenSoft</p>
+</div>
+</body>
+</html>
@@ -0,0 +1,958 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>More Details - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Intro Details</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="more-details">More Details</a></h2>
+
+<p>This section digs a little deeper into how SourceGen works.</p>
+
+
+
+<h2><a name="about-symbols">All About Symbols</a></h2>
+
+<p>A symbol has two essential parts, a label and a value.  The label is a short
+ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
+constant.  Symbols can be defined in different ways, and applied in
+different ways.</p>
+
+<p>The label syntax is restricted to a format that should be compatible
+with most assemblers:</p>
+<ul>
+  <li>2-32 characters long.</li>
+  <li>Starts with a letter or underscore.</li>
+  <li>Comprised of ASCII letters, numbers, and the underscore.</li>
+</ul>
+<p>Label comparisons are case-sensitive, as is customary for programming
+languages.</p>
+<p>Sometimes the purpose of a subroutine or variable isn't immediately
+clear, but you can take a reasonable guess.  You can document your
+uncertainty by adding a question mark ('?') to the end of the label.
+This isn't really part of the label, so it won't appear in the assembled
+output, and you don't have to include it when searching for a symbol.</p>
+<p>Some assemblers restrict the set of valid labels further.  For example,
+64tass uses a leading underscore to indicate a local label, and reserves
+a double leading underscore (e.g. <code>__label</code>) for its own
+purposes.  In such cases, the label will be modified to comply with the
+target assembler syntax.</p>
+
+<p>Operands may use parts of symbols.  For example, if you have a label
+<code>MYSTRING</code>, you can write:</p>
+<pre>
+MYSTRING .STR    "hello"
+         LDA     #&lt;MYSTRING
+         STA     $00
+         LDA     #&gt;MYSTRING
+         STA     $01
+</pre>
+<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
+
+<p>Symbols that represent a memory address within a project are treated
+differently from those outside a project.  We refer to these as internal
+and external addresses, respectively.</p>
+
+
+<h3><a name="connecting-operands">Connecting Operands with Labels</a></h3>
+
+<p>Suppose you have the following code:</p>
+<pre>
+         LDA     $1234
+         JSR     $2345
+</pre>
+<p>If we put that in a source file, it will assemble correctly.
+However, if those addresses are part of the file, the code may break if
+changes are made and things assemble to different addresses.  It would
+be better to generate code that references labels, e.g.:</p>
+<pre>
+         LDA     my_data
+         JSR     nifty_func
+</pre>
+<p>SourceGen tries to establish labels for address operands automatically.
+How this works depends on whether the operand's address is inside the file or
+external, and whether there are existing labels at or near the target
+address.  The details are explored in the next few sections.</p>
+<p>On the 65816 this process is trickier, because addresses are 24 bits
+instead of 16.  For a control-transfer instruction like <code>JSR</code>,
+the high 8 bits come from the Program Bank Register (K).  For a data-access
+instruction like <code>LDA</code>, the high 8 bits come from the Data
+Bank Register (B).  The PBR value is determined by the address in which
+the code is executing, so it's easy to determine.  The DBR value can be
+set arbitrarily.  Sometimes it's easy to figure out, sometimes it has
+to be specified manually.</p>
+
+
+<h3><a name="internal-address-symbols">Internal Address Symbols</a></h3>
+
+<p>Symbols that represent an address inside the file being disassembled
+are referred to as <i>internal</i>.  They come in two varieties.</p>
+
+<p><b>User labels</b> are labels added to instructions or data by the user.
+The editor will try to prevent you from creating a label that has the same
+name as another symbol, but if you manage to do so, the user label takes
+precedence over symbols from other sources.  User labels may be tagged
+as non-unique local, unique local, global, or global and exported.  Local
+vs. global is important for the label localizer, while exported symbols
+can be pulled directly into other projects.</p>
+
+<p><b>Auto labels</b> are automatically generated labels placed on
+instructions or data offsets that are the target of operands.  They're
+formed by appending the hexadecimal address to the letter "L", with
+additional characters added if some other symbol has already defined
+that label.  Options can be set that change the "L" to a character or
+characters based on how the label is referenced, e.g. "B" for branch targets.
+Auto labels are only added where they are needed, and are removed when
+no longer necessary.  Because auto labels may be renamed or vanish, the
+editor will try to prevent you from referring to them explicitly when
+editing operands.</p>
+
+
+<h3><a name="external-address-symbols">External Address Symbols</a></h3>
+
+<p>Symbols that represent an address outside the file being disassembled
+are referred to as <i>external</i>.  These may be ROM entry points,
+data buffers, zero-page variables, or a number of other things.  Because
+the memory address they appear at aren't within the bounds of the file,
+we can't simply put an address label on them.  Three different mechanisms
+exist for defining them.  If an instruction or data operand refers to
+an address outside the file bounds, SourceGen looks for a symbol with
+a matching address value.</p>
+
+<p><b>Platform symbols</b> are defined in platform symbol files.  These
+are named with a ".sym65" extension, and have a fairly straightforward
+name/value syntax.  Several files for popular platforms come with SourceGen
+and live in the <code>RuntimeData</code> directory.  You can also create your
+own, but they have to live in the same directory as the project file.</p>
+
+<p>Platform symbols can be addresses or constants.  Addresses are
+limited to 24-bit values, and are matched automatically.  Constants may
+be 32-bit values, but must be specified manually.</p>
+
+<p>If two platform symbols have the same label, only the most recently read
+one is kept.  If two platform symbols have different labels but the
+same value, both symbols will be kept, but the one in the file loaded
+last will take priority when doing a lookup by address.  If symbols with
+the same value are defined in the same file, the one whose symbol appears
+first alphabetically takes priority.</p>
+
+<p>Platform address symbols have an optional width.  This can be used
+to define multi-byte items, such as two-byte pointers or 256-byte stacks.
+If no width is specified, a default value of 1 is used.  Widths are ignored
+for constants.
+Overlapping symbols are resolved as described earlier, with symbols loaded
+later taking priority over previously-loaded symbols.  In addition,
+symbols defined closer to the target address take priority, so if you put
+a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
+be visible because the start point is closer to the addresses it covers
+than the start of the 256-byte range.</p>
+
+<p>Platform symbols can be designated for reading, writing, or both.
+Normally you'd want both, but if an address is a memory-mapped I/O
+location that has different behavior for reads and writes, you'd want
+to define two different symbols, and have the correct one applied
+based on the access type.</p>
+
+<p><b>Project symbols</b> behave like platform symbols, but they are
+defined in the project file itself, through the Project Properties editor.
+The editor will try to prevent you from creating two symbols with the same
+name.  If two symbols have the same value, the one whose label comes
+first alphabetically is used.</p>
+
+<p>Project symbols always have precedence over platform symbols, allowing
+you to redefine symbols within a project.  (You can "hide" a platform
+symbol by creating a project symbol constant with the same name.  Use a
+value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
+
+<p><b>Address region pre-labels</b> are an oddity: they're external
+address symbols that also act like user labels.  These are explained
+in more detail <a href="#pre-labels">later</a>.</p>
+
+<p><b>Local variables</b> are redefinable symbols that are organized
+into tables.  They're used to specify labels for zero-page addresses
+and 65816 stack-relative instructions.  These are explained in more
+detail in the next section.</p>
+
+
+<h4><a name="local-vars">How Local Variables Work</a></h4>
+
+<p>Local variables are applied to instructions that have zero
+page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
+65816 stack relative operands
+(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>).  While they must be
+unique relative to other kinds of labels, they don't have to be unique
+with respect to earlier variable definitions.  So you can define
+<code>TMP .EQ $10</code>, and a few lines later define
+<code>TMP .EQ $20</code>.  This is handy because zero-page addresses are
+often used in different ways by different parts of the program.  For
+example:</p>
+<pre>
+         LDA     ($00),Y
+         INC     $02
+         ... elsewhere ...
+         DEC     $00
+         STA     ($01),Y
+</pre>
+<p>If we had given <code>$00</code> the label <code>PTR</code> and
+<code>$02</code> the label <code>COUNT</code> globally,
+the second pair of instructions would look all wrong.  With local
+variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
+and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
+
+<p>Local variables have a value and a width.  If we create a pair of
+variable definitions like this:</p>
+<pre>
+PTR      .eq     $00        ;2 bytes
+COUNT    .eq     $02        ;1 byte
+</pre>
+<p>Then this:</p>
+<pre>
+         STA     $00
+         STX     $01
+         LDY     $02
+</pre>
+<p>Would become:</p>
+<pre>
+         STA     PTR
+         STX     PTR+1
+         LDY     COUNT
+</pre>
+
+<p>The scope of a variable definition starts at the point where it is
+defined, and stops when its definition is erased.  There are three
+ways for a table to erase an earlier definition:</p>
+<ol>
+  <li>Create a new definition with the same name.</li>
+  <li>Create a new definition that has an overlapping value.  For
+    example, if you have a two-byte variable <code>PTR = $00</code>,
+    and define a one-byte variable <code>COUNT = $01</code>, the
+    definition for <code>PTR</code> will be cleared because its second
+    byte overlaps.</li>
+  <li>Tables have a "clear previous" flag that erases all previous
+    definitions.  This doesn't usually cause anything to be generated in the
+    assembly sources; instead, it just causes SourceGen to stop using
+    that label.</li>
+</ol>
+<p>As you might expect, you're not allowed to have duplicate labels or
+overlapping values in an individual table.</p>
+<p>If a platform/project symbol has the same value as a local variable,
+the local variable is used.  If the local variable definition is cleared,
+use of the platform/project symbol will resume.</p>
+<p>Not all assemblers support redefinable variables.  In those cases,
+the symbol names will be modified to be unique (e.g. the second definition
+of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
+global scope.</p>
+
+
+<h3><a name="unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></h3>
+
+<p>Most assemblers have a notion of "local" labels, which have a scope
+that is book-ended by global labels.  These are handy for generic branch
+target names like "loop" or "notzero" that you might want to use in
+multiple places.  The exact definition of local variable scope varies
+between assemblers, so labels that you want to be local might have to
+be promoted to global (and probably renamed).</p>
+<p>SourceGen has a similar concept with a slight twist: they're called
+non-unique labels, because the goal is to be able to use the same
+label in more than one place.  Whether or not they actually turn out
+to be local is a decision deferred to assembly source generation time.
+(You can also declare a label to be a unique local if you like; the
+auto-generated labels like "L1234" do this.)</p>
+<p>When you're writing code for an assembler, it has to be unambiguous,
+because the assembler can't guess at what the output should be.  For a
+disassembler, the output is known, so a greater degree of ambiguity is
+tolerable.  Instead of throwing errors and refusing to continue, the
+source generator can modify the output until it works.  For example:<p>
+<pre>
+@LOOP    LDX     #$02
+@LOOP    DEX
+         BNE     @LOOP
+         DEY
+         BNE     @LOOP
+</pre>
+<p>This would confuse an assembler.  SourceGen already knows which @LOOP
+is being branched to, so it can just rename one of them to "@LOOP1".</p>
+<p>One situation where non-unique labels cause difficulty is with
+weak symbolic references (see next section).  For example, suppose
+the above code then did this:</p>
+<pre>
+         LDA     #&lt;@LOOP
+</pre>
+<p>While it's possible to make an educated guess at which @LOOP was
+meant, it's easy to get wrong.  In situations like this, it's best to
+give the labels different names.</p>
+
+
+<h3><a name="weak-refs">Weak Symbolic References</a></h3>
+
+<p>Symbolic references in operands are "weak references".  If the named
+symbol exists, the reference is used.  If the symbol can't be found, the
+operand is formatted in hex instead.  They're called "weak" because
+failing to resolve the reference isn't considered an error.</p>
+
+<p>It's important to know this when editing a project.  Consider the
+following trivial chunk of code:</p>
+
+<pre>
+1000: 4c0310      JMP     $1003
+1003: ea          NOP
+</pre>
+
+<p>When you load it into SourceGen, it will be formatted like this:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1003
+L1003    NOP
+</pre>
+
+<p>The analyzer found the JMP operand, and created an auto label for
+address $1003.  It then created a weak reference to "L1003" in the JMP
+operand.</p>
+
+<p>If you edit the JMP instruction's operand to use the symbol "FOO", the
+results are probably not what you want:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     $1003
+         NOP
+</pre>
+
+<p>This happened because you added a weak reference to "FOO" in the operand,
+but the label doesn't exist.  The operand is formatted as hex.  Because
+there's no longer a reference to L1003, SourceGen removed the auto-label
+as well.</p>
+
+<p>If you set the label "FOO" on the NOP instruction, you'll see what you
+probably wanted:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     FOO
+FOO      NOP
+</pre>
+
+<p>You don't actually need the explicit reference in the JMP instruction.
+If you edit the JMP operand and set it back to "Default", the code will
+still look the same.  This is because SourceGen identified the numeric
+reference, and automatically added a symbolic reference to the label on
+the NOP instruction.</p>
+
+<p>However, suppose you didn't actually want FOO as the operand label.
+You can create a project symbol, BAR with the value $1003, and then edit
+the operand to reference BAR instead.  Your code would then look like:</p>
+<pre>
+BAR      .EQ     $1003
+         .ADDRS  $1000
+         JMP     BAR
+FOO      NOP
+</pre>
+
+<p>If you change the value of BAR in the project symbol file, the operand
+will continue to refer to it, but with an adjustment.  For example, if
+you changed BAR from $1003 to $1007, the code would become:</p>
+<pre>
+BAR      .EQ     $1007
+         .ADDRS  $1000
+         JMP     BAR-4
+FOO      NOP
+</pre>
+
+<p>If you rename a label, all references to that label are updated.  For
+numeric references that happens implicitly.  For explicit operand
+references, the weak references are updated individually.  (Modern IDEs
+call this "refactoring".)</p>
+<p>If you remove a label, all of the numeric references to it will
+reference something else, probably a new auto label.  Weak references
+to the symbol will break and be formatted as hex, but will not be
+removed.  Similarly, removing symbols from a platform or project file
+will break the reference but won't modify the operands.</p>
+
+<h3><a name="symbol-parts">Parts and Adjustments</a></h3>
+
+<p>Sometimes you want to use part of a label, or adjust the value slightly.
+(I use "adjustment" rather than "offset" to avoid confusing it with file
+offsets.) Consider the following example:</p>
+<pre>
+1000: a910      LDA     #$10
+1002: 48        PHA
+1003: a906      LDA     #$06
+1005: 48        PHA
+1006: 60        RTS
+1007: 4c3aff    JMP     $ff3a
+</pre>
+
+<p>This pushes the address of the JMP instruction ($1007) onto the stack,
+and jumps to it with the RTS instruction.  However, RTS requires the
+address of the byte before the target instruction, so we actually push
+$1006.</p>
+
+<p>The disassembler won't know that offset $1007 is code because nothing
+appears to reference it.  After tagging $1007 as a code start point, the
+project looks like this:</p>
+<pre>
+         LDA      #$10
+         PHA
+         LDA      #$06
+         PHA
+         RTS
+
+         JMP      $ff3a
+</pre>
+
+<p>We set a label called "NEXT" on the JMP instruction, and then edit
+the two LDA instructions to reference the high and low parts, yielding:</p>
+<pre>
+         .ADDRS  $1000
+         LDA     #&gt;NEXT
+         PHA
+         LDA     #&lt;NEXT-1
+         PHA
+         RTS
+
+NEXT     JMP     $ff3a
+</pre>
+
+<p>SourceGen will adjust label values by whatever amount is required to
+generate the original value.  If the adjustment seems wrong, make sure
+you're selecting the right part of the symbol.</p>
+
+<p>Different assemblers use different syntaxes to form expressions.  This
+is particularly noticeable in 65816 code.  You can adjust how it appears
+on-screen from the app settings.</p>
+
+<h3><a name="nearby-targets">Automatic Use of Nearby Targets</a></h3>
+
+<p>Sometimes you want to use a symbol that doesn't match up with the
+operand.  SourceGen tries to anticipate situations where that might be
+the case, and apply adjustments for you.</p>
+
+<p>Suppose you have the following:</p>
+<pre>
+         .ADDRS  $1000
+         LDA     #$00
+         STA     L1010
+         LDA     #$20
+         STA     L1011
+         LDA     #$e1
+         STA     L1012
+         RTS
+
+L1010    .DD1    $00
+L1011    .DD1    $00
+L1012    .DD1    $00
+</pre>
+
+<p>Showing stores to three different labeled addresses is fine, but
+the code is actually setting up a single 24-bit address.  For clarity,
+you'd like the output to reflect the fact that it's a single, multi-byte
+variable.  So, if you set a label at $1010, SourceGen removes the
+nearby auto labels, and sets the numeric references to use your label:</p>
+
+<pre>
+         .ADDRS  $1000
+         LDA     #$00
+         STA     DATA
+         LDA     #$20
+         STA     DATA+1
+         LDA     #$e1
+         STA     DATA+2
+         RTS
+
+DATA     .DD1    $00
+         .DD1    $00
+         .DD1    $00
+</pre>
+
+<p>If you decide that you really wanted each store to have its own
+label, you can set labels on the other two addresses.  SourceGen won't
+search for alternate labels if the numeric reference target has a
+user-defined label.</p>
+
+<p>This is also used for self-modifying code.  For example:</p>
+<pre>
+1000: a9ff      LDA     #$ff
+1002: 8d0610    STA     $1006
+1005: 4900      EOR     #$00
+</pre>
+
+<p>The above changes the <code>EOR #$00</code> instruction to
+<code>EOR #$ff</code>.  The operand target is $1006, but we can't
+put a label there because it's in the middle of the instruction.  So
+SourceGen puts a label at $1005 and adjusts it:</p>
+<pre>
+         LDA     #$ff
+         STA     L1005+1
+L1005    EOR     #$00
+</pre>
+
+<p>If you really don't like the way this works, you can disable the
+search for nearby targets entirely from the
+<a href="settings.html#project-properties">project properties</a>.
+Self-modifying code will always be adjusted because of the limitation
+on mid-instruction labels.</p>
+
+
+<h2><a name="width-disambiguation">Width Disambiguation</a></h2>
+
+<p>It's possible to interpret certain instructions in multiple ways.
+For example, "LDA $0000" might be an absolute load from a 16-bit
+address, or it might be a direct page load from an 8-bit address.
+Humans can infer from the fact that it was written with a 4-digit address
+that it's meant to be absolute, but assemblers often treat operands
+purely as numbers, and would just see "LDA 0".  Common practice is to
+use the shortest instruction possible.</p>
+<p>Every assembler seems to address the problem in a slightly different
+way.  Some use opcode suffixes, others use operand prefixes, some
+allow both.  You can configure how they appear in the
+<a href="settings.html#app-settings">application settings</a>.</p>
+<p>SourceGen will only add width disambiguators to opcodes or operands when
+they are needed, with one exception: the opcode suffix for long
+(24-bit address) operations is always applied.  This is done because some
+assemblers require it, insisting on "LDAL" rather than "LDA" for an
+absolute long load, and because it can make 65816 code easier to read.</p>
+
+
+
+<h2 id="address-regions">Address Regions</h2>
+
+<p>Simple programs are loaded at a particular address and executed there.
+The source code starts with a directive that tells the assembler what the
+initial address is, and the code and data statements that follow are
+placed appropriately.  More complicated programs might relocate parts
+of themselves to other parts of memory, or be comprised of multiple
+"overlay" segments that, through disk I/O or bank-switching, all execute
+at the same address.</p>
+
+<p>Consider the code in the first tutorial.  It loads at $1000, copies
+part of itself to $2000, and transfers execution there:</p>
+
+<pre>
+                                   .ADDRS $1000
+1000: a0 71                        LDY    #$71
+1002: b9 17 10     L1002           LDA    SRC,y
+1005: 99 00 20                     STA    MAIN,y
+1008: 88                           DEY
+1009: 30 09                        BMI    L1014
+100b: 10 f5                        BPL    L1002
+
+100d: 00                           .DD1   $00
+100e: 68 65 6c 6c+                 .STR   "hello!"
+
+1014: 4c 00 20     L1014           JMP    MAIN
+
+1017:              SRC
+                                   .ADDRS $2000
+2000: ad 00 30     MAIN            LDA    $3000
+[...]
+</pre>
+
+<p>The arrangement of this code can be viewed in a couple of ways.  One
+way is to see it linearly: the code starts at $1000, continues to $1017,
+then restarts at $2000:</p>
+<pre>
+000000  +- start
+         |  $1000 - $1016  length=23 ($0017)
+000016  +- end (floating)
+
+000017  +- start 'MAIN'
+         |  $2000 - $2070  length=113 ($0071)
+000087  +- end (floating)
+</pre>
+
+<p>The other way to picture it is hierarchical: the file loads
+fully at $1000, and has a "child" region at offset +000017 in which the
+address changes to $2000:</p>
+<pre>
+000000  +- start
+         |  $1000 - $1016  length=23 ($0017)
+000017  | +- start 'MAIN'  pre='SRC'
+         | |  $2000 - $2070  length=113 ($0071)
+000087  | +- end
+000087  +- end
+</pre>
+
+<p>The latter is closer to what many assemblers expect, with a "physical"
+PC that starts where the file is loaded, and a "logical" or "pseudo" PC
+that determines how the code is generated.  SourceGen supports both
+approaches.  The only thing that would change in this example is that
+the nested approach allows the "SRC" label to exist.  (More on this
+later, on the section on <a href="#pre-labels">pre-labels</a>.)</p>
+
+<p>The real value of a hierarchical arrangement becomes apparent when
+the area copied out of the file is only a small part of it.  For
+example, suppose something like:</p>
+
+<pre>
+        .ADDRS  $1000
+        LDA     SUB_SRC,Y
+        STA     SUB_DST,Y
+        JMP     CONT
+
+SUB_SRC
+        .ADDRS  $2000
+SUB_DST [small routine]
+        .ADREND
+
+CONT    LDA     #$12
+        JSR     SUB_DST
+</pre>
+<p>In this case, a small routine is copied out of the middle of the
+code that lives at $1000.  We want the code at CONT to pick up where
+things left off.  If SUB_SRC is at $1009, and is 23 bytes long, then
+CONT should be $1020.  We could output <code>.ADDRS $1020</code>
+directly before CONT, but it's inconvenient to work with the generated
+code if we want to modify the subroutine (changing its length)
+and re-assemble it.</p>
+
+
+<h3 id="fixed-float">Fixed vs. Floating</h3>
+
+<p>Sometimes when disassembling code you know exactly where an address
+region starts and ends.  Other times you know where it starts, but won't
+know where it stops until you've had a chance to look at the updated
+disassembly.  In the former case you create a region with a "fixed" end
+point, in the latter you create one with a "floating" end point.</p>
+<p>Address regions with fixed end points always stop in the same place.
+Regions with floating end points stop at the next address region boundary,
+which means they can change size as regions are added or removed.
+The end will be placed for either the start of a new region (a "sibling"),
+or the end of an encapsulating region (the "parent").</p>
+
+<p>Regions that overlap must have a parent/child relationship.  Whichever
+one starts last or ends first is the child.  A strict ordering is necessary
+because a given file offset can only have one address, and if we don't
+know which region is the child we can't know which address to assign.
+Regions cannot straddle the start or end of another region, and cannot
+exactly overlap (have the same start and length) as another region.
+One consequence of these rules is that "floating" regions cannot share
+a start offset with another region, because their end point would be
+adjusted to match the end of the other region.</p>
+
+<p>The arrangement of regions is particularly important when attempting
+to resolve an address operand (such as a JSR) to a location within the
+file.  The process is straightforward if the address only appears once,
+but when overlays cause multiple parts of the file to have the same
+address, the operand target may be in different places depending on where
+the call is being made from.
+The algorithm for resolving addresses is described
+in the <a href="advanced.html#overlap">advanced topics</a> section.</p>
+
+
+<h3 id="non-addr">Non-Addressable Areas</h3>
+
+<p>Some files have contents that aren't actually loaded into memory
+addressable by the 6502.  One example is a file header, such as a load
+address extracted by the system when reading the program into memory, or
+something intended to be read by an emulator.  Another example is the
+CHR graphic data on the NES, which is loaded into an area inaccessible
+to the CPU.</p>
+
+<p>The generated source file must recreate the original binary exactly,
+but we don't really want to assign an address to non-addressable data,
+because it should never be resolved as the target of a JSR or LDA.  To
+handle this case, you can set a region's address to "NA".  The assembler
+needs to have <i>some</i> notion of address, so the start address will
+be treated as zero.</p>
+
+<p>Non-addressable regions cannot include executable code.  You may put
+labels on data items, but attempting to reference them will cause a
+warning and will likely generate code that doesn't assemble.</p>
+
+<p>It's possible to delete all address regions from a project, or edit
+them so that there are "holes" not covered by a region.
+To handle this, all projects are effectively covered by a non-addressable
+region that spans the entire file.  Any part of the file that isn't
+explicitly covered by a user-specified region will be provided an
+auto-generated non-addressable region.  Such regions don't actually exist,
+so attempting to edit one will actually cause a new region to be created.</p>
+
+
+<h3 id="pre-labels">Pre-Labels</h3>
+
+<p>The need for pre-labels was illustrated in the earlier example, where
+code in Tutorial1 was copied from $1017 to $2000.  The fundamental issue
+is that offset +000017 has <i>two</i> addresses: $1017 and $2000.  The
+assembler can only generate code for one.  Pre-labels allow you to do
+the same thing you'd do in the source code, which is to add a label
+immediately before the address is changed.</p>
+
+<p>Pre-labels are "external" symbols, similar to project symbols,
+because they refer to an address that is outside the file bounds.
+They're always treated as having global scope.
+However, they also behave like user labels, because they're generated
+as part of the instruction stream and interfere with local label
+references that cross them.</p>
+
+<p>The address of a pre-label is determined by the parent region.
+Suppose you have a file with an arrangement like:</p>
+<pre>
+  region1 start
+   ...
+    region2 start
+     ...
+    region2 end
+  region1 end
+</pre>
+
+<p>You can put a pre-label on <code>region2</code>, which will be the
+address of the byte in <code>region1</code> right before the address
+changed.  You can't put a pre-label on <code>region1</code>, because
+before <code>region1</code> there was no address.  Similarly:</p>
+<pre>
+  region1 start
+   ...
+  region1 end
+  region2 start
+   ...
+  region2 end
+</pre>
+
+<p>You can't put a pre-label on <code>region2</code> because its parent
+is non-addressable.  <code>region1</code>'s address doesn't apply,
+because <code>region1</code> ended before the label would be issued.</p>
+
+
+<h3 id="relative-addr">Relative Addressing</h3>
+
+<p>It is occasionally useful to output an address region start directive
+that uses relative addressing instead of absolute addressing.  For
+example, given:</p>
+<pre>
+        .ADDRS  $1000
+        [...]
+        .ADDRS  $2000
+</pre>
+<p>We could instead generate:</p>
+<pre>
+        .ADDRS  $1000
+        [...]
+        .ADDRS  *+$0fe9
+</pre>
+
+<p>This has no effect on the definition of the region.  It only affects
+how the start directive is generated in the assembly source file.</p>
+
+<p>The value is an offset from the current assembler program counter.
+If the new region is the child of a non-addressable region, a relative
+offset cannot be used.</p>
+
+
+
+<h2><a name="atags">Directing the Code Analyzer</a></h2>
+
+<p>Sometimes SourceGen can't automatically find the start or end of an
+instruction stream, or gets confused by inline data.  These situations
+can be resolved by adding analyzer tags.</p>
+
+<p><b>Code start point</b> tags tell the analyzer to add the offset
+to the list of instruction start points.  Suppose you've got a code
+library that begins with jump vectors, like this:</p>
+<pre>
+1000: 4c0910    JMP     $1009
+1003: 4cef10    JMP     $10ef
+1006: 4c3012    JMP     $1230
+1009: 18        CLC
+</pre>
+
+<p>When opened with SourceGen, it will look like this:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1009
+
+         .DD1    $4c
+         .DD1    $ef
+         .DD1    $10
+         .DD1    $4c
+         .DD1    $30
+         .DD1    $12
+L1009    CLC
+</pre>
+
+<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
+assumes those are data.  Further, the functions at those addresses may
+also be considered data unless some bit of code reachable from L1009
+calls into them.  If you tag $1003 and $1006 as code start points,
+you'll get better results:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1009
+         JMP     L10ef
+         JMP     L1230
+L1009    CLC
+</pre>
+
+<p>Be careful that you only tag the instruction opcode byte.  If
+you tagged each and every byte from $1003 to $1008, you would
+end up with a mess:</p>
+<pre>
+         .ADDRS  $1000
+         JMP     L1009
+         JMP &#x25bc;   L10ef
+         BPL &#x25bc;   L1053
+         JMP &#x25bc;   L1230
+         BMI     L101b
+L1009    CLC
+</pre>
+
+<p>The exact set of instructions shown depends on your CPU configuration.
+The problem is that the bytes in the middle of the instruction have
+been tagged as start points, so SourceGen is treating them as
+embedded instructions.  $EF and $12 aren't valid 6502 opcodes, so
+they're being ignored, but $10 is BPL and $30 is BMI.  Because tagging
+multiple consecutive bytes is rarely useful, SourceGen only applies code
+start tags to the first byte in a selected line.</p>
+
+<p><b>Code stop point</b> tags tell the analyzer when it should stop.  For
+example, suppose address $ff00 is known to always be nonzero, and the code
+uses that fact to get a branch-always on the 6502:</p>
+<pre>
+         .ADDRS  $1000
+         LDA     $ff00
+         BNE     L1010
+         BRK     $11
+</pre>
+
+<p>By tagging the BRK as a code stop point, you're telling the analyzer that
+it should stop trying to execute code when it reaches that point.  (Note
+that this example would actually be better solved by setting a status flag
+override on the BNE that sets Z=0, so the code tracer will know it's a
+branch-always and just do the right thing.)  As with code start points,
+code stop points should only be placed on the opcode byte.  Placing a
+code stop point in the middle of what SourceGen believes to be instruction
+will have no effect.</p>
+<p>As with code start points, only the first byte in each selected line will
+be tagged.</p>
+
+<p><b>Inline data</b> tags identify bytes as being part of the
+instruction stream, but not instructions.  A simple example of this
+is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
+<pre>
+         JSR     $bf00
+         .DD1    $function
+         .DD2    $address
+         BCS     BAD
+</pre>
+
+<p>The three bytes following the <code>JSR $bf00</code> should be tagged
+as inline data, so that the code analyzer skips over them and continues the
+analysis at the <code>BCS</code> instruction.  You can think of these as
+"code skip" tags, but they're different from stop/start points, because
+every byte of inline data must be tagged.  When
+applying the tag, all bytes in a selected line will be modified.</p>
+<p>If code branches into a region that is tagged as inline data, the
+branch will be ignored.</p>
+
+
+<h3><a name="scripts">Extension Scripts</a></h3>
+
+<p>Extension scripts are C# source files that are compiled and
+executed by SourceGen.  They can be added to a project from SourceGen's
+runtime data directory, or can live in the directory next to the project
+file.  They're used to generate visualizations of graphical data, and
+to format inline data automatically.</p>
+<p>The inline data formatting feature can significantly reduce the tedium
+in certain projects.  For example, suppose the code uses a string print
+routine that embeds a null-terminated string right after a JSR.  Ordinarily
+you'd have to walk through the code, marking every instance by hand so
+the disassembler would know where the string ends and execution resumes.
+With an extension script, you can just pass in the print routine's label,
+and let the script do the formatting automatically.</p>
+
+<p>To reduce the chances of a script causing problems, all scripts are
+executed in a sandbox with severely restricted access.  Notably, nothing
+in the sandbox can access files, except to read files from the PluginDll
+directory.</p>
+<p>The PluginDll directory lives next to the SourceGen executable, and
+contains all of the compiled script DLLs, as well as two pre-built
+application DLLs that plugins are allowed access to.  The contents
+are persistent, to avoid recompiling the scripts every time SourceGen
+is launched, but may be manually deleted without harm.</p>
+<p>More details can be found in the
+<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
+
+
+<h2><a name="pseudo-ops">Data and Directive Pseudo-Opcodes</a></h2>
+
+<p>The on-screen code list shows assembler directives that are similar
+to what the various cross-assemblers provide.  The actual directives
+generated for a given assembler may match exactly or be totally different.
+The idea is to represent the concept behind the directive, then let the
+code generator figure out the implementation details.</p>
+
+<p>There are eight assembler directives that appear in the code list:</p>
+<ul>
+  <li>.EQ - defines a symbol's value.  These are generated automatically
+    when an operand that matches a platform or project symbol is found.</li>
+  <li>.VAR - defines a local variable.  These are generated for
+    local variable tables.</li>
+  <li>.ADDRS/.ADREND - specifies the start or end of an
+    address region.</li>
+  <li>.RWID - specifies the width of the accumulator and index registers
+    (65816 only).  Note this doesn't change the actual width, just tells
+    the assembler that the width has changed.</li>
+  <li>.DBANK - specifies what value the Data Bank Register holds
+    (65816 only).  Used when matching operands to labels.</li>
+  <li>.JUNK - indicates that the data in a range of bytes is irrelevant.
+    (When generating sources, this will become .FILL or .BULK
+    depending on the contents of the memory region and the assembler's
+    capabilities.)</li>
+  <li>.ALIGN - a special case of .JUNK that indicates the irrelevant
+    bytes exist to force alignment to a memory boundary (usually a
+    256-byte page).  Depending on the memory contents, it may be possible
+    to output this as an assembler-specific alignment directive.</li>
+</ul>
+
+<p>Every data item is represented by a pseudo-op.  Some of them may
+represent hundreds of bytes and span multiple lines.</p>
+<ul>
+  <li>.DD1, .DD2, .DD3, .DD4 - basic "define data" op.  A 1-4 byte
+    little-endian value.</li>
+  <li>.DBD2, .DBD3, .DBD4 - "define big-endian data".  2-4 bytes of
+    big-endian data.  (The 3- and 4-byte versions are not currently
+    available in the UI, since they're very unusual and few assemblers
+    support them.)</li>
+  <li>.BULK - data packed in as compact a form as the assembler allows.
+    Useful for chunks of graphics data.</li>
+  <li>.FILL - a series of identical bytes.  The operand
+    has two parts, the byte count followed by the byte value.</li>
+</ul>
+
+<p>In addition, several pseudo-ops are defined for string constants:</p>
+<ul>
+  <li>.STR - basic character string.</li>
+  <li>.RSTR - string in reverse order.</li>
+  <li>.ZSTR - null-terminated string.</li>
+  <li>.DSTR - Dextral Character Inverted string.  The high bit of the
+    last byte is flipped.</li>
+  <li>.L1STR - string prefixed with a length byte.</li>
+  <li>.L2STR - string prefixed with a length word.</li>
+</ul>
+
+<p>You can configure the pseudo-operands to look more like what your
+favorite assembler uses in the
+<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
+application settings.</p>
+
+<p>String constants start and end with delimiter characters, typically
+single or double quotes.  You can configure the delimiters differently
+for each character encoding, so that it's obvious whether the text is
+in ASCII or PETSCII.  See the
+<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
+the application settings.</p>
+
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,292 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Intro - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Intro</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="overview">Overview</a></h2>
+
+<p>SourceGen converts 6502/65C02/65816 machine-language programs to
+assembly-language source.</p>
+
+<p>SourceGen has two purposes.  The first is to be a really nice
+disassembler for the 6502 and related CPUs.  Code tracing with status
+flag tracking makes it easier to separate the code from the data,
+automatic formatting of character strings and filled-data areas helps
+get the data regions sorted out, and modern IDE-style features like
+cross-reference generation and color-highlighted bookmarks help
+navigate the code while trying to figure out what it does.  A
+disassembler should help you understand the code, not just dump the
+instructions to a text file.</p>
+<p>The computer I built back in 2014 has a 4GHz CPU and 8GB of RAM.  I
+figured we should put the power of modern computing hardware to good use.</p>
+
+<p>The second purpose is to facilitate sharing and collaboration.  Most
+disassemblers generate output for a specific assembler, or in a way that's
+generic enough to match most any assembler; either way, you're left with
+a text file in somebody's idea of the "correct" format.  SourceGen keeps
+everything in an assembler-neutral format, and provides numerous options
+for customizing the display, so that multiple people viewing the same
+project can each do so with the conventions they are accustomed to.
+Code and data operands can be formatted in various numeric formats or
+as symbols.
+The project file uses a text format that is fairly diff-friendly, so
+sharing projects through git works reasonably well.  If you want source
+code you can assemble, SourceGen will generate code optimized for the
+assembler of your choice.</p>
+
+<p>The sharing and collaboration ideas only work if the formatting
+capabilities within SourceGen are sufficiently flexible.  If you need to
+generate assembly source and tweak it a bunch to express the intent of
+the original code, then passing a SourceGen project around won't work.
+This sort of thing is a bit outside the bounds of what a typical
+disassembler does, so it remains to be seen whether SourceGen succeeds at
+what it's trying to do, and also whether what it's trying to do is
+something that people actually want.</p>
+
+<p>You can get started by watching a
+<a href="https://youtu.be/dalISyBPQq8">demo video</a> and working through
+the <a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
+
+
+<h2><a name="fundamental-concepts">Fundamentals</a></h2>
+
+<p>The next few sections present some general concepts and terminology.  The
+rest of the documentation assumes you've read and understood this.</p>
+<p>It will be helpful if you already understand something about the 6502
+instruction set and assembly-language programming, but disassembling
+other programs is actually a pretty good way to learn how to code in
+assembly.  You will need to be familiar with hexadecimal numbers and
+general programming concepts to make sense of this, however.</p>
+
+<h3><a name="begin">About 6502 Code</a></h3>
+
+<p>For brevity's sake, "6502 code" should be taken to mean "code for
+the 6502 CPU or any of its derivatives, including but not limited to
+the 65C02 and 65816".  So let's talk about 6502 code.</p>
+
+<p>Code usually arrives in a big binary blob.  Some of it will be
+instructions, some of it will be data, some will be empty space used
+for variable storage.  Part of the challenge of disassembly is
+identifying which parts of the file contain which.</p>
+
+<p>Much of the code you'll find for the 6502 was written by humans,
+rather than generated by a compiler, which means it won't conform to a
+standard set of conventions.  However, most programmers will use
+subroutines, which can be identified and analyzed in isolation.  Subroutines
+are often interspersed with variable storage, referred to as a "stash".
+Variables and constants may be single-byte or multi-byte, the latter
+typically in little-endian byte order.</p>
+
+<p>Much of the data in a typical program is read-only, often in the
+form of graphics or character string data.  Graphics can be difficult
+to recognize automatically, but strings can be identified with a
+reasonable degree of confidence.  Address tables, which are a collection
+of addresses to other things, are also fairly common.</p>
+
+<p>A simple disassembler would start at the top of the file and just
+start converting bytes to instructions.  Unfortunately there's no reliable
+way to tell the difference between instructions, data, and variable
+stashes.  When the converter hits data bytes it'll start generating
+instructions that won't make sense.  You'll have another problem when the
+data ends and code resumes: 6502 instructions are variable-length, so if
+the last byte of the data area appears to be a three-byte instruction,
+the first two bytes of the next instruction area will be gobbled up.</p>
+
+<p>To make things even more difficult (sometimes deliberately), programmers
+will sometimes use a trick where they "embed" an instruction
+inside another instruction.  This allows code to branch to two different
+entry points, one of which will set a flag or load a register, and then
+continue on to common code.</p>
+
+<p>Another trick is to embed "inline data" after a JSR or JSL instruction.
+The called subroutine pulls the caller's address off the stack, uses it to
+access the parameters, then pushes the address back on after modifying it to
+point to an address past the inline data.  This can be very confusing
+for the disassembler, which will try to interpret the inline data as
+instructions.</p>
+
+<p>Sometimes code is loaded at one location, then moved to another and
+executed there.  If you're disassembling an executing program you don't
+have to worry about this, but if you're disassembling the binary from the
+loadable file on disk then you need to track the address changes.  The
+address is communicated to the assembler with a "pseudo-opcode", usually
+something like "ORG" (short for "origin").  Other pseudo-op directives
+are used to define things like constants and (for 65816 code)
+register widths.</p>
+
+<p>The 8-bit CPUs have a 16-bit (64KiB) address space, so addresses can
+range from $0000 to $ffff.  (I'm going to write hex values with a
+preceding '$', like "$12ab", rather than "0x12ab" or "12abh", because
+that's what 6502 systems commonly used.)  The 65816 has a 24-bit address
+space, but it's not contiguous -- a branch that extends past the end will
+wrap around to the start of the 64KiB "bank".  For 16-bit instruction
+operands, the bank is identified for instruction and data addresses
+by the program bank register and the data bank register, respectively.
+The disassembler can't always discern the value of the data bank
+register through static analysis, so some user input may be required.</p>
+
+<p>The 6502 has an 8-bit processor status register ("P") with a bunch of flags
+in it.  Some of the flags determine whether a conditional branch is taken
+or not, which is important because some branches appear to be conditional
+but actually are always or never taken in practice.  The disassembler needs
+to be able to figure this out so that it doesn't try to disassemble the
+bytes that follow an always-taken branch.
+A more significant concern is the M and X flags found on the 65802/65816,
+which determine the width of the registers and of immediate load
+instructions.  If you don't know what state the flags are in, you can't
+know whether <code>LDA #value</code> is two bytes or three, and the
+disassembly of the instruction stream will come out wrong.</p>
+
+<p>Some addresses correspond to memory-mapped I/O, rather than RAM or ROM.
+Accessing the address can have side effects, like changing between text
+and graphics modes.  Sometimes reading and writing have different effects.
+For example, on later models of the Apple II, reading from
+$C000 returns the most recently hit key, while writing to $C000 changes
+how 80-column display memory is mapped.</p>
+<p>On a few systems, such as the Atari 2600, RAM, ROM, and registers can
+appear at multiple locations, "mirrored" across the address space.</p>
+
+<h3><a name="charenc">Character Encoding</a></h3>
+
+<p>The American Standard Code for Information Interchange (ASCII) was
+developed in the 1960s, and became widely used as the means for representing
+text data on a computer.  It's compatible with Unicode, in that the
+binary representation of an ASCII string is exactly the same when
+expressed as a Unicode string with UTF-8 encoding.</p>
+<p>Not all 6502-based computers used ASCII, notably those from Commodore
+International (e.g. PET, VIC-20, 64, 128), which used variants
+collectively known as "PETSCII".  PETSCII had most of the same symbols,
+but rearranged them, and added a number of graphical symbols.  This was
+further complicated by the use of two different character sets, one of
+which dropped lower-case letters in favor of additional symbols, and
+the use of a separate encoding for characters stored in the text frame
+buffer ("screen codes").</p>
+<p>Apple II computers were based on ASCII, but tended to store bytes
+with the high bit set rather than clear.  This is known as "high ASCII".</p>
+
+<p>SourceGen allows you to specify that a string is encoded with ASCII,
+High ASCII, C64 PETSCII, or C64 Screen Codes.  Because the goal is to
+generate assembly sources for cross-assemblers, the C64 character
+support is limited to the set that overlaps with ASCII.</p>
+<p>For the most part only printable characters are accepted in strings,
+but certain control characters are also allowed.  The characters for
+bell ($07), linefeed ($0a), and carriage return ($0d) are recognized as
+string data, and in C64 PETSCII a number of text color and formatting
+control codes are also allowed.</p>
+
+<h3><a name="sgconcepts">SourceGen Concepts</a></h3>
+
+<p>As you work on a disassembled file, formatting operands and adding
+comments, everything you do is saved in the project file as "meta data".
+None of the data from the file being disassembled is included.  This
+should allow project files to be shared without violating the copyright
+of the work being disassembled.  (This will vary by region.  Also, note
+that the mere act of disassembling a piece of software may be illegal in
+some cases.)</p>
+
+<p>To avoid mix-ups where the wrong data file is used, the file's length
+and CRC are stored in the project file.  SourceGen will refuse to open a
+project if the data file's length and CRC don't match.</p>
+
+<p>Most of the data in the project file is associated with a file offset.
+When you create a comment, you aren't associating it with line 53, you're
+associating it with the 127th byte in the file.  This ensures that, as the
+project evolves, the comment you wrote is always connected to the
+same instruction or data item.  This also means you can't have two
+comments on the same line -- each offset only has room for one.  By
+convention, file offsets are always shown as a six-digit hexadecimal value
+with a leading '+', e.g. "+0012ab".  This makes it easy to distinguish
+between an address and a offset.</p>
+
+<p>Instruction and data operands can be formatted in various ways.  The
+formatting choice is associated with the first offset of the item.  For
+instructions the number of bytes in the operand is determined by the opcode
+(and, on the 65816, the M/X status flags).  For data items the length
+can be a single byte or an entire file.  Operand formats are not allowed
+to overlap.</p>
+
+<p>When an instruction or data operand references an address, we call
+it a <b>numeric reference</b>.  When the target address has a label, and
+the operand uses that symbol, we call that a <b>symbolic reference</b>.
+SourceGen tries to establish symbolic references whenever possible,
+so that the generated assembly source doesn't refer to hard-coded
+locations within the program.  Labels are generated automatically for
+the targets of numeric references.</p>
+
+<p>As your understanding of the disassembled code develops, you will want
+to add comments explaining it.  SourceGen projects have three kinds of
+comments:</p>
+<ol>
+  <li>End-of-line comments.  As the name implies, these appear at the
+    end of a line, to the right of the opcode or operand.</li>
+  <li>Long comments, also known as multi-line comments.  These get a
+    line all to themselves, and may span multiple lines.</li>
+  <li>Notes.  Like long comments, these get a line to themselves.  Unlike
+    long comments, these do not appear in generated assembly code.  They
+    are a way for you to leave notes to yourself, perhaps "don't forget
+    to figure this out" or "this is the cool part".</li>
+</ol>
+<p>Every file offset can have one of each.</p>
+
+<p>Labels and comments may disappear if you associate them with a file
+offset that is in the middle of a multi-byte instruction or data item.
+For example, suppose you put a long comment at offset +000010, and then
+mark a 50-byte region starting at offset +000008 as an ASCII string.  The
+comment won't be deleted, but won't be displayed either.  The same thing
+can happen to labels.  SourceGen will try to prevent this from happening
+by splitting formatted data into sub-regions at label boundaries.</p>
+
+
+<h2><a name="sgintro">How SourceGen Works</a></h2>
+
+<p>SourceGen employs a partial emulation technique that traces the flow
+of execution through the program.  Most of what a given instruction does
+isn't important; only its effect on the flow of execution matters.  This
+makes SourceGen different from most other disassemblers, because instead
+of assuming everything is code and expecting the user to separate out the
+data, it assumes everything is data and asks the user to identify where the
+code starts executing.</p>
+
+<p>SourceGen uses "code start points" to tag places where execution may
+begin.  By default, the first byte of the file is marked as a start point.
+From there, the tracing process walks through the code, pursuing all
+branches.  In many cases, if you tag all external entry points, SourceGen
+will automatically find all executable code and separate it from variable
+storage and data areas.</p>
+
+<p>As noted earlier, tracking the processor status flags can make the
+analysis more accurate.  Identifying situations where a branch instruction
+is always or never taken avoids mis-categorizing a data region as code.
+On the 65816, it's absolutely crucial to track the M/X flags, since those
+affect the width of instructions.  SourceGen tracks the value of the
+processor flags at every instruction, blending sets of flags together when
+multiple paths of execution converge.</p>
+
+<p>Once instructions and data have been separated, the instruction operands
+can be examined.  Branches, loads, and stores that reference an address
+that falls inside the address space covered by the file can be replaced
+with a symbol.  Operands that refer to addresses outside the file, such
+as ROM or operating system routines, can be replaced with a symbol defined
+by an equate directive.</p>
+
+(For more details on how this works, see the
+<a href="analysis.html">analysis appendix</a>.)
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,18 @@
+/*  
+ * Overall look and feel.
+ */ 
+body {
+    font-family: Arial, Helvetica, sans-serif;
+    padding: 0px;
+    margin: 0px;
+}   
+#content {
+    /* top right bottom left */
+    margin: 20px 10px 10px 10px;
+    /*position: relative;*/
+}
+#footer {
+    /* top right bottom left */
+    margin: 20px 10px 10px 10px;
+    /*position: relative;*/
+}
@@ -0,0 +1,615 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Using SourceGen - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Using SourceGen</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="starting-new">Starting a New Project</a></h2>
+
+<p>Select File &gt; New, or if no project is open, click "Start new project".
+This opens the Create New Project window.</p>
+<p>Start by selecting your target system from the tree on the left.
+The panel on the right will show the CPU that will be selected, as well
+as the symbol files and extension scripts that will be loaded by default.
+All of these may be overridden later from the project properties.
+(If the description in the panel on the right says "[placeholder]", it
+means that the system doesn't yet have a set of symbols defined for it.)</p>
+
+<p>Next, click the "Select File..." button.  Pick the file you wish to
+disassemble.  The dialog will update with the pathname and some notes
+about the file's size.  Click "OK" if all looks good to create the
+project.</p>
+<p><strong>NOTE:</strong> Support for very large 65816 programs is
+incomplete.  The maximum size for a data file is limited to 1 MiB.</p>
+
+<p>The first time you save the project (with File &gt; Save), you will be
+prompted for the project name.  It's best to use the data file's name
+with ".dis65" added, so this will be set as the default.  The data
+file's name is not stored in the project file, so if you pick a different
+name, or save the project in a different directory, you will have to
+select the data file manually whenever you open the project.</p>
+
+
+<h2><a name="opening">Opening an Existing Project</a></h2>
+
+<p>Select File &gt; Open, or if no project is open, click "Open
+existing project".  Select the .dis65 project file from the standard
+file dialog.</p>
+<p>SourceGen will try to open a data file with the project's name,
+minus the ".dis65".  If it can't find a file with that name, or if there's
+something wrong with it (e.g. the CRC doesn't match), you will be given
+the opportunity to specify the location of the data file to use.</p>
+
+<p>If non-fatal problems with the file are detected, a warning will be
+shown.  If it's something simple, like a missing .sym65 or extension
+script file, you'll be notified.  If it's something more complicated,
+e.g. the project has a comment on an offset that doesn't exist, you
+will be warned that the problematic data has been deleted, and will be
+lost if the project is saved.  By default, such a project will be opened
+in read-only mode, though you can override this in the dialog.  You will
+also be given the opportunity to simply cancel loading the project.</p>
+
+<p>The locations of the last few projects you've worked with are saved
+in the application settings.  You can access them from
+File &gt; Recent Projects.  If no project is open, links to the two
+most-recently-opened projects will be available.</p>
+
+
+<h2><a name="working">Working With a Project</a></h2>
+
+<p>The main project window is divided into five areas:</p>
+<ol>
+  <li>Center: the code list.  If no project is open, this will instead
+    have buttons to open a new or existing project.</li>
+  <li>Top left: cross-reference list.</li>
+  <li>Bottom left: notes list.</li>
+  <li>Top right: symbols list.</li>
+  <li>Bottom right: info on selected line.</li>
+</ol>
+
+<p>Most actions are performed in the center code list.  All of the
+sub-windows can be resized.  The window sizes and column widths are
+saved in the application settings file.</p>
+
+<p>A toolbar near the top of the screen has some shortcut buttons.
+If you hover your mouse over them, a tooltip with an explanation will
+appear.</p>
+
+
+<h3><a name="code-list">Code List</a></h3>
+
+<p>The code list provides a view of the code being disassembled.  Each
+line may be an instruction, data item, long comment, note, or
+assembler directive.</p>
+<p>The list is divided into columns:</p>
+<ul>
+  <li><b>Offset</b>. The offset within the file where the instruction
+    or data item starts.  Throughout the UI, file offsets are shown as
+    six-digit hex values with a leading '+'.</li>
+  <li><b>Address</b>.  The address where the assembled code will execute.
+    For 8-bit CPUs this is shown as a 4-digit hex number, for 16-bit
+    CPUs the bank is shown as well.  Double-click on this field to open the
+    <a href="editors.html#address">Edit Address</a> dialog.</li>
+  <li><b>Bytes</b>.  Shows up to four bytes from the data file that
+    correspond to the instruction or data.  To see the full dump of
+    a longer item, such as an ASCII string, double-click on the field
+    to open the
+    <a href="tools.html#hexdump">Hex Dump Viewer</a>.  This is
+    a floating window, so you can keep it open while you work.
+    Double-clicking in the bytes column while the window is open will
+    update the viewer's position and selection.</li>
+  <li><b>Flags</b>.  This shows the state of the status flags as they
+    are before the instruction is executed.  Double-click on this
+    field to open the
+    <a href="editors.html#flags">Edit Status Flag Override</a> dialog.</li>
+  <li><b>Attributes</b>.  Some instructions and data items have
+    interesting attributes.
+    '@' indicates an entry point,
+    'T' means one or more bytes has an analyzer tag (code start/stop/skip),
+    '#' means execution will not continue to the following instruction,
+    '>' is shown for branch targets, and
+    '!' appears when a conditional branch is never taken.
+    (This column is rarely useful and can be hidden.)</li>
+  <li><b>Label</b>.  If a label has been defined for this offset, by
+    the user or generated automatically, it will appear here.  Also,
+    full-line items like long comments and notes will start in this
+    field.  Double-click on this field to open the
+    <a href="editors.html#label">Edit Label</a> dialog.</li>
+  <li><b>Opcode</b>.  The instruction or pseudo-opcode mnemonic.
+    If an instruction is embedded inside this one, a &#x25bc; symbol
+    will appear.
+    If you double-click this field for an instruction or data item
+    whose operand refers to an address in the file, the selection will
+    jump to that location.  If the operand is a local variable, the
+    selection will jump to the point where the variable was defined.</li>
+  <li><b>Operand</b>.  The instruction or data operand.  Data operands
+    may span a large number of bytes.  Double-click on this field to
+    open the
+    <a href="editors.html#instruction-operand">Edit Instruction Operand</a>
+    or <a href="editors.html#data-operand">Edit Data Operand</a> dialog, as
+    appropriate.  (Note you can shift-double-click on data items to
+    edit multiple lines.)</li>
+  <li><b>Comment</b>.  End-of-line comment, generally shown with a ';'
+    prefix.  If enabled, cycle counts will appear here.  Double-click
+    on this field to open the
+    <a href="editors.html#comment">Edit Comment</a> dialog.</li>
+</ul>
+
+<p>Double-clicking anywhere on a line with a note or long comment will
+open the
+<a href="editors.html#note">Edit Note</a> or
+<a href="editors.html#long-comment">Edit Long Comment</a> dialog,
+respectively.</p>
+
+<p>The code list is a standard Windows list view.  You can left-click
+to select an item, ctrl-left-click to toggle individual items on and
+off, and shift-left-click to select a range.  You can select all lines
+with Edit &gt; Select All.  Resize columns by
+left-clicking on the divider in the header and dragging it.</p>
+<p>Selecting any part of a multi-line item, such as a long comment
+or character string, effectively selects the entire item.</p>
+
+<p>Right-clicking opens a menu.  The contents are the same as those in
+the Actions menu item in the menu bar.  The set of options that are
+enabled will depend on what you have selected in the main window.</p>
+<ul>
+  <li><a href="editors.html#address">Set Address</a>.  Sets the
+    target address at that offset.  When multiple lines are selected,
+    the target addresses at the start and end of the range is set.
+    Enabled when the first line selected is code, data, or an address
+    override, and the full selected range does not overlap with another
+    address override.</li>
+  <li><a href="editors.html#flags">Override Status Flags</a>.  Changes
+    the status flags at that offset.  Enabled when a single instruction
+    line is selected.</li>
+  <li><a href="editors.html#label">Edit Label</a>.  Sets the label
+    at that offset.  Enabled when a single instruction or data line is
+    selected.</li>
+  <li><a href="editors.html#instruction-operand">Edit Operand</a>.  Opens the
+    Edit Instruction Operand or Edit Data Operand window, depending on
+    what's selected.
+    Enabled when a single instruction line is selected, or when one
+    or more data lines are selected.</li>
+  <li><a href="editors.html#comment">Edit Comment</a>.  Sets the
+    comment at that offset.  Enabled when a single instruction or data
+    line is selected.</li>
+  <li><a href="editors.html#long-comment">Edit Long Comment</a>.  Sets
+    the long comment at that offset.  Enabled when a single instruction
+    or data line, or an existing long comment, is selected.</li>
+  <li><a href="editors.html#note">Edit Note</a>.  Sets the note at
+    that offset.  Enabled when a single instruction or data line, or
+    an existing note, is selected.</li>
+  <li><a href="editors.html#project-symbol">Edit Project Symbol</a>.
+    Sets the name, value, and comment of the project symbol.  Enabled
+    when a single equate directive, generated from a project symbol, is
+    selected.</li>
+  <li><a href="editors.html#lvtable">Create Local Variable Table</a>.
+    Create a new local variable table.</li>
+  <li><a href="editors.html#lvtable">Edit Prior Local Variable Table</a>.
+    Modify or delete entries in the most recently defined local
+    variable table.</li>
+  <li><a href="visualization.html#vis-and-sets">Create/Edit Visualization Set</a>.
+    Create a new visualization set or edit an existing set.</li>
+
+  <li><a href="#atags">Analyzer Tags</a> (Tag Address As Code Start Point,
+    Tag Address As Code Stop Point, Tag Bytes As Inline Data,
+    Remove Analyzer Tags).
+    Enabled when one or more code and data lines are selected.  Remove
+    Analyzer Tags is only enabled when at least one line has tags.  The
+    keyboard shortcuts are two-key combinations.</li>
+
+  <li><a href="#address-table">Format Address Table</a>.  Formats
+    a series of bytes as parts of a table of addresses.</li>
+  <li><a href="#toggle-single">Toggle Single-Byte Format</a>.  Toggles
+    a range of lines between default format and single-byte format.  Enabled
+    when one or more data lines are selected.</li>
+  <li><a href="#format-as-word">Format As Word</a>.  Formats two bytes as
+    a 16-bit little-endian word.</li>
+  <li>Delete Note / Long Comment.  Deletes the selected note or long
+    comment.  Enabled when a single note or long comment is selected.</li>
+  <li><a href="tools.html#hexdump">Show Hex Dump</a>.  Opens the hex dump
+    viewer, with the current selection highlighted.  Always enabled.  If
+    nothing is selected, the viewer will open at the top of the file.</li>
+</ul>
+
+
+<h3><a name="undo">Undo &amp; Redo</a></h3>
+
+<p>You can undo a change with Edit &gt; Undo, or Ctrl+Z.  You can redo a
+change with Edit &gt; Redo, Ctrl+Y, or Ctrl+Shift+Z.</p>
+<p>All changes to the project, including changes to the project properties,
+are added to the undo/redo buffer.  This has no fixed size limit, so no
+matter how much you change, you can always undo back to the point where
+the project was opened.</p>
+<p>The undo history is not saved as part of the project.  Closing a project
+clears it.</p>
+
+
+<h3><a name="references">References Window</a></h3>
+
+<p>When a single instruction or data line is selected in the main window,
+all references to that offset will be shown in the References window.
+For each reference, the file offset, address, and some details about the
+type of reference will be shown.</p>
+
+<p>The reference type indicates whether the origin is an instruction or
+data operand, and provides an indication of the nature of the reference:</p>
+<ul>
+  <li>call - subroutine call
+    (e.g. <code>JSR addr</code>, <code>JSL addr</code>)</li>
+  <li>branch - conditional or unconditional branch
+    (e.g. <code>JMP addr</code>, <code>BCC addr</code>)</li>
+  <li>read - read from memory
+    (e.g. <code>LDA addr</code>, <code>BIT addr</code>)</li>
+  <li>write - write to memory
+    (e.g. <code>STA addr</code>)</li>
+  <li>rmw - read-modify-write
+    (e.g. <code>LSR addr</code>, <code>TSB addr</code>)</li>
+  <li>ref - reference to address by instruction
+    (e.g. <code>LDA #&lt;addr</code>, <code>PEA addr</code>)</li>
+  <li>data - reference to address by data
+    (e.g. <code>.DD2 addr</code>)</li>
+</ul>
+<p>References from instructions that use indexed addressing
+(e.g. <code>LDA addr,Y</code>) will also show "idx" to indicate that
+the instruction is using the location as a base address.</p>
+<p>References from instructions that treat the address as a pointer
+(e.g. <code>LDA (dp),Y</code>) will show "ptr".  This makes it easy
+to identify the locations that are reading or writing through the
+pointer from those that are reading or writing the pointer itself.</p>
+<p>This will be prefixed with "Sym" or "Oth" to indicate whether or not
+the reference used the label at the current address.  To understand
+this, consider that addresses can be referenced in different ways.
+For example:</p>
+<pre>
+         LDA     DATA0
+         LDX     DATA0+1
+         RTS
+DATA0    .DD1    $80
+DATA1    .DD2    $90
+</pre>
+<p>Both <code>DATA0</code> and <code>DATA1</code> are accessed, but
+both operands used <code>DATA0</code>.  When the <code>DATA0</code> line
+is selected in the code list, the references window will show the
+<code>LDA</code> and <code>LDX</code> instructions, because both
+instructions referenced it.  When <code>DATA1</code> is selected, the
+references window will show the <code>LDX</code>, because that
+instruction accessed <code>DATA1</code>'s location even though it didn't
+use the symbol.  To make the difference clear, the lines in the references
+window will either show "Sym" (to indicate that the symbol at the selected
+line was referenced) or "Oth" (to indicate that some other symbol, or no
+symbol, was used).</p>
+
+<p>When an equate directive (generated for platform and project
+symbols) or local variable assignment is selected, the References
+window will show all references to that symbol.  Unlike in-file
+references, only the uses of that symbol are shown.  So if you have
+both a project symbol and a local variable for address $30, they
+will show disjoint sets of references.  Furthermore, if you explicitly
+format an instruction operand as hex, e.g. <code>LDA $30</code>, it will
+not appear in either set because it's not a symbolic reference.</p>
+<p>The cross-reference data is used to generate the set of equate
+directives at the top of the listing.  If nothing references a platform
+or project symbol, an equate directive will not be generated for it.</p>
+
+<p>Double-clicking on a reference moves the code list selection to that
+reference, and adds the previous selection to the navigation stack.</p>
+
+
+<h3><a name="notes">Notes Window</a></h3>
+
+<p>When you add a note, it will also be added to this window.
+Double-clicking on a note will jump directly to it, and add the previous
+selection to the navigation stack.  This makes notes useful as bookmarks.</p>
+
+
+<h3><a name="symbols">Symbols Window</a></h3>
+
+<p>All known <a href="intro-details.html#about-symbols">symbols</a> are shown
+here.  The filter buttons allow you to screen out symbols you're not
+interested in, such as platform symbols or constants.</p>
+
+<p>Clicking on one of the column headers will sort the list on that
+field.  Click a second time to reverse the sort direction.</p>
+
+<p>Double-clicking on an auto or user label will jump to that label, and
+add the previous selection to the navigation stack.  This can be a handy
+way to move around the file, jumping from label to label.</p>
+
+<p>The "type" column uses a two-letter code to identify the symbol's
+type and scope.  The first letter is one of A (auto), U (user),
+P (platform), J (project), R (pre-label), or V (variable).
+The second letter is one of N (non-unique local), L (local), G (global),
+X (exported), E (external), or C (constant).</p>
+
+
+<h3><a name="info">Info Window</a></h3>
+
+<p>Some additional information about the currently-selected line is
+shown, such as the formatting applied to the operand.  If the operand
+has a default format, any automatically-generated format will be noted.
+For an instruction,
+a summary is shown that includes the cycle count, flags affected, and a
+brief description of what the instruction does.  The latter can be
+especially handy for undocumented instructions.</p>
+
+
+<h3><a name="messages">Messages Window</a></h3>
+
+<p>Sometimes a change will invalidate an earlier change.  For example,
+suppose you add a code stop point, and format the data that follows
+as a string.  Later on you change it to a code start point.  You now have
+a block of executable code with a string format record sitting in the
+middle of it.  SourceGen tries very hard not to throw away anything
+you've done, but it will ignore anything invalid.</p>
+<p>If a problem like this is encountered, an entry is added to a list
+of messages displayed at the bottom of the main window.  Each entry identifies
+the nature of the problem, the severity of the problem, the offset where
+it occurred, and what was done to resolve it.  The problem categories
+include:</p>
+<ul>
+  <li>Hidden label: a label placed on code or data is now stuck in the
+    middle of a multi-byte instruction or data item.</li>
+  <li>Unresolved weak ref: a reference to a non-existent symbol was found.</li>
+  <li>Invalid offset or length: the offset or length in a format object
+    had an invalid value.</li>
+  <li>Invalid descriptor: the format descriptor is inappropriate,
+    e.g. formatting an instruction as a string.</li>
+</ul>
+<p>The "context" column will provide additional detail about the problem,
+and the "resolution" column will indicate how it's being handled.  In most
+cases, the offending item will be ignored.</p>
+<p>Double-clicking on an entry will jump to that offset.</p>
+<p>The message list will not appear if there are no messages.  You can
+hide the list by clicking on the "Hide" button to the left of the messages.
+Un-hide the list by clicking on the "N messages" button at the bottom-right
+corner of the application window.</p>
+
+
+<h3><a name="navigation">Navigation</a></h3>
+
+<p>The simplest way to move through the code list is with the scroll wheel
+on your mouse, or by left-clicking and dragging the scroll bar.  You
+can also use PgUp/PgDn and the arrow keys.</p>
+
+<p>Use Navigate &gt; Find to search for text.  This performs a case-insensitive
+text search on the label, opcode, operand, and comment fields.
+Use Navigate &gt; Find Next to find the next match, and
+Navigate &gt; Find Previous to find the previous match.  Note "next" is
+always downward, and "previous" is always upward, regardless of the
+direction of the initial search chosen in the Find dialog.</p>
+
+<p>Use Navigate &gt; Go To to jump to an offset, address, or label.  Remember
+that offsets and addresses are always hexadecimal, and offsets start
+with a '+'.  If you have a label that is also a valid hexadecimal
+address, like "FEED", the label takes precedence.  To jump to the address
+write "$FEED" instead.  If you enter a non-unique label, the selection
+will jump to the nearest instance.</p>
+
+<p>If an instruction or data line has an operand that references an address
+in the file, you can navigate to the operand's location with
+Navigate &gt; Jump to Operand.  You can also do this by double-clicking
+in the opcode column.</p>
+
+<p>When you edit something, lines throughout the listing can change.  This
+is different from a source code editor, where editing a line just changes
+that line.  To allow you to watch the effects changes have, the undo/redo
+commands try to keep the listing in the same position.
+If you want to go to the place where the last change (i.e. the change
+that will be undone by the next Undo operation) was made,
+Navigate &gt; Go to Last Change will jump to the first offset
+associated with the most recent change.
+If the last change was to the project properties, it will jump to the
+first offset in the file.</p>
+
+<p>When you jump around, e.g. by double-clicking on an opcode or an entry
+in one of the side windows, the previously-selected line is added to
+a navigation stack.  You can use Navigate &gt; Nav Forward and
+Navigate &gt; Nav Backward to move forward and backward through the
+stack.  (The curly arrows on the left side of the toolbar may be more
+convenient.  You can use Alt+Left/Right Arrow, or
+Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)</p>
+
+
+<h3><a name="atags">Adding and Removing Analyzer Tags</a></h3>
+
+<p><i>(Note: These were referred to as code/data "hints" in older
+versions of SourceGen.)</i></p>
+
+<p>To set code start or stop points, select the desired offsets and
+use Actions &gt; Tag Address As Code Start Point (or Stop Point).  Because
+these indicate a transition between code and data regions, there is rarely
+any need to tag multiple consecutive bytes.
+For this reason, only the first byte on each selected line will be tagged.</p>
+
+<p>For inline data, you need to cover the entire range, so every byte in every
+selected line is tagged when you select Tag Bytes As Inline Data.  Similarly,
+the Remove Analyzer Tags menu item will remove tags from every byte.</p>
+
+<p>If you're having a hard time selecting just the right bytes because
+the instructions are caught up in a multi-byte data item, such as an
+auto-detected character string, you can disable uncategorized data analysis
+(the thing that creates the .STR and .FILL ops for you).  You can do this
+from the
+<a href="settings.html#project-properties">project properties</a> editor,
+or simply by hitting Ctrl+D.  Hit that, tag the byte or bytes, then hit it
+again to re-enable the string &amp; fill analyzer.</p>
+<p>Another approach is to use the "Toggle Single-Byte Format"
+menu item to "flatten" the item.</p>
+
+
+<h3><a name="address-table">Format Address Table</a></h3>
+
+<p>Tables of addresses are fairly common.  Sometimes you'll find them as a
+series of 16-bit words, like this:</p>
+<pre>
+jmptab   .dd2    func1
+         .dd2    func2
+         .dd2    func3
+</pre>
+
+<p>While that's fairly common in 16-bit software, 8-bit software often splits
+the high and low bytes into separate arrays, like this:</p>
+<pre>
+jmptabl  .dd1    &lt;func1
+         .dd1    &lt;func2
+         .dd1    &lt;func3
+jmptabh  .dd1    &gt;func1
+         .dd1    &gt;func2
+         .dd1    &gt;func3
+</pre>
+
+<p>Sometimes the tables contain <code>address - 1</code>, because the
+values are to be pushed onto the stack for an RTS call.</p>
+
+<p>While the .dd2 case is easy to format with the data operand editor,
+formatting addresses whose components are split into multiple tables can
+be tedious.  Even in the easy case, you may want to create labels and set
+code start points for each item.</p>
+
+<p>The Address Table Formatter helps you associate symbols with the
+addresses in the table.  It works for simple and "split" tables.</p>
+<p>To use it, start by selecting the entire table.  In the examples above,
+you would select all 6 bytes.  The number of bytes in each part of a
+split table must be equal: here, it's 3 low bytes, followed by 3 high
+bytes.  If the number of bytes selected can't be evenly divided by the
+number of parts -- two parts for 16-bit data, three parts for 24-bit data --
+the formatter will report an error.</p>
+<p>With the data selected, open the format dialog with
+Actions &gt; Format Split-Address Table.  The rather complicated dialog
+is split into sections.</p>
+<ul>
+  <li>Address Characteristics: select whether the table has 16-bit
+  addresses or 24-bit addresses.  (24-bit addresses are disabled if you
+  don't have the CPU set to 65816.)  If the table is split into individual
+  sub-tables for low bytes and high bytes, check the "Parts are split
+  across sub-tables" box.  If the address parts are being pushed
+  on the stack for an RTS/RTL, check the "Adjusted for RTS/RTL" box to
+  adjust them by 1.</li>
+  <li>Low Byte Source: indicate which part of the table or word holds the
+  low bytes.  For common little-endian words, the low bytes come first.  In
+  the split-table example above, the low bytes came first, followed by the
+  high bytes, so you would select "first part of selection".  If they were
+  stored the other way around, you would click "second part" instead.</li>
+  <li>High Byte Source: indicate which part of the table or word holds
+  the high bytes.  For a 16-bit address this will be the part you didn't
+  pick for the low bytes.
+  Sometimes, if all addresses land on the same 256-byte page, the high byte
+  will be a constant in the code, and only the low bytes will be stored in
+  a table.  If that's the case, select "Constant", and enter the high byte
+  in the text box.  (Decimal, hex, and binary are accepted.)</li>
+  <li>Bank Byte Source: for 24-bit addresses, you can select "Nth part of
+  selection", which will just use whichever part you didn't specify for
+  the low and high bytes.  If the table holds 16-bit addresses, you can
+  use the "Constant" field to specify the data bank.</li>
+  <li>Options: if the table holds the addresses of executable code, check
+  the "Tag targets as code start points" box.  If the target address
+  hasn't been identified by the code analyzer through some other execution
+  path, it will be tagged as a code start point.</li>
+  <li>Generated Addresses: this shows the full list of addresses that are
+  generated with the current set of parameters.  Each address is shown with
+  a file offset and a symbol.  If the address can't be mapped within the
+  file, the offset is shown as dashes instead.  If the address can be
+  mapped, and it already has a user-specified label, the label will be
+  shown.  If no label was found, the table will show "(+)", indicating
+  that a permanent label will be added at the target offset.  If everything
+  is set up correctly, and the addresses fall entirely within the program,
+  you shouldn't see any unknown entries here.</li>
+</ul>
+
+<p>For a 16-bit address, you have three choices: low byte first, high byte
+first, or low byte only with a constant high byte.  For a 24-bit address
+the set of possibilities expands, but is essentially the same: pick the
+order in which things appear, using fixed constants if desired.</p>
+
+<p>A message at the top of the screen shows how many bytes are selected.
+It also tells you how many groups there are, but unlike the data operand
+formatter, the split-address table formatter doesn't care about group
+boundaries.  For this reason, tables do not have to be contiguous in
+memory.  The low bytes and high bytes could be on separate 256-byte
+pages.  You just need to have all of the data selected.</p>
+
+<p>It should be mentioned that SourceGen does not record the fact that the
+data in question is part of a table.  The formatting, labels, and code
+start point tags are applied as if you entered them all individually by
+hand.  The formatter is just significantly more convenient.  It also
+does everything as a single undoable action, so if it comes out looking
+wrong, just hit "undo" and try something else.</p>
+
+
+<h3><a name="toggle-single">Toggle Single-Byte Format</a></h3>
+
+<p>The "Toggle Single-Byte Format" feature provides a quick way to
+change a range of bytes to single bytes
+or back to their default format.  It's equivalent to opening the Edit
+Data Operand dialog and selecting "Single bytes" displayed as hex, or
+selecting "Default".</p>
+<p>This can be handy if the default format for a range of bytes is a
+string, but you want to see it as bytes or set a label in the middle.</p>
+
+
+<h3><a name="format-as-word">Format As Word</a></h3>
+
+<p>This is a quick way to format pairs of bytes as 16-bit words.  It's
+equivalent to opening the Edit Data Operand dialog and selecting
+"16-bit words, little-endian", displayed as hex.</p>
+
+<p>To avoid some confusing situations, it only works on sets of
+single-byte values.  This means, for example, that you can't select a
+10-byte string and have it turn into five 16-bit words.  You can select as
+many bytes as you want, but they must come in pairs.  (Remember that you
+can turn off auto-generation of strings and .FILLs with
+<a href="#toggle-data">Toggle Data Scan</a>.)</p>
+<p>As a special case, if you select a single byte, the following byte will
+also be selected.  This won't work if the following byte is part of a
+multi-byte data item, is the start of a new region (see
+<a href="editors.html#data-operand">Edit Data Operand</a> for a definition of
+what splits a region), or is the last byte in the file.</p>
+
+
+<h3><a name="toggle-data">Toggle Data Scan</a></h3>
+
+<p>This menu item is in the Edit menu, and acts as a shortcut to opening
+the Project Properties editor, and clicking on the "Analyze Uncategorized
+Data" checkbox.  When enabled, SourceGen will look for character strings and
+regions of identical bytes, and generate .STR and .FILL directives.  When
+disabled, uncategorized data is presented as one byte per line, which can
+be handy if you're trying to get at a byte in the middle of a string.</p>
+<p>As with all other project property changes, this is an undoable
+action.</p>
+
+
+<h3><a name="clipboard">Copying to Clipboard</a></h3>
+
+<p>When you use Edit &gt; Copy, all lines selected in the code list are
+copied to the system clipboard.  This can be a convenient way to post
+code snippets into forum postings or documentation.  The text is
+copied from the data shown on screen, so your chosen capitalization
+and pseudo-ops will appear in the copy.</p>
+<p>Long comments are included, but notes are not.</p>
+<p>By default, only the label, opcode, operand, and comment fields are
+included.  From the
+<a href="settings.html#app-settings">app settings</a> dialog you can select
+alternative formats that include additional columns.</p>
+
+<p>A copy of all of the fields is also written to the clipboard in CSV
+format.  If you have a spreadsheet like Excel, you can use Paste Special
+to put the data into individual cells.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,393 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Properties &amp; Settings - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Properties &amp; Settings</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="overview">Settings Overview</a></h2>
+
+<p>There are two kinds of settings: application settings, and
+project properties.</p>
+
+
+<h2><a name="app-settings">Application Settings</a></h2>
+
+<p>Application settings are stored in a file called "SourceGen-settings"
+in the SourceGen installation directory.  If the file is missing or
+corrupted, default settings will be used.  These settings are local
+to your system, and include everything from window sizes to whether or not
+you prefer hexadecimal values to be shown in upper case.  None of them
+affect the way the project analyzes code and data, though they may affect
+the way generated assembly sources look.</p>
+
+<p>The settings editor is divided into four tabs.  Changes don't take
+effect until you hit Apply or OK.</p>
+
+
+<h3><a name="appset-codeview">Code View</a></h3>
+
+<p>These settings change the way the code looks on screen.</p>
+
+<p>Click the Column Visibility buttons to hide columns.  Click them
+again to restore the column to a width appropriate for the current font.
+A "hidden" column just has a width of zero, so with careful mouse
+positioning you can show and hide columns by dragging the column headers.
+The buttons may be more convenient though.</p>
+
+<p>You can select a different font for the code list, and make it as large
+or as small as you want.  Mono-space fonts like Courier or Consolas are
+recommended (and will be the only ones shown).</p>
+
+<p>You can choose to display different parts of the display in upper or
+lower case, using the "all lower" and "all upper" buttons as a quick way
+to set all values.  These settings are also used for generated assembly
+code, unless the assembler has specific case-sensitivity requirements.  There
+is no setting for labels, which are always case-sensitive.</p>
+
+<p>The Clipboard drop-down list lets you choose the format for text
+<a href="mainwin.html#clipboard">copied to the clipboard</a>.  The
+"Assembler Source" format includes the rightmost columns (label,
+opcode, operand, and comment), like assembly source code does.  The
+"Disassembly" format adds the address and bytes on the left.  Use
+the "All Columns" format to get all columns.</p>
+
+<p>When "show cycle counts for instructions" is checked, every instruction
+line will have an end-of-line comment that indicates the number of cycles
+required for that instruction.  If the cycle count can't be determined
+solely from a static analysis, e.g. an extra cycle is required if
+<code>LDA (dp),Y</code> crosses a page boundary, a '+' will be shown.
+In some cases the variability can be factored out if the state of
+certain status flags is known, e.g. 65C02 instructions that take longer
+in decimal mode won't be shown as variable if the analyzer can determine
+that D=0 or D=1.  This checkbox enables display in the on-screen list, but
+does not affect generated source code, which can be configured independently
+on the Asm Config tab.</p>
+
+<p>Check "use 'dark' color scheme" to change the main disassembly list
+to use white text on a black background, and mute the Note highlight
+colors.
+(Most of the GUI uses standard Windows controls that take their colors
+from the system theme, but the disassembly list uses a custom style.  You
+can change the rest of the UI from the Windows display "personalization"
+controls.)</p>
+
+
+<h3><a name="appset-textdelim">Text Delimiters</a></h3>
+
+<p>Character and string operands are shown surrounded by quotes, e.g.
+<code>LDA #'*'</code> or <code>.STR "Hello, world!"</code>.  It's
+handy to be able to tell at a glance how characters are encoded, so
+SourceGen allows you to set the delimiters independently for every
+supported character encoding.</p>
+<p>String operands may contain a mixture of text and hexadecimal values.
+For example, in ASCII data, the control characters for linefeed and
+carriage return ($0a and $0d) are considered part of the string, but
+don't have a printable symbol.  (Unicode defines some glpyhs, but they
+don't look very good at smaller font sizes.)</p>
+<p>If one of the delimiter characters appears in the string itself,
+the character will be output as hex to avoid confusion.  For this
+reason, it's generally wise to use delimiter characters that aren't
+part of the ASCII character set.  The "Sample Characters" box holds some
+characters that you can copy and paste (with Ctrl+C / Ctrl+V) into the
+delimiter fields.</p>
+<p>For character operands, the prefix and suffix are added to the start
+and end of the operand.  For string operands, the prefix is added to the
+start of the first line, and suffixes aren't allowed.
+<p>These options change the way the code list looks on screen.  They
+do not affect generated code, which must use the delimiter characters
+specified by the chosen assembler.</p>
+
+
+<h3><a name="appset-displayformat">Display Format</a></h3>
+
+<p>These options change the way the code list looks on screen.  They
+do not affect generated code.</p>
+
+<p>The
+<a href="intro-details.html#width-disambiguation">operand width disambiguator</a>
+strings are used when the width of an instruction operand is unclear.
+You may specify values for all of them or none of them.</p>
+
+<p>Different assemblers have different ways of forming expressions.
+Sometimes the rules allow expressions to be written simply, other times
+explicit grouping with parenthesis is required.  Select whichever style
+you are most comfortable with.</p>
+
+<p>Non-unique labels are identified with a prefix character, typically
+'@' or ':'.  The default is '@', but you can configure it to any character
+that isn't valid for the start of a label.  (64tass uses '_' for locals,
+but that's a valid label start character, and so isn't allowed here.)
+The setting affects label editing as well as display.</p>
+
+<p>If you would like your local variables to be shown with a prefix
+character, you can set it in the "local variable prefix" box.</p>
+
+<p>The "quick set" pop-up configures the fields on the left side of the
+tab to match the conventions of the specified assembler.  Select your
+preferred assembler in the combo box to set the fields.  The setting
+automatically switches to "custom" when you edit a field.
+(64tass and ACME use the "common"
+expression style, cc65 and Merlin 32 have their own unique styles.)</p>
+
+<p>The "add spaces in Bytes column" checkbox changes the format of the
+hex data in the code list "bytes" column from dense (<code>20edfd</code>)
+to spaced (<code>20 ed fd</code>).  This also affects the format of
+clipboard copies and exports.</p>
+
+<p>The "comma-separated format for bulk data" determines whether large
+blocks of hex look like <code>ABC123</code> or
+<code>$AB,$C1,$23</code>.  The former reduces the number of lines
+required, the latter is more readable.</p>
+<p>Long operands, such as strings and bulk data, are wrapped to a new
+line after a certain number of characters.  Use the pop-up to configure
+the value.  Larger values can make the code easier to read, but smaller
+values allow you to shrink the width of the operand column in the
+on-screen listing, moving the comment field closer in.</p>
+
+
+<h3><a name="appset-pseudoop">Pseudo-Op</a></h3>
+
+<p>These options change the way the code list looks on screen.  Assembler
+directives and data pseudo-opcodes will use these values.  This does
+not affect generated source code, which always matches the conventions
+of the target assembler.</p>
+
+<p>Enter the string you want to use for the various data formats.  If
+a field is left blank, a default value is used.</p>
+
+<p>The "quick set" pop-up configures the fields on this tab to match
+the conventions of the specified assembler.  Select your preferred assembler
+in the combo box to set the fields.  The setting automatically switches to
+"custom" when you edit a field.</p>
+
+
+
+<h3><a name="appset-asmconfig">Asm Config</a></h3>
+
+<p>These settings configure cross-assemblers and modify assembly source
+generation in various ways.</p>
+<p>To configure an assembler, select it in the pop-up menu.  The fields
+will initially contain assembler-specific default values.  All of
+the values in the Assembler Configuration box may be configured
+differently for each assembler.</p>
+<p>The "executable" box holds the full path to the cross-assembler
+executable.</p>
+<ul>
+  <li>64tass: <code>64tass.exe</code>
+  <li>ACME: <code>acme.exe</code>
+  <li>cc65: <code>bin/cl65.exe</code> -- full installation required,
+    with all configuration files and libraries
+  <li>Merlin 32: <code>Merlin32.exe</code>
+</ul>
+<p>The "column widths" section allows you to specify the minimum
+width of the label, opcode, operand, and comment fields.  If the width
+is less than 1, or isn't a valid number, 1 will be used.  These are
+not hard stops: if the contents of a field are too wide, the contents
+of the next column will be pushed over.  (The comment field width is
+not currently being used, but may be used to fold lines in the future.)</p>
+
+<p>When "show cycle counts in comments" is checked, cycle counts are
+inserted into end-of-line comments.  This works the same as the option
+in the Code View tab, but applies to generated source code rather than
+the on-screen display.</p>
+
+<p>If "put long labels on separate line" is checked, labels that are
+longer than the label column are placed on their own line.  This looks
+a bit nicer because otherwise the opcode gets pushed out of alignment.
+(Some assemblers get bent out of shape if you split an equate
+directive, so those might stay on one line.)</p>
+
+<p>If you enable "identify assembler in output", a comment will be
+added to the top of the generated assembly output that identifies the
+target assembler and version.  It also shows the command-line options
+passed to the assembler.  This can be very helpful if the source
+file is sent to other people, since it may not otherwise be obvious from
+the source file what the intended target assembler is, or what options
+are required to process the file correctly.</p>
+
+
+<h2><a name="project-properties">Project Properties</a></h2>
+
+<p>Project properties are stored in the .dis65 project file.
+They specify which CPU to use, which extension scripts to load, and a
+variety of other things that directly impact how SourceGen processes
+the project.  Because of the potential impact, all changes to
+the project properties are made through the undo/redo buffer,
+which means you hit "undo" to revert a property change.</p>
+
+<p>The properties editor is divided into four tabs.  Changes aren't pushed
+out to the main application until you close the dialog.  Clicking Apply
+will capture the current changes, ensuring that they're applied even if
+you later hit Cancel, but the changes are not applied immediately.</p>
+
+
+<h3><a name="projprop-general">General</a></h3>
+
+<p>The choice of CPU determines the set of available instructions, as
+well as cycle costs and register widths.  There are many variations
+on the 6502, but from the perspective of a disassembler most can be
+treated as one of these four:</p>
+<ol>
+  <li>MOS 6502.  The original 8-bit instruction set.</li>
+  <li>WDC 65C02.  Expanded the instruction set and smoothed
+    some rough edges.</li>
+  <li>WDC W65C02S.  An enhanced version of the 65C02, with some
+    additional instructions introduced by Rockwell (R65C02), as well
+    as WDC's STP and WAI instructions.  The Rockwell additions overlap
+    with 65816 instructions, so code that uses them will not work on
+    16-bit CPUs.</li>
+  <li>WDC W65C816S.  Expanded instruction set, 24-bit address space,
+    and 16-bit registers.</li>
+</ol>
+<p>The Hudson Soft HuC6280 and Commodore CSG 4510 / 65CE02 are very
+similar, but they have additional instructions and some fundamental
+architectural changes.  These are not currently supported by SourceGen.</p>
+
+<p>If "enable undocumented instructions" is checked, some additional
+opcodes are recognized on the 6502 and 65C02.  These instructions are
+not part of the chip specification, but most of them have consistent
+behavior and can be used.  If the box is not checked, the instructions
+are treated as invalid and cause the code analyzer to assume that it
+has run into a data area.  This option has no effect on the 65816.</p>
+<p>The "treat BRK as two-byte instruction" checkbox determines whether
+BRK instructions should be handled as if they have an operand.</p>
+
+<p>The entry flags determine the initial value for the processor status
+flag register.  Code that is unreachable internally (requiring a code
+start point tag) will use this value.  This is chiefly of value for
+65816 code, where the initial value of the M/X/E flags has a significant
+impact on how instructions are disassembled.</p>
+
+<p>If "analyze uncategorized data" is checked, SourceGen will attempt to
+identify character strings and regions that are filled with a repeated
+value.  If it's not checked, anything that isn't detected as code or
+explicitly formatted as data will be shown as individual byte values.</p>
+<p>If "seek nearby targets" is checked, the analyzer will try to use
+nearby labels for data loads and stores, adjusting them to fit
+(e.g. <code>LDA LABEL+1</code>).  If not enabled, labels are not applied
+unless they match exactly.  Note that references into the middle of an
+instruction or formatted data area are always adjusted, regardless of
+how this is set.  This setting has no effect on local variables, and
+only enables a 1-byte backward search on project/platform symbols.</p>
+<p>The "use relocation data" checkbox is only available if the project
+was created from a relocatable source, e.g. by the OMF Converter tool.
+If checked, information from the relocation dictionary will be used to
+improve automatic operand formatting.</p>
+<p>If "smart PLP handling" is checked, the analyzer will try to use
+the processor status flags from a nearby <code>PHP</code> when a
+<code>PLP</code> is encountered.  If not enabled, all flags are set to
+"indeterminate" following a <code>PLP</code>, except for the M/X
+flags on the 65816, which are left unmodified.  (In practice this
+approach doesn't seem to work all that well, so the setting is
+un-checked by default.)</p>
+<p>If "smart PLB handling" is checked, the analyzer will watch for
+code patterns like <code>PLB</code> preceded by <code>PHK</code>,
+and generate appropriate Data Bank Register changes.  If not enabled,
+the DBR is set to the bank of the address of the start of the file,
+and does not change unless explicitly set.  Only useful for 65816 code.</p>
+
+<p>The "default text encoding" setting has two effects.  First, it
+specifies which character encoding to use when searching for strings in
+uncategorized data.  Second, if an assembler has a notion of preferred
+character encoding (e.g. you can default string operands to PETSCII),
+this setting will determine which encoding is preferred.</p>
+<p>The "min chars for string detection" setting determines how many
+ASCII characters need to appear consecutively for the data analyzer to
+declare it a string.  Shorter values are prone to false-positive
+identifications, longer values miss out on short strings.  You can also
+set it to "none" to disable automatic string identification.</p>
+
+<p>The auto-label style setting determines the format for labels that are
+generated automatically.  By default the label will be the letter 'L'
+followed by the hexadecimal address, but the label can be annotated based
+on usage.  For example, addresses that are the target of branch instructions
+can be labeled with the letter 'B'.</p>
+
+
+<h3><a name="projprop-projsym">Project Symbols</a></h3>
+<p>You can add, edit, and delete individual symbols and constants.
+See the <a href="intro-details.html#about-symbols">symbols</a> section for an
+explanation of how project symbols work.</p>
+
+<p>The Edit Symbol button opens the
+<a href="editors.html#project-symbol">Edit Project Symbol</a> dialog, which
+allows changing any part of a symbol definition.  You're not allowed to
+create two symbols with the same label.</p>
+
+<p>The Import button allows you to import symbols from another project.
+Only labels that have been tagged as global and exported will be imported.
+Existing symbols with identical labels will be replaced, so it's okay to
+run the importer multiple times.  Labels that aren't found will not be
+removed, so you can safely import from multiple projects, but will need
+to manually delete any symbols that are no longer being exported.</p>
+
+<p>Shortcut: you can open the project properties window with the
+Project Symbols tab selected by hitting F6 from the main code list.</p>
+
+
+<h3><a name="projprop-symfiles">Symbol Files</a></h3>
+<p>From here, you can add and remove platform symbol files, or change
+the order in which they are loaded.
+See the <a href="intro-details.html#about-symbols">symbols</a> section for an
+explanation of how platform symbols work, and the
+<a href="advanced.html#platform-symbols">advanced topics</a> section
+for a description of the file syntax.</p>
+
+<p>Platform symbol files must live in the RuntimeData directory that comes
+with SourceGen, or in the directory where the project file lives.  This
+is mostly to keep things manageable when projects are distributed to
+other people, but also acts as a minor security check, to prevent a
+wayward project from trying to open files it shouldn't.</p>
+<p>Click one of the "Add Symbol Files" buttons to include one or more
+symbol files in the project.
+The "Add Symbol Files from Runtime" button sets the directory
+to the SourceGen RuntimeData directory, while "Add Symbol Files from Project"
+starts in the project directory.  If you haven't yet saved the project,
+the latter button will be disabled.  The only difference between the
+buttons is the initial directory.</p>
+<p>In the list, files loaded from the RuntimeData directory will be
+prefixed with <code>RT:</code>.  Files loaded from the project directory
+will be prefixed with <code>PROJ:</code>.</p>
+<p>If a platform symbol file can't be found when the project is opened,
+you will receive a warning.</p>
+
+
+<h3><a name="projprop-extscripts">Extension Scripts</a></h3>
+<p>From here, you can add and remove extension script files.
+See the <a href="advanced.html#extension-scripts">extension scripts</a>
+section for details on how extension scripts work.</p>
+
+<p>Extension script files must live in the RuntimeData directory that comes
+with SourceGen, or in the directory where the project file lives.  This
+is mostly to keep things manageable when projects are distributed to
+other people, but also acts as a minor security check, to prevent a
+wayward project from trying to open files it shouldn't.</p>
+<p>Click one of the "Add Scripts" buttons to include one more scripts in
+the project.  The "Add Scripts from Runtime" button sets the directory
+to the SourceGen RuntimeData directory, while "Add Scripts from Project"
+starts in the project directory.  If you haven't yet saved the project,
+the latter button will be disabled.  The only difference between the
+buttons is the initial directory.</p>
+<p>In the list, files loaded from the RuntimeData directory will be
+prefixed with <code>RT:</code>.  Files loaded from the project directory
+will be prefixed with <code>PROJ:</code>.</p>
+<p>If an extension script file can't be found when the project is opened,
+you will receive a warning.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,159 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Tools - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Tools</h1>
+
+<h2><a name="instruction-chart">Instruction Chart</a></h2>
+
+<p>This opens a window with a summary of all 256 opcodes.  The CPU can
+be chosen from the pop-up list at the bottom.  Undocumented opcodes for
+6502/65C02 are shown in italics, and can be excluded from the list
+by unchecking the box at the bottom.</p>
+<p>The status flags affected by each instruction reflect their behavior
+on the 65816.  The only significant difference between 65816 and
+6502/65C02 is the way the BRK instruction affects the D and B/X flags.</p>
+
+
+<h2><a name="ascii-chart">ASCII Chart</a></h2>
+
+<p>This opens a window with the ASCII character set.  Each character is
+displayed next to its numeric value in decimal and hexadecimal.  The
+pop-up list at the bottom allows you to flip between standard and "high"
+ASCII.</p>
+
+
+<h2><a name="apple2-screen-chart">Apple II Screen Chart</a></h2>
+
+<p>The Apple II text and hi-res screens are mapped to memory in a way
+that makes sense to computers but is a little confusing for humans.  This
+chart maps line numbers to addresses and vice-versa.  Select different
+screens and sort orders from the list at the bottom.</p>
+
+
+<h2><a name="hexdump">Hex Dump Viewer</a></h2>
+
+<p>You can use this to view the contents of the project data file
+by double-clicking the "bytes" column, or with Actions &gt; Show Hex Dump.
+The viewer is displayed in a "modeless" dialog that does not
+prevent you from continuing to work with the project.  If you
+double-click a different line in the project, the viewer will automatically
+highlight those bytes.</p>
+
+<p>You can also use this to view the contents of arbitrary files by
+using Tools &gt; Hex Dump.  There is no fixed limit on the number of
+viewers you can have open simultaneously.  (Be aware that the viewer
+currently loads the entire file into memory, and you will run out of room
+eventually.  Not coincidentally, the viewer has a size limit of 16MiB
+per file.)</p>
+
+<p>You can select lines with the mouse as you would in any other list
+view.  Ctrl+A selects all lines.  Ctrl+C copies the selected lines to
+the system clipboard.</p>
+
+<p>The "character conversion" selector allows you to choose how the
+bytes are converted to characters for the Text column.  Choose from
+the usual set of encodings.</p>
+
+<p>If "ASCII-only dump" is not checked, non-printable bytes are shown in
+the ASCII dump as a middle dot ('&#183;').  If the box is checked,
+non-printable bytes are represented by a period ('.') instead.  The
+use of non-ASCII characters makes the dump unambiguous when unprintable
+characters are mixed with periods, but the lines may be unsuitable for
+pasting in some forums.</p>
+
+<p>If "always on top" is checked, the window will stay above all other
+windows that don't also declare that they should always be on top.  By
+default this box is checked when displaying project data, and not checked for
+external files.</p>
+
+
+<h2><a name="file-concat">File Concatenator</a></h2>
+
+<p>The File Concatenator combines multiple files into a single file.
+Select the files to add, arrange them in the proper order, then hit
+"Save".  CRC-32 values are shown for reference.</p>
+
+
+<h2><a name="file-slicer">File Slicer</a></h2>
+
+<p>The File Slicer allows you to "slice" a piece out of a file, saving
+it to a new file.  Specify the start and length in decimal or hex.  If
+you leave a field blank, they will default to offset 0 and the remaining
+length of the file, respectively.</p>
+<p>The hex dumps show the area just before and after the chunk to be
+sliced, allowing you to confirm the placement.</p>
+
+
+<h2><a name="omf-converter">OMF Converter</a></h2>
+
+<p>This tool allows you to view Apple IIgs Object Module Format (OMF)
+executables, and convert them for disassembly.</p>
+
+<p>OMF executables have multiple segments with relocatable code.  References
+to addresses aren't filled in until the program is loaded into memory,
+which makes it difficult to disassemble the file.  The conversion tool
+loads the OMF file in roughly the same way the GS/OS System Loader would,
+placing each segment at the start of a bank unless otherwise directed.
+The loaded image is saved to a new file, and a SourceGen project file is
+created with some basic attributes filled in.</p>
+
+<p>Only "Load" files (S16, PIF, TOL, etc) may be converted.  Compiler object
+files and libraries contain references that must be resolved by
+a IIgs linker, and are not supported.</p>
+
+<p>Before you can examine or convert a file, you must first extract
+it from the Apple II disk image, using a mode that does not modify the
+original (e.g. extract with "configure to preserve Apple II formats"
+in CiderPress).  Then, open it with the "Tools &gt; Convert OMF".</p>
+
+<p>The initial view shows all of the OMF segments in the file.  Double-clicking
+on an entry opens a detailed view that shows the segment header and a
+list of all the OMF records.  For load files, the relocation dictionary is
+also shown.</p>
+
+<p>To convert the file, click "Generate" to create a modified binary and a
+SourceGen project file.</p>
+
+<p>If "offset segment start by $0100" is checked, the converter will try
+to shift the segment's load address from <code>$xx/0000</code> to
+<code>$xx/0100</code>.  This can make the generated code a little nicer
+to work with because it removes potential ambiguity with direct page
+addresses.  For example, <code>LDA $56</code> and <code>LDA $0056</code>
+may be interpreted as the same thing by the assembler, requiring
+generation of operand width disambiguators.  By shifting the initial
+address we avoid the potential ambiguity.</p>
+<p>Check "add comments and notes for each segment" to add a long comment
+and a note at the start of each segment.  The comments include the
+segment name, type, and optional flags.  The notes just provide a quick
+way to jump to a segment.</p>
+
+<p>The binary generated by the tool is not in OMF format and will not
+execute on an Apple IIgs.  To be functional, the generated sources must be
+assembled by a program capable of generating OMF output, such as Merlin.</p>
+
+<p>The <a href="advanced.html#reloc-data">relocation dictionaries</a> from
+the executable are included in the project file, and can be used to guide
+the disassembler's analysis.  The "use reloc data" setting in the project
+properties controls this feature.</p>
+
+<p>A full explanation of the structure of OMF is beyond the scope of this
+manual.  For more information on OMF, see Appendix F of the GS/OS Reference
+Manual.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,25 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Tutorials - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Tutorials</h1>
+
+<p><strong>NOTE:</strong> this tutorial has been replaced by
+content on the 6502bench web site.  Visit
+<a href="https://6502bench.com/sgtutorial/">https://6502bench.com/sgtutorial/</a>.</p>
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2018 faddenSoft -->
+</html>
@@ -0,0 +1,315 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="main.css" rel="stylesheet" type="text/css" />
+<title>Visualizations - 6502bench SourceGen</title>
+</head>
+
+<body>
+<div id="content">
+<h1>6502bench SourceGen: Visualizations</h1>
+<p><a href="index.html">Back to index</a></p>
+
+<h2><a name="overview">Overview</a></h2>
+
+<p>Programs are generally a combination of code and data.  Sometimes
+the data is graphical in nature, e.g. a bitmap used as a font or
+game sprite.  Being able to see the data in graphic form can make it
+easier to determine the purpose of associated code.</p>
+<p>While modern systems use GIF, JPEG, and PNG to hold 2D bitmaps,
+graphical elements embedded in 6502 applications are almost always
+in a platform-specific form.  For this reason, the task of generating
+images from data is performed by
+<a href="advanced.html#extension-scripts">extension scripts</a>.  Some
+scripts for common formats are included in the SourceGen runtime directory.
+If these don't do what you need, you can write your own scripts and
+include them in your project.</p>
+<p>The project file doesn't store the converted graphics.  Instead, the
+project file holds a string that identifies the converter, and a list of
+parameters that are passed to the converter.  Images are generated when
+the project is first opened, and updated when certain things change in
+the project.</p>
+<p>Visualizations are not included in generated assembly output.  They
+may be included in HTML exports.</p>
+<p>Because visualizations are associated with a specific file offset,
+they will become "hidden" if the offset isn't at the start of a line,
+e.g. it's in the middle of a multi-byte instruction or data item.  The
+editors will try to prevent you from doing this.</p>
+<p>Bitmaps will always be scaled up as much as possible to make them
+easy to see.  This means that small shapes and large shapes may appears
+to be the same size when displayed as thumbnails in the code list.</p>
+<p>The role of a visualization generator is to take a collection of input
+parameters and generate graphical data.  It's most useful for graphical
+sources like bitmaps, but it's not limited to that.  You could, for example,
+write a script that generates random flowers, and use it to make your
+source listings more cheerful.</p>
+
+
+<h2><a name="vis-and-sets">Visualizations and Visualization Sets</a></h2>
+
+<p>Visualizations are essentially decorative: they do not affect the
+assembled output, and do not change how code is analyzed.  They are
+contained in sets that are placed at arbitrary offsets.  Each set can
+contain multiple items.  For example, if a file has data for
+10 bitmaps, you can place a visualization near each, or create a single
+visualization set with all 10 items and put it at the start of the file.
+You can display a visualization near the data or near the instructions
+that perform the drawing.  Or both.</p>
+
+<p>To create a visualization set, select a code or data line, and use
+Actions &gt; Create/Edit Visualization Set.  To edit a visualization set,
+select it and use the same menu item, or just double-click on it.  This
+opens the Visualization Set Editor window.</p>
+
+<p>The visualization set editor shows a list of visualizations associated
+with the selected file offset.  You can create a new visualization, edit
+or remove an existing entry, or rearrange them.
+If you select "New Bitmap" or edit an existing bitmap entry, the
+Bitmap Visualization Editor window will open.
+Similarly, if you select "New Bitmap Animation" or edit an existing
+bitmap animation, the Bitmap Animation Editor will open.</p>
+
+<h4>Visualization Editor</h4>
+
+<p>The combo box at the top of the screen lists every visualization
+generator defined by an active extension script.  Select the one that is
+appropriate for the data you're trying to visualize.  Every visualizer may
+have different parameters, so as you select different entries the set of
+input parameters below the preview window may change.</p>
+<p>There are two categorizes of visualization generator: bitmap, and
+wireframe.  Bitmaps are simple 2D images, but wireframes are 2D or 3D
+meshes that can be viewed from different angles.  When you select a
+wireframe generator, additional view controls will be added at the bottom.
+(See below.)</p>
+
+<p>The "tag" is a unique string that will be shown in the display list.
+This is not a label, and may contain any characters you want (but leading
+and trailing whitespace will be trimmed).  The only requirement is that
+it be unique across all visualizations (bitmaps, animations, etc).</p>
+<p>The preview window shows the visualizer output.  The generated image is
+expanded to fill the window, so small bitmaps will be shown with very
+large pixels.
+If you resize the editor window, the preview window will expand, which
+can make it easier to see detail on larger images.
+If the generator fails, the preview window will show a red 'X', and an
+error message will appear below it.</p>
+<p>Parameters may be numeric or boolean.  The latter use a simple checkbox,
+the former a text entry field that accepts decimal and hexadecimal values.
+The range of allowable values is shown to the right of the entry field.
+If you enter an invalid value, the parameter description will turn red.</p>
+
+<p>The "Export" button at the top right can be used to save a copy of
+the bitmap or wireframe rendering with the current parameters.</p>
+
+<h5>Wireframe View Controls</h5>
+
+<p>The wireframe generator may offer the choice of perspective vs.
+orthographic projection, and whether or not to enable backface
+culling.  These are declared in the visualization generator script,
+but implemented in the viewer.  If the generator doesn't
+declare them, the default is to render with a perspective projection
+and without culling.</p>
+<p>The viewer allows you to rotate the image about the X, Y, and Z
+axes.  The viewer provides a left-handed coordinate system,
+with +X toward the right, +Y toward the top of the screen, and +Z
+going into the screen.  The object will be placed a short distance
+down the Z axis and scaled to fit the window.
+Positive rotations cause a counter-clockwise rotation when the axis
+about which rotations are performed points toward the viewer.  The
+rotations are performed with a matrix using Euler angles, and are
+subject to gimbal lock (e.g. if you set Y to 90 degrees, X and Z rotate
+about the same axis).</p>
+<p>If you check the "Animated" box, you can add a simple spin.  Choose
+the number of degrees to rotate per frame, how many frames to generate before
+resetting, and the delay between each frame.  Clicking the "Auto" button
+will automatically select the number of frames needed to display the
+animation in an unbroken loop (useful for animated GIFs).  Click
+the "Test Animation" button to see it in action.</p>
+
+<h4>Bitmap Animation Editor</h4>
+
+<p>Bitmap animations allow you to create a simple animation from a
+collection of other visualizations.  This can be useful when a program
+stores animated graphics as a series of frames.</p>
+<p>The "tag" is a unique string that will be shown in the display list.
+The same rules apply as for bitmap visualizations.</p>
+<p>The list at the top left holds all visualizations.  Select items on
+the left and use the "Add" button to add them to the list on the right,
+which has the set that is included in the animation.  You can reorder
+the list with the up/down buttons.  Adding the same frame multiple times
+is allowed.</p>
+<p>The "frame delay" field lets you specify how long each frame is shown
+on screen, in milliseconds.  Some animation formats may use a different
+time resolution; for example, animated GIFs use units of 1/100th of a
+second.  The closest value will be used.  Note also that some viewers
+(notably web browsers) will cap the update rate.</p>
+<p>When you have one or more frames in the animation list, you can preview
+the result in the window at the bottom.  The actual appearance may be
+slightly different, especially if the frames are different sizes.  For
+example, the preview window scales individual frames, but animated GIFs
+will be scaled to the size of the largest frame.</p>
+
+
+<h2><a name="runtime">Scripts Included with SourceGen</a></h2>
+
+<p>A number of visualization generation scripts are included with
+SourceGen, in the platform-specific runtime data directories.</p>
+
+<p>Most generators will take the file offset, bitmap width, and bitmap
+height as parameters.  Offsets are handled as they are elsewhere, i.e.
+always in hexadecimal, with a leading '+'.
+Some less-common parameters include:</p>
+<ul>
+  <li><b>Column stride</b> - number of bytes used to hold a column.
+    This is uncommon, but could be used if (say) a pair of bitmaps
+    was stored with interleaved bytes.  If you set this to zero the
+    visualizer will default to no interleave (col_stride = 1).</li>
+  <li><b>Row stride</b> - number of bytes between the start of each
+    row.  This is used when a row has padding on the end, e.g. a
+    bitmap that's 7 bytes wide might be padded to 8 for easy indexing,
+    or when bitmap data is interleaved.  If you set this to zero the
+    visualizer will default to no padding
+    (row_stride = width * column_stride).</li>
+  <li><b>Cell stride</b> - for multi-bitmap data like a font or sprite
+    sheet, this determines the number of bytes between the start of
+    one item and the next.  If set to zero a "dense" arrangement is
+    assumed (cell_stride = row_stride * item_height).</li>
+</ul>
+
+<p>Remember that this is a disassembler, not an image converter.  The
+results do not need to be perfectly accurate to be useful when disassembling
+code.</p>
+
+
+<h3>Apple II - Apple/VisHiRes and Apple/VisShapeTable</h3>
+
+<p>There is no standard format for small hi-res bitmaps, but certain
+arrangements are common.  The VisHiRes script defines four generators:</p>
+
+<ul>
+  <li><b>Hi-Res Bitmap</b> - converts an MxN row-major bitmap.</li>
+  <li><b>Hi-Res Sprite Sheet</b> - converts a series of bitmaps and
+    renders them in a grid.  Useful for games that use cell
+    animation.  The generated bitmap has a 1-pixel transparent gap
+    between elements.</li>
+  <li><b>Hi-Res Bitmap Font</b> - a simplified version of the
+    Sprite Sheet, intended for the common 7x8 monochrome fonts.
+    Most fonts have 96 or 128 glyphs, though some drop the last
+    character.
+    (This also works for Apple /// fonts, but currently ignores
+    the high bit in each byte.)</li>
+  <li><b>Hi-Res Screen Image</b> - used for 8KiB screen images.  The
+    data is linearized and converted to a 280x192 bitmap.  Because
+    images are relatively large, the generator does not require them
+    to be contiguous in the file, i.e. two halves of the image can be
+    in different parts of the file so long as they end up contiguous
+    in memory.</li>
+</ul>
+
+<p>Widths are specified in bytes, not pixels.  Each byte represents 7
+pixels (with some hand-waving).</p>
+
+<p>In addition to offset, dimensions, and stride values, the bitmap
+converter has a checkbox for monochrome or color, and two checkboxes
+that affect the color.  The first causes the first byte to be treated
+as being in an odd column rather than an even one, which affects
+green vs. purple and orange vs. blue.  The second flips the high bits
+on every byte, switching green vs. orange and purple vs. blue.
+Neither has any effect on black &amp; white bitmaps.</p>
+<p>The converter generates one output pixel for every source pixel, so
+half-pixel shifts are not represented.</p>
+
+<p>The VisShapeTable script renders Applesoft shape tables, which can
+have multiple vector shapes.  The only parameter other than the offset
+is the shape number.</p>
+
+
+<h3>Atari 2600 - Atari/Vis2600</h3>
+
+<p>The Atari 2600 graphics system has registers that determine the
+appearance of a sprite or playfield on a single row.  The register
+values are typically changed as the screen is drawn to get different
+data on successive rows.  The visualization generator doesn't attempt
+to emulate this behavior, but works well for data stored in a
+straightforward fashion.</p>
+
+<ul>
+  <li><b>Sprite</b> - basic 1xN sprite, converted to an image 8 pixels
+    wide.  Square pixels are assumed.</li>
+  <li><b>Playfield</b> - assumes PF0,PF1,PF2 are stored in that order,
+    multiple entries following each other.  Specify the number of
+    3-byte entries as the height.
+    Since most playfields aren't the full height of the screen,
+    it will tend to look squashed.  Use the "row thickness" feature
+    to repeat each row N times to adjust the proportions.
+    The "Reflected" checkbox determines whether the right-side image is
+    repeated as-is or flipped.</li>
+</ul>
+
+<h3>Atari Arcade - Atari/VisAVG </h3>
+
+<p>Different versions of Atari's Analog Vector Graphics were used in
+several games, notably Battlezone, Tempest, and Star Wars.  The commands
+drove a vector display monitor.  SourceGen visualizes them as 2D
+wireframes, which isn't a perfect fit since they can describe points as
+well as lines, but works fine for annotating a disassembly.</p>
+<p>The visualizer takes two arguments: the offset of the start of
+the commands to visualize, and the base address of vector RAM.  The latter
+is necessary to convert AVG JMP/JSR commands into offsets.</p>
+
+<h3>Commodore 64 - Commodore/VisC64</h3>
+
+<p>The Commodore 64 has a 64-bit sprite format defined by the hardware.
+It comes in two basic varieties:</p>
+<ul>
+  <li><b>High-resolution sprite</b> - 24x21 monochrome.  Pixels are either
+    colored or transparent.</li>
+  <li><b>Multi-color sprite</b> - 12x21 3-color.  The width of each pixel
+    is doubled to make it 24x21.
+</ul>
+<p>Sprites can be doubled in width and/or height.</p>
+<p>Colors come from a hardware-defined palette of 16:</p>
+<ol start="0" style="columns:2; -webkit-columns:2; -moz-columns:2;">
+  <li><span style="color:#ffffff;background-color:#000000">&nbsp;black&nbsp;</span></li>
+  <li><span style="color:#000000;background-color:#ffffff">&nbsp;white&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#67372b">&nbsp;red&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#70a4b2">&nbsp;cyan&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#6f3d86">&nbsp;purple&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#588d43">&nbsp;green&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#352879">&nbsp;blue&nbsp;</span></li>
+  <li><span style="color:#000000;background-color:#b8c76f">&nbsp;yellow&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#6f4f25">&nbsp;orange&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#433900">&nbsp;brown&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#9a6759">&nbsp;light red&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#444444">&nbsp;dark grey&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#6c6c6c">&nbsp;grey&nbsp;</span></li>
+  <li><span style="color:#000000;background-color:#9ad284">&nbsp;light green&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#6c5eb5">&nbsp;light blue&nbsp;</span></li>
+  <li><span style="color:#ffffff;background-color:#959595">&nbsp;light grey&nbsp;</span></li>
+</ol>
+
+<p>Bear in mind that the editor scales images to their maximum size, so
+a sprite that is doubled in both width and height will look exactly like
+a sprite that is not doubled at all.</p>
+
+<h3>Nintendo Entertainment System - Nintendo/VisNES</h3>
+
+<p>NES PPU pattern tables hold 8x8 tiles with 2 bits of color per pixel.
+Converting the full collection to a reference bitmap is straightforward.
+A few color palette options are offered.</p>
+
+<p>Sprites and backgrounds are formed from collections of tiles.  In
+some cases this is straightfoward, in others it's not.  A visualization
+generator that renders a "tile grid" is available for simpler cases.</p>
+
+</div>
+
+<div id="footer">
+<p><a href="index.html">Back to index</a></p>
+</div>
+</body>
+<!-- Copyright 2019 faddenSoft -->
+</html>