Update documentation

PETSCII, ACME, and other improvements.
2025-08-05 09:25:39 +00:00 · 2019-08-18 16:42:40 -07:00
parent 037b590967
commit 62c15031fd
10 changed files with 180 additions and 92 deletions
--- a/SourceGen/RuntimeData/Help/advanced.html
+++ b/SourceGen/RuntimeData/Help/advanced.html
@@ -100,8 +100,8 @@ feature is still evolving, and the interfaces may change significantly
 in the near future.</b></p>

 <p>The current interfaces can be used to identify inline data that follows
-JSR/JSL instructions, and to format operands.  The latter can be useful for
-replacing immediate load operands with symbolic constants.</p>
+JSR, JSL, or BRK instructions, and to format operands.  The latter can be
+useful for replacing immediate load operands with symbolic constants.</p>

 <p>Scripts may be loaded from the RuntimeData directory, or from the directory
 where the project file lives.  Attempts to load them from other locations
@@ -160,8 +160,8 @@ to data files.)</p>
 be used as a malware vector, so there's no value in forcing scripts to
 execute in an isolated server process, or to jump through the other hoops
 required to really lock things down.  I do believe there's value in
-defining the API in such a way that we *could* implement full security if
-circumstances change, so I'm using app domains as a way to keep everybody
+defining the API in such a way that we <b>could</b> implement full security if
+circumstances change, so I'm using app domains as a way to keep the API
 honest.</p>


--- a/SourceGen/RuntimeData/Help/analysis.html
+++ b/SourceGen/RuntimeData/Help/analysis.html
@@ -310,7 +310,7 @@ land in the middle of an instruction are always adjusted backward to
 the instruction start.  This is necessary because labels are only visible
 if they're associated with the first (opcode) byte of an instruction.</p>

-<p>The uncategorized data analyzer tries to find ASCII strings and
+<p>The uncategorized data analyzer tries to find character strings and
 opportunities to use the ".FILL" operation.  It breaks the file into
 pieces, where contiguous regions hold nothing but data, are not split
 across a ".ORG" directive, are not interrupted by data, and do not
--- a/SourceGen/RuntimeData/Help/codegen.html
+++ b/SourceGen/RuntimeData/Help/codegen.html
@@ -117,8 +117,9 @@ SourceGen works around when generating code.</p>
 Most of them will let you group expressions with parenthesis, but that
 doesn't always help.  For example, <code>PEA label >> 8 + 1</code> is
 perfectly valid, but writing <code>PEA (label >> 8) + 1</code> will cause
-most assemblers to assume you're trying to use an alterate form of PEA
-with indirect addressing (which doesn't exist).  The code generator needs
+most assemblers to assume you're trying to use an alternate (and non-existent)
+form of <code>PEA</code> with indirect addressing, causing the assembler
+to halt with an error message.  The code generator needs
 to understand expression syntax and operator precedence to generate correct
 code, but also needs to know how to handle the corner cases.</p>

@@ -162,9 +163,20 @@ code, but also needs to know how to handle the corner cases.</p>
    strings.  However, if you use the built-in "screen" encoding, you will
    get the wrong behavior if you compile an ASCII source without the
    "--ascii" command-line flag, because it expects to convert from
-    PETSCII.  To get the behavior expected of a cross-assembler, it's
-    necessary to pass "--ascii" and explicitly define an ASCII encoding
-    for use with ASCII text strings.</li>
+    PETSCII.  To get the behavior expected of a cross-assembler, the
+    recommended approach seems to be to pass "--ascii" and explicitly
+    define an ASCII encoding for use with ASCII text strings.</li>
+</ul>
+
+<p>Notes:</p>
+<ul>
+  <li>The "default text encoding" project property determines the text
+    encoding for the entire file.  For non-ASCII projects, a small
+    encoding table is output at the top of the file.  This works for
+    C64 PETSCII and C64 screen codes, but not for high ASCII.  This is
+    done without passing "--ascii" on the command line.  If the
+    source file is converted to PETSCII, the encoding table should be
+    removed.</li>
 </ul>


@@ -194,15 +206,15 @@ code, but also needs to know how to handle the corner cases.</p>
    all references to zero-page labels.</li>
  <li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
    <code>ASR</code> instead.</li>
+  <li>Does not allow the accumulator to be specified explicitly as an
+    operand, e.g. you can't write <code>LSR A</code>.</li>
+  <li>Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
+    before 8-bit operands.</li>
  <li>Officially, the preferred file extension for ACME source code is ".a",
    but this is already used on UNIX systems for static libraries (which
    means shell filename completion tends to ignore them).  Since ".S" is
    pretty universally recognized as assembly source, code generated by
    SourceGen for ACME also uses ".S".</li>
-  <li>Does not allow the accumulator to be specified explicitly as an
-    operand, e.g. you can't write <code>LSR A</code>.</li>
-  <li>Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
-    before 8-bit operands.</li>
 </ul>


--- a/SourceGen/RuntimeData/Help/editors.html
+++ b/SourceGen/RuntimeData/Help/editors.html
@@ -53,8 +53,10 @@ be imported by other projects (see

 <h2><a name="operand">Edit Instruction Operand</a></h2>
 <p>Operands can be displayed in a variety of numeric formats, or as a
-symbol.  The ASCII character format is only available for operands
-whose value falls into the range of low- or high-ASCII characters.</p>
+symbol.  The character formats are only available for operands
+whose value falls into the proper range.  The ASCII format handles
+both plain and high ASCII; the correct encoding is chosen based on
+the character data.</p>
 <p>Symbols may be used in their entirety, or shifted and masked.
 The low / high / bank selector determines which byte is used as the
 low byte.  For 16-bit operands, this acts as a shift rather than a byte
@@ -87,14 +89,14 @@ be used twice, adjusted if necessary.  (This may be addressed in a
 future release.)</p>


-<h2><a name="data">Edit Data Format</a></h2>
+<h2><a name="data">Edit Data Operand</a></h2>
 <p>This dialog offers a variety of choices, and can be used to apply a
-format to a range of offsets.  You must select all of the bytes you want
+format to multiple lines.  You must select all of the bytes you want
 to format.  For example, to format two bytes as a 16-bit word, you must
 select both bytes in the editor.  (If you click on the first item, then
 Shift+double-click on the operand field of the last item, you can do
 this very quickly.)  The selection does not need to be contiguous: you
-can use Control+click to select scattered items.)</p>
+can use Control+click to select scattered items.</p>
 <p>If the range is discontiguous, or crosses a visual boundary
 such as a change in address, a user-specified label, or a long comment
 or note, the selection will be split into smaller regions.  A message at the
@@ -125,11 +127,18 @@ same value.</p>

 <p>The "String" items are enabled or disabled depending on whether the
 data you have selected is in the appropriate format.  For example,
-"Null-terminated strings" is only enabled if the data is exclusively
-ASCII characters followed by $00.  Zero-length strings are allowed,
-but only if some non-zero-length strings are present.</p>
-<p>Standard ASCII and "high ASCII" are both supported, but strings must
-be composed exclusively of one or the other.</p>
+"Null-terminated strings" is only enabled if the data regions are
+composed entirely of characters followed by $00.  Zero-length strings
+are allowed, but only if some non-zero-length strings are present.</p>
+<p>The character encoding can be selected, offering a choice between
+plain ASCII, low + high ASCII, C64 PETSCII, and C64 screen codes.  When
+you change the encoding, your available options may change.  The
+low + high ASCII setting will accept both, configuring the appropriate
+encoding based on the data values, but when identifying multiple strings
+it requires that each individual string be entirely one or the other.</p>
+<p>Due to fundamental limitations of the character sets, C64 PETSCII
+does not support DCI, and C64 screen code strings cannot be null
+terminated.</p>

 <p>To avoid burying a label in the middle of a data item, contiguous areas
 are split at labels.  This can sometimes have unexpected effects.  For
@@ -165,9 +174,6 @@ CODE     LDA     LABEL+2


 <h2><a name="comment">Edit Comment</a></h2>
-<p>NOTE: the shortcut for Edit Comment is Ctrl+; (semicolon).  For some
-reason Windows likes to show this as "Ctrl+Oem1".</p>
-
 <p>Enter an end-of-line (EOL) comment, or leave the text field blank to
 delete it.  EOL comments may be placed on instruction and data lines, but
 not on assembler directives.</p>
@@ -179,9 +185,9 @@ If this isn't a concern, you can enter any characters you like.</p>
 want to limit the overall length if you're hoping to create 80-column
 output.  Some retro assemblers may have hard line length limitations,
 which could result in the comment being truncated in generated sources.</p>
-<p>A semicolon (';') is placed at the start of the line.  If an assembler
+<p>A semicolon (';') is placed at the start of the comment.  If an assembler
 has different conventions, a different delimiter character may be used.  You
-don't need to include a semicolon in the comment field.</p>
+don't need to include a delimiter explicitly in the comment field.</p>

 <p>Comments on platform symbols are read from the platform symbol file, and
 cannot be edited from within SourceGen.  Comments on project symbols are
@@ -238,7 +244,7 @@ to the note in the code list and in the "Notes" window.</p>
 <a href="intro.html#about-symbols">All About Symbols</a>), and must
 not have the same name as another project symbol.  It can overlap
 with platform symbols and user labels.</p>
-<p>The value may be entered in decimal, hexadecimal, or binary.  The
+<p>The value may be entered in decimal, hexadecimal, or binary.  The numeric
 base you choose will be remembered, so that the value will be displayed
 the same way when used in a .EQ directive.</p>
 <p>If you enter a comment, it will be placed at the end of the line of
--- a/SourceGen/RuntimeData/Help/index.html
+++ b/SourceGen/RuntimeData/Help/index.html
@@ -11,31 +11,34 @@
 <body>
 <div id="content">
 <h1>6502bench SourceGen</h1>
-<p>SourceGen is an interactive disassembler for 6502, 65C02, 65802,
+<p>SourceGen is an interactive disassembler for 6502, 65C02,
 and 65816 code.  The official web site is
 <a href="https://6502bench.com/">https://6502bench.com/</a>.</p>

 <p>If you want to dive right in, try the
-<a href="tutorials.html">Tutorials</a>.</p>
+<a href="tutorials.html">tutorials</a>.</p>

 <h2>Contents</h2>
 <ul>
  <li><a href="intro.html">Overview</a>
  <ul>
    <li><a href="intro.html#fundamental-concepts">Fundamental Concepts</a></li>
-    <li><a href="intro.html#begin">About 6502 Code</a></li>
+    <li><a href="intro.html#begin">About 6502 Code</a>
+      <ul>
+        <li><a href="intro.html#charenc">Character Encoding</a></li>
+      </ul></li>
    <li><a href="intro.html#sgintro">How SourceGen Works</a>
-    <ul>
-      <li><a href="intro.html#scripts">Extension Scripts</a></li>
-      <li><a href="intro.html#hints">Analyzer Hints</a></li>
-    </ul></li>
+      <ul>
+        <li><a href="intro.html#scripts">Extension Scripts</a></li>
+        <li><a href="intro.html#hints">Analyzer Hints</a></li>
+      </ul></li>
    <li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
    <li><a href="intro.html#about-symbols">All About Symbols</a>
-    <ul>
-      <li><a href="intro.html#weak-refs">Weak References</a></li>
-      <li><a href="intro.html#symbol-parts">Parts and Adjustments</a></li>
-      <li><a href="intro.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
-    </ul></li>
+      <ul>
+        <li><a href="intro.html#weak-refs">Weak References</a></li>
+        <li><a href="intro.html#symbol-parts">Parts and Adjustments</a></li>
+        <li><a href="intro.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
+      </ul></li>
    <li><a href="intro.html#width-disambiguation">Width Disambiguation</a></li>
    <li><a href="intro.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
  </ul></li>
@@ -89,6 +92,7 @@ and 65816 code.  The official web site is
    <li><a href="codegen.html#quirks">Assembler-Specific Bugs &amp; Quirks</a>
    <ul>
      <li><a href="codegen.html#64tass">64tass</a></li>
+      <li><a href="codegen.html#acme">ACME</a></li>
      <li><a href="codegen.html#cc65">cc65</a></li>
      <li><a href="codegen.html#merlin32">Merlin 32</a></li>
    </ul></li>
@@ -99,6 +103,7 @@ and 65816 code.  The official web site is
    <li><a href="settings.html#app-settings">Application Settings</a>
    <ul>
      <li><a href="settings.html#appset-codeview">Code View</a></li>
+      <li><a href="settings.html#appset-textdelim">Text Delimiters</a></li>
      <li><a href="settings.html#appset-asmconfig">Asm Config</a></li>
      <li><a href="settings.html#appset-displayformat">Display Format</a></li>
      <li><a href="settings.html#appset-pseudoop">Pseudo-Op</a></li>
--- a/SourceGen/RuntimeData/Help/intro.html
+++ b/SourceGen/RuntimeData/Help/intro.html
@@ -21,7 +21,7 @@ assembly-language source.</p>
 <p>SourceGen has two purposes.  The first is to be a really nice
 disassembler for the 6502 and related CPUs.  Code tracing with status
 flag tracking makes it easier to separate the code from the data,
-automatic formatting of ASCII strings and filled-data areas helps
+automatic formatting of character strings and filled-data areas helps
 get the data regions sorted out, and modern IDE-style features like
 cross-reference generation and color-highlighted bookmarks help
 navigate the code while trying to figure out what it does.  A
@@ -86,7 +86,7 @@ Variables and constants may be single-byte or multi-byte, the latter
 typically in little-endian byte order.</p>

 <p>Much of the data in a typical program is read-only, often in the
-form of graphics or ASCII string data.  Graphics can be difficult
+form of graphics or character string data.  Graphics can be difficult
 to recognize automatically, but strings can be identified with a
 reasonable degree of confidence.  Address tables, which are a collection
 of addresses to other things, are also fairly common.</p>
@@ -144,6 +144,34 @@ instructions.  If you don't know what state the flags are in, you can't
 know whether <code>LDA #value</code> is two bytes or three, and the
 disassembly of the instruction stream will come out wrong.</p>

+<h3><a name="charenc">Character Encoding</a></h3>
+
+<p>The American Standard Code for Information Interchange (ASCII) was
+developed in the 1960s, and became widely used as the means for representing
+text data on a computer.  It's compatible with Unicode, in that the
+binary representation of an ASCII string is exactly the same when
+expressed as a Unicode string with UTF-8 encoding.</p>
+<p>Not all 6502-based computers used ASCII, notably those from Commodore
+International (e.g. PET, VIC-20, 64, 128), which used variants
+collectively known as "PETSCII".  PETSCII had most of the same symbols,
+but rearranged them, and added a number of graphical symbols.  This was
+further complicated by the use of two different character sets, one of
+which dropped lower-case letters in favor of additional symbols, and
+the use of a separate encoding for characters stored in the text frame
+buffer ("screen codes").</p>
+<p>Apple II computers were based on ASCII, but tended to store bytes
+with the high bit set rather than clear.  This is known as "high ASCII".</p>
+
+<p>SourceGen allows you to specify that a string is encoded with ASCII,
+High ASCII, C64 PETSCII, or C64 Screen Codes.  Because the goal is to
+generate assembly sources for cross-assemblers, the C64 character
+support is limited to the set that overlaps with ASCII.</p>
+<p>For the most part only printable characters are accepted in strings,
+but certain control characters are also allowed.  The characters for
+bell ($07), linefeed ($0a), and carriage return ($0d) are recognized as
+string data, and in C64 PETSCII a number of text color and formatting
+control codes are also allowed.</p>
+

 <h2><a name="sgintro">How SourceGen Works</a></h2>

@@ -184,8 +212,8 @@ executed by SourceGen.  They can be added to a project from SourceGen's
 runtime data directory, or can live in the directory next to the project
 file.</p>
 <p>In the current implementation, scripts are only called to examine
-JSR/JSL instructions.  They can format nearby bytes as inline data, or
-apply symbols to operands.</p>
+JSR, JSL, and BRK instructions.  They can format nearby bytes as inline
+data, or apply symbols to operands.</p>

 <p>To reduce the chances of a script causing problems, all scripts are
 executed in a sandbox with severely restricted access.  Notably, nothing
@@ -196,6 +224,8 @@ contains all of the compiled script DLLs, as well as two pre-built
 application DLLs that plugins are allowed access to.  The contents
 are persistent, to avoid recompiling the scripts every time SourceGen
 is launched, but may be manually deleted without harm.</p>
+<p>More details can be found in the
+<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>


 <h3><a name="hints">Analyzer Hints</a></h3>
@@ -690,12 +720,12 @@ represent hundreds of bytes and span multiple lines.</p>
  <li>.BULK - data packed in as compact a form as the assembler allows.
    Useful for chunks of graphics data.</li>
  <li>.FILL - a series of identical bytes.  The operand
-    is the byte count, followed by the byte value.</li>
+    has two parts, the byte count followed by the byte value.</li>
 </ul>

 <p>In addition, several pseudo-ops are defined for string constants:</p>
 <ul>
-  <li>.STR - basic ASCII string.</li>
+  <li>.STR - basic character string.</li>
  <li>.RSTR - string in reverse order.</li>
  <li>.ZSTR - null-terminated string.</li>
  <li>.DSTR - Dextral Character Inverted string.  The high bit of the
@@ -704,12 +734,17 @@ represent hundreds of bytes and span multiple lines.</p>
  <li>.L2STR - string prefixed with a length word.</li>
 </ul>

-<p>If the characters have their high bits set -- commonly referred to
-as "high ASCII" -- an upward arrow will be added to the pseudo-op.  How
-these strings are generated into assembly source varies.</p>
-<p>You can configure these to look more like what your favorite assembler
-uses in the
-<a href="settings.html#app-settings">application settings</a>.</p>
+<p>You can configure the pseudo-operands to look more like what your
+favorite assembler uses in the
+<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
+application settings.</p>
+
+<p>String constants start and end with delimiter characters, typically
+single or double quotes.  You can configure the delimiters differently
+for each character encoding, so that it's obvious whether the text is
+in ASCII or PETSCII.  See the
+<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
+the application settings.</p>


 </div>
--- a/SourceGen/RuntimeData/Help/mainwin.html
+++ b/SourceGen/RuntimeData/Help/mainwin.html
@@ -147,7 +147,7 @@ to select an item, ctrl-left-click to toggle individual items on and
 off, and shift-left-click to select a range.  You can select all lines
 with Edit &gt; Select All.  Resize columns by
 left-clicking on the divider in the header and dragging it.</p>
-<p>Multi-line items, such as long comments or ASCII strings, are
+<p>Multi-line items, such as long comments or character strings, are
 selected as a whole when any part is selected.</p>

 <p>Right-clicking opens a menu.  The contents are the same as those in
@@ -309,7 +309,7 @@ the Remove Hints menu item will remove hints from every byte.</p>

 <p>If you're having a hard time selecting just the right bytes because
 the instructions are caught up in a multi-byte data item, such as an
-auto-detected ASCII string, you can disable uncategorized data analysis
+auto-detected character string, you can disable uncategorized data analysis
 (the thing that creates the .STR and .FILL ops for you).  You can do this
 from the
 <a href="settings.html#project-properties">project properties</a> editor,
@@ -442,7 +442,7 @@ what splits a region), or is the last byte in the file.</p>

 <p>This menu item is in the Edit menu, and acts as a shortcut to opening
 the Project Properties editor, and clicking on the "Analyze Uncategorized
-Data" checkbox.  When enabled, SourceGen will look for ASCII strings and
+Data" checkbox.  When enabled, SourceGen will look for character strings and
 regions of identical bytes, and generate .STR and .FILL directives.  When
 disabled, uncategorized data is presented as one byte per line, which can
 be handy if you're trying to get at a byte in the middle of a string.</p>
--- a/SourceGen/RuntimeData/Help/settings.html
+++ b/SourceGen/RuntimeData/Help/settings.html
@@ -23,14 +23,14 @@ project properties.</p>

 <p>Application settings are stored in a file called "SourceGen-settings"
 in the SourceGen installation directory.  If the file is missing or
-corrupted, some default settings will be used.  These settings are local
+corrupted, default settings will be used.  These settings are local
 to your system, and include everything from window sizes to whether or not
 you prefer hexadecimal values to be shown in upper case.  None of them
 affect the way the project analyzes code and data, though they may affect
 the way generated assembly sources look.</p>

-<p>The settings editor is divided into four tabs.  Changes aren't pushed
-out to the main application until you hit Apply or OK.</p>
+<p>The settings editor is divided into four tabs.  Changes don't take
+effect until you hit Apply or OK.</p>


 <h3><a name="appset-codeview">Code View</a></h3>
@@ -40,18 +40,18 @@ out to the main application until you hit Apply or OK.</p>
 <p>Click the Column Visibility buttons to hide columns.  Click them
 again to restore the column to a width appropriate for the current font.
 A "hidden" column just has a width of zero, so with careful mouse
-positioning you can show and hide columns from the code list.  The buttons
-may be more convenient though.</p>
+positioning you can show and hide columns by dragging the column headers.
+The buttons may be more convenient though.</p>

-<p>You can select a different font for the code list.  Make it as large
-or small as you want.  Mono-space fonts like Courier or Consolas are
+<p>You can select a different font for the code list, and make it as large
+or as small as you want.  Mono-space fonts like Courier or Consolas are
 recommended (and will be the only ones shown).</p>

 <p>You can choose to display different parts of the display in upper or
 lower case, using the "all lower" and "all upper" buttons as a quick way
-to set all values.  These values are also used for generated assembly
-code.  Note that labels are case-sensitive and can't be forced one way
-or the other.</p>
+to set all values.  These settings are also used for generated assembly
+code, unless the assembler has specific case-sensitivity requirements.  There
+is no setting for labels, which are always case-sensitive.</p>

 <p>The Clipboard drop-down list lets you choose the format for text
 <a href="mainwin.html#clipboard">copied to the clipboard</a>.  The
@@ -65,6 +65,32 @@ to spaced (<code>20 ed fd</code>).  This also affects the way the
 "Disassembly" copy &amp; paste format looks.</p>


+<h3><a name="appset-textdelim">Text Delimiters</a></h3>
+
+<p>Character and string operands are shown surrounded by quotes, e.g.
+<code>LDA #'*'</code> or <code>.STR "Hello, world!"</code>.  It's
+handy to be able to tell at a glance how characters are encoded, so
+SourceGen allows you to set the delimiters independently for every
+supported character encoding.</p>
+<p>String operands may contain a mixture of text and hexadecimal values.
+For example, in ASCII data, the control characters for linefeed and
+carriage return ($0a and $0d) are considered part of the string, but
+don't have a printable symbol.  (Unicode defines some glpyhs, but they
+don't look very good at smaller font sizes.)</p>
+<p>If one of the delimiter characters appears in the string itself,
+the character will be output as hex to avoid confusion.  For this
+reason, it's generally wise to use delimiter characters that aren't
+part of the ASCII character set.  The "Sample Characters" box holds some
+characters that you can copy and paste (with Ctrl+C / Ctrl+V) into the
+delimiter fields.</p>
+<p>For character operands, the prefix and suffix are added to the start
+and end of the operand.  For string operands, the prefix is added to the
+start of the first line, and suffixes aren't allowed.
+<p>These options change the way the code list looks on screen.  They
+do not affect generated code, which must use the delimiter characters
+specified by the assembler.</p>
+
+
 <h3><a name="appset-asmconfig">Asm Config</a></h3>

 <p>These settings configure cross-assemblers and modify assembly source
@@ -74,17 +100,17 @@ will initially contain assembler-specific default values.  All of
 the values in the Assembler Configuration box may be configured
 differently for each assembler.</p>
 <p>The "executable" box holds the full path to the cross-assembler
-executable.
-For cc65 this is <code>bin/cl65.exe</code>,
-for Merlin32 you need <code>Merlin32.exe</code>,
-for 64tass it's <code>64tass.exe</code>.
-(On non-Windows platforms, you won't need the ".exe".)  For cc65 you need
-a full installation, with the configuration files and libraries, not just
-the cl65 binary alone.</p>
+executable.</p>
+<ul>
+  <li>64tass: <code>64tass.exe</code>
+  <li>ACME: <code>acme.exe</code>
+  <li>cc65: <code>bin/cl65.exe</code> -- full installation required,
+    with all configuration files and libraries
+  <li>Merlin 32: <code>Merlin32.exe</code>
+</ul>
 <p>The "column widths" section allows you to specify the width of the
 label, opcode, operand, and comment fields.  If the width is less than 1,
-or isn't a valid number, 1 will be used.  (Note: the comment width isn't
-used at this time.)</p>
+or isn't a valid number, 1 will be used.</p>

 <p>When "show cycle counts" is checked, every instruction line will have
 an end-of-line comment that indicates the number of cycles required for
@@ -132,7 +158,8 @@ you are most comfortable with.</p>

 <p>The "quick set" buttons configure the fields on this tab to match
 the conventions of the specified assembler.  Select your preferred assembler
-with the combox box, then click "set" to set the fields.</p>
+with the combox box, then click "set" to set the fields.  (64tass and
+ACME use the "common" expression style.)</p>


 <h3><a name="appset-pseudoop">Pseudo-Op</a></h3>
@@ -195,12 +222,18 @@ entry point hint) will use this value.  This is chiefly of value for
 65816 code, where the initial value of the M/X/E flags is significant.</p>

 <p>If "analyze uncategorized data" is checked, SourceGen will attempt to
-identify strings and regions filled with a single byte value.  If it's
+identify character strings and regions that are filled with a repeated
+value.  If it's
 not checked, anything that isn't detected as code or explicitly formatted
 will simply be shown as a byte value.</p>
 <p>If "seek nearby targets" is checked, the analyzer will try to use
 nearby labels for data loads and stores.</p>
-<p>The "minimum characters for string" setting determines how many
+<p>The "default text encoding" setting has two effects.  First, it
+specifies which character encoding to use when searching for strings in
+uncategorized data.  Second, if an assembler has a notion of preferred
+character encoding (e.g. you can default string operands to PETSCII),
+this setting will determine which encoding is preferred.</p>
+<p>The "min chars for string detection" setting determines how many
 ASCII characters need to appear consecutively for the data analyzer to
 declare it a string.  Shorter values are prone to false-positive
 identifications, longer values miss out on short strings.  You can also
@@ -253,10 +286,8 @@ you will receive a warning.</p>

 <h3><a name="projprop-extscripts">Extension Scripts</a></h3>
 <p>From here, you can add and remove extension script files.
-See the <a href="intro.html#scripts">extension scripts</a> section for
-an overview of how extension scripts work.
-There's a more detailed document in the RuntimeData directory
-("ExtensionScripts.md").</p>
+See the <a href="advanced.html#scripts">extension scripts</a> section for
+details on how extension scripts work.</p>


 <p>Extension script files must live in the RuntimeData directory that comes
--- a/SourceGen/RuntimeData/Help/tools.html
+++ b/SourceGen/RuntimeData/Help/tools.html
@@ -32,17 +32,16 @@ per file.)</p>
 view.  Ctrl+A selects all lines.  Ctrl+C copies the selected lines to
 the system clipboard.</p>

-<p>The "character conversion" drop list allows you to choose between
-basic ASCII and low/high ASCII.  The latter just means that bytes have
-their high bits stripped before a decision is made is to whether they
-should appear in the ASCII dump on the right.</p>
+<p>The "character conversion" selector allows you to choose how the
+bytes are converted to characters for the Text column.  Choose from
+the usual set of encodings.</p>

 <p>If "ASCII-only dump" is not checked, non-printable bytes are shown in
 the ASCII dump as a middle dot ('&#183;').  If the box is checked,
 non-printable bytes are represented by a period ('.') instead.  The
 use of non-ASCII characters makes the dump unambiguous when unprintable
 characters are mixed with periods, but the lines may be unsuitable for
-pasting in some situations.</p>
+pasting in some forums.</p>

 <p>If "always on top" is checked, the window will stay above all other
 windows that don't also declare that they should always be on top.  By
--- a/SourceGen/RuntimeData/Help/tutorials.html
+++ b/SourceGen/RuntimeData/Help/tutorials.html
@@ -264,15 +264,15 @@ label to "STR1".  Move up a bit and select address $2030, then scroll to
 the bottom and shift-click address $2070.  Select Actions &gt; Edit Operand.
 At the top it should now say, "65 bytes selected in 2 groups".
 There are two groups because the presence of a label split the data into
-two separate regions.  Select "mixed ASCII and non-ASCII", then click
-"OK".</p>
+two separate regions.  Select "Low or High ASCII" encoding, select the
+"mixed character and non-character" string type, then click "OK".</p>
 <p>We now have two ".STR" lines, one for "string zero  ", and one with the
 STR1 label and the rest of the string data.  This is okay, but it's not
 really what we want.  The code at $200B appears to be loading a 16-bit
 address from data at $2025, so we want to use that if we can.</p>
 <p>Select Edit &gt; Undo three times.  You should be back to the state where
-there's a single ".STR" line at the bottom, split across two lines with
-a '+'.</p>
+there's a single ".STR" line at the bottom of the file, split across two
+lines with a '+'.</p>
 <p>Select the line at $2026.  This is currently formatted as a string,
 but that appears to be incorrect, so let's format it as individual bytes
 instead.  There's an easy way to do that: use Actions &gt; Toggle Single-Byte