mirror of
https://github.com/fadden/6502bench.git
synced 2026-04-20 19:16:34 +00:00
Relocate manual
Move the SourceGen manual to a subdirectory in "docs", so that it can be accessed directly from the 6502bench web site. The place where it's installed in the distribution doesn't change.
This commit is contained in:
@@ -0,0 +1,495 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Advanced Topics - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Advanced Topics</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
|
||||
<h2><a name="platform-symbols">Platform Symbol Files (.sym65)</a></h2>
|
||||
|
||||
<p>Platform symbol files contain lists of symbols, each of which has a
|
||||
label and a value. SourceGen comes with a collection of symbols for
|
||||
popular systems, but you can create your own. This can be handy if a
|
||||
few different projects are coded against a common library.</p>
|
||||
|
||||
<p>If two symbols have the same value, the older symbol is replaced by
|
||||
the newer one. This is why the order in which symbol files are loaded
|
||||
matters.</p>
|
||||
|
||||
<p>Platform symbol files consist of comments, commands, and symbols.
|
||||
Blank lines, and lines that begin with a semicolon (';'), are ignored. Lines
|
||||
that begin with an asterisk ('*') are commands. Three are currently
|
||||
defined:</p>
|
||||
<ul>
|
||||
<li><code>*SYNOPSIS</code> - a short summary of the file contents.</li>
|
||||
<li><code>*TAG</code> - a tag string to apply to all symbols that follow
|
||||
in this file.</li>
|
||||
<li><code>*MULTI_MASK</code> - specify a mask for symbols that appear
|
||||
at multiple addresses.</li>
|
||||
</ul>
|
||||
|
||||
<p>Tags can be used by extension scripts to identify a subset of symbols.
|
||||
The symbols are still part of the global set; the tag just provides a
|
||||
way to extract a subset. Tags should be comprised of non-whitespace ASCII
|
||||
characters. Tags are global, so use a long, descriptive string. If
|
||||
<code>*TAG</code> is not followed by a string, the symbols that follow
|
||||
are treated as untagged.</p>
|
||||
|
||||
<p>All other lines are symbols, which have the form:</p>
|
||||
<pre>
|
||||
LABEL {=|@|<|>} VALUE [WIDTH] [;COMMENT]
|
||||
</pre>
|
||||
|
||||
<p>The LABEL must be at least two characters long, begin with a letter or
|
||||
underscore, and consist entirely of alphanumeric ASCII characters
|
||||
(A-Z, a-z, 0-9) and the underscore ('_'). (This is the same format
|
||||
required for line labels in SourceGen.)</p>
|
||||
<p>The next token can be one of:</p>
|
||||
<ul>
|
||||
<li>@: general addresses</li>
|
||||
<li><: read-only addresses</li>
|
||||
<li>>: write-only addresses</li>
|
||||
<li>=: constants</li>
|
||||
</ul>
|
||||
<p>If an instruction references an address, and that address is outside
|
||||
the bounds of the file, the list of address symbols (i.e. everything
|
||||
that's not a constant) will be scanned for a match.
|
||||
If found, the symbol is applied automatically. You normally want to
|
||||
use '@', but can use '<' and '>' for memory-mapped I/O locations
|
||||
that have different behavior depending on whether they are read or
|
||||
written.</p>
|
||||
|
||||
<p>The VALUE is a number in decimal, hexadecimal (with a leading '$'), or
|
||||
binary (with a leading '%'). The numeric base will be recorded and used when
|
||||
formatting the symbol in generated output, so use whichever form is most
|
||||
appropriate. Values are unsigned 24-bit numbers. The special value
|
||||
"erase" may be used for an address to erase a symbol defined in an earlier
|
||||
platform file.</p>
|
||||
|
||||
<p>The WIDTH is optional, and ignored for constants. It must be a
|
||||
decimal or hexadecimal value between 1 and 65536, inclusive. If omitted,
|
||||
the default width is 1.</p>
|
||||
|
||||
<p>The COMMENT is optional. If present, it will be saved and used as the
|
||||
end-of-line comment on the .EQ directive if the symbol is used.</p>
|
||||
|
||||
<h4>Using MULTI_MASK</h4>
|
||||
|
||||
<p>The multi-address mask is used for systems like the Atari 2600, where
|
||||
RAM, ROM, and I/O registers appear at multiple addresses. The hardware
|
||||
looks for certain address lines to be set or clear, and if the pattern
|
||||
matches, another set of bits is examined to determine which register or
|
||||
RAM address is being accessed.</p>
|
||||
<p>This is expressed in symbol files with the MULTI_MASK statement.
|
||||
Address symbol declarations that follow have the mask set applied. Symbols
|
||||
whose addresses don't fit the pattern cause a warning and will be
|
||||
ignored. Constants are not affected.</p>
|
||||
|
||||
<p>The mask set is best explained with an example. Suppose the address
|
||||
pattern for a set of registers is <code>???0 ??1? 1??x xxxx</code>
|
||||
(where '?' can be any value, 0/1 must be that value, and 'x' means the bit
|
||||
is used to determine the register).
|
||||
So any address between $0280-029F matches, as does $23C0-23DF, but
|
||||
$0480 and $1280 don't. The register number is found in the low five bits.</p>
|
||||
<p>The corresponding MULTI_MASK line, with values specifed in binary,
|
||||
would be:</p>
|
||||
<pre> *MULTI_MASK %0001001010000000 %0000001010000000 %0000000000011111</pre>
|
||||
<p>The values are CompareMask, CompareValue, and AddressMask. To
|
||||
determine if an address is in the register set, we check to see if
|
||||
<code>(address & CompareMask) == CompareValue</code>. If so, we can
|
||||
extract the register number with <code>(address & AddressMask)</code>.</p>
|
||||
|
||||
<p>We don't want to have a huge collection of equates at the top of the
|
||||
generated source file, so whatever value is used in the symbol declaration
|
||||
is considered the "canonical" value. All other matching values are output
|
||||
with an offset.</p>
|
||||
<p>All mask values must fall between 0 and $00FFFFFF. The set bits in
|
||||
CompareMask and AddressMask must not overlap, and CompareValue must not
|
||||
have any bits set that aren't also set in CompareMask.</p>
|
||||
<p>If an address can be mapped to a masked value and an unmasked value,
|
||||
the unmasked value takes precedence for exact matches. In the example
|
||||
above, if you declare <code>REG1 @ $0281</code> outside the MULTI_MASK
|
||||
declaration, the disassembler will use <code>REG1</code> for all operands
|
||||
that reference $0281. If other code accesses the same register as $23C1,
|
||||
the symbol established for the masked value will be used instead.</p>
|
||||
<p>If there are multiple masked values for a given address, the precedence
|
||||
is undefined.</p>
|
||||
<p>To disable the MULTI_MASK and resume normal declarations, write the
|
||||
tag without arguments:
|
||||
<pre> *MULTI_MASK</pre></p>
|
||||
|
||||
|
||||
<h3>Creating a Project-Specific Symbol File</h3>
|
||||
|
||||
<p>To create a platform symbol file for your project, just create a new
|
||||
text file, named with a ".sym65" extension. (If your text editor of choice
|
||||
doesn't like that, you can put a ".txt" on the end while you're editing.)
|
||||
Make sure you create it in the same directory where your project file
|
||||
(the file that ends with ".dis65") lives. Add a <code>*SYNOPSIS</code>,
|
||||
then add the desired symbols.</p>
|
||||
<p>Finally, add it to your project. Select Edit > Project Properties,
|
||||
switch to the Symbol Files tab, click Add Symbol Files from Project, and
|
||||
select your symbol file. It should appear in the list with a
|
||||
"PROJ:" prefix.</p>
|
||||
|
||||
<p>If an example helps, the A2-Amper-fdraw project in the Examples
|
||||
directory has a project-local symbol file, called "fdraw-exports".
|
||||
(fdraw-exports is a list of exported symbols from the fdraw library,
|
||||
for which Amper-fdraw provides an Applesoft BASIC interface.)
|
||||
|
||||
<p>NOTE: in the current version of SourceGen, changes to .sym65 files are
|
||||
not detected automatically. Use File > Reload External Files to
|
||||
import the changes.</p>
|
||||
|
||||
|
||||
<h2><a name="extension-scripts">Extension Scripts</a></h2>
|
||||
|
||||
<p>Extension scripts, also called "plugins", are C# programs with access to
|
||||
the full .NET Standard 2.0 APIs. They're compiled at run time by SourceGen
|
||||
and executed in a sandbox with security restrictions.</p>
|
||||
|
||||
<p>SourceGen defines an interface that plugins must implement, and an
|
||||
interface that plugins can use to interact with SourceGen. See
|
||||
Interfaces.cs in the PluginCommon directory.</p>
|
||||
|
||||
<p>The current interfaces can be used to generate visualizations, to
|
||||
identify inline data that follows JSR, JSL, or BRK instructions, and to
|
||||
format operands. The latter can be used to format code and data, e.g.
|
||||
replacing immediate load operands with symbolic constants.</p>
|
||||
|
||||
<p>Scripts may be loaded from the RuntimeData directory, or from the directory
|
||||
where the project file lives. Attempts to load them from other locations
|
||||
will fail.</p>
|
||||
<p>A project may load multiple scripts. The order in which they are
|
||||
invoked is not defined.</p>
|
||||
|
||||
<h4>Known Issues and Limitations</h4>
|
||||
|
||||
<p>Scripts are currently limited to C# version 5, because the compiler
|
||||
built into .NET only handles that. C# 6 and later require installing an
|
||||
additional package ("Roslyn"), so SourceGen does not support this.</p>
|
||||
|
||||
<p>When a project is opened, any errors encountered by the script compiler
|
||||
are reported to the user. If the project is already open, and a script
|
||||
is added to the project through the Project Properties editor, compiler
|
||||
messages are silently discarded. (This also applies if you undo/redo across
|
||||
the property edit.) Use File > Reload External Files to see the
|
||||
compiler messages.</p>
|
||||
|
||||
<h4>Development</h4>
|
||||
|
||||
<p>The easiest way to develop extension scripts is inside the 6502bench
|
||||
solution in Visual Studio. This way you have the interfaces available
|
||||
for IntelliSense completion, and get all the usual syntax and compile
|
||||
checking in the editor. (This is why there's a RuntimeData project for
|
||||
Visual Studio.)</p>
|
||||
|
||||
<p>If you have the solution configured for debug builds, SourceGen will pass
|
||||
<code>IncludeDebugInformation=true</code> to the script compiler. This
|
||||
causes a .PDB file to be created. While this can help with debugging,
|
||||
it can sometimes get in the way: if you edit the script source code and
|
||||
reload the project without restarting the app, SourceGen will recompile
|
||||
the script, but the old .PDB file will still be open by VisualStudio
|
||||
and you'll see some failure messages. Exiting and restarting SourceGen
|
||||
will allow regeneration of the PDB files.</p>
|
||||
|
||||
<p>Some commonly useful functions are defined in the
|
||||
<code>PluginCommon.Util</code> class, which is available to plugins. These
|
||||
call into the CommonUtil library, which is shared with SourceGen.
|
||||
While plugins could use CommonUtil directly, they should avoid doing so. The
|
||||
APIs there are not guaranteed to be stable, so plugins that rely on them
|
||||
may break in a subsequent release of SourceGen.</p>
|
||||
|
||||
<h4>PluginDllCache Directory</h4>
|
||||
|
||||
<p>Extension scripts are compiled into .DLLs, and saved in the PluginDllCache
|
||||
directory, which lives next to the application executable and RuntimeData.
|
||||
If the extension script is the same age or older than the DLL, SourceGen
|
||||
will continue to use the existing DLL.</p>
|
||||
|
||||
<p>The DLL names are a combination of the script filename and script location.
|
||||
The compiled name for "MyPlatform/MyScript.cs" in the RuntimeData directory
|
||||
will be "RT_MyPlatform_MyScript.dll". For a project-specific script, it
|
||||
would look like "PROJ_MyProject_MyScript.dll".</p>
|
||||
|
||||
<p>The PluginCommon and CommonUtil DLLs will be copied into the directory, so
|
||||
that code in the sandbox has access to them.</p>
|
||||
|
||||
<p>The contents of the directory are generated as needed, and can be deleted
|
||||
entirely whenever SourceGen isn't running.</p>
|
||||
|
||||
<h4>Sandboxing</h4>
|
||||
|
||||
<p>Extension scripts are executed in an App Domain sandbox. App domains are
|
||||
a .NET feature that creates a partition inside the virtual machine, isolating
|
||||
code. It still runs in the same address space, on the same threads, so the
|
||||
isolation is only effective for "partially trusted" code that has been
|
||||
declared safe by the bytecode verifier.</p>
|
||||
|
||||
<p>SourceGen disallows most actions, notably file access. An exception is
|
||||
made for reading files from the directory where the plugin DLLs live, but
|
||||
scripts are otherwise unable to read or write from the filesystem. (A
|
||||
future version of SourceGen may provide an API that allows limited access
|
||||
to data files.)</p>
|
||||
|
||||
<p>App domain security is not absolute. I don't really expect SourceGen to
|
||||
be used as a malware vector, so there's no value in forcing scripts to
|
||||
execute in an isolated server process, or to jump through the other hoops
|
||||
required to really lock things down. I do believe there's value in
|
||||
defining the API in such a way that we <b>could</b> implement full security if
|
||||
circumstances change, so I'm using app domains as a way to keep the API
|
||||
honest.</p>
|
||||
|
||||
|
||||
<h2><a name="multi-bin">Working With Multiple Binaries</a></h2>
|
||||
|
||||
<p>Sometimes a program is split into multiple files on disk. They
|
||||
may be all loaded at once, or some may be loaded into the same place
|
||||
at different times. In such situations it's not uncommon for one
|
||||
file to provide a set of interfaces that other files use. It's
|
||||
useful to have symbols for these interfaces be available to all
|
||||
projects.</p>
|
||||
<p>There are two ways to do this: (1) define a common platform symbol
|
||||
file with the relevant addresses, and keep it up to date as you work;
|
||||
or (2) declare the labels as global and exported, and import them
|
||||
as project symbols into the other projects.</p>
|
||||
<p>Support for this is currently somewhat weak, requiring a manual
|
||||
symbol-import step in every interested project. This step must be
|
||||
repeated whenever the labels are updated.</p>
|
||||
<p>A different but related problem is typified by arcade ROM sets,
|
||||
where files are split apart because each file must be burned into a
|
||||
separate PROM. All files are expected to be present in memory at
|
||||
once, so there's no reason to treat them as separate projects. Currently,
|
||||
the best way to deal with this is to concatenate the files into a single
|
||||
file, and operate on that.</p>
|
||||
|
||||
<h2><a name="overlap">Overlapping Address Spaces</a></h2>
|
||||
<p>Some programs use memory overlays, where multiple parts of the
|
||||
code run in the same address in RAM. Others use bank switching to access
|
||||
parts of the program that reside in separate physical RAM or ROM,
|
||||
but appear at the same address. Nested address regions allow for a
|
||||
variety of configurations, which can make address resolution complicated.</p>
|
||||
|
||||
<p>The general goal is to have references to an address resolve to
|
||||
the "nearest" match. For example, consider a simple overlay:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1100
|
||||
|
||||
.ADDRS $1100
|
||||
L1100 BIT L1100
|
||||
L1103 LDA #$11
|
||||
BRA L1103
|
||||
.ADREND
|
||||
|
||||
.ADDRS $1100
|
||||
L1100_0 BIT L1100_0
|
||||
L1103_0 LDA #$22
|
||||
JMP L1103_0
|
||||
.ADREND
|
||||
|
||||
.ADREND
|
||||
</pre>
|
||||
|
||||
<p>Both sections start at $1100, and have branches to $1103. The branch
|
||||
in the first section resolves to the label in the first version of
|
||||
that address chunk, while the branch in the second section resolves to
|
||||
the label in the second chunk. When branches originate outside the current
|
||||
address chunk, the first chunk that includes that address is used, as
|
||||
it is with the <code>JMP $1000</code> at the start of the file.</p>
|
||||
|
||||
<p>The full address-to-offset algorithm is as follows.
|
||||
There are two inputs: the file offset of the instruction or data item
|
||||
that has the reference (e.g. the JMP or LDA), and the address
|
||||
it is referring to.</p>
|
||||
<ul>
|
||||
<li>Create a tree with all address regions. Each "node" in the tree
|
||||
has an offset, length, and start address.</li>
|
||||
<li>Search the tree for a node that includes the offset of the
|
||||
reference source.
|
||||
When there are multiple overlapping regions, descend until the
|
||||
deepest child that spans the offset is found. This node will be
|
||||
the starting point of the search.</li>
|
||||
<li>Loop until we hit the top of the tree:
|
||||
<ul>
|
||||
<li>Perform a recursive depth-first search of all children of the
|
||||
current node. They're searched in order of ascending file offset.</li>
|
||||
<li>If the address wasn't found in the children, check the current
|
||||
node. If we find it here, return this node as the result.</li>
|
||||
<li>Move up to the parent node.
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<p>This searches all children and all siblings before checking the parent.
|
||||
If we hit the top of the tree without finding a match, we conclude
|
||||
that the reference is to an external address.</p>
|
||||
|
||||
|
||||
<h2><a name="reloc-data">OMF Relocation Dictionaries</a></h2>
|
||||
|
||||
<p><i>This feature is considered experimental. Some features,
|
||||
like cross-reference tracking, may not work correctly with it.</i></p>
|
||||
|
||||
<p>65816 code can be tricky to disassemble for a number of reasons.
|
||||
24-bit addresses are formed from 16-bit data-access operands by combining
|
||||
with the Data Bank Register (DBR), which often requires a bit of manual
|
||||
intervention. But the problems go beyond that. Consider the following
|
||||
bit of source code for the Apple IIgs:</p>
|
||||
<pre>
|
||||
rsrcmsg pea rsrcmsg2|-16
|
||||
pea rsrcmsg2
|
||||
_WriteCString
|
||||
lda #buffer
|
||||
sta pArcRead+$04
|
||||
lda #buffer|-16
|
||||
sta pArcRead+$06
|
||||
</pre>
|
||||
<p>In both cases we're referencing a 24-bit address as two 16-bit values.
|
||||
Without context, the disassembler will treat the PEA instruction as two
|
||||
independent 16-bit addresses, and the immediate values as constants:</p>
|
||||
<pre>
|
||||
.dbank $02
|
||||
02/317c: f4 02 00 L2317C pea L20002 & $ffff
|
||||
02/317f: f4 54 32 pea L23254 & $ffff
|
||||
02/3182: a2 0c 20 ldx #WriteCString
|
||||
02/3185: 22 00 00 e1 jsl Toolbox
|
||||
02/3189: a9 00 00 L23189 lda #$0000
|
||||
02/318c: 8d 78 3f sta L23F78 & $ffff
|
||||
02/318f: a9 03 00 lda #$0003
|
||||
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
|
||||
</pre>
|
||||
<p>Worse yet, those <code>STA</code> instruction operands would have been
|
||||
shown as hex values or incorrect labels if the DBR had been set incorrectly.
|
||||
However, if we have the relocation data, we know the full
|
||||
address from which the addresses were formed, and we can tell when
|
||||
immediate values are addresses rather than constants. And we can do this
|
||||
even without setting the DBR.</p>
|
||||
<pre>
|
||||
02/317c: f4 02 00 L2317C pea L23254 >> 16
|
||||
02/317f: f4 54 32 pea L23254 & $ffff
|
||||
02/3182: a2 0c 20 ldx #WriteCString
|
||||
02/3185: 22 00 00 e1 jsl Toolbox
|
||||
02/3189: a9 00 00 L23189 lda #L30000 & $ffff
|
||||
02/318c: 8d 78 3f sta L23F78 & $ffff
|
||||
02/318f: a9 03 00 lda #L30000 >> 16
|
||||
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
|
||||
</pre>
|
||||
<p>The absence of relocation data can be a useful signal as well. For
|
||||
example, when pushing arguments for a toolbox call, the disassembler
|
||||
can tell the difference between addresses and constants without needing
|
||||
emulation or pattern-matching, because only the addresses get
|
||||
relocated. Consider this bit of source code:</p>
|
||||
<pre>
|
||||
lda <total_records
|
||||
pha
|
||||
pea linebuf|-16
|
||||
pea linebuf+65
|
||||
pea $0005
|
||||
pea $0000
|
||||
_Int2Dec
|
||||
</pre>
|
||||
<p>Without relocation data, it becomes:</p>
|
||||
<pre>
|
||||
02/0aa8: a5 42 lda $42
|
||||
02/0aaa: 48 pha
|
||||
02/0aab: f4 02 00 pea L20002 & $ffff
|
||||
02/0aae: f4 03 31 pea L23103 & $ffff
|
||||
02/0ab1: f4 05 00 pea L20005 & $ffff
|
||||
02/0ab4: f4 00 00 pea L20000 & $ffff
|
||||
02/0ab7: a2 0b 26 ldx #Int2Dec
|
||||
02/0aba: 22 00 00 e1 jsl Toolbox
|
||||
</pre>
|
||||
<p>If we treat the non-relocated operands as constants:</p>
|
||||
<pre>
|
||||
02/0aa8: a5 42 lda $42
|
||||
02/0aaa: 48 pha
|
||||
02/0aab: f4 02 00 pea L230C2 >> 16
|
||||
02/0aae: f4 03 31 pea L23103 & $ffff
|
||||
02/0ab1: f4 05 00 pea $0005
|
||||
02/0ab4: f4 00 00 pea $0000
|
||||
02/0ab7: a2 0b 26 ldx #Int2Dec
|
||||
02/0aba: 22 00 00 e1 jsl Toolbox
|
||||
</pre>
|
||||
|
||||
|
||||
<h2><a name="debug">Debug Menu Options</a></h2>
|
||||
|
||||
<p>The DEBUG menu is hidden by default in release builds, but can be
|
||||
exposed by checking the "enable DEBUG menu" box in the application
|
||||
settings. These features are used for debugging SourceGen. They will
|
||||
not help you debug 6502 projects.</p>
|
||||
|
||||
<p>Features:</p>
|
||||
<ul>
|
||||
<li>Re-analyze (F5). Causes a full re-analysis. Useful if you think
|
||||
the display is out of sync.</li>
|
||||
<li>Source Generation Tests. Opens the regression test harness. See
|
||||
<code>README.md</code> in the SGTestData directory for more information.
|
||||
If the regression tests weren't included in the SourceGen distribution,
|
||||
this will have nothing to do.</li>
|
||||
<li>Show Analyzer Output. Opens a floating window with a text log from
|
||||
the most recent analysis pass. The exact contents will vary depending
|
||||
on how the verbosity level is configured internally. Debug messages
|
||||
from extension scripts appear here.</li>
|
||||
<li>Show Analysis Timers. Opens a floating window with a dump of
|
||||
timer results from the most recent analysis pass. Times for individual
|
||||
stages are noted, as are times for groups of functions. This
|
||||
provides a crude sense of where time is being spent.</li>
|
||||
<li>Show Undo/Redo History. Opens a floating window that lets you
|
||||
watch the contents of the undo buffer while you work.</li>
|
||||
<li>Extension Script Info. Shows a bit about the currently-loaded
|
||||
extension scripts.</li>
|
||||
<li>Show Comment Rulers. Adds a string of digits above every
|
||||
multi-line comment (long comment, note). Useful for confirming that
|
||||
the width limitation is being obeyed. These are added exactly
|
||||
as shown, without comment delimiters, into generated assembly output,
|
||||
which doesn't work out well if you run the assembler.</li>
|
||||
<li>Disable Security Sandbox. Extension scripts are loaded and run in
|
||||
a "sandbox" to prevent security issues. Setting this flag allows
|
||||
them to execute with full permissions.
|
||||
This setting is not persistent.</li>
|
||||
<li>Disable Keep-Alive Hack. The hack sends a "ping" to the extension
|
||||
script sandbox every 60 seconds. This seems to be required to avoid
|
||||
an infrequently-encountered Windows bug. (See code for notes and
|
||||
stackoverflow.com links.)
|
||||
This setting is not persistent.</li>
|
||||
<li>Reboot Security Sandbox. Discards the sandbox, creates a new one,
|
||||
and reloads it. Only useful for exercising the sandbox code that
|
||||
runs when the keep-alives are unsuccessful.</li>
|
||||
<li>Applesoft to HTML. An experimental feature that formats an
|
||||
Applesoft program as HTML.</li>
|
||||
<li>Export Edit Commands. Outputs comments and notes in
|
||||
SourceGen Edit Command format. This is an experimental feature.</li>
|
||||
<li>Apply Edit Commands. Reads a file in SourceGen Edit Command
|
||||
format and applies the commands.</li>
|
||||
<li>Apply External Symbols. An experimental feature for turning platform
|
||||
and project symbols into address labels. This will run through the list
|
||||
of all symbols loaded from .sym65 files and find addresses that fall
|
||||
within the bounds of the file. If it finds an address that is the start
|
||||
of a code/data line and doesn't already have a user-supplied label,
|
||||
and the platform symbol's label isn't already defined elsewhere, the
|
||||
platform label will be applied. Useful when disassembling ROM images
|
||||
or other code with an established set of public entry points.
|
||||
(Tip: disable "analyze uncategorized data" from the project
|
||||
properties editor first, as this will not set labels in the middle
|
||||
of multi-byte data items.)</li>
|
||||
</ul>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,340 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Instruction and Data Analysis - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Instruction and Data Analysis</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<p><i>This section discusses the internal workings of SourceGen. It is
|
||||
not necessary to understand this to use the program.</i></p>
|
||||
|
||||
<h2><a name="analysis-process">Analysis Process</a></h2>
|
||||
|
||||
<p>Analysis of the file data is a complex multi-step process. Some
|
||||
changes to the project, such as adding a code start point or
|
||||
changing the CPU selection, require a full re-analysis of instructions
|
||||
and data. Other changes, such as adding or removing a label, don't
|
||||
affect the code tracing and only require a re-analysis of the data areas.
|
||||
And some changes, such as editing a comment, only require a refresh
|
||||
of the displayed lines.</p>
|
||||
<p>It should be noted that none of the analysis results are stored in
|
||||
the project file. Only user-supplied data, such as the locations of
|
||||
code entry points and label definitions, is written to the file. This
|
||||
does create the possibility that two different users might get different
|
||||
results when opening the same project file with different versions of
|
||||
SourceGen, but these effects are expected to be minor.</p>
|
||||
|
||||
<p>The analyzer performs the following steps (see the <code>Analyze</code>
|
||||
method in <code>DisasmProject.cs</code>):</p>
|
||||
<ul>
|
||||
<li>Reset the symbol table.</li>
|
||||
<li>Merge platform symbols into the symbol table, loading the files
|
||||
in order.</li>
|
||||
<li>Merge project symbols into the symbol table, stomping on any
|
||||
platform symbols that conflict.</li>
|
||||
<li>Merge user label symbols into the table, stomping any previous
|
||||
entries.</li>
|
||||
<li>Run the code analyzer. The outcome of this is an array of analysis
|
||||
attributes, or "anattribs", with one entry per byte in the file.
|
||||
The Anattrib array tracks most of the state from here on. If we're
|
||||
doing a partial re-analysis, this step will just clone a copy of the
|
||||
Anattrib array that was made at this point in a previous run. (The
|
||||
code analysis pass is described in more detail below.)</li>
|
||||
<li>Apply user-specified labels to Anattribs.</li>
|
||||
<li>Apply user-specified format descriptors. These are the instruction
|
||||
and data operand formats.</li>
|
||||
<li>Run the data analyzer. This looks for patterns in uncategorized
|
||||
data, and connects instruction and data operands to target offsets.
|
||||
The "nearby label" stuff is handled here. Auto-labels are generated
|
||||
for references to internal addresses. All of the results are
|
||||
stored in the Anattribs array. (The data analysis pass is described in
|
||||
more detail below.)</li>
|
||||
<li>Remove hidden labels from the symbol table. These are user-specified
|
||||
labels that have been placed on offsets that are in the middle of an
|
||||
instruction or multi-byte data item. They can't be referenced, so we
|
||||
want to pull them out of the symbol table. (Remember, symbolic
|
||||
operands use "weak references", so a missing symbol just means the
|
||||
operand is shown as a hex value.)</li>
|
||||
<li>Resolve references to local variables. This sets the operand symbol
|
||||
in Anattrib so we won't try to apply platform/project symbols to
|
||||
zero-page addresses. If we somehow ended up with a variable that has
|
||||
the same as a user label, we rename the variable.</li>
|
||||
<li>Resolve references to platform and project external symbols.
|
||||
This sets the operand symbol in Anattrib, and adds the symbol to
|
||||
the list that is displayed in .EQ directives.</li>
|
||||
<li>Generate cross-reference lists. This is done for internal references,
|
||||
for local variables, and for any platform/project symbols that are
|
||||
referenced.</li>
|
||||
<li>If annotated auto-labels are enabled, the simple labels are
|
||||
replaced with the annotated versions here. (This can't be done earlier
|
||||
because the annotations are generated from the cross-reference data.)</li>
|
||||
<li>In a debug build, some validity checks are performed.</li>
|
||||
</ul>
|
||||
|
||||
<p>Once analysis is complete, a line-by-line display list is generated
|
||||
by walking through the annotated file data. Most of the actual text
|
||||
isn't rendered until they're needed. For complicated multi-line items
|
||||
like string operands, the formatted text must be generated to know how
|
||||
many lines it will occupy, so it's done immediately and cached for re-use
|
||||
on subsequent runs.</p>
|
||||
|
||||
|
||||
<h3><a name="auto-format">Automatic Formatting</a></h3>
|
||||
|
||||
<p>Every offset in the file is marked as an instruction byte, data byte, or
|
||||
inline data byte. Some offsets are also marked as the start of an instruction
|
||||
or data area. The start offsets may have a format descriptor associated
|
||||
with them.</p>
|
||||
<p>Format descriptors have a format (like "numeric" or
|
||||
"null-terminated string") a sub-format (like "hexadecimal" or
|
||||
"high ASCII"), and a length. For
|
||||
an instruction operand the length is redundant, but for a data operand it
|
||||
determines the width of the numeric value or length of the string. For
|
||||
this reason, instructions do not need a format descriptor, but all
|
||||
data items do.</p>
|
||||
<p>Symbolic references are format descriptors with a symbol attached.
|
||||
The symbol reference also specifies low/high/bank, for partial symbol
|
||||
references like <code>LDA #>symbol</code>.</p>
|
||||
<p>Every offset marked as a start point gets its own line in the on-screen
|
||||
display list. Embedded instructions are identified internally by
|
||||
looking for instruction-start offsets inside instructions.</p>
|
||||
|
||||
<p>The Anattrib array holds the post-analysis state for every offset,
|
||||
including comments and formatting, but any changes you make in the
|
||||
editors are applied to the data structures that are saved in the project
|
||||
file. After a change is made, a full or partial re-analysis is done to
|
||||
fill out the Anattribs.</p>
|
||||
<p>Consider a simple example:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>We haven't explicitly formatted anything yet. The data analyzer sees
|
||||
that the JMP operand is inside the file, and has no label, so it creates an
|
||||
auto-label at offset +000003 and a format descriptor with a symbolic
|
||||
operand reference to "L1003" at +000000.</p>
|
||||
<p>Suppose we now edit the label, changing L1003 to "FOO". This goes into
|
||||
the project's "user label" list. The analyzer is
|
||||
run, and applies the new "user label" to the Anattrib array. The
|
||||
data analyzer finds the numeric reference in the JMP operand, and finds
|
||||
a label at the target address, so it creates a symbolic operand reference
|
||||
to "FOO". When the display list is generated, the symbol "FOO" appears
|
||||
in both places.</p>
|
||||
<p>Even though the JMP operand changed from "L1003" to "FOO", the only
|
||||
change actually written to the project file is the label edit. The
|
||||
contents of the Anattrib array are disposable, so it can be used to
|
||||
hold auto-generated labels and "fix up" numeric references. Labels and
|
||||
format descriptors generated by SourceGen are never added to the
|
||||
project file.</p>
|
||||
|
||||
<p>If the JMP operand were edited, a format descriptor would be added
|
||||
to the user-specified descriptor list. During the analysis pass it would
|
||||
be added to the Anattrib array at offset +000000.</p>
|
||||
|
||||
|
||||
<h3><a name="undo-redo">Interaction With Undo/Redo</a></h3>
|
||||
|
||||
<p>The analysis pass always considers the current state of the user
|
||||
data structures. Whether you're adding a label or removing one, the
|
||||
code runs through the same set of steps. The advantage of this approach
|
||||
is that the act of doing a thing, undoing a thing, and redoing a thing
|
||||
are all handled the same way.</p>
|
||||
<p>None of the editors modify the project data structures directly. All
|
||||
changes are added to a change set, which is processed by a single
|
||||
"apply changes" function. The change sets are kept in the undo/redo
|
||||
buffer indefinitely. After
|
||||
the changes are made, the Anattrib array and other data structures are
|
||||
regenerated.</p>
|
||||
|
||||
<p>Data format editing can create some tricky situations. For example,
|
||||
suppose you have 8 bytes that have been formatted as two 32-bit words:</p>
|
||||
|
||||
<pre>
|
||||
1000: 68690074 .dd4 $74006968
|
||||
1004: 65737400 .dd4 $00747365
|
||||
</pre>
|
||||
|
||||
<p>You realize these are null-terminated strings, select both words, and
|
||||
reformat them:</p>
|
||||
|
||||
<pre>
|
||||
1000: 686900 .zstr "hi"
|
||||
1003: 74657374+ .zstr "test"
|
||||
</pre>
|
||||
|
||||
<p>Seems simple enough. Under the hood, SourceGen created three changes:</p>
|
||||
<ol>
|
||||
<li>At offset +000000, replace the current format descriptor (4-byte
|
||||
numeric) with a 3-byte null-terminated string descriptor.</li>
|
||||
<li>At offset +000003, add a new 5-byte null-terminated string
|
||||
descriptor.</li>
|
||||
<li>At offset +000004, remove the 4-byte numeric descriptor.</li>
|
||||
</ol>
|
||||
|
||||
<p>Each entry in the change set has "before" and "after" states for the
|
||||
format descriptor at a specific offset. Only the state for the affected
|
||||
offsets is included -- the program doesn't record the state of the full
|
||||
project after each change (even with the RAM on a modern system that would
|
||||
add up quickly). When undoing a change, before and after are simply
|
||||
reversed.</p>
|
||||
|
||||
|
||||
<h2><a name="code-analysis">Code Analysis</a></h2>
|
||||
|
||||
<p>The code tracer walks through the instructions, examining them to
|
||||
determine where execution will proceed next. There are five possibilities
|
||||
for every instruction:</p>
|
||||
<ol>
|
||||
<li>Continue. Execution always continues at the next instruction.
|
||||
Examples: <code>LDA</code>, <code>STA</code>, <code>AND</code>,
|
||||
<code>NOP</code>.</li>
|
||||
<li>Don't continue. The next instruction to be executed can't be
|
||||
determined from the file data (unless you're disassembling the
|
||||
system ROM around the BRK vector).
|
||||
Examples: <code>RTS</code>, <code>BRK</code>.</li>
|
||||
<li>Branch always. The operand specifies the next instruction address.
|
||||
Examples: <code>JMP</code>, <code>BRA</code>, <code>BRL</code>.</li>
|
||||
<li>Branch sometimes. Execution may continue at the operand address,
|
||||
or may execute the following instruction. If we know the value of
|
||||
the flags in the processor status register, we can eliminate one
|
||||
possibility. Examples: <code>BCC</code>, <code>BEQ</code>,
|
||||
<code>BVS</code>.</li>
|
||||
<li>Call subroutine. Execution will continue at the operand address,
|
||||
and is expected to also continue at the following instruction.
|
||||
Examples: <code>JSR</code>, <code>JSL</code>.</li>
|
||||
</ol>
|
||||
|
||||
<p>Branch targets are added to a list. When the current run of instructions
|
||||
is exhausted (i.e. a "don't continue" or "branch always" instruction is
|
||||
reached), the next target is pulled off of the list.</p>
|
||||
|
||||
<p>The state of the processor status flags is recorded for every
|
||||
instruction. When execution proceeds to the next instruction or branches
|
||||
to a new address, the flags are merged with the flags at the new
|
||||
location. If one execution path through a given address has the flags
|
||||
in one state (say, the carry is clear), while another execution path
|
||||
sees a different state (carry is set), the merged flag is
|
||||
"indeterminate". Indeterminate values cannot become determinate through
|
||||
a merge, but can be set by an instruction.</p>
|
||||
|
||||
<p>There can be multiple paths to a single address. If the analyzer
|
||||
sees that an instruction has been visited before, with an identical set
|
||||
of status flags, the analyzer stops pursuing that path.</p>
|
||||
|
||||
<p>The analyzer must always know the width of immediate load instructions
|
||||
when examining 65816 code, but it's possible for the status flag values
|
||||
to be indeterminate. In such a situation, short registers are assumed.
|
||||
Similarly, if the carry flag is unknown when an <code>XCE</code> is
|
||||
performed, we assume a transition to emulation mode (E=1).</p>
|
||||
|
||||
<p>There are three ways in which code can set a flag to a definite value:</p>
|
||||
<ol>
|
||||
<li>With explicit instructions, like <code>SEC</code> or
|
||||
<code>CLD</code>.</li>
|
||||
<li>With immediate-operand instructions. <code>LDA #$00</code> sets Z=1
|
||||
and N=0. <code>ORA #$80</code> sets Z=0 and N=1.</li>
|
||||
<li>By inference. For example, if we see a <code>BCC</code> instruction,
|
||||
we know that the carry will be clear at the branch target address, and
|
||||
set at the following instruction. The instruction doesn't affect the
|
||||
value of the flag, but we know what the value will be at both
|
||||
addresses.</li>
|
||||
</ol>
|
||||
<p>Self-modifying code can spoil any of these, possibly requiring a
|
||||
status flag override to get correct disassembly.</p>
|
||||
|
||||
<p>The instruction that is most likely to cause problems is <code>PLP</code>,
|
||||
which pulls the processor status flags off of the stack. SourceGen
|
||||
doesn't try to track stack contents, so it can't know what values may
|
||||
be pulled. In many cases the <code>PLP</code> appears not long after a
|
||||
<code>PHP</code>, so SourceGen can scan backward through the file to
|
||||
find the nearest <code>PHP</code>, and use the status flags from that.
|
||||
In practice this doesn't work well, but the "smart" behavior can be
|
||||
enabled from the project properties if desired. Otherwise, a
|
||||
<code>PLP</code> causes all flags to be set to "indeterminate", except
|
||||
for the M/X flags on the 65816 which are left unmodified.</p>
|
||||
|
||||
<p>Some other things that the code analyzer can't recognize automatically:</p>
|
||||
<ul>
|
||||
<li>Jumping indirectly through an address outside the file, e.g.
|
||||
storing an address in zero-page memory and jumping through it.</li>
|
||||
<li>Jumping to an address by pushing the location onto the stack,
|
||||
then executing an <code>RTS</code>.</li>
|
||||
<li>Self-modifying code, e.g. overwriting a <code>JMP</code> instruction.</li>
|
||||
<li>Addresses invoked by external code, e.g. interrupt handlers.</li>
|
||||
</ul>
|
||||
<p>Sometimes the indirect jump targets are coming from a table of
|
||||
addresses in the file. If so, these can be formatted as addresses,
|
||||
and then the target locations tagged as code entry points.</p>
|
||||
<p>The 65816 adds an additional twist: some instructions combine their
|
||||
operands with the Data Bank Register ("B") to form a 24-bit address.
|
||||
SourceGen can't automatically determine what the register holds, so it
|
||||
assumes that it's equal to the program bank register ("K"), and provides
|
||||
a way to override the value.</p>
|
||||
|
||||
|
||||
<h3><a name="extension-scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts can mark data that follows a JSR, JSL, or BRK as inline
|
||||
data, or change the format of nearby data or instructions. The first
|
||||
time a JSR/JSL/BRK instruction is encountered, all loaded extension scripts
|
||||
that implement the appropriate interface are offered a chance to act.</p>
|
||||
|
||||
<p>The first script that applies a format wins. Attempts to re-format
|
||||
instructions or data that have already been formatted will fail. This rule
|
||||
ensures that anything explicitly formatted by the user will not be
|
||||
overridden by a script.</p>
|
||||
|
||||
<p>If code jumps into a region that is marked as inline data, the
|
||||
branch will be ignored. If an extension script tries to flag bytes
|
||||
as inline data that have already been executed, the script will be
|
||||
ignored. This can lead to a race condition in the analyzer if
|
||||
an extension script is doing the wrong thing. (The race doesn't exist
|
||||
with inline data tags specified by the user, because those are applied
|
||||
before code analysis starts.)</p>
|
||||
|
||||
|
||||
<h2><a name="data-analysis">Data Analysis</a></h2>
|
||||
<p>The data analyzer performs two tasks. It matches operands with
|
||||
offsets, and it analyzes uncategorized data. This behavior can be
|
||||
modified in the
|
||||
<a href="settings.html#project-properties">project properties</a>.</p>
|
||||
|
||||
<p>The data target analyzer examines every instruction and data operand
|
||||
to see if it's referring to an offset within the data file. If the
|
||||
target is within the file, and has a label, a format descriptor with a
|
||||
weak symbolic reference to that label is added to the Anattrib array. If
|
||||
the target doesn't have a label, the analyzer will either use a nearby
|
||||
label, or generate a unique label and use that.</p>
|
||||
<p>While most of the "nearby label" logic can be disabled, targets that
|
||||
land in the middle of an instruction are always adjusted backward to
|
||||
the instruction start. This is necessary because labels are only visible
|
||||
if they're associated with the first (opcode) byte of an instruction.</p>
|
||||
|
||||
<p>The uncategorized data analyzer tries to find character strings and
|
||||
opportunities to use the ".FILL" operation. It breaks the file into
|
||||
pieces, where contiguous regions hold nothing but data, are not split
|
||||
across address region start/end directives, are not interrupted by data,
|
||||
and do not contain anything that the user has chosen to format. Each
|
||||
region is scanned for matching patterns. If a match is found, a format entry
|
||||
is added to the Anattrib array. Otherwise, data is added as single-byte
|
||||
values.</p>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,415 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Code Generation & Assembly - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Code Generation & Assembly</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<p>SourceGen can generate an assembly source file that, when fed into
|
||||
the target assembler, will recreate the original data file exactly.
|
||||
Every assembler is different, so support must be added to SourceGen
|
||||
for each.</p>
|
||||
<p>The generation / assembly dialog can be opened with File > Assemble.</p>
|
||||
<p>If you want to show code to others, perhaps by adding a page to
|
||||
your web site, you can "export" the formatted code as text or HTML.
|
||||
This is explained in more detail <a href="#export-source">below</a>.
|
||||
|
||||
|
||||
<h2><a name="generate">Generating Source Code</a></h2>
|
||||
|
||||
<p>Cross assemblers tend to generate additional files, either compiler
|
||||
intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
|
||||
generators may produce multiple source files, perhaps a link script or
|
||||
symbol definition header to go with the assembly source. To avoid
|
||||
spreading files across the filesystem, SourceGen does all of its work
|
||||
in the same directory where the project lives. Before you can generate
|
||||
code, you have to have assigned your project a directory. This is why
|
||||
you can't assemble a project until you've saved it for the first time.</p>
|
||||
|
||||
<p>The Generate and Assemble dialog has a drop-down list near the top
|
||||
that lets you pick which assembler to target. The name of the assembler
|
||||
will be shown with the detected version number. If the assembler
|
||||
executable isn't configured, "[latest version]" will be shown instead
|
||||
of a version number.</p>
|
||||
<p>The Settings button will take you directly to the assembler configuration
|
||||
tab in the application settings dialog.</p>
|
||||
<p>Hit the Generate button to generate the source code into a file on disk.
|
||||
The file will use the project name, with the <code>.dis65</code> extension
|
||||
replaced by <code>_<assembler>.S</code>.</p>
|
||||
<p>The first 64KiB of each generated file will be shown in the preview
|
||||
window. If multiple files were generated, you can use the "preview file"
|
||||
drop-down to select between them. Line numbers are
|
||||
prepended to each line to make it easier to track down errors.</p>
|
||||
|
||||
|
||||
|
||||
<h3><a name="localizer">Label Localizer</a></h3>
|
||||
<p>The label localizer is an optional feature that automatically converts
|
||||
some labels to an assembler-specific less-than-global label format. Local
|
||||
labels may be reusable (e.g. using "]LOOP" for multiple consecutive
|
||||
loops is easier to understand than giving each one a unique label) or
|
||||
reduce the size of a generated link table. There are usually restrictions
|
||||
on local labels, e.g. references to them may not be allowed to cross a
|
||||
global label definition, which the localizer factors in automatically.</p>
|
||||
|
||||
|
||||
<h3><a name="reserved-labels">Reserved Label Names</a></h3>
|
||||
<p>Some label names aren't allowed. For example, 64tass reserves the
|
||||
use of labels that begin with two underscores. Most assemblers will
|
||||
also prevent you from using opcode mnemonics as labels (which means
|
||||
you can't assemble <code>jmp jmp jmp</code>).</p>
|
||||
<p>If a label doesn't appear to be legal, the generated code will use
|
||||
a suitable replacement (e.g. <code>jmp_1 jmp jmp_1</code>).</p>
|
||||
|
||||
|
||||
<h3><a name="platform-features">Platform-Specific Features</a></h3>
|
||||
<p>SourceGen needs to be able to assemble binaries for any system
|
||||
with any assembler, so it generally avoids platform-specific features.
|
||||
One exception to that is C64 PRG files.</p>
|
||||
<p>PRG files start with a 16-bit value that tells the OS where the
|
||||
rest of the file should be loaded. The value is not usually part of
|
||||
the source code, but instead is generated by the assembler, based on
|
||||
the address of the first byte output. If SourceGen detects that
|
||||
a file is PRG, the source generators for some assemblers will suppress
|
||||
the first 2 bytes, and instead pass appropriate meta-data (such as
|
||||
an additional command-line option) to the assembler.</p>
|
||||
<p>A file is treated as a PRG if:</p>
|
||||
<ul>
|
||||
<li>it is between 3 and 65536 bytes long (inclusive)</li>
|
||||
<li>the format at offset +000000 is a 16-bit numeric data item
|
||||
(not executable code, not two 8-byte values, not the first part
|
||||
of a 24-bit value, etc.)</li>
|
||||
<li>there is an address region start directive at +000002
|
||||
<li>the 16-bit value at +000000 is equal to the address of the
|
||||
byte at +000002</li>
|
||||
<li>there is no label at offset +000000 (explicit or auto-generated)</li>
|
||||
</ul>
|
||||
<p>The definition is sufficiently narrow to avoid most false-positives.
|
||||
If a file is being treated as PRG and you'd rather it weren't, you
|
||||
can add a label or reformat the bytes. This feature is currently only
|
||||
enabled for 64tass.</p>
|
||||
|
||||
|
||||
<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
|
||||
|
||||
<p>After generating sources, if you have a cross-assembler executable
|
||||
configured, you can run it by clicking the "Run Assembler" button. The
|
||||
command-line output will be displayed, with stdout and stderr separated.
|
||||
(I'd prefer them to be interleaved, but that's not what the system
|
||||
provides.)</p>
|
||||
|
||||
<p>The output will show the assembler's exit code, which will be zero
|
||||
on success (note: sometimes they lie.) If it appeared to succeed,
|
||||
SourceGen will then compare the assembler's output to the original file,
|
||||
and report any differences.</p>
|
||||
<p>Failures here may be due to bugs in the cross-assembler or in
|
||||
SourceGen. However, SourceGen can generally work around assembler bugs,
|
||||
so any failure is an opportunity for improvement.</p>
|
||||
|
||||
|
||||
<h2><a name="supported">Supported Assemblers</a></h2>
|
||||
|
||||
<p>SourceGen currently supports the following cross-assemblers:</p>
|
||||
<ul>
|
||||
<li><a href="#64tass">64tass</a></li>
|
||||
<li><a href="#acme">ACME</a></li>
|
||||
<li><a href="#cc65">cc65</a></li>
|
||||
<li><a href="#merlin32">Merlin 32</a></li>
|
||||
</ul>
|
||||
|
||||
<h3><a name="version">Version-Specific Code Generation</a></h3>
|
||||
|
||||
<p>Code generation must be tailored to the specific version of the
|
||||
assembler. This is most easily understood with an example.</p>
|
||||
<p>If the code has a statement like <code>MVN #$01,#$02</code>, the
|
||||
assembler is expected to output <code>54 02 01</code>, with the arguments
|
||||
reversed. cc65 v2.17 got it backward; the behavior was fixed in v2.18. The
|
||||
bug means we can't generate the same <code>MVN</code>/<code>MVP</code>
|
||||
instructions for both versions of the assembler.</p>
|
||||
<p>Having version-dependent source code is a bad idea. If we generated
|
||||
reversed operands (<code>MVN #$02,#$01</code>), we'd get the correct
|
||||
output with v2.17, but the wrong output for v2.18. Unambiguous code can
|
||||
be generated for all versions of the assembler by just outputting raw hex
|
||||
bytes, but that's ugly and annoying, so we don't want to be stuck doing
|
||||
that forever. We want to detect which version of the assembler is in
|
||||
use, and output actual <code>MVN</code>/<code>MVP</code> instructions
|
||||
when producing code for newer versions of the assembler.</p>
|
||||
<p>When you configure a cross-assembler, SourceGen runs the executable with
|
||||
version query args, and extracts the version information from the output
|
||||
stream. This is used by the generator to ensure that the output will compile.
|
||||
If no assembler is configured, SourceGen will produce code optimized
|
||||
for the latest version of the assembler.</p>
|
||||
|
||||
|
||||
<h3><a name="quirks">Assembler-Specific Bugs & Quirks</a></h3>
|
||||
|
||||
<p>This is a list of bugs and quirky behavior in cross-assemblers that
|
||||
SourceGen works around when generating code.</p>
|
||||
<p>Every assembler seems to have a different way of dealing with expressions.
|
||||
Most of them will let you group expressions with parenthesis, but that
|
||||
doesn't always help. For example, <code>PEA label >> 8 + 1</code> is
|
||||
perfectly valid, but writing <code>PEA (label >> 8) + 1</code> will cause
|
||||
most assemblers to assume you're trying to use an alternate (and non-existent)
|
||||
form of <code>PEA</code> with indirect addressing, causing the assembler
|
||||
to halt with an error message. The code generator needs
|
||||
to understand expression syntax and operator precedence to generate correct
|
||||
code, but also needs to know how to handle the corner cases.</p>
|
||||
|
||||
|
||||
<h3><a name="64tass">64tass</a></h3>
|
||||
|
||||
<p>Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625
|
||||
<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
|
||||
|
||||
<p>Bugs:</p>
|
||||
<ul>
|
||||
<li>[Fixed in v1.55.2176]
|
||||
Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
|
||||
the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
|
||||
<li>[Fixed in v1.55.2176] WDM is not supported.</li>
|
||||
</ul>
|
||||
|
||||
<p>Quirks:</p>
|
||||
<ul>
|
||||
<li>The underscore character ('_') is allowed as a character in labels,
|
||||
but when used as the first character in a label it indicates the
|
||||
label is local. If you create labels with leading underscores that
|
||||
are not local, the labels must be altered to start with some other
|
||||
character, and made unique.</li>
|
||||
<li>Labels starting with two underscores are "reserved". Trying to
|
||||
use them causes an error.</li>
|
||||
<li>By default, 64tass sets the first two bytes of the output file to
|
||||
the load address. The <code>--nostart</code> flag is used to
|
||||
suppress this.</li>
|
||||
<li>By default, 64tass is case-insensitive, but SourceGen treats labels
|
||||
as case-sensitive. The <code>--case-sensitive</code> flag must be passed
|
||||
to the assembler.</li>
|
||||
<li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
|
||||
and operands must be lower-case. Most of the SourceGen options that
|
||||
cause things to appear in upper case must be disabled.</li>
|
||||
<li>For 65816, selecting the bank byte is done with the grave accent
|
||||
character ('`') rather than the caret ('^'). (There's a note in the
|
||||
docs to the effect that they plan to move to carets.)</li>
|
||||
<li>Instructions whose argument is formed by combining with the
|
||||
65816 Program Bank Register (16-bit JMP/JSR) must be specified
|
||||
as 24-bit values for code that lives outside bank 0. This is
|
||||
true for both symbols and raw hex (e.g. <code>JSR $1234</code>
|
||||
is invalid outside bank 0). Attempting to JSR to a label in bank
|
||||
0 from outside bank 0 causes an error, even though it is technically
|
||||
a 16-bit operand.</li>
|
||||
<li>The arguments to COP and BRK require immediate-mode syntax
|
||||
(<code>COP #$03</code> rather than <code>COP $03</code>).
|
||||
<li>For historical reasons, the default behavior of the assembler is to
|
||||
assume that the source file is PETSCII, and the desired encoding for
|
||||
strings is also PETSCII. No character conversion is done, so anybody
|
||||
assembling ASCII files will get ASCII strings (which works out pretty
|
||||
well if you're assembling code for a non-Commodore target). However,
|
||||
the documentation says you're required to pass the "--ascii" flag when
|
||||
the input is ASCII/UTF-8, so to build files that want ASCII operands
|
||||
an explicit character encoding definition must be provided.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="acme">ACME</a></h3>
|
||||
|
||||
<p>Tested versions: v0.96.4, v0.97
|
||||
<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
|
||||
|
||||
<p>Bugs:</p>
|
||||
<ul>
|
||||
<li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
|
||||
outside bank zero cannot be assembled. SourceGen currently deals with
|
||||
this by outputting the entire file as a hex dump.</li>
|
||||
<li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
|
||||
<li>BRK and WDM are not allowed to have operands.</li>
|
||||
</ul>
|
||||
|
||||
<p>Quirks:</p>
|
||||
<ul>
|
||||
<li>The assembler shares some traits with one-pass assemblers. In
|
||||
particular, if you forward-reference a zero-page label, the reference
|
||||
generates a 16-bit absolute address instead of an 8-bit zero-page
|
||||
address. Unlike other one-pass assemblers, the width is "sticky",
|
||||
and backward references appearing later in the file also use absolute
|
||||
addressing even though the proper width is known at that point. This is
|
||||
worked around by using explicit "force zero page" annotations on
|
||||
all references to zero-page labels.</li>
|
||||
<li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
|
||||
<code>ASR</code> instead.</li>
|
||||
<li>Does not allow the accumulator to be specified explicitly as an
|
||||
operand, e.g. you can't write <code>LSR A</code>.</li>
|
||||
<li>[Fixed in v0.97.]
|
||||
Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
|
||||
before 8-bit operands.</li>
|
||||
<li>Officially, the preferred file extension for ACME source code is ".a",
|
||||
but this is already used on UNIX systems for static libraries (which
|
||||
means shell filename completion tends to ignore them). Since ".S" is
|
||||
pretty universally recognized as assembly source, code generated by
|
||||
SourceGen for ACME also uses ".S".</li>
|
||||
<li>Version 0.97 started interpreting '\' in strings as an escape
|
||||
character, to allow C-style escapes like "\n". This requires escaping
|
||||
all occurrences of '\' in data strings as "\\". Compiling an older
|
||||
source file with a newer version of ACME may fail unless you pass
|
||||
a backward-compatibility command-line argument.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="cc65">cc65</a></h3>
|
||||
|
||||
<p>Tested versions: v2.17, v2.18
|
||||
<a href="https://cc65.github.io/">[web site]</a></p>
|
||||
|
||||
<p>Bugs:</p>
|
||||
<ul>
|
||||
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
||||
<li>BRK can only be given an argument in 65816 mode.</li>
|
||||
<li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
|
||||
<li>[Fixed in v2.18] <code>BRK <arg></code> is assembled to opcode
|
||||
$05 rather than $00.</li>
|
||||
<li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
|
||||
</ul>
|
||||
|
||||
<p>Quirks:</p>
|
||||
<ul>
|
||||
<li>Operator precedence is unusual. Consider <code>label >> 8 - 16</code>.
|
||||
cc65 puts shift higher than subtraction, whereas languages like C
|
||||
and assemblers like 64tass do it the other way around. So cc65
|
||||
regards the expression as <code>(label >> 8) - 16</code>, while the
|
||||
more common interpretation would be <code>label >> (8 - 16)</code>.
|
||||
(This is actually somewhat convenient, since none of the expressions
|
||||
SourceGen currently generates require parenthesis.)</li>
|
||||
<li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS. All
|
||||
other opcodes match up with the "unintended opcodes" document.</li>
|
||||
<li>ca65 is implemented as a single-pass assembler, so label widths
|
||||
can't always be known in time. For example, if you use some zero-page
|
||||
labels, but they're defined via <code>.ORG $0000</code> after the point
|
||||
where the labels are used, the assembler will already have generated them
|
||||
as absolute values. Width disambiguation must be applied to operands
|
||||
that wouldn't be ambiguous to a multi-pass assembler.</li>
|
||||
<li>Assignment of constants and variables (<code>=</code> and
|
||||
<code>.set</code>) ends local label scope, so the label localizer
|
||||
has to take variable assignment into account.</li>
|
||||
<li>The assembler is geared toward generating relocatable code with
|
||||
multiple segments (it is, after all, an assembler for a C compiler).
|
||||
A linker configuration script is expected to be provided for anything
|
||||
complex. SourceGen generates a custom config file for each project.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="merlin32">Merlin 32</a></h3>
|
||||
|
||||
<p>Tested Versions: v1.0
|
||||
<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
|
||||
<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
|
||||
</p>
|
||||
|
||||
<p>Bugs:</p>
|
||||
<ul>
|
||||
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
||||
<li>For some failures, an exit code of zero is returned.</li>
|
||||
<li>Immediate operands with a comma (e.g. <code>LDA #','</code>)
|
||||
or curly braces (e.g. <code>LDA #'{'</code>) cause an error.</li>
|
||||
<li>Some DP indexed store instructions cause errors if the label isn't
|
||||
unambiguously DP (e.g. <code>STX $00,X</code> vs.
|
||||
<code>STX $0000,X</code>). This isn't a problem with project/platform
|
||||
symbols, which are output as two-digit hex values when possible, but
|
||||
causes failures when direct page locations are included in the project
|
||||
and given labels.</li>
|
||||
<li>The check for 64KiB overflow appears to happen before instructions
|
||||
that might be absolute or direct page are resolved and reduced in size.
|
||||
This makes it unlikely that a full 64KiB bank of code can be
|
||||
assembled.</li>
|
||||
</ul>
|
||||
|
||||
<p>Quirks:</p>
|
||||
<ul>
|
||||
<li>Operator precedence is unusual. Expressions are generally processed
|
||||
from left to right. The byte-selection operators have a lower
|
||||
precedence than all of the others, and so are always processed last.</li>
|
||||
<li>The byte selection operators ('<', '>', '^') are actually
|
||||
word-selection operators, yielding 16-bit values when wide registers
|
||||
are enabled on the 65816.</li>
|
||||
<li>Values loaded into registers are implicitly mod 256 or 65536. There
|
||||
is no need to explicitly mask an expression.</li>
|
||||
<li>The assembler tracks register widths when it sees SEP/REP instructions,
|
||||
but doesn't attempt to track the emulation flag. So if you issue a
|
||||
<code>REP #$20</code>
|
||||
while in emulation mode, the assembler will incorrectly assume long
|
||||
registers. Ideally it would be possible to configure that off, but
|
||||
there's no way to do that, so instead we occasionally generate
|
||||
additional width directives.</li>
|
||||
<li>Non-unique local labels should cause an error, but don't.</li>
|
||||
<li>No undocumented opcodes are supported, nor are the Rockwell
|
||||
65C02 instructions.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
<h2><a name="export-source">Exporting Source Code</a></h2>
|
||||
<p>The "export" function takes what you see in the code list in the app
|
||||
and converts it to text or HTML. The options you've set in the app
|
||||
settings, such as capitalization, text delimiters, pseudo-opcode names,
|
||||
operand expression style, and display of cycle counts are all taken into
|
||||
account. The file generated is not expected to work with an actual
|
||||
assembler.</p>
|
||||
<p>The text output is similar to what you'd get by copying lines to the
|
||||
clipboard and pasting them into a text file, except that you have greater
|
||||
control over which columns are included. The HTML version is augmented
|
||||
with links and (optionally) images.</p>
|
||||
|
||||
<p>Use File > Export to open the export dialog. You have several
|
||||
options:</p>
|
||||
<ul>
|
||||
<li><b>Include only selected lines</b>. This allows you to choose between
|
||||
exporting all or part of a file. If no lines are selected, the entire
|
||||
file will exported. This setting does <b>not</b> affect link generation
|
||||
for HTML output, so you may have some dead internal links if you don't
|
||||
export the entire file.</li>
|
||||
<li><b>Include notes</b>. Notes are normally excluded from generated
|
||||
sources. Check this to include them.</li>
|
||||
<li><b>Show <Column></b>. The leftmost five columns are optional,
|
||||
and will not appear in the output unless the appropriate option is
|
||||
checked.</li>
|
||||
<li><b>Column widths</b>. These determine the minimum widths of the
|
||||
rightmost four columns. These are not hard limits: if the contents
|
||||
of the column are too wide, the next column will start farther over.
|
||||
The widths are not used at all for CSV output.</li>
|
||||
<li><b>Text vs. CSV</b>. For text generation, you can choose between
|
||||
plain text and Comma-Separated Value format. The latter is useful
|
||||
for importing source code into another application, such as a
|
||||
spreadsheet.</li>
|
||||
<li><b>Generate image files</b>. When exporting to HTML, selecting this
|
||||
will cause GIF images to be generated for visualizations.</li>
|
||||
<li><b>Overwrite CSS file</b>. Some aspects of the HTML output's format
|
||||
are defined by a file called "SGStyle.css", which may be shared between
|
||||
multiple HTML files and customized. The file is copied out
|
||||
of the RuntimeData directory without modification. It will be
|
||||
created if it doesn't exist, but will not be overwritten unless this
|
||||
box is checked. The setting is <b>not</b> sticky, and will revert
|
||||
to unchecked. (Think of this as a proactive alternative to "are you
|
||||
sure you wish to overwrite SGStyle.css?")</li>
|
||||
</ul>
|
||||
<p>Once you've picked your options, click either "Generate HTML" or
|
||||
"Generate Text", then select an output file name from the standard file
|
||||
dialog. Any additional files generated, such as graphics for HTML pages,
|
||||
will be written to the same directory.</p>
|
||||
|
||||
<p>All output uses UTF-8 encoding. Filenames of HTML files will have '#'
|
||||
replaced with '_' to make linking easier.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,467 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Editors - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Editors</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
|
||||
<h2><a name="address">Define Address Region</a></h2>
|
||||
|
||||
<p><a href="intro-details.html#address-regions">Address regions</a>
|
||||
may be created, edited, resized, or removed. Which
|
||||
operation is performed depends on the current selection. You can
|
||||
specify the start and end points of a region by selecting the entire
|
||||
region, or by selecting just the first and last lines.</p>
|
||||
<p>In all cases, you can specify the range's initial address
|
||||
as a hexadecimal value. You can prefix it with '$', but that's not
|
||||
required.
|
||||
24-bit addresses may be written with a bank separator, e.g. "12/3456"
|
||||
would resolve to address $123456.
|
||||
If you want to set the region to be non-addressable, enter
|
||||
"<code>NA</code>".</p>
|
||||
|
||||
<p>You can also enter a <a href="intro-details.html#pre-labels">pre-label</a>
|
||||
or specify that the operand should be formatted as a
|
||||
<a href="intro-details.html#relative-addr">relative address</a>.
|
||||
|
||||
<p>To delete a region, click the "Delete Region" button.</p>
|
||||
|
||||
<h4>Create</h4>
|
||||
|
||||
<p>If your selection starts with a code or data line, the editor
|
||||
will allow to create a new address region. If a single line was
|
||||
selected, the default behavior will be to create a region with a
|
||||
floating end point. If multiple lines were selected, the default
|
||||
behavior will be to create a region with a fixed end point.</p>
|
||||
|
||||
<p>The address field will be initialized to the address of the
|
||||
first selected line.</p>
|
||||
|
||||
<p>You can create a child region that shares the same start offset
|
||||
as an existing region by selecting the first code or data line
|
||||
within that region. Note that regions with floating end points cannot
|
||||
have the same start offset as another region.</p>
|
||||
|
||||
<h4>Edit</h4>
|
||||
|
||||
<p>If you select only the address region start line, perhaps by
|
||||
double-clicking the operand there, you will be able to edit the
|
||||
current region's properties.</p>
|
||||
|
||||
<p>If the region has a floating end point, you can choose to convert
|
||||
it to a fixed end. The end doesn't move; it just gets fixed in place.
|
||||
This is a quick way to "lock down" regions once you've established
|
||||
their end points.</p>
|
||||
|
||||
<h4>Resize</h4>
|
||||
|
||||
<p>If you select multiple lines, and the first line is an address
|
||||
region start directive, you will be able to resize that region to
|
||||
the selection. By definition, the updated region will have a fixed
|
||||
end point.</p>
|
||||
|
||||
<h4>Other notes</h4>
|
||||
|
||||
<p>There is no affordance for moving the start offset of a region. You
|
||||
must create a new region and then delete the old one.</p>
|
||||
|
||||
<p>Regions may not "straddle" the start or end points of other regions.</p>
|
||||
|
||||
<p>Double-clicking on the pseudo-opcode of a region start or end
|
||||
declaration will move the selection to the other end, rather than
|
||||
opening the editor.</p>
|
||||
|
||||
<p>To see detailed information about an address region in the "Info"
|
||||
window, select the region start or end directive. You can see the
|
||||
current arrangement of address regions across your entire
|
||||
project with Navigate > View Address Map.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="flags">Override Status Flags</a></h2>
|
||||
|
||||
<p>The state of the processor status flags are tracked for every
|
||||
instruction. Each individual flag is recorded as zero, one, or
|
||||
"indeterminate", meaning it could hold either value at the start of
|
||||
that instruction. You can override the value of individual flags.</p>
|
||||
<p>The 65816 emulation bit, which is not part of the processor status
|
||||
register, may also be set in the editor.</p>
|
||||
<p>The M, X, and E flags will not be editable unless your CPU configuration
|
||||
is set to 65816.</p>
|
||||
|
||||
|
||||
<h2><a name="label">Edit Label</a></h2>
|
||||
<p>Sets or clears a label at the selected offset. The label must have the
|
||||
<a href="intro-details.html#about-symbols">proper form</a>, and not have the same
|
||||
name as another symbol, unless it's specified to be non-unique. If you
|
||||
edit an auto-generated label you will be required to change the name.</p>
|
||||
<p>The label may be marked as non-unique local, unique local, global,
|
||||
or global and exported. The default is global. If you start typing
|
||||
a label with the non-unique label prefix character (usually '@',
|
||||
configurable in
|
||||
<a href="settings.html#appset-displayformat">application settings</a>),
|
||||
the selection will automatically switch to non-unique local.</p>
|
||||
<p>Local labels may be "promoted" to global if the assembler requires it.
|
||||
Most assemblers define local scope as starting clean after each global
|
||||
label, but there are exceptions. If a label's name conflicts or is
|
||||
incompatible with the assembler, it will be renamed.</p>
|
||||
<p>Exported labels are added to a table that may
|
||||
be imported by other projects (see
|
||||
<a href="advanced.html#multi-bin">Working With Multiple Binaries</a>).</p>
|
||||
|
||||
|
||||
<h2><a name="instruction-operand">Edit Operand (Instruction)</a></h2>
|
||||
<p>Operands can be formatted explicitly, or you can let the disassembler
|
||||
select the format for you. By default, immediate constants and
|
||||
addresses with no matching symbol are formatted as hex. Symbols
|
||||
defined as address labels, platform/project symbols, and local
|
||||
variables will be identified and applied automatically.</p>
|
||||
|
||||
<h3><a name="explicit-format">Explicit Formats</a></h3>
|
||||
<p>Operands can be displayed in a variety of numeric formats, or as a
|
||||
symbol. The character formats are only available for operands
|
||||
whose value falls into the proper range. The ASCII format handles
|
||||
both plain and high ASCII; the correct encoding is chosen based on
|
||||
the operand's value.</p>
|
||||
<p>Symbols may be used in their entirety, or, when used as constants,
|
||||
can be shifted and masked.
|
||||
The low / high / bank selector determines which byte is used as the
|
||||
low byte. For 16-bit operands, this acts as a shift rather than a byte
|
||||
select. If the symbol is wider than the operand field, e.g. you're
|
||||
referencing a 16-bit address in an 8-bit constant, a mask will be
|
||||
applied automatically.</p>
|
||||
<p>The editor will try to prevent you from using auto-generated
|
||||
labels and local variables in the symbol field. These types of symbols
|
||||
can be freely renamed by SourceGen, and thus cannot be reliably
|
||||
referenced by name.
|
||||
You can reference a non-unique local by writing it with the non-unique
|
||||
label prefix character (default '@'). Ambiguous non-unique references
|
||||
are not allowed, so if the symbol can't be found the label will
|
||||
be discarded.</p>
|
||||
<p>When you select a non-default format option, a "preview" of the
|
||||
formatted operand will be shown.</p>
|
||||
<p>The <code>MVN</code> and <code>MVP</code> instructions on the 65816
|
||||
are a bit peculiar, because they have two operands rather than one.
|
||||
SourceGen currently only allows you to set one format, which will be
|
||||
applied to both operands. If you specify a symbol, the symbol will
|
||||
be used twice, adjusted if necessary. (This limitation may be addressed
|
||||
in a future release.)</p>
|
||||
<p>The <code>BBR</code> and <code>BBS</code> instructions on the W65C02
|
||||
also have two operands: a direct page address, and a relative branch.
|
||||
In general the direct page address is ignored, so these are treated as
|
||||
branch instructions.</p>
|
||||
|
||||
<p>The bottom part of the window has some shortcuts for working with
|
||||
address references and local variables. These are primarily used to
|
||||
change the way things work when "Default" is selected. The shortcuts
|
||||
don't cause any changes to the recorded format of the instruction
|
||||
being edited. All of the actions can be performed elsewhere, by
|
||||
editing the label at the target address, editing the project symbol
|
||||
set, or editing a local variable table.</p>
|
||||
|
||||
<h3><a name="shortcut-nar">Numeric Address References</a></h3>
|
||||
|
||||
<p>For operands that are 8-bit, 16-bit, or 24-bit addresses, you can
|
||||
define a symbol for the address as a label or
|
||||
<a href="intro-details.html#symbol-types">project symbol</a>.</p>
|
||||
<p>If the operand is an address inside the project, you can set a
|
||||
label at that address. If the address falls in the middle of an
|
||||
instruction or multi-byte data item, its position will be adjusted to
|
||||
the start. Labels may be created, modified, or (by erasing the label)
|
||||
deleted.</p>
|
||||
<p>The label finder does not do the optional search for "nearby" labels
|
||||
that the main analyzer does, so there will be times when an instruction
|
||||
that is shown with a symbol in the code list won't have a symbol
|
||||
in the editor.</p>
|
||||
|
||||
<p>If the operand is an address outside the project, e.g. a ROM
|
||||
address or I/O location, you can define a project symbol. If a
|
||||
match was found in the configured platform definition files, it will be
|
||||
shown; it can't be edited, but it can be overridden by a project symbol.
|
||||
You can create or modify a project symbol by clicking on "Create Project
|
||||
Symbol" or "Edit Project Symbol". You can't delete project symbols
|
||||
from this editor (use Project Properties instead).</p>
|
||||
|
||||
<p>It's possible to have more than one project symbol for the same
|
||||
address. For example, on the Apple II, reading from the memory-mapped
|
||||
I/O address $C000 returns the last key pressed, but writing to it
|
||||
changes the state of the 80-column display hardware, so it's useful to
|
||||
have two different names for it. If more than one project symbol has the
|
||||
same address, the first one found will be used, which may not be
|
||||
what is desired. In such situations, you should create the project
|
||||
symbol and then copy the symbol name into the operand. You can do this
|
||||
in one step by clicking the "Copy to Operand" button.
|
||||
(In most cases you don't want to do this, because if the project
|
||||
symbol is deleted or renamed, you'll have operands that refer to a
|
||||
nonexistent symbol. Unlike labels, project symbol renames do not
|
||||
refactor the rest of the project.)
|
||||
|
||||
<h3><a name="shortcut-local-var">Local Variable References</a></h3>
|
||||
|
||||
<p>For zero-page address operands and (65816-only) stack-relative
|
||||
constant operands, a local variable can be created or modified. This
|
||||
requires that a local variable table has been defined at or before
|
||||
the instruction being edited.</p>
|
||||
<p>If an existing entry is found, you will be able to edit the name
|
||||
and comment fields. If not, a new entry with a generic name and
|
||||
pre-filled value field will be created in the nearest table.</p>
|
||||
|
||||
|
||||
<h2><a name="data-operand">Edit Operand (Data)</a></h2>
|
||||
|
||||
<p>This dialog offers a variety of choices, and can be used to apply a
|
||||
format to multiple lines. You must select all of the bytes you want
|
||||
to format. For example, to format two bytes as a 16-bit word, you must
|
||||
select both bytes in the editor. (If you click on the first item, then
|
||||
Shift+double-click on the operand field of the last item, you can do
|
||||
this very quickly.) The selection does not need to be contiguous: you
|
||||
can use Control+click to select scattered items.</p>
|
||||
<p>If the range is discontiguous, crosses a logical boundary
|
||||
such as a change in address or a user-specified label, or crosses a
|
||||
visual boundary like a long comment, note, or visualization, the selection
|
||||
will be split into smaller regions. A message at the
|
||||
top of the dialog indicates how many bytes have been selected, and how
|
||||
many regions they have been divided into.</p>
|
||||
<p>(End-of-line comments do <i>not</i> split a region, and will
|
||||
disappear if they end up inside a multi-byte data item.)</p>
|
||||
|
||||
<p>The "Simple Data" items behave the same as their equivalents in the
|
||||
Edit Operand dialog. However, because the width is not determined by
|
||||
an instruction opcode, and multiple items can be selected, you will need
|
||||
to specify how wide each item is and what its byte order is. For data
|
||||
you also have the option of setting the format to "Address", which marks
|
||||
the selected bytes as a numeric reference.</p>
|
||||
|
||||
<p>Consider a simple example: suppose you find a table of 16-bit
|
||||
addresses in the code. Click on
|
||||
the first byte, shift-click the last byte, then select the Edit Data menu
|
||||
item. The number of bytes selected should be even. Select
|
||||
"16-bit words, little-endian", then over to the right click on
|
||||
"Address". When you click OK, the selected data will be formatted as a
|
||||
series of 16-bit address values. If the addresses can be resolved inside
|
||||
the data file, each address will be assigned a label.</p>
|
||||
|
||||
<p>The "Bulk Data" items can represent large chunks of data compactly.
|
||||
The "fill" option is only available if all selected bytes have the
|
||||
same value.
|
||||
If a region of bytes is irrelevant, perhaps used only as padding, you
|
||||
can mark it as "junk". If it appears to be adding bytes to reach a
|
||||
power-of-two address boundary, you can designate it as an alignment
|
||||
directive. If you have multiple regions selected, only options that
|
||||
work for all regions will be shown.</p>
|
||||
|
||||
<p>The "String" items are enabled or disabled depending on whether the
|
||||
data you have selected is in the appropriate format. For example,
|
||||
"Null-terminated strings" is only enabled if the data regions are
|
||||
composed entirely of characters followed by $00. Zero-length strings
|
||||
are allowed.
|
||||
DCI (Dextral Character Inverted) strings have the high bit on the last
|
||||
byte flipped; for PETSCII this will usually look like a series of
|
||||
lower-case letters followed by a capital letter, but may look odd if the
|
||||
last character is punctuation (e.g. '!' becomes $A1, which is a
|
||||
rectangle character that SourceGen will only display as hex).</p>
|
||||
<p>The character encoding can be selected, offering a choice between
|
||||
plain ASCII, low + high ASCII, C64 PETSCII, and C64 screen codes. When
|
||||
you change the encoding, your available options may change. The
|
||||
low + high ASCII setting will accept both, configuring the appropriate
|
||||
encoding based on the data values, but when identifying multiple strings
|
||||
it requires that each individual string be entirely one or the other.</p>
|
||||
<p>Due to fundamental limitations of the character set, C64 screen code
|
||||
strings cannot be null terminated ($00 is '@').</p>
|
||||
|
||||
<p>As noted earlier, to avoid burying elements such as labels in the middle
|
||||
of a data item, contiguous areas may be split into smaller regions. This
|
||||
can sometimes have unexpected effects. For example, this can be formatted
|
||||
as two 16-bit words or one 32-bit word:</p>
|
||||
<pre>
|
||||
.DD1 $01
|
||||
.DD1 $ef
|
||||
.DD1 $01
|
||||
.DD1 $f0
|
||||
</pre>
|
||||
|
||||
<p>With a label in the middle, it can be formatted as two 16-bit words, but
|
||||
not as a 32-bit word:</p>
|
||||
<pre>
|
||||
.DD1 $01
|
||||
.DD1 $ef
|
||||
LABEL .DD1 $01
|
||||
.DD1 $f0
|
||||
CODE LDA LABEL
|
||||
</pre>
|
||||
|
||||
<p>If this is undesirable, you can add a label at a 32-bit boundary, and
|
||||
reference that instead:</p>
|
||||
<pre>
|
||||
LABEL .DD1 $01
|
||||
.DD1 $ef
|
||||
.DD1 $01
|
||||
.DD1 $f0
|
||||
CODE LDA LABEL+2
|
||||
</pre>
|
||||
|
||||
<p>With the label out of the way, the data can be formatted as desired.</p>
|
||||
|
||||
|
||||
<h2><a name="comment">Edit Comment</a></h2>
|
||||
<p>Enter an end-of-line (EOL) comment, or leave the text field blank to
|
||||
delete it. EOL comments may be placed on instruction and data lines, but
|
||||
not on assembler directives.</p>
|
||||
<p>It's wise to restrict comments to the ASCII character set, because
|
||||
not all assemblers can accept UTF-8 input. Code generators for such
|
||||
assemblers will convert non-ASCII characters to '?' or something similar.
|
||||
If this isn't a concern, you can enter any characters you like.</p>
|
||||
<p>There is no fixed limit on the number of characters, but you may
|
||||
want to limit the overall length if you're hoping to create 80-column
|
||||
output. Some retro assemblers may have hard line length limitations,
|
||||
which could result in the comment being truncated in generated sources.</p>
|
||||
<p>A semicolon (';') is placed at the start of the comment. If an assembler
|
||||
has different conventions, a different delimiter character may be used. You
|
||||
don't need to include a delimiter explicitly in the comment field.</p>
|
||||
|
||||
<p>Comments on platform symbols are read from the platform symbol file, and
|
||||
cannot be edited from within SourceGen. Comments on project symbols are
|
||||
stored in the project file, and can be edited with the project symbol
|
||||
editor.</p>
|
||||
|
||||
|
||||
<h2><a name="long-comment">Edit Long Comment</a></h2>
|
||||
<p>Long comments can be arbitrarily long and span multiple lines. They
|
||||
will be word-wrapped at a line width of your choosing. They're always
|
||||
drawn with a fixed-width font, so you can create ASCII-art diagrams.
|
||||
Comment delimiters are added automatically at the start of each line.</p>
|
||||
<p>For a true retro look you can "box" the comment with asterisks. You
|
||||
can create a full-width row of asterisks by putting a '*' on a line by
|
||||
itself. (Assembly source generators are allowed to use a character
|
||||
other than '*' for the output, e.g. they might use a full set of
|
||||
box outline characters, though that's somewhat against the spirit of
|
||||
the thing. Regardless, a solo '*' results in a line.)</p>
|
||||
<p>The bottom window will update automatically as you type, showing what
|
||||
the output is expected to look like. The actual assembler source output
|
||||
will depend on features of the target assembler, such as comment
|
||||
delimiter choices and maximum line length limitations. For example,
|
||||
Merlin allows a leading '*' to indicate a comment, while cc65 does not,
|
||||
so cc65 code uses ";*' instead. Because the length limitation affects
|
||||
the length of the line, not just the comment text, an asterisk-boxed
|
||||
comment will have one fewer character per line in cc65 output.</p>
|
||||
|
||||
<p>Clear the text field to delete the comment.</p>
|
||||
<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
|
||||
|
||||
<p>The long comment at the very top of the project is special, as it's
|
||||
not associated with a file offset. If you delete it, you can get it
|
||||
back by using Edit > Edit Header Comment.</p>
|
||||
|
||||
<h2><a name="data-bank">Edit Data Bank (65816 only)</a></h2>
|
||||
|
||||
<p>Sets the Data Bank Register (DBR) value for 65816 code. This is used
|
||||
when matching 16-bit address operands with labels. The new value is
|
||||
in effect from the line where it's declared to the end of the file, even
|
||||
across bank boundaries.
|
||||
If you leave the text field blank, the directive will be removed.</p>
|
||||
<p>A hexadecimal value from $00 to $ff can be entered directly. As
|
||||
with other address inputs, a leading '$' is not required. Entering
|
||||
"K" will set the DBR to the current address, and will automatically
|
||||
update if you change the address to a different bank.</p>
|
||||
<p>The pop-up menu has a list of all banks that hold code or data.
|
||||
To make them easier to identify, each is shown with the label on the
|
||||
first address in the bank, if any.</p>
|
||||
<p>While you can override automatically-generated data bank change
|
||||
directives, you can't remove them individually. You can disable
|
||||
automatic generation by un-checking "smart PLB handling" in the project
|
||||
properties.</p>
|
||||
<p>Because the directive is frequently associated with <code>PLB</code>
|
||||
instructions, double-clicking on a <code>PLB</code> opcode in the
|
||||
code list will open the editor.</p>
|
||||
|
||||
|
||||
<h2><a name="note">Edit Note</a></h2>
|
||||
<p>Notes are similar to long comments, in that they can be arbitrarily
|
||||
long and span multiple lines. However, because they're never included
|
||||
in generated output, options like line width formatting and boxing
|
||||
aren't relevant.</p>
|
||||
<p>Instead, you can select a highlight color for the note to make it
|
||||
stand out. You may want to assign certain colors to specific things,
|
||||
e.g. blue for "I don't know what this is" or green for "this is a
|
||||
bookmark for the really interesting stuff". The color will be applied
|
||||
to the note in the code list and in the "Notes" window.</p>
|
||||
<p>If you don't like the standard colors you can define your own.
|
||||
You can do this with web RGB syntax, which uses a '#' followed by
|
||||
two hex digits per channel. For example, bright red is
|
||||
<code>#ff0000</code>, while teal is <code>#008080</code>. You can
|
||||
also simply type a color name like "violet" so long as it appears in the
|
||||
<a href="https://docs.microsoft.com/en-us/dotnet/media/art-color-table.png?view=netframework-4.8">list of Microsoft .NET colors</a>.</p>
|
||||
|
||||
<p>Clear the text field to delete the note.</p>
|
||||
<p>You can use Ctrl+Enter as a keyboard shortcut for "OK".</p>
|
||||
|
||||
|
||||
<h2><a name="project-symbol">Edit Project Symbol</a></h2>
|
||||
<p>This is used to edit the properties of a project symbol.</p>
|
||||
<p>Symbols marked as "address" will be applied automatically when an
|
||||
operand references an address outside the scope of the data file. They
|
||||
will not be applied to addresses inside the data file. Symbols
|
||||
marked as "constant" are not applied automatically, and must be
|
||||
explicitly specified as an operand.</p>
|
||||
<p>The label must meet the criteria for symbols (see
|
||||
<a href="intro-details.html#about-symbols">All About Symbols</a>), and must
|
||||
not have the same name as another project symbol. It can overlap
|
||||
with platform symbols and user labels.</p>
|
||||
<p>The value may be entered in decimal, hexadecimal, or binary. The numeric
|
||||
base you choose will be remembered, so that the value will be displayed
|
||||
the same way when used in a .EQ directive.</p>
|
||||
<p>You can optionally provide a width for address symbols. For example,
|
||||
if the address is of a two-byte pointer or a 64-byte buffer, you would
|
||||
set the width field to cause all references to any location in that range
|
||||
to be set to the symbol. Widths may be entered in hex or decimal. If
|
||||
the field is left blank, a width of 1 is assumed. Overlapping symbols
|
||||
are allowed. The width is ignored for constants.</p>
|
||||
<p>If you enter a comment, it will be placed at the end of the line of
|
||||
the .EQ directive.</p>
|
||||
<p>For address symbols that represent a memory-mapped I/O location, it
|
||||
can be useful to have different symbols for reads and writes. Use
|
||||
the Read/Write checkboxes to specify the desired behavior.</p>
|
||||
|
||||
|
||||
<h2><a name="lvtable">Create/Edit Local Variable Table</a></h2>
|
||||
<p><a href="intro-details.html#local-vars">Local variables</a> are arranged in
|
||||
tables, which are created at a specific file offset. They must be
|
||||
associated with a line of code, and are usually placed at the start of
|
||||
a subroutine.
|
||||
The "Create Local Variable Table" action creates a new table, and
|
||||
opens the editor. The "Edit Prior Local Variable Table" searches
|
||||
for the closest table that appears at or before the selected line,
|
||||
and edits that.</p>
|
||||
<p>The editor allows you to create, edit, and delete entries, as well
|
||||
as move and delete entire tables (though these last two options are not
|
||||
available when creating a new table). Empty tables are allowed. These
|
||||
can be useful if the "clear previous" flag is set. If you want to
|
||||
delete the table, click the "Delete Table" button.</p>
|
||||
<p>Use the buttons to add, edit, or remove individual variables. Each
|
||||
variable has a name, a value, a width, and an optional comment. The
|
||||
standard naming rules for symbols apply. Variables are only used for
|
||||
zero-page and stack-relative operands, so all values must fall in the
|
||||
range 0-255. The width may extend one byte past the end (to address $0100)
|
||||
to allow 16-bit accesses to $ff (particularly useful on 65816).</p>
|
||||
<p>You can move a table to any offset that is the start of an instruction
|
||||
and doesn't already have a local variable table present. Click the
|
||||
"Move Table" button and enter the new offset in hex. You can also click
|
||||
on the up/down buttons to move to the next valid offset.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,86 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>End notes - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: End Notes</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="origins">Origins</a></h2>
|
||||
<p>The inspiration for SourceGen goes a long way back. While in high
|
||||
school in the late 1980s, I read Don Lancaster's
|
||||
<i>Enhancing Your Apple II, Vol. 1</i> (available for download
|
||||
<a href="https://www.tinaja.com/ebksamp1.shtml">here</a>). This
|
||||
included a very detailed methodology for disassembling 6502 software
|
||||
(nicely reformatted
|
||||
<a href="https://www.tinaja.com/ebooks/tearing_rework.pdf">here</a>).
|
||||
I wanted to give it a try, so I generated a monitor listing of an
|
||||
operating system (called "RDOS") that SSI used on their games, and
|
||||
printed it out on my Epson RX-80 -- tractor feed paper was helpful for
|
||||
this sort of thing -- then set to work.</p>
|
||||
|
||||
<p>Lancaster's methodology involved highlighting different types of
|
||||
instructions with different colors, making notes, and adding labels.
|
||||
All this being done with felt-tip and colored highlighter pens. The
|
||||
process worked remarkably well: by the time I was finished marking
|
||||
things up, I knew how everything in the code worked.</p>
|
||||
|
||||
<p>I really wanted a better system though. The disassembler built into
|
||||
the Apple II could get out of sync when it walked through a data area,
|
||||
so sometimes you had to hand-write in the correct instruction. Applying
|
||||
a label to every place that referenced it was tedious. When you got to
|
||||
the end, you had a colorful print out, but you can't run that through
|
||||
an assembler.</p>
|
||||
|
||||
<p>There were commercially-available disassemblers that generated source
|
||||
code and removed some of the tedium from the process, and for many tasks
|
||||
they solved the problem nicely. What I really wanted, though, looked more
|
||||
like a modern IDE, because I didn't just want it to translate machine code
|
||||
into readable form. I wanted it to help me with the process of
|
||||
understanding the code, by providing cross-reference tables and symbol
|
||||
lists and giving me a place to scribble notes to myself while I worked.
|
||||
I especially wanted the note-scribbling, because learning how something
|
||||
works is usually an iterative process, where the function of a chunk of
|
||||
code gradually reveals itself over time.</p>
|
||||
|
||||
<p>In 2002, while writing the 6502/65816 disassembler for CiderPress, I
|
||||
ran into the same problems I had with the original Apple II monitor: it
|
||||
blundered through data sections and got lost briefly when a new code
|
||||
section started. You had to pick long or short registers for the entire
|
||||
diassembly, which made 65816 code something of a disaster. I
|
||||
jotted down some notes on what I thought the core features of a good
|
||||
6502 disassembler should be, then moved on to work on other features. It
|
||||
was another 15 years before I picked up the idea again.</p>
|
||||
|
||||
<p>More recently, I disassembled some code by dumping it to a text
|
||||
file with CiderPress and then fiddling with it in a text editor. I could
|
||||
leave free-form notes, but when I found some code that I wanted to
|
||||
exercise a bit I realized that getting it into an assembler was going
|
||||
to take some effort. Raw addresses needed to be converted to labels,
|
||||
the address and byte dump in the left column needed to be stripped out --
|
||||
really just some basic text and string replace operations, but tedious
|
||||
to do by hand.</p>
|
||||
|
||||
<p>The original design for SourceGen was substantially less feature-rich
|
||||
than the final result. I kept discovering opportunities for features
|
||||
that I wanted to have, or at least wanted to write. The result is
|
||||
something of a monument to creeping featurism. Hopefully the core features
|
||||
are solid enough to excuse the excesses.</p>
|
||||
|
||||
<p>-- Andy McFadden, September 2018</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,214 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Contents - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen Reference Manual</h1>
|
||||
<p>SourceGen is an interactive disassembler for 6502, 65C02,
|
||||
and 65816 code. The official web site is
|
||||
<a href="https://6502bench.com/">https://6502bench.com/</a>.</p>
|
||||
|
||||
<p>If you want to get up to speed quickly, start with the
|
||||
<a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
|
||||
|
||||
<h2>Contents</h2>
|
||||
<ul>
|
||||
<li><a href="intro.html">Overview</a>
|
||||
<ul>
|
||||
<li><a href="intro.html#fundamental-concepts">Fundamentals</a></li>
|
||||
<ul>
|
||||
<li><a href="intro.html#begin">About 6502 Code</a>
|
||||
<li><a href="intro.html#charenc">Character Encoding</a></li>
|
||||
<li><a href="intro.html#sgconcepts">SourceGen Concepts</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro.html#sgintro">How SourceGen Works</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html">Digging Deeper</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#about-symbols">All About Symbols</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#connecting-operands">Connecting Operands With Labels</a></li>
|
||||
<li><a href="intro-details.html#internal-address-symbols">Internal Address Symbols</a></li>
|
||||
<li><a href="intro-details.html#external-address-symbols">External Address Symbols</a></li>
|
||||
<li><a href="intro-details.html#unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></li>
|
||||
<li><a href="intro-details.html#weak-refs">Weak Symbolic References</a></li>
|
||||
<li><a href="intro-details.html#symbol-parts">Parts and Adjustments</a></li>
|
||||
<li><a href="intro-details.html#nearby-targets">Automatic Use of Nearby Targets</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html#width-disambiguation">Width Disambiguation</a></li>
|
||||
<li><a href="intro-details.html#address-regions">Address Regions</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#fixed-float">Fixed vs. Floating</a></li>
|
||||
<li><a href="intro-details.html#non-addr">Non-Addressable Areas</a></li>
|
||||
<li><a href="intro-details.html#pre-labels">Pre-Labels</a></li>
|
||||
<li><a href="intro-details.html#relative-addr">Relative Addressing</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
|
||||
<li><a href="intro-details.html#atags">Directing the Code Analyzer</a>
|
||||
<ul>
|
||||
<li><a href="intro-details.html#scripts">Extension Scripts</a></li>
|
||||
</ul></li>
|
||||
<li><a href="intro-details.html#pseudo-ops">Data and Directive Pseudo-Opcodes</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="mainwin.html">Using SourceGen</a>
|
||||
<ul>
|
||||
<li><a href="mainwin.html#starting-new">Starting a New Project</a></li>
|
||||
<li><a href="mainwin.html#opening">Opening an Existing Project</a></li>
|
||||
<li><a href="mainwin.html#working">Working With a Project</a>
|
||||
<ul>
|
||||
<li><a href="mainwin.html#code-list">Code List</a></li>
|
||||
<li><a href="mainwin.html#undo">Undo & Redo</a></li>
|
||||
<li><a href="mainwin.html#references">References Window</a></li>
|
||||
<li><a href="mainwin.html#notes">Notes Window</a></li>
|
||||
<li><a href="mainwin.html#symbols">Symbols Window</a></li>
|
||||
<li><a href="mainwin.html#info">Info Window</a></li>
|
||||
<li><a href="mainwin.html#messages">Messages Window</a></li>
|
||||
<li><a href="mainwin.html#navigation">Navigation</a></li>
|
||||
<li><a href="mainwin.html#atags">Adding and Removing Analyzer Tags</a></li>
|
||||
<li><a href="mainwin.html#address-table">Format Address Table</a></li>
|
||||
<li><a href="mainwin.html#toggle-single">Toggle Single-Byte Format</a></li>
|
||||
<li><a href="mainwin.html#format-as-word">Format As Word</a></li>
|
||||
<li><a href="mainwin.html#toggle-data">Toggle Data Scan</a></li>
|
||||
<li><a href="mainwin.html#clipboard">Copying to Clipboard</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="editors.html">Editors</a>
|
||||
<ul>
|
||||
<li><a href="editors.html#address">Define Address Region<a></li>
|
||||
<li><a href="editors.html#flags">Override Status Flags</a></li>
|
||||
<li><a href="editors.html#label">Edit Label</a></li>
|
||||
<li><a href="editors.html#instruction-operand">Edit Operand (Instruction)</a>
|
||||
<ul>
|
||||
<li><a href="editors.html#explicit-format">Explicit Formats</a></li>
|
||||
<li><a href="editors.html#shortcut-nar">Numeric Address References</a></li>
|
||||
<li><a href="editors.html#shortcut-local-var">Local Variable References</a></li>
|
||||
</ul></li>
|
||||
<li><a href="editors.html#data-operand">Edit Operand (Data)</a></li>
|
||||
<li><a href="editors.html#comment">Edit Comment</a></li>
|
||||
<li><a href="editors.html#long-comment">Edit Long Comment</a></li>
|
||||
<li><a href="editors.html#data-bank">Edit Data Bank (65816 only)</a></li>
|
||||
<li><a href="editors.html#note">Edit Note</a></li>
|
||||
<li><a href="editors.html#project-symbol">Edit Project Symbol</a></li>
|
||||
<li><a href="editors.html#lvtable">Create / Edit Local Variable Table</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="visualization.html">Visualizations</a>
|
||||
<ul>
|
||||
<li><a href="visualization.html#overview">Overview</a></li>
|
||||
<li><a href="visualization.html#vis-and-sets">Visualizations and Visualization Sets</a></li>
|
||||
<li><a href="visualization.html#runtime">Scripts Included with SourceGen</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="codegen.html">Code Generation & Assembly</a>
|
||||
<ul>
|
||||
<li><a href="codegen.html#generate">Generating Source Code</a>
|
||||
<ul>
|
||||
<li><a href="codegen.html#localizer">Label Localizer</a></li>
|
||||
<li><a href="codegen.html#reserved-labels">Reserved Label Names</a></li>
|
||||
<li><a href="codegen.html#platform-features">Platform-Specific Features</a></li>
|
||||
</ul></li>
|
||||
<li><a href="codegen.html#assemble">Cross-Assembling Generated Code</a></li>
|
||||
<li><a href="codegen.html#supported">Supported Assemblers</a>
|
||||
<ul>
|
||||
<li><a href="codegen.html#version">Version-Specific Code Generation</a></li>
|
||||
<li><a href="codegen.html#quirks">Assembler-Specific Bugs & Quirks</a>
|
||||
<ul>
|
||||
<li><a href="codegen.html#64tass">64tass</a></li>
|
||||
<li><a href="codegen.html#acme">ACME</a></li>
|
||||
<li><a href="codegen.html#cc65">cc65</a></li>
|
||||
<li><a href="codegen.html#merlin32">Merlin 32</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
<li><a href="codegen.html#export-source">Exporting Source Code</a>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="settings.html">Properties & Settings</a>
|
||||
<ul>
|
||||
<li><a href="settings.html#app-settings">Application Settings</a>
|
||||
<ul>
|
||||
<li><a href="settings.html#appset-codeview">Code View</a></li>
|
||||
<li><a href="settings.html#appset-textdelim">Text Delimiters</a></li>
|
||||
<li><a href="settings.html#appset-asmconfig">Asm Config</a></li>
|
||||
<li><a href="settings.html#appset-displayformat">Display Format</a></li>
|
||||
<li><a href="settings.html#appset-pseudoop">Pseudo-Op</a></li>
|
||||
</ul></li>
|
||||
<li><a href="settings.html#project-properties">Project Properties</a>
|
||||
<ul>
|
||||
<li><a href="settings.html#projprop-general">General</a></li>
|
||||
<li><a href="settings.html#projprop-projsym">Project Symbols</a></li>
|
||||
<li><a href="settings.html#projprop-symfiles">Symbol Files</a></li>
|
||||
<li><a href="settings.html#projprop-extscripts">Extension Scripts</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="tools.html">Tools</a>
|
||||
<ul>
|
||||
<li><a href="tools.html#instruction-chart">Instruction Chart</a></li>
|
||||
<li><a href="tools.html#ascii-chart">ASCII Chart</a></li>
|
||||
<li><a href="tools.html#apple2-screen-chart">Apple II Screen Chart</a></li>
|
||||
<li><a href="tools.html#hexdump">Hex Dump Viewer</a></li>
|
||||
<li><a href="tools.html#file-concat">File Concatenator</a></li>
|
||||
<li><a href="tools.html#file-slicer">File Slicer</a></li>
|
||||
<li><a href="tools.html#omf-converter">OMF Converter</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="advanced.html">Advanced Topics</a>
|
||||
<ul>
|
||||
<li><a href="advanced.html#platform-symbols">Platform Symbol Files (.sym65)</a></li>
|
||||
<li><a href="advanced.html#extension-scripts">Extension Scripts</a></li>
|
||||
<li><a href="advanced.html#multi-bin">Working With Multiple Binaries</a></li>
|
||||
<li><a href="advanced.html#overlap">Overlapping Address Spaces</a></li>
|
||||
<li><a href="advanced.html#reloc-data">OMF Relocation Dictionaries</a></li>
|
||||
<li><a href="advanced.html#debug">Debug Menu Options</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="analysis.html">Appendix: Instruction and Data Analysis</a>
|
||||
<ul>
|
||||
<li><a href="analysis.html#analysis-process">Analysis Process</a>
|
||||
<ul>
|
||||
<li><a href="analysis.html#auto-format">Automatic Formatting</a></li>
|
||||
<li><a href="analysis.html#undo-redo">Interaction With Undo/Redo</a></li>
|
||||
</ul></li>
|
||||
<li><a href="analysis.html#code-analysis">Code Analysis</a>
|
||||
<ul>
|
||||
<li><a href="analysis.html#extension-scripts">Extension Scripts</a></li>
|
||||
</ul></li>
|
||||
<li><a href="analysis.html#data-analysis">Data Analysis</a></li>
|
||||
</ul></li>
|
||||
|
||||
<li><a href="end-notes.html">End Notes</a> </li>
|
||||
|
||||
<br/>
|
||||
|
||||
<!--
|
||||
<li><a href="tutorials.html">Tutorials</a>
|
||||
<ul>
|
||||
<li><a href="tutorials.html#basic-features">Tutorial #1: Basic Features</a></li>
|
||||
<li><a href="tutorials.html#advanced-features">Tutorial #2: Advanced Features</a></li>
|
||||
<li><a href="tutorials.html#address-tables">Tutorial #3: Address Table Formatting</a></li>
|
||||
<li><a href="tutorials.html#extension-scripts">Tutorial #4: Extension Scripts</a></li>
|
||||
<li><a href="tutorials.html#visualizations">Tutorial #5: Visualizations</a></li>
|
||||
</ul></li>
|
||||
-->
|
||||
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<hr/>
|
||||
<p>Copyright 2020 faddenSoft</p>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,958 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>More Details - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Intro Details</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="more-details">More Details</a></h2>
|
||||
|
||||
<p>This section digs a little deeper into how SourceGen works.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="about-symbols">All About Symbols</a></h2>
|
||||
|
||||
<p>A symbol has two essential parts, a label and a value. The label is a short
|
||||
ASCII string; the value may be an 8-to-24-bit address or a 32-bit numeric
|
||||
constant. Symbols can be defined in different ways, and applied in
|
||||
different ways.</p>
|
||||
|
||||
<p>The label syntax is restricted to a format that should be compatible
|
||||
with most assemblers:</p>
|
||||
<ul>
|
||||
<li>2-32 characters long.</li>
|
||||
<li>Starts with a letter or underscore.</li>
|
||||
<li>Comprised of ASCII letters, numbers, and the underscore.</li>
|
||||
</ul>
|
||||
<p>Label comparisons are case-sensitive, as is customary for programming
|
||||
languages.</p>
|
||||
<p>Sometimes the purpose of a subroutine or variable isn't immediately
|
||||
clear, but you can take a reasonable guess. You can document your
|
||||
uncertainty by adding a question mark ('?') to the end of the label.
|
||||
This isn't really part of the label, so it won't appear in the assembled
|
||||
output, and you don't have to include it when searching for a symbol.</p>
|
||||
<p>Some assemblers restrict the set of valid labels further. For example,
|
||||
64tass uses a leading underscore to indicate a local label, and reserves
|
||||
a double leading underscore (e.g. <code>__label</code>) for its own
|
||||
purposes. In such cases, the label will be modified to comply with the
|
||||
target assembler syntax.</p>
|
||||
|
||||
<p>Operands may use parts of symbols. For example, if you have a label
|
||||
<code>MYSTRING</code>, you can write:</p>
|
||||
<pre>
|
||||
MYSTRING .STR "hello"
|
||||
LDA #<MYSTRING
|
||||
STA $00
|
||||
LDA #>MYSTRING
|
||||
STA $01
|
||||
</pre>
|
||||
<p>See <a href="#symbol-parts">Parts and Adjustments</a> for more details.</p>
|
||||
|
||||
<p>Symbols that represent a memory address within a project are treated
|
||||
differently from those outside a project. We refer to these as internal
|
||||
and external addresses, respectively.</p>
|
||||
|
||||
|
||||
<h3><a name="connecting-operands">Connecting Operands with Labels</a></h3>
|
||||
|
||||
<p>Suppose you have the following code:</p>
|
||||
<pre>
|
||||
LDA $1234
|
||||
JSR $2345
|
||||
</pre>
|
||||
<p>If we put that in a source file, it will assemble correctly.
|
||||
However, if those addresses are part of the file, the code may break if
|
||||
changes are made and things assemble to different addresses. It would
|
||||
be better to generate code that references labels, e.g.:</p>
|
||||
<pre>
|
||||
LDA my_data
|
||||
JSR nifty_func
|
||||
</pre>
|
||||
<p>SourceGen tries to establish labels for address operands automatically.
|
||||
How this works depends on whether the operand's address is inside the file or
|
||||
external, and whether there are existing labels at or near the target
|
||||
address. The details are explored in the next few sections.</p>
|
||||
<p>On the 65816 this process is trickier, because addresses are 24 bits
|
||||
instead of 16. For a control-transfer instruction like <code>JSR</code>,
|
||||
the high 8 bits come from the Program Bank Register (K). For a data-access
|
||||
instruction like <code>LDA</code>, the high 8 bits come from the Data
|
||||
Bank Register (B). The PBR value is determined by the address in which
|
||||
the code is executing, so it's easy to determine. The DBR value can be
|
||||
set arbitrarily. Sometimes it's easy to figure out, sometimes it has
|
||||
to be specified manually.</p>
|
||||
|
||||
|
||||
<h3><a name="internal-address-symbols">Internal Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address inside the file being disassembled
|
||||
are referred to as <i>internal</i>. They come in two varieties.</p>
|
||||
|
||||
<p><b>User labels</b> are labels added to instructions or data by the user.
|
||||
The editor will try to prevent you from creating a label that has the same
|
||||
name as another symbol, but if you manage to do so, the user label takes
|
||||
precedence over symbols from other sources. User labels may be tagged
|
||||
as non-unique local, unique local, global, or global and exported. Local
|
||||
vs. global is important for the label localizer, while exported symbols
|
||||
can be pulled directly into other projects.</p>
|
||||
|
||||
<p><b>Auto labels</b> are automatically generated labels placed on
|
||||
instructions or data offsets that are the target of operands. They're
|
||||
formed by appending the hexadecimal address to the letter "L", with
|
||||
additional characters added if some other symbol has already defined
|
||||
that label. Options can be set that change the "L" to a character or
|
||||
characters based on how the label is referenced, e.g. "B" for branch targets.
|
||||
Auto labels are only added where they are needed, and are removed when
|
||||
no longer necessary. Because auto labels may be renamed or vanish, the
|
||||
editor will try to prevent you from referring to them explicitly when
|
||||
editing operands.</p>
|
||||
|
||||
|
||||
<h3><a name="external-address-symbols">External Address Symbols</a></h3>
|
||||
|
||||
<p>Symbols that represent an address outside the file being disassembled
|
||||
are referred to as <i>external</i>. These may be ROM entry points,
|
||||
data buffers, zero-page variables, or a number of other things. Because
|
||||
the memory address they appear at aren't within the bounds of the file,
|
||||
we can't simply put an address label on them. Three different mechanisms
|
||||
exist for defining them. If an instruction or data operand refers to
|
||||
an address outside the file bounds, SourceGen looks for a symbol with
|
||||
a matching address value.</p>
|
||||
|
||||
<p><b>Platform symbols</b> are defined in platform symbol files. These
|
||||
are named with a ".sym65" extension, and have a fairly straightforward
|
||||
name/value syntax. Several files for popular platforms come with SourceGen
|
||||
and live in the <code>RuntimeData</code> directory. You can also create your
|
||||
own, but they have to live in the same directory as the project file.</p>
|
||||
|
||||
<p>Platform symbols can be addresses or constants. Addresses are
|
||||
limited to 24-bit values, and are matched automatically. Constants may
|
||||
be 32-bit values, but must be specified manually.</p>
|
||||
|
||||
<p>If two platform symbols have the same label, only the most recently read
|
||||
one is kept. If two platform symbols have different labels but the
|
||||
same value, both symbols will be kept, but the one in the file loaded
|
||||
last will take priority when doing a lookup by address. If symbols with
|
||||
the same value are defined in the same file, the one whose symbol appears
|
||||
first alphabetically takes priority.</p>
|
||||
|
||||
<p>Platform address symbols have an optional width. This can be used
|
||||
to define multi-byte items, such as two-byte pointers or 256-byte stacks.
|
||||
If no width is specified, a default value of 1 is used. Widths are ignored
|
||||
for constants.
|
||||
Overlapping symbols are resolved as described earlier, with symbols loaded
|
||||
later taking priority over previously-loaded symbols. In addition,
|
||||
symbols defined closer to the target address take priority, so if you put
|
||||
a 4-byte symbol in the middle of a 256-byte symbol, the 4-byte symbol will
|
||||
be visible because the start point is closer to the addresses it covers
|
||||
than the start of the 256-byte range.</p>
|
||||
|
||||
<p>Platform symbols can be designated for reading, writing, or both.
|
||||
Normally you'd want both, but if an address is a memory-mapped I/O
|
||||
location that has different behavior for reads and writes, you'd want
|
||||
to define two different symbols, and have the correct one applied
|
||||
based on the access type.</p>
|
||||
|
||||
<p><b>Project symbols</b> behave like platform symbols, but they are
|
||||
defined in the project file itself, through the Project Properties editor.
|
||||
The editor will try to prevent you from creating two symbols with the same
|
||||
name. If two symbols have the same value, the one whose label comes
|
||||
first alphabetically is used.</p>
|
||||
|
||||
<p>Project symbols always have precedence over platform symbols, allowing
|
||||
you to redefine symbols within a project. (You can "hide" a platform
|
||||
symbol by creating a project symbol constant with the same name. Use a
|
||||
value like $ffffffff or $deadbeef so you'll know why it's there.)</p>
|
||||
|
||||
<p><b>Address region pre-labels</b> are an oddity: they're external
|
||||
address symbols that also act like user labels. These are explained
|
||||
in more detail <a href="#pre-labels">later</a>.</p>
|
||||
|
||||
<p><b>Local variables</b> are redefinable symbols that are organized
|
||||
into tables. They're used to specify labels for zero-page addresses
|
||||
and 65816 stack-relative instructions. These are explained in more
|
||||
detail in the next section.</p>
|
||||
|
||||
|
||||
<h4><a name="local-vars">How Local Variables Work</a></h4>
|
||||
|
||||
<p>Local variables are applied to instructions that have zero
|
||||
page operands (<code>op ZP</code>, <code>op (ZP),Y</code>, etc.), or
|
||||
65816 stack relative operands
|
||||
(<code>op OFF,S</code> or <code>op (OFF,S),Y</code>). While they must be
|
||||
unique relative to other kinds of labels, they don't have to be unique
|
||||
with respect to earlier variable definitions. So you can define
|
||||
<code>TMP .EQ $10</code>, and a few lines later define
|
||||
<code>TMP .EQ $20</code>. This is handy because zero-page addresses are
|
||||
often used in different ways by different parts of the program. For
|
||||
example:</p>
|
||||
<pre>
|
||||
LDA ($00),Y
|
||||
INC $02
|
||||
... elsewhere ...
|
||||
DEC $00
|
||||
STA ($01),Y
|
||||
</pre>
|
||||
<p>If we had given <code>$00</code> the label <code>PTR</code> and
|
||||
<code>$02</code> the label <code>COUNT</code> globally,
|
||||
the second pair of instructions would look all wrong. With local
|
||||
variable tables you can set <code>PTR=$00 COUNT=$02</code> for the first chunk,
|
||||
and <code>COUNT=$00 PTR=$01</code> for the second chunk.</p>
|
||||
|
||||
<p>Local variables have a value and a width. If we create a pair of
|
||||
variable definitions like this:</p>
|
||||
<pre>
|
||||
PTR .eq $00 ;2 bytes
|
||||
COUNT .eq $02 ;1 byte
|
||||
</pre>
|
||||
<p>Then this:</p>
|
||||
<pre>
|
||||
STA $00
|
||||
STX $01
|
||||
LDY $02
|
||||
</pre>
|
||||
<p>Would become:</p>
|
||||
<pre>
|
||||
STA PTR
|
||||
STX PTR+1
|
||||
LDY COUNT
|
||||
</pre>
|
||||
|
||||
<p>The scope of a variable definition starts at the point where it is
|
||||
defined, and stops when its definition is erased. There are three
|
||||
ways for a table to erase an earlier definition:</p>
|
||||
<ol>
|
||||
<li>Create a new definition with the same name.</li>
|
||||
<li>Create a new definition that has an overlapping value. For
|
||||
example, if you have a two-byte variable <code>PTR = $00</code>,
|
||||
and define a one-byte variable <code>COUNT = $01</code>, the
|
||||
definition for <code>PTR</code> will be cleared because its second
|
||||
byte overlaps.</li>
|
||||
<li>Tables have a "clear previous" flag that erases all previous
|
||||
definitions. This doesn't usually cause anything to be generated in the
|
||||
assembly sources; instead, it just causes SourceGen to stop using
|
||||
that label.</li>
|
||||
</ol>
|
||||
<p>As you might expect, you're not allowed to have duplicate labels or
|
||||
overlapping values in an individual table.</p>
|
||||
<p>If a platform/project symbol has the same value as a local variable,
|
||||
the local variable is used. If the local variable definition is cleared,
|
||||
use of the platform/project symbol will resume.</p>
|
||||
<p>Not all assemblers support redefinable variables. In those cases,
|
||||
the symbol names will be modified to be unique (e.g. the second definition
|
||||
of <code>PTR</code> becomes <code>PTR_1</code>), and variables will have
|
||||
global scope.</p>
|
||||
|
||||
|
||||
<h3><a name="unique-local-global">Unique vs. Non-Unique and Local vs. Global</a></h3>
|
||||
|
||||
<p>Most assemblers have a notion of "local" labels, which have a scope
|
||||
that is book-ended by global labels. These are handy for generic branch
|
||||
target names like "loop" or "notzero" that you might want to use in
|
||||
multiple places. The exact definition of local variable scope varies
|
||||
between assemblers, so labels that you want to be local might have to
|
||||
be promoted to global (and probably renamed).</p>
|
||||
<p>SourceGen has a similar concept with a slight twist: they're called
|
||||
non-unique labels, because the goal is to be able to use the same
|
||||
label in more than one place. Whether or not they actually turn out
|
||||
to be local is a decision deferred to assembly source generation time.
|
||||
(You can also declare a label to be a unique local if you like; the
|
||||
auto-generated labels like "L1234" do this.)</p>
|
||||
<p>When you're writing code for an assembler, it has to be unambiguous,
|
||||
because the assembler can't guess at what the output should be. For a
|
||||
disassembler, the output is known, so a greater degree of ambiguity is
|
||||
tolerable. Instead of throwing errors and refusing to continue, the
|
||||
source generator can modify the output until it works. For example:<p>
|
||||
<pre>
|
||||
@LOOP LDX #$02
|
||||
@LOOP DEX
|
||||
BNE @LOOP
|
||||
DEY
|
||||
BNE @LOOP
|
||||
</pre>
|
||||
<p>This would confuse an assembler. SourceGen already knows which @LOOP
|
||||
is being branched to, so it can just rename one of them to "@LOOP1".</p>
|
||||
<p>One situation where non-unique labels cause difficulty is with
|
||||
weak symbolic references (see next section). For example, suppose
|
||||
the above code then did this:</p>
|
||||
<pre>
|
||||
LDA #<@LOOP
|
||||
</pre>
|
||||
<p>While it's possible to make an educated guess at which @LOOP was
|
||||
meant, it's easy to get wrong. In situations like this, it's best to
|
||||
give the labels different names.</p>
|
||||
|
||||
|
||||
<h3><a name="weak-refs">Weak Symbolic References</a></h3>
|
||||
|
||||
<p>Symbolic references in operands are "weak references". If the named
|
||||
symbol exists, the reference is used. If the symbol can't be found, the
|
||||
operand is formatted in hex instead. They're called "weak" because
|
||||
failing to resolve the reference isn't considered an error.</p>
|
||||
|
||||
<p>It's important to know this when editing a project. Consider the
|
||||
following trivial chunk of code:</p>
|
||||
|
||||
<pre>
|
||||
1000: 4c0310 JMP $1003
|
||||
1003: ea NOP
|
||||
</pre>
|
||||
|
||||
<p>When you load it into SourceGen, it will be formatted like this:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1003
|
||||
L1003 NOP
|
||||
</pre>
|
||||
|
||||
<p>The analyzer found the JMP operand, and created an auto label for
|
||||
address $1003. It then created a weak reference to "L1003" in the JMP
|
||||
operand.</p>
|
||||
|
||||
<p>If you edit the JMP instruction's operand to use the symbol "FOO", the
|
||||
results are probably not what you want:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP $1003
|
||||
NOP
|
||||
</pre>
|
||||
|
||||
<p>This happened because you added a weak reference to "FOO" in the operand,
|
||||
but the label doesn't exist. The operand is formatted as hex. Because
|
||||
there's no longer a reference to L1003, SourceGen removed the auto-label
|
||||
as well.</p>
|
||||
|
||||
<p>If you set the label "FOO" on the NOP instruction, you'll see what you
|
||||
probably wanted:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP FOO
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>You don't actually need the explicit reference in the JMP instruction.
|
||||
If you edit the JMP operand and set it back to "Default", the code will
|
||||
still look the same. This is because SourceGen identified the numeric
|
||||
reference, and automatically added a symbolic reference to the label on
|
||||
the NOP instruction.</p>
|
||||
|
||||
<p>However, suppose you didn't actually want FOO as the operand label.
|
||||
You can create a project symbol, BAR with the value $1003, and then edit
|
||||
the operand to reference BAR instead. Your code would then look like:</p>
|
||||
<pre>
|
||||
BAR .EQ $1003
|
||||
.ADDRS $1000
|
||||
JMP BAR
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you change the value of BAR in the project symbol file, the operand
|
||||
will continue to refer to it, but with an adjustment. For example, if
|
||||
you changed BAR from $1003 to $1007, the code would become:</p>
|
||||
<pre>
|
||||
BAR .EQ $1007
|
||||
.ADDRS $1000
|
||||
JMP BAR-4
|
||||
FOO NOP
|
||||
</pre>
|
||||
|
||||
<p>If you rename a label, all references to that label are updated. For
|
||||
numeric references that happens implicitly. For explicit operand
|
||||
references, the weak references are updated individually. (Modern IDEs
|
||||
call this "refactoring".)</p>
|
||||
<p>If you remove a label, all of the numeric references to it will
|
||||
reference something else, probably a new auto label. Weak references
|
||||
to the symbol will break and be formatted as hex, but will not be
|
||||
removed. Similarly, removing symbols from a platform or project file
|
||||
will break the reference but won't modify the operands.</p>
|
||||
|
||||
<h3><a name="symbol-parts">Parts and Adjustments</a></h3>
|
||||
|
||||
<p>Sometimes you want to use part of a label, or adjust the value slightly.
|
||||
(I use "adjustment" rather than "offset" to avoid confusing it with file
|
||||
offsets.) Consider the following example:</p>
|
||||
<pre>
|
||||
1000: a910 LDA #$10
|
||||
1002: 48 PHA
|
||||
1003: a906 LDA #$06
|
||||
1005: 48 PHA
|
||||
1006: 60 RTS
|
||||
1007: 4c3aff JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>This pushes the address of the JMP instruction ($1007) onto the stack,
|
||||
and jumps to it with the RTS instruction. However, RTS requires the
|
||||
address of the byte before the target instruction, so we actually push
|
||||
$1006.</p>
|
||||
|
||||
<p>The disassembler won't know that offset $1007 is code because nothing
|
||||
appears to reference it. After tagging $1007 as a code start point, the
|
||||
project looks like this:</p>
|
||||
<pre>
|
||||
LDA #$10
|
||||
PHA
|
||||
LDA #$06
|
||||
PHA
|
||||
RTS
|
||||
|
||||
JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>We set a label called "NEXT" on the JMP instruction, and then edit
|
||||
the two LDA instructions to reference the high and low parts, yielding:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #>NEXT
|
||||
PHA
|
||||
LDA #<NEXT-1
|
||||
PHA
|
||||
RTS
|
||||
|
||||
NEXT JMP $ff3a
|
||||
</pre>
|
||||
|
||||
<p>SourceGen will adjust label values by whatever amount is required to
|
||||
generate the original value. If the adjustment seems wrong, make sure
|
||||
you're selecting the right part of the symbol.</p>
|
||||
|
||||
<p>Different assemblers use different syntaxes to form expressions. This
|
||||
is particularly noticeable in 65816 code. You can adjust how it appears
|
||||
on-screen from the app settings.</p>
|
||||
|
||||
<h3><a name="nearby-targets">Automatic Use of Nearby Targets</a></h3>
|
||||
|
||||
<p>Sometimes you want to use a symbol that doesn't match up with the
|
||||
operand. SourceGen tries to anticipate situations where that might be
|
||||
the case, and apply adjustments for you.</p>
|
||||
|
||||
<p>Suppose you have the following:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #$00
|
||||
STA L1010
|
||||
LDA #$20
|
||||
STA L1011
|
||||
LDA #$e1
|
||||
STA L1012
|
||||
RTS
|
||||
|
||||
L1010 .DD1 $00
|
||||
L1011 .DD1 $00
|
||||
L1012 .DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>Showing stores to three different labeled addresses is fine, but
|
||||
the code is actually setting up a single 24-bit address. For clarity,
|
||||
you'd like the output to reflect the fact that it's a single, multi-byte
|
||||
variable. So, if you set a label at $1010, SourceGen removes the
|
||||
nearby auto labels, and sets the numeric references to use your label:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA #$00
|
||||
STA DATA
|
||||
LDA #$20
|
||||
STA DATA+1
|
||||
LDA #$e1
|
||||
STA DATA+2
|
||||
RTS
|
||||
|
||||
DATA .DD1 $00
|
||||
.DD1 $00
|
||||
.DD1 $00
|
||||
</pre>
|
||||
|
||||
<p>If you decide that you really wanted each store to have its own
|
||||
label, you can set labels on the other two addresses. SourceGen won't
|
||||
search for alternate labels if the numeric reference target has a
|
||||
user-defined label.</p>
|
||||
|
||||
<p>This is also used for self-modifying code. For example:</p>
|
||||
<pre>
|
||||
1000: a9ff LDA #$ff
|
||||
1002: 8d0610 STA $1006
|
||||
1005: 4900 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>The above changes the <code>EOR #$00</code> instruction to
|
||||
<code>EOR #$ff</code>. The operand target is $1006, but we can't
|
||||
put a label there because it's in the middle of the instruction. So
|
||||
SourceGen puts a label at $1005 and adjusts it:</p>
|
||||
<pre>
|
||||
LDA #$ff
|
||||
STA L1005+1
|
||||
L1005 EOR #$00
|
||||
</pre>
|
||||
|
||||
<p>If you really don't like the way this works, you can disable the
|
||||
search for nearby targets entirely from the
|
||||
<a href="settings.html#project-properties">project properties</a>.
|
||||
Self-modifying code will always be adjusted because of the limitation
|
||||
on mid-instruction labels.</p>
|
||||
|
||||
|
||||
<h2><a name="width-disambiguation">Width Disambiguation</a></h2>
|
||||
|
||||
<p>It's possible to interpret certain instructions in multiple ways.
|
||||
For example, "LDA $0000" might be an absolute load from a 16-bit
|
||||
address, or it might be a direct page load from an 8-bit address.
|
||||
Humans can infer from the fact that it was written with a 4-digit address
|
||||
that it's meant to be absolute, but assemblers often treat operands
|
||||
purely as numbers, and would just see "LDA 0". Common practice is to
|
||||
use the shortest instruction possible.</p>
|
||||
<p>Every assembler seems to address the problem in a slightly different
|
||||
way. Some use opcode suffixes, others use operand prefixes, some
|
||||
allow both. You can configure how they appear in the
|
||||
<a href="settings.html#app-settings">application settings</a>.</p>
|
||||
<p>SourceGen will only add width disambiguators to opcodes or operands when
|
||||
they are needed, with one exception: the opcode suffix for long
|
||||
(24-bit address) operations is always applied. This is done because some
|
||||
assemblers require it, insisting on "LDAL" rather than "LDA" for an
|
||||
absolute long load, and because it can make 65816 code easier to read.</p>
|
||||
|
||||
|
||||
|
||||
<h2 id="address-regions">Address Regions</h2>
|
||||
|
||||
<p>Simple programs are loaded at a particular address and executed there.
|
||||
The source code starts with a directive that tells the assembler what the
|
||||
initial address is, and the code and data statements that follow are
|
||||
placed appropriately. More complicated programs might relocate parts
|
||||
of themselves to other parts of memory, or be comprised of multiple
|
||||
"overlay" segments that, through disk I/O or bank-switching, all execute
|
||||
at the same address.</p>
|
||||
|
||||
<p>Consider the code in the first tutorial. It loads at $1000, copies
|
||||
part of itself to $2000, and transfers execution there:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
1000: a0 71 LDY #$71
|
||||
1002: b9 17 10 L1002 LDA SRC,y
|
||||
1005: 99 00 20 STA MAIN,y
|
||||
1008: 88 DEY
|
||||
1009: 30 09 BMI L1014
|
||||
100b: 10 f5 BPL L1002
|
||||
|
||||
100d: 00 .DD1 $00
|
||||
100e: 68 65 6c 6c+ .STR "hello!"
|
||||
|
||||
1014: 4c 00 20 L1014 JMP MAIN
|
||||
|
||||
1017: SRC
|
||||
.ADDRS $2000
|
||||
2000: ad 00 30 MAIN LDA $3000
|
||||
[...]
|
||||
</pre>
|
||||
|
||||
<p>The arrangement of this code can be viewed in a couple of ways. One
|
||||
way is to see it linearly: the code starts at $1000, continues to $1017,
|
||||
then restarts at $2000:</p>
|
||||
<pre>
|
||||
+000000 +- start
|
||||
| $1000 - $1016 length=23 ($0017)
|
||||
+000016 +- end (floating)
|
||||
|
||||
+000017 +- start 'MAIN'
|
||||
| $2000 - $2070 length=113 ($0071)
|
||||
+000087 +- end (floating)
|
||||
</pre>
|
||||
|
||||
<p>The other way to picture it is hierarchical: the file loads
|
||||
fully at $1000, and has a "child" region at offset +000017 in which the
|
||||
address changes to $2000:</p>
|
||||
<pre>
|
||||
+000000 +- start
|
||||
| $1000 - $1016 length=23 ($0017)
|
||||
+000017 | +- start 'MAIN' pre='SRC'
|
||||
| | $2000 - $2070 length=113 ($0071)
|
||||
+000087 | +- end
|
||||
+000087 +- end
|
||||
</pre>
|
||||
|
||||
<p>The latter is closer to what many assemblers expect, with a "physical"
|
||||
PC that starts where the file is loaded, and a "logical" or "pseudo" PC
|
||||
that determines how the code is generated. SourceGen supports both
|
||||
approaches. The only thing that would change in this example is that
|
||||
the nested approach allows the "SRC" label to exist. (More on this
|
||||
later, on the section on <a href="#pre-labels">pre-labels</a>.)</p>
|
||||
|
||||
<p>The real value of a hierarchical arrangement becomes apparent when
|
||||
the area copied out of the file is only a small part of it. For
|
||||
example, suppose something like:</p>
|
||||
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA SUB_SRC,Y
|
||||
STA SUB_DST,Y
|
||||
JMP CONT
|
||||
|
||||
SUB_SRC
|
||||
.ADDRS $2000
|
||||
SUB_DST [small routine]
|
||||
.ADREND
|
||||
|
||||
CONT LDA #$12
|
||||
JSR SUB_DST
|
||||
</pre>
|
||||
<p>In this case, a small routine is copied out of the middle of the
|
||||
code that lives at $1000. We want the code at CONT to pick up where
|
||||
things left off. If SUB_SRC is at $1009, and is 23 bytes long, then
|
||||
CONT should be $1020. We could output <code>.ADDRS $1020</code>
|
||||
directly before CONT, but it's inconvenient to work with the generated
|
||||
code if we want to modify the subroutine (changing its length)
|
||||
and re-assemble it.</p>
|
||||
|
||||
|
||||
<h3 id="fixed-float">Fixed vs. Floating</h3>
|
||||
|
||||
<p>Sometimes when disassembling code you know exactly where an address
|
||||
region starts and ends. Other times you know where it starts, but won't
|
||||
know where it stops until you've had a chance to look at the updated
|
||||
disassembly. In the former case you create a region with a "fixed" end
|
||||
point, in the latter you create one with a "floating" end point.</p>
|
||||
<p>Address regions with fixed end points always stop in the same place.
|
||||
Regions with floating end points stop at the next address region boundary,
|
||||
which means they can change size as regions are added or removed.
|
||||
The end will be placed for either the start of a new region (a "sibling"),
|
||||
or the end of an encapsulating region (the "parent").</p>
|
||||
|
||||
<p>Regions that overlap must have a parent/child relationship. Whichever
|
||||
one starts last or ends first is the child. A strict ordering is necessary
|
||||
because a given file offset can only have one address, and if we don't
|
||||
know which region is the child we can't know which address to assign.
|
||||
Regions cannot straddle the start or end of another region, and cannot
|
||||
exactly overlap (have the same start and length) as another region.
|
||||
One consequence of these rules is that "floating" regions cannot share
|
||||
a start offset with another region, because their end point would be
|
||||
adjusted to match the end of the other region.</p>
|
||||
|
||||
<p>The arrangement of regions is particularly important when attempting
|
||||
to resolve an address operand (such as a JSR) to a location within the
|
||||
file. The process is straightforward if the address only appears once,
|
||||
but when overlays cause multiple parts of the file to have the same
|
||||
address, the operand target may be in different places depending on where
|
||||
the call is being made from.
|
||||
The algorithm for resolving addresses is described
|
||||
in the <a href="advanced.html#overlap">advanced topics</a> section.</p>
|
||||
|
||||
|
||||
<h3 id="non-addr">Non-Addressable Areas</h3>
|
||||
|
||||
<p>Some files have contents that aren't actually loaded into memory
|
||||
addressable by the 6502. One example is a file header, such as a load
|
||||
address extracted by the system when reading the program into memory, or
|
||||
something intended to be read by an emulator. Another example is the
|
||||
CHR graphic data on the NES, which is loaded into an area inaccessible
|
||||
to the CPU.</p>
|
||||
|
||||
<p>The generated source file must recreate the original binary exactly,
|
||||
but we don't really want to assign an address to non-addressable data,
|
||||
because it should never be resolved as the target of a JSR or LDA. To
|
||||
handle this case, you can set a region's address to "NA". The assembler
|
||||
needs to have <i>some</i> notion of address, so the start address will
|
||||
be treated as zero.</p>
|
||||
|
||||
<p>Non-addressable regions cannot include executable code. You may put
|
||||
labels on data items, but attempting to reference them will cause a
|
||||
warning and will likely generate code that doesn't assemble.</p>
|
||||
|
||||
<p>It's possible to delete all address regions from a project, or edit
|
||||
them so that there are "holes" not covered by a region.
|
||||
To handle this, all projects are effectively covered by a non-addressable
|
||||
region that spans the entire file. Any part of the file that isn't
|
||||
explicitly covered by a user-specified region will be provided an
|
||||
auto-generated non-addressable region. Such regions don't actually exist,
|
||||
so attempting to edit one will actually cause a new region to be created.</p>
|
||||
|
||||
|
||||
<h3 id="pre-labels">Pre-Labels</h3>
|
||||
|
||||
<p>The need for pre-labels was illustrated in the earlier example, where
|
||||
code in Tutorial1 was copied from $1017 to $2000. The fundamental issue
|
||||
is that offset +000017 has <i>two</i> addresses: $1017 and $2000. The
|
||||
assembler can only generate code for one. Pre-labels allow you to do
|
||||
the same thing you'd do in the source code, which is to add a label
|
||||
immediately before the address is changed.</p>
|
||||
|
||||
<p>Pre-labels are "external" symbols, similar to project symbols,
|
||||
because they refer to an address that is outside the file bounds.
|
||||
They're always treated as having global scope.
|
||||
However, they also behave like user labels, because they're generated
|
||||
as part of the instruction stream and interfere with local label
|
||||
references that cross them.</p>
|
||||
|
||||
<p>The address of a pre-label is determined by the parent region.
|
||||
Suppose you have a file with an arrangement like:</p>
|
||||
<pre>
|
||||
region1 start
|
||||
...
|
||||
region2 start
|
||||
...
|
||||
region2 end
|
||||
region1 end
|
||||
</pre>
|
||||
|
||||
<p>You can put a pre-label on <code>region2</code>, which will be the
|
||||
address of the byte in <code>region1</code> right before the address
|
||||
changed. You can't put a pre-label on <code>region1</code>, because
|
||||
before <code>region1</code> there was no address. Similarly:</p>
|
||||
<pre>
|
||||
region1 start
|
||||
...
|
||||
region1 end
|
||||
region2 start
|
||||
...
|
||||
region2 end
|
||||
</pre>
|
||||
|
||||
<p>You can't put a pre-label on <code>region2</code> because its parent
|
||||
is non-addressable. <code>region1</code>'s address doesn't apply,
|
||||
because <code>region1</code> ended before the label would be issued.</p>
|
||||
|
||||
|
||||
<h3 id="relative-addr">Relative Addressing</h3>
|
||||
|
||||
<p>It is occasionally useful to output an address region start directive
|
||||
that uses relative addressing instead of absolute addressing. For
|
||||
example, given:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
[...]
|
||||
.ADDRS $2000
|
||||
</pre>
|
||||
<p>We could instead generate:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
[...]
|
||||
.ADDRS *+$0fe9
|
||||
</pre>
|
||||
|
||||
<p>This has no effect on the definition of the region. It only affects
|
||||
how the start directive is generated in the assembly source file.</p>
|
||||
|
||||
<p>The value is an offset from the current assembler program counter.
|
||||
If the new region is the child of a non-addressable region, a relative
|
||||
offset cannot be used.</p>
|
||||
|
||||
|
||||
|
||||
<h2><a name="atags">Directing the Code Analyzer</a></h2>
|
||||
|
||||
<p>Sometimes SourceGen can't automatically find the start or end of an
|
||||
instruction stream, or gets confused by inline data. These situations
|
||||
can be resolved by adding analyzer tags.</p>
|
||||
|
||||
<p><b>Code start point</b> tags tell the analyzer to add the offset
|
||||
to the list of instruction start points. Suppose you've got a code
|
||||
library that begins with jump vectors, like this:</p>
|
||||
<pre>
|
||||
1000: 4c0910 JMP $1009
|
||||
1003: 4cef10 JMP $10ef
|
||||
1006: 4c3012 JMP $1230
|
||||
1009: 18 CLC
|
||||
</pre>
|
||||
|
||||
<p>When opened with SourceGen, it will look like this:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
|
||||
.DD1 $4c
|
||||
.DD1 $ef
|
||||
.DD1 $10
|
||||
.DD1 $4c
|
||||
.DD1 $30
|
||||
.DD1 $12
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>SourceGen doesn't see any code that jumps to $1003 or $1006, so it
|
||||
assumes those are data. Further, the functions at those addresses may
|
||||
also be considered data unless some bit of code reachable from L1009
|
||||
calls into them. If you tag $1003 and $1006 as code start points,
|
||||
you'll get better results:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
JMP L10ef
|
||||
JMP L1230
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>Be careful that you only tag the instruction opcode byte. If
|
||||
you tagged each and every byte from $1003 to $1008, you would
|
||||
end up with a mess:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
JMP L1009
|
||||
JMP ▼ L10ef
|
||||
BPL ▼ L1053
|
||||
JMP ▼ L1230
|
||||
BMI L101b
|
||||
L1009 CLC
|
||||
</pre>
|
||||
|
||||
<p>The exact set of instructions shown depends on your CPU configuration.
|
||||
The problem is that the bytes in the middle of the instruction have
|
||||
been tagged as start points, so SourceGen is treating them as
|
||||
embedded instructions. $EF and $12 aren't valid 6502 opcodes, so
|
||||
they're being ignored, but $10 is BPL and $30 is BMI. Because tagging
|
||||
multiple consecutive bytes is rarely useful, SourceGen only applies code
|
||||
start tags to the first byte in a selected line.</p>
|
||||
|
||||
<p><b>Code stop point</b> tags tell the analyzer when it should stop. For
|
||||
example, suppose address $ff00 is known to always be nonzero, and the code
|
||||
uses that fact to get a branch-always on the 6502:</p>
|
||||
<pre>
|
||||
.ADDRS $1000
|
||||
LDA $ff00
|
||||
BNE L1010
|
||||
BRK $11
|
||||
</pre>
|
||||
|
||||
<p>By tagging the BRK as a code stop point, you're telling the analyzer that
|
||||
it should stop trying to execute code when it reaches that point. (Note
|
||||
that this example would actually be better solved by setting a status flag
|
||||
override on the BNE that sets Z=0, so the code tracer will know it's a
|
||||
branch-always and just do the right thing.) As with code start points,
|
||||
code stop points should only be placed on the opcode byte. Placing a
|
||||
code stop point in the middle of what SourceGen believes to be instruction
|
||||
will have no effect.</p>
|
||||
<p>As with code start points, only the first byte in each selected line will
|
||||
be tagged.</p>
|
||||
|
||||
<p><b>Inline data</b> tags identify bytes as being part of the
|
||||
instruction stream, but not instructions. A simple example of this
|
||||
is the ProDOS 8 call interface on the Apple II, which looks like this:</p>
|
||||
<pre>
|
||||
JSR $bf00
|
||||
.DD1 $function
|
||||
.DD2 $address
|
||||
BCS BAD
|
||||
</pre>
|
||||
|
||||
<p>The three bytes following the <code>JSR $bf00</code> should be tagged
|
||||
as inline data, so that the code analyzer skips over them and continues the
|
||||
analysis at the <code>BCS</code> instruction. You can think of these as
|
||||
"code skip" tags, but they're different from stop/start points, because
|
||||
every byte of inline data must be tagged. When
|
||||
applying the tag, all bytes in a selected line will be modified.</p>
|
||||
<p>If code branches into a region that is tagged as inline data, the
|
||||
branch will be ignored.</p>
|
||||
|
||||
|
||||
<h3><a name="scripts">Extension Scripts</a></h3>
|
||||
|
||||
<p>Extension scripts are C# source files that are compiled and
|
||||
executed by SourceGen. They can be added to a project from SourceGen's
|
||||
runtime data directory, or can live in the directory next to the project
|
||||
file. They're used to generate visualizations of graphical data, and
|
||||
to format inline data automatically.</p>
|
||||
<p>The inline data formatting feature can significantly reduce the tedium
|
||||
in certain projects. For example, suppose the code uses a string print
|
||||
routine that embeds a null-terminated string right after a JSR. Ordinarily
|
||||
you'd have to walk through the code, marking every instance by hand so
|
||||
the disassembler would know where the string ends and execution resumes.
|
||||
With an extension script, you can just pass in the print routine's label,
|
||||
and let the script do the formatting automatically.</p>
|
||||
|
||||
<p>To reduce the chances of a script causing problems, all scripts are
|
||||
executed in a sandbox with severely restricted access. Notably, nothing
|
||||
in the sandbox can access files, except to read files from the PluginDll
|
||||
directory.</p>
|
||||
<p>The PluginDll directory lives next to the SourceGen executable, and
|
||||
contains all of the compiled script DLLs, as well as two pre-built
|
||||
application DLLs that plugins are allowed access to. The contents
|
||||
are persistent, to avoid recompiling the scripts every time SourceGen
|
||||
is launched, but may be manually deleted without harm.</p>
|
||||
<p>More details can be found in the
|
||||
<a href="advanced.html#extension-scripts">advanced topics</a> section.</p>
|
||||
|
||||
|
||||
<h2><a name="pseudo-ops">Data and Directive Pseudo-Opcodes</a></h2>
|
||||
|
||||
<p>The on-screen code list shows assembler directives that are similar
|
||||
to what the various cross-assemblers provide. The actual directives
|
||||
generated for a given assembler may match exactly or be totally different.
|
||||
The idea is to represent the concept behind the directive, then let the
|
||||
code generator figure out the implementation details.</p>
|
||||
|
||||
<p>There are eight assembler directives that appear in the code list:</p>
|
||||
<ul>
|
||||
<li>.EQ - defines a symbol's value. These are generated automatically
|
||||
when an operand that matches a platform or project symbol is found.</li>
|
||||
<li>.VAR - defines a local variable. These are generated for
|
||||
local variable tables.</li>
|
||||
<li>.ADDRS/.ADREND - specifies the start or end of an
|
||||
address region.</li>
|
||||
<li>.RWID - specifies the width of the accumulator and index registers
|
||||
(65816 only). Note this doesn't change the actual width, just tells
|
||||
the assembler that the width has changed.</li>
|
||||
<li>.DBANK - specifies what value the Data Bank Register holds
|
||||
(65816 only). Used when matching operands to labels.</li>
|
||||
<li>.JUNK - indicates that the data in a range of bytes is irrelevant.
|
||||
(When generating sources, this will become .FILL or .BULK
|
||||
depending on the contents of the memory region and the assembler's
|
||||
capabilities.)</li>
|
||||
<li>.ALIGN - a special case of .JUNK that indicates the irrelevant
|
||||
bytes exist to force alignment to a memory boundary (usually a
|
||||
256-byte page). Depending on the memory contents, it may be possible
|
||||
to output this as an assembler-specific alignment directive.</li>
|
||||
</ul>
|
||||
|
||||
<p>Every data item is represented by a pseudo-op. Some of them may
|
||||
represent hundreds of bytes and span multiple lines.</p>
|
||||
<ul>
|
||||
<li>.DD1, .DD2, .DD3, .DD4 - basic "define data" op. A 1-4 byte
|
||||
little-endian value.</li>
|
||||
<li>.DBD2, .DBD3, .DBD4 - "define big-endian data". 2-4 bytes of
|
||||
big-endian data. (The 3- and 4-byte versions are not currently
|
||||
available in the UI, since they're very unusual and few assemblers
|
||||
support them.)</li>
|
||||
<li>.BULK - data packed in as compact a form as the assembler allows.
|
||||
Useful for chunks of graphics data.</li>
|
||||
<li>.FILL - a series of identical bytes. The operand
|
||||
has two parts, the byte count followed by the byte value.</li>
|
||||
</ul>
|
||||
|
||||
<p>In addition, several pseudo-ops are defined for string constants:</p>
|
||||
<ul>
|
||||
<li>.STR - basic character string.</li>
|
||||
<li>.RSTR - string in reverse order.</li>
|
||||
<li>.ZSTR - null-terminated string.</li>
|
||||
<li>.DSTR - Dextral Character Inverted string. The high bit of the
|
||||
last byte is flipped.</li>
|
||||
<li>.L1STR - string prefixed with a length byte.</li>
|
||||
<li>.L2STR - string prefixed with a length word.</li>
|
||||
</ul>
|
||||
|
||||
<p>You can configure the pseudo-operands to look more like what your
|
||||
favorite assembler uses in the
|
||||
<a href="settings.html#appset-pseudoop">Pseudo-Op</a> tab in the
|
||||
application settings.</p>
|
||||
|
||||
<p>String constants start and end with delimiter characters, typically
|
||||
single or double quotes. You can configure the delimiters differently
|
||||
for each character encoding, so that it's obvious whether the text is
|
||||
in ASCII or PETSCII. See the
|
||||
<a href="settings.html#appset-textdelim">Text Delimiters</a> tab in
|
||||
the application settings.</p>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,292 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Intro - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Intro</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="overview">Overview</a></h2>
|
||||
|
||||
<p>SourceGen converts 6502/65C02/65816 machine-language programs to
|
||||
assembly-language source.</p>
|
||||
|
||||
<p>SourceGen has two purposes. The first is to be a really nice
|
||||
disassembler for the 6502 and related CPUs. Code tracing with status
|
||||
flag tracking makes it easier to separate the code from the data,
|
||||
automatic formatting of character strings and filled-data areas helps
|
||||
get the data regions sorted out, and modern IDE-style features like
|
||||
cross-reference generation and color-highlighted bookmarks help
|
||||
navigate the code while trying to figure out what it does. A
|
||||
disassembler should help you understand the code, not just dump the
|
||||
instructions to a text file.</p>
|
||||
<p>The computer I built back in 2014 has a 4GHz CPU and 8GB of RAM. I
|
||||
figured we should put the power of modern computing hardware to good use.</p>
|
||||
|
||||
<p>The second purpose is to facilitate sharing and collaboration. Most
|
||||
disassemblers generate output for a specific assembler, or in a way that's
|
||||
generic enough to match most any assembler; either way, you're left with
|
||||
a text file in somebody's idea of the "correct" format. SourceGen keeps
|
||||
everything in an assembler-neutral format, and provides numerous options
|
||||
for customizing the display, so that multiple people viewing the same
|
||||
project can each do so with the conventions they are accustomed to.
|
||||
Code and data operands can be formatted in various numeric formats or
|
||||
as symbols.
|
||||
The project file uses a text format that is fairly diff-friendly, so
|
||||
sharing projects through git works reasonably well. If you want source
|
||||
code you can assemble, SourceGen will generate code optimized for the
|
||||
assembler of your choice.</p>
|
||||
|
||||
<p>The sharing and collaboration ideas only work if the formatting
|
||||
capabilities within SourceGen are sufficiently flexible. If you need to
|
||||
generate assembly source and tweak it a bunch to express the intent of
|
||||
the original code, then passing a SourceGen project around won't work.
|
||||
This sort of thing is a bit outside the bounds of what a typical
|
||||
disassembler does, so it remains to be seen whether SourceGen succeeds at
|
||||
what it's trying to do, and also whether what it's trying to do is
|
||||
something that people actually want.</p>
|
||||
|
||||
<p>You can get started by watching a
|
||||
<a href="https://youtu.be/dalISyBPQq8">demo video</a> and working through
|
||||
the <a href="https://6502bench.com/sgtutorial/">tutorials</a>.</p>
|
||||
|
||||
|
||||
<h2><a name="fundamental-concepts">Fundamentals</a></h2>
|
||||
|
||||
<p>The next few sections present some general concepts and terminology. The
|
||||
rest of the documentation assumes you've read and understood this.</p>
|
||||
<p>It will be helpful if you already understand something about the 6502
|
||||
instruction set and assembly-language programming, but disassembling
|
||||
other programs is actually a pretty good way to learn how to code in
|
||||
assembly. You will need to be familiar with hexadecimal numbers and
|
||||
general programming concepts to make sense of this, however.</p>
|
||||
|
||||
<h3><a name="begin">About 6502 Code</a></h3>
|
||||
|
||||
<p>For brevity's sake, "6502 code" should be taken to mean "code for
|
||||
the 6502 CPU or any of its derivatives, including but not limited to
|
||||
the 65C02 and 65816". So let's talk about 6502 code.</p>
|
||||
|
||||
<p>Code usually arrives in a big binary blob. Some of it will be
|
||||
instructions, some of it will be data, some will be empty space used
|
||||
for variable storage. Part of the challenge of disassembly is
|
||||
identifying which parts of the file contain which.</p>
|
||||
|
||||
<p>Much of the code you'll find for the 6502 was written by humans,
|
||||
rather than generated by a compiler, which means it won't conform to a
|
||||
standard set of conventions. However, most programmers will use
|
||||
subroutines, which can be identified and analyzed in isolation. Subroutines
|
||||
are often interspersed with variable storage, referred to as a "stash".
|
||||
Variables and constants may be single-byte or multi-byte, the latter
|
||||
typically in little-endian byte order.</p>
|
||||
|
||||
<p>Much of the data in a typical program is read-only, often in the
|
||||
form of graphics or character string data. Graphics can be difficult
|
||||
to recognize automatically, but strings can be identified with a
|
||||
reasonable degree of confidence. Address tables, which are a collection
|
||||
of addresses to other things, are also fairly common.</p>
|
||||
|
||||
<p>A simple disassembler would start at the top of the file and just
|
||||
start converting bytes to instructions. Unfortunately there's no reliable
|
||||
way to tell the difference between instructions, data, and variable
|
||||
stashes. When the converter hits data bytes it'll start generating
|
||||
instructions that won't make sense. You'll have another problem when the
|
||||
data ends and code resumes: 6502 instructions are variable-length, so if
|
||||
the last byte of the data area appears to be a three-byte instruction,
|
||||
the first two bytes of the next instruction area will be gobbled up.</p>
|
||||
|
||||
<p>To make things even more difficult (sometimes deliberately), programmers
|
||||
will sometimes use a trick where they "embed" an instruction
|
||||
inside another instruction. This allows code to branch to two different
|
||||
entry points, one of which will set a flag or load a register, and then
|
||||
continue on to common code.</p>
|
||||
|
||||
<p>Another trick is to embed "inline data" after a JSR or JSL instruction.
|
||||
The called subroutine pulls the caller's address off the stack, uses it to
|
||||
access the parameters, then pushes the address back on after modifying it to
|
||||
point to an address past the inline data. This can be very confusing
|
||||
for the disassembler, which will try to interpret the inline data as
|
||||
instructions.</p>
|
||||
|
||||
<p>Sometimes code is loaded at one location, then moved to another and
|
||||
executed there. If you're disassembling an executing program you don't
|
||||
have to worry about this, but if you're disassembling the binary from the
|
||||
loadable file on disk then you need to track the address changes. The
|
||||
address is communicated to the assembler with a "pseudo-opcode", usually
|
||||
something like "ORG" (short for "origin"). Other pseudo-op directives
|
||||
are used to define things like constants and (for 65816 code)
|
||||
register widths.</p>
|
||||
|
||||
<p>The 8-bit CPUs have a 16-bit (64KiB) address space, so addresses can
|
||||
range from $0000 to $ffff. (I'm going to write hex values with a
|
||||
preceding '$', like "$12ab", rather than "0x12ab" or "12abh", because
|
||||
that's what 6502 systems commonly used.) The 65816 has a 24-bit address
|
||||
space, but it's not contiguous -- a branch that extends past the end will
|
||||
wrap around to the start of the 64KiB "bank". For 16-bit instruction
|
||||
operands, the bank is identified for instruction and data addresses
|
||||
by the program bank register and the data bank register, respectively.
|
||||
The disassembler can't always discern the value of the data bank
|
||||
register through static analysis, so some user input may be required.</p>
|
||||
|
||||
<p>The 6502 has an 8-bit processor status register ("P") with a bunch of flags
|
||||
in it. Some of the flags determine whether a conditional branch is taken
|
||||
or not, which is important because some branches appear to be conditional
|
||||
but actually are always or never taken in practice. The disassembler needs
|
||||
to be able to figure this out so that it doesn't try to disassemble the
|
||||
bytes that follow an always-taken branch.
|
||||
A more significant concern is the M and X flags found on the 65802/65816,
|
||||
which determine the width of the registers and of immediate load
|
||||
instructions. If you don't know what state the flags are in, you can't
|
||||
know whether <code>LDA #value</code> is two bytes or three, and the
|
||||
disassembly of the instruction stream will come out wrong.</p>
|
||||
|
||||
<p>Some addresses correspond to memory-mapped I/O, rather than RAM or ROM.
|
||||
Accessing the address can have side effects, like changing between text
|
||||
and graphics modes. Sometimes reading and writing have different effects.
|
||||
For example, on later models of the Apple II, reading from
|
||||
$C000 returns the most recently hit key, while writing to $C000 changes
|
||||
how 80-column display memory is mapped.</p>
|
||||
<p>On a few systems, such as the Atari 2600, RAM, ROM, and registers can
|
||||
appear at multiple locations, "mirrored" across the address space.</p>
|
||||
|
||||
<h3><a name="charenc">Character Encoding</a></h3>
|
||||
|
||||
<p>The American Standard Code for Information Interchange (ASCII) was
|
||||
developed in the 1960s, and became widely used as the means for representing
|
||||
text data on a computer. It's compatible with Unicode, in that the
|
||||
binary representation of an ASCII string is exactly the same when
|
||||
expressed as a Unicode string with UTF-8 encoding.</p>
|
||||
<p>Not all 6502-based computers used ASCII, notably those from Commodore
|
||||
International (e.g. PET, VIC-20, 64, 128), which used variants
|
||||
collectively known as "PETSCII". PETSCII had most of the same symbols,
|
||||
but rearranged them, and added a number of graphical symbols. This was
|
||||
further complicated by the use of two different character sets, one of
|
||||
which dropped lower-case letters in favor of additional symbols, and
|
||||
the use of a separate encoding for characters stored in the text frame
|
||||
buffer ("screen codes").</p>
|
||||
<p>Apple II computers were based on ASCII, but tended to store bytes
|
||||
with the high bit set rather than clear. This is known as "high ASCII".</p>
|
||||
|
||||
<p>SourceGen allows you to specify that a string is encoded with ASCII,
|
||||
High ASCII, C64 PETSCII, or C64 Screen Codes. Because the goal is to
|
||||
generate assembly sources for cross-assemblers, the C64 character
|
||||
support is limited to the set that overlaps with ASCII.</p>
|
||||
<p>For the most part only printable characters are accepted in strings,
|
||||
but certain control characters are also allowed. The characters for
|
||||
bell ($07), linefeed ($0a), and carriage return ($0d) are recognized as
|
||||
string data, and in C64 PETSCII a number of text color and formatting
|
||||
control codes are also allowed.</p>
|
||||
|
||||
<h3><a name="sgconcepts">SourceGen Concepts</a></h3>
|
||||
|
||||
<p>As you work on a disassembled file, formatting operands and adding
|
||||
comments, everything you do is saved in the project file as "meta data".
|
||||
None of the data from the file being disassembled is included. This
|
||||
should allow project files to be shared without violating the copyright
|
||||
of the work being disassembled. (This will vary by region. Also, note
|
||||
that the mere act of disassembling a piece of software may be illegal in
|
||||
some cases.)</p>
|
||||
|
||||
<p>To avoid mix-ups where the wrong data file is used, the file's length
|
||||
and CRC are stored in the project file. SourceGen will refuse to open a
|
||||
project if the data file's length and CRC don't match.</p>
|
||||
|
||||
<p>Most of the data in the project file is associated with a file offset.
|
||||
When you create a comment, you aren't associating it with line 53, you're
|
||||
associating it with the 127th byte in the file. This ensures that, as the
|
||||
project evolves, the comment you wrote is always connected to the
|
||||
same instruction or data item. This also means you can't have two
|
||||
comments on the same line -- each offset only has room for one. By
|
||||
convention, file offsets are always shown as a six-digit hexadecimal value
|
||||
with a leading '+', e.g. "+0012ab". This makes it easy to distinguish
|
||||
between an address and a offset.</p>
|
||||
|
||||
<p>Instruction and data operands can be formatted in various ways. The
|
||||
formatting choice is associated with the first offset of the item. For
|
||||
instructions the number of bytes in the operand is determined by the opcode
|
||||
(and, on the 65816, the M/X status flags). For data items the length
|
||||
can be a single byte or an entire file. Operand formats are not allowed
|
||||
to overlap.</p>
|
||||
|
||||
<p>When an instruction or data operand references an address, we call
|
||||
it a <b>numeric reference</b>. When the target address has a label, and
|
||||
the operand uses that symbol, we call that a <b>symbolic reference</b>.
|
||||
SourceGen tries to establish symbolic references whenever possible,
|
||||
so that the generated assembly source doesn't refer to hard-coded
|
||||
locations within the program. Labels are generated automatically for
|
||||
the targets of numeric references.</p>
|
||||
|
||||
<p>As your understanding of the disassembled code develops, you will want
|
||||
to add comments explaining it. SourceGen projects have three kinds of
|
||||
comments:</p>
|
||||
<ol>
|
||||
<li>End-of-line comments. As the name implies, these appear at the
|
||||
end of a line, to the right of the opcode or operand.</li>
|
||||
<li>Long comments, also known as multi-line comments. These get a
|
||||
line all to themselves, and may span multiple lines.</li>
|
||||
<li>Notes. Like long comments, these get a line to themselves. Unlike
|
||||
long comments, these do not appear in generated assembly code. They
|
||||
are a way for you to leave notes to yourself, perhaps "don't forget
|
||||
to figure this out" or "this is the cool part".</li>
|
||||
</ol>
|
||||
<p>Every file offset can have one of each.</p>
|
||||
|
||||
<p>Labels and comments may disappear if you associate them with a file
|
||||
offset that is in the middle of a multi-byte instruction or data item.
|
||||
For example, suppose you put a long comment at offset +000010, and then
|
||||
mark a 50-byte region starting at offset +000008 as an ASCII string. The
|
||||
comment won't be deleted, but won't be displayed either. The same thing
|
||||
can happen to labels. SourceGen will try to prevent this from happening
|
||||
by splitting formatted data into sub-regions at label boundaries.</p>
|
||||
|
||||
|
||||
<h2><a name="sgintro">How SourceGen Works</a></h2>
|
||||
|
||||
<p>SourceGen employs a partial emulation technique that traces the flow
|
||||
of execution through the program. Most of what a given instruction does
|
||||
isn't important; only its effect on the flow of execution matters. This
|
||||
makes SourceGen different from most other disassemblers, because instead
|
||||
of assuming everything is code and expecting the user to separate out the
|
||||
data, it assumes everything is data and asks the user to identify where the
|
||||
code starts executing.</p>
|
||||
|
||||
<p>SourceGen uses "code start points" to tag places where execution may
|
||||
begin. By default, the first byte of the file is marked as a start point.
|
||||
From there, the tracing process walks through the code, pursuing all
|
||||
branches. In many cases, if you tag all external entry points, SourceGen
|
||||
will automatically find all executable code and separate it from variable
|
||||
storage and data areas.</p>
|
||||
|
||||
<p>As noted earlier, tracking the processor status flags can make the
|
||||
analysis more accurate. Identifying situations where a branch instruction
|
||||
is always or never taken avoids mis-categorizing a data region as code.
|
||||
On the 65816, it's absolutely crucial to track the M/X flags, since those
|
||||
affect the width of instructions. SourceGen tracks the value of the
|
||||
processor flags at every instruction, blending sets of flags together when
|
||||
multiple paths of execution converge.</p>
|
||||
|
||||
<p>Once instructions and data have been separated, the instruction operands
|
||||
can be examined. Branches, loads, and stores that reference an address
|
||||
that falls inside the address space covered by the file can be replaced
|
||||
with a symbol. Operands that refer to addresses outside the file, such
|
||||
as ROM or operating system routines, can be replaced with a symbol defined
|
||||
by an equate directive.</p>
|
||||
|
||||
(For more details on how this works, see the
|
||||
<a href="analysis.html">analysis appendix</a>.)
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,18 @@
|
||||
/*
|
||||
* Overall look and feel.
|
||||
*/
|
||||
body {
|
||||
font-family: Arial, Helvetica, sans-serif;
|
||||
padding: 0px;
|
||||
margin: 0px;
|
||||
}
|
||||
#content {
|
||||
/* top right bottom left */
|
||||
margin: 20px 10px 10px 10px;
|
||||
/*position: relative;*/
|
||||
}
|
||||
#footer {
|
||||
/* top right bottom left */
|
||||
margin: 20px 10px 10px 10px;
|
||||
/*position: relative;*/
|
||||
}
|
||||
@@ -0,0 +1,615 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Using SourceGen - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Using SourceGen</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="starting-new">Starting a New Project</a></h2>
|
||||
|
||||
<p>Select File > New, or if no project is open, click "Start new project".
|
||||
This opens the Create New Project window.</p>
|
||||
<p>Start by selecting your target system from the tree on the left.
|
||||
The panel on the right will show the CPU that will be selected, as well
|
||||
as the symbol files and extension scripts that will be loaded by default.
|
||||
All of these may be overridden later from the project properties.
|
||||
(If the description in the panel on the right says "[placeholder]", it
|
||||
means that the system doesn't yet have a set of symbols defined for it.)</p>
|
||||
|
||||
<p>Next, click the "Select File..." button. Pick the file you wish to
|
||||
disassemble. The dialog will update with the pathname and some notes
|
||||
about the file's size. Click "OK" if all looks good to create the
|
||||
project.</p>
|
||||
<p><strong>NOTE:</strong> Support for very large 65816 programs is
|
||||
incomplete. The maximum size for a data file is limited to 1 MiB.</p>
|
||||
|
||||
<p>The first time you save the project (with File > Save), you will be
|
||||
prompted for the project name. It's best to use the data file's name
|
||||
with ".dis65" added, so this will be set as the default. The data
|
||||
file's name is not stored in the project file, so if you pick a different
|
||||
name, or save the project in a different directory, you will have to
|
||||
select the data file manually whenever you open the project.</p>
|
||||
|
||||
|
||||
<h2><a name="opening">Opening an Existing Project</a></h2>
|
||||
|
||||
<p>Select File > Open, or if no project is open, click "Open
|
||||
existing project". Select the .dis65 project file from the standard
|
||||
file dialog.</p>
|
||||
<p>SourceGen will try to open a data file with the project's name,
|
||||
minus the ".dis65". If it can't find a file with that name, or if there's
|
||||
something wrong with it (e.g. the CRC doesn't match), you will be given
|
||||
the opportunity to specify the location of the data file to use.</p>
|
||||
|
||||
<p>If non-fatal problems with the file are detected, a warning will be
|
||||
shown. If it's something simple, like a missing .sym65 or extension
|
||||
script file, you'll be notified. If it's something more complicated,
|
||||
e.g. the project has a comment on an offset that doesn't exist, you
|
||||
will be warned that the problematic data has been deleted, and will be
|
||||
lost if the project is saved. By default, such a project will be opened
|
||||
in read-only mode, though you can override this in the dialog. You will
|
||||
also be given the opportunity to simply cancel loading the project.</p>
|
||||
|
||||
<p>The locations of the last few projects you've worked with are saved
|
||||
in the application settings. You can access them from
|
||||
File > Recent Projects. If no project is open, links to the two
|
||||
most-recently-opened projects will be available.</p>
|
||||
|
||||
|
||||
<h2><a name="working">Working With a Project</a></h2>
|
||||
|
||||
<p>The main project window is divided into five areas:</p>
|
||||
<ol>
|
||||
<li>Center: the code list. If no project is open, this will instead
|
||||
have buttons to open a new or existing project.</li>
|
||||
<li>Top left: cross-reference list.</li>
|
||||
<li>Bottom left: notes list.</li>
|
||||
<li>Top right: symbols list.</li>
|
||||
<li>Bottom right: info on selected line.</li>
|
||||
</ol>
|
||||
|
||||
<p>Most actions are performed in the center code list. All of the
|
||||
sub-windows can be resized. The window sizes and column widths are
|
||||
saved in the application settings file.</p>
|
||||
|
||||
<p>A toolbar near the top of the screen has some shortcut buttons.
|
||||
If you hover your mouse over them, a tooltip with an explanation will
|
||||
appear.</p>
|
||||
|
||||
|
||||
<h3><a name="code-list">Code List</a></h3>
|
||||
|
||||
<p>The code list provides a view of the code being disassembled. Each
|
||||
line may be an instruction, data item, long comment, note, or
|
||||
assembler directive.</p>
|
||||
<p>The list is divided into columns:</p>
|
||||
<ul>
|
||||
<li><b>Offset</b>. The offset within the file where the instruction
|
||||
or data item starts. Throughout the UI, file offsets are shown as
|
||||
six-digit hex values with a leading '+'.</li>
|
||||
<li><b>Address</b>. The address where the assembled code will execute.
|
||||
For 8-bit CPUs this is shown as a 4-digit hex number, for 16-bit
|
||||
CPUs the bank is shown as well. Double-click on this field to open the
|
||||
<a href="editors.html#address">Edit Address</a> dialog.</li>
|
||||
<li><b>Bytes</b>. Shows up to four bytes from the data file that
|
||||
correspond to the instruction or data. To see the full dump of
|
||||
a longer item, such as an ASCII string, double-click on the field
|
||||
to open the
|
||||
<a href="tools.html#hexdump">Hex Dump Viewer</a>. This is
|
||||
a floating window, so you can keep it open while you work.
|
||||
Double-clicking in the bytes column while the window is open will
|
||||
update the viewer's position and selection.</li>
|
||||
<li><b>Flags</b>. This shows the state of the status flags as they
|
||||
are before the instruction is executed. Double-click on this
|
||||
field to open the
|
||||
<a href="editors.html#flags">Edit Status Flag Override</a> dialog.</li>
|
||||
<li><b>Attributes</b>. Some instructions and data items have
|
||||
interesting attributes.
|
||||
'@' indicates an entry point,
|
||||
'T' means one or more bytes has an analyzer tag (code start/stop/skip),
|
||||
'#' means execution will not continue to the following instruction,
|
||||
'>' is shown for branch targets, and
|
||||
'!' appears when a conditional branch is never taken.
|
||||
(This column is rarely useful and can be hidden.)</li>
|
||||
<li><b>Label</b>. If a label has been defined for this offset, by
|
||||
the user or generated automatically, it will appear here. Also,
|
||||
full-line items like long comments and notes will start in this
|
||||
field. Double-click on this field to open the
|
||||
<a href="editors.html#label">Edit Label</a> dialog.</li>
|
||||
<li><b>Opcode</b>. The instruction or pseudo-opcode mnemonic.
|
||||
If an instruction is embedded inside this one, a ▼ symbol
|
||||
will appear.
|
||||
If you double-click this field for an instruction or data item
|
||||
whose operand refers to an address in the file, the selection will
|
||||
jump to that location. If the operand is a local variable, the
|
||||
selection will jump to the point where the variable was defined.</li>
|
||||
<li><b>Operand</b>. The instruction or data operand. Data operands
|
||||
may span a large number of bytes. Double-click on this field to
|
||||
open the
|
||||
<a href="editors.html#instruction-operand">Edit Instruction Operand</a>
|
||||
or <a href="editors.html#data-operand">Edit Data Operand</a> dialog, as
|
||||
appropriate. (Note you can shift-double-click on data items to
|
||||
edit multiple lines.)</li>
|
||||
<li><b>Comment</b>. End-of-line comment, generally shown with a ';'
|
||||
prefix. If enabled, cycle counts will appear here. Double-click
|
||||
on this field to open the
|
||||
<a href="editors.html#comment">Edit Comment</a> dialog.</li>
|
||||
</ul>
|
||||
|
||||
<p>Double-clicking anywhere on a line with a note or long comment will
|
||||
open the
|
||||
<a href="editors.html#note">Edit Note</a> or
|
||||
<a href="editors.html#long-comment">Edit Long Comment</a> dialog,
|
||||
respectively.</p>
|
||||
|
||||
<p>The code list is a standard Windows list view. You can left-click
|
||||
to select an item, ctrl-left-click to toggle individual items on and
|
||||
off, and shift-left-click to select a range. You can select all lines
|
||||
with Edit > Select All. Resize columns by
|
||||
left-clicking on the divider in the header and dragging it.</p>
|
||||
<p>Selecting any part of a multi-line item, such as a long comment
|
||||
or character string, effectively selects the entire item.</p>
|
||||
|
||||
<p>Right-clicking opens a menu. The contents are the same as those in
|
||||
the Actions menu item in the menu bar. The set of options that are
|
||||
enabled will depend on what you have selected in the main window.</p>
|
||||
<ul>
|
||||
<li><a href="editors.html#address">Set Address</a>. Sets the
|
||||
target address at that offset. When multiple lines are selected,
|
||||
the target addresses at the start and end of the range is set.
|
||||
Enabled when the first line selected is code, data, or an address
|
||||
override, and the full selected range does not overlap with another
|
||||
address override.</li>
|
||||
<li><a href="editors.html#flags">Override Status Flags</a>. Changes
|
||||
the status flags at that offset. Enabled when a single instruction
|
||||
line is selected.</li>
|
||||
<li><a href="editors.html#label">Edit Label</a>. Sets the label
|
||||
at that offset. Enabled when a single instruction or data line is
|
||||
selected.</li>
|
||||
<li><a href="editors.html#instruction-operand">Edit Operand</a>. Opens the
|
||||
Edit Instruction Operand or Edit Data Operand window, depending on
|
||||
what's selected.
|
||||
Enabled when a single instruction line is selected, or when one
|
||||
or more data lines are selected.</li>
|
||||
<li><a href="editors.html#comment">Edit Comment</a>. Sets the
|
||||
comment at that offset. Enabled when a single instruction or data
|
||||
line is selected.</li>
|
||||
<li><a href="editors.html#long-comment">Edit Long Comment</a>. Sets
|
||||
the long comment at that offset. Enabled when a single instruction
|
||||
or data line, or an existing long comment, is selected.</li>
|
||||
<li><a href="editors.html#note">Edit Note</a>. Sets the note at
|
||||
that offset. Enabled when a single instruction or data line, or
|
||||
an existing note, is selected.</li>
|
||||
<li><a href="editors.html#project-symbol">Edit Project Symbol</a>.
|
||||
Sets the name, value, and comment of the project symbol. Enabled
|
||||
when a single equate directive, generated from a project symbol, is
|
||||
selected.</li>
|
||||
<li><a href="editors.html#lvtable">Create Local Variable Table</a>.
|
||||
Create a new local variable table.</li>
|
||||
<li><a href="editors.html#lvtable">Edit Prior Local Variable Table</a>.
|
||||
Modify or delete entries in the most recently defined local
|
||||
variable table.</li>
|
||||
<li><a href="visualization.html#vis-and-sets">Create/Edit Visualization Set</a>.
|
||||
Create a new visualization set or edit an existing set.</li>
|
||||
|
||||
<li><a href="#atags">Analyzer Tags</a> (Tag Address As Code Start Point,
|
||||
Tag Address As Code Stop Point, Tag Bytes As Inline Data,
|
||||
Remove Analyzer Tags).
|
||||
Enabled when one or more code and data lines are selected. Remove
|
||||
Analyzer Tags is only enabled when at least one line has tags. The
|
||||
keyboard shortcuts are two-key combinations.</li>
|
||||
|
||||
<li><a href="#address-table">Format Address Table</a>. Formats
|
||||
a series of bytes as parts of a table of addresses.</li>
|
||||
<li><a href="#toggle-single">Toggle Single-Byte Format</a>. Toggles
|
||||
a range of lines between default format and single-byte format. Enabled
|
||||
when one or more data lines are selected.</li>
|
||||
<li><a href="#format-as-word">Format As Word</a>. Formats two bytes as
|
||||
a 16-bit little-endian word.</li>
|
||||
<li>Delete Note / Long Comment. Deletes the selected note or long
|
||||
comment. Enabled when a single note or long comment is selected.</li>
|
||||
<li><a href="tools.html#hexdump">Show Hex Dump</a>. Opens the hex dump
|
||||
viewer, with the current selection highlighted. Always enabled. If
|
||||
nothing is selected, the viewer will open at the top of the file.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="undo">Undo & Redo</a></h3>
|
||||
|
||||
<p>You can undo a change with Edit > Undo, or Ctrl+Z. You can redo a
|
||||
change with Edit > Redo, Ctrl+Y, or Ctrl+Shift+Z.</p>
|
||||
<p>All changes to the project, including changes to the project properties,
|
||||
are added to the undo/redo buffer. This has no fixed size limit, so no
|
||||
matter how much you change, you can always undo back to the point where
|
||||
the project was opened.</p>
|
||||
<p>The undo history is not saved as part of the project. Closing a project
|
||||
clears it.</p>
|
||||
|
||||
|
||||
<h3><a name="references">References Window</a></h3>
|
||||
|
||||
<p>When a single instruction or data line is selected in the main window,
|
||||
all references to that offset will be shown in the References window.
|
||||
For each reference, the file offset, address, and some details about the
|
||||
type of reference will be shown.</p>
|
||||
|
||||
<p>The reference type indicates whether the origin is an instruction or
|
||||
data operand, and provides an indication of the nature of the reference:</p>
|
||||
<ul>
|
||||
<li>call - subroutine call
|
||||
(e.g. <code>JSR addr</code>, <code>JSL addr</code>)</li>
|
||||
<li>branch - conditional or unconditional branch
|
||||
(e.g. <code>JMP addr</code>, <code>BCC addr</code>)</li>
|
||||
<li>read - read from memory
|
||||
(e.g. <code>LDA addr</code>, <code>BIT addr</code>)</li>
|
||||
<li>write - write to memory
|
||||
(e.g. <code>STA addr</code>)</li>
|
||||
<li>rmw - read-modify-write
|
||||
(e.g. <code>LSR addr</code>, <code>TSB addr</code>)</li>
|
||||
<li>ref - reference to address by instruction
|
||||
(e.g. <code>LDA #<addr</code>, <code>PEA addr</code>)</li>
|
||||
<li>data - reference to address by data
|
||||
(e.g. <code>.DD2 addr</code>)</li>
|
||||
</ul>
|
||||
<p>References from instructions that use indexed addressing
|
||||
(e.g. <code>LDA addr,Y</code>) will also show "idx" to indicate that
|
||||
the instruction is using the location as a base address.</p>
|
||||
<p>References from instructions that treat the address as a pointer
|
||||
(e.g. <code>LDA (dp),Y</code>) will show "ptr". This makes it easy
|
||||
to identify the locations that are reading or writing through the
|
||||
pointer from those that are reading or writing the pointer itself.</p>
|
||||
<p>This will be prefixed with "Sym" or "Oth" to indicate whether or not
|
||||
the reference used the label at the current address. To understand
|
||||
this, consider that addresses can be referenced in different ways.
|
||||
For example:</p>
|
||||
<pre>
|
||||
LDA DATA0
|
||||
LDX DATA0+1
|
||||
RTS
|
||||
DATA0 .DD1 $80
|
||||
DATA1 .DD2 $90
|
||||
</pre>
|
||||
<p>Both <code>DATA0</code> and <code>DATA1</code> are accessed, but
|
||||
both operands used <code>DATA0</code>. When the <code>DATA0</code> line
|
||||
is selected in the code list, the references window will show the
|
||||
<code>LDA</code> and <code>LDX</code> instructions, because both
|
||||
instructions referenced it. When <code>DATA1</code> is selected, the
|
||||
references window will show the <code>LDX</code>, because that
|
||||
instruction accessed <code>DATA1</code>'s location even though it didn't
|
||||
use the symbol. To make the difference clear, the lines in the references
|
||||
window will either show "Sym" (to indicate that the symbol at the selected
|
||||
line was referenced) or "Oth" (to indicate that some other symbol, or no
|
||||
symbol, was used).</p>
|
||||
|
||||
<p>When an equate directive (generated for platform and project
|
||||
symbols) or local variable assignment is selected, the References
|
||||
window will show all references to that symbol. Unlike in-file
|
||||
references, only the uses of that symbol are shown. So if you have
|
||||
both a project symbol and a local variable for address $30, they
|
||||
will show disjoint sets of references. Furthermore, if you explicitly
|
||||
format an instruction operand as hex, e.g. <code>LDA $30</code>, it will
|
||||
not appear in either set because it's not a symbolic reference.</p>
|
||||
<p>The cross-reference data is used to generate the set of equate
|
||||
directives at the top of the listing. If nothing references a platform
|
||||
or project symbol, an equate directive will not be generated for it.</p>
|
||||
|
||||
<p>Double-clicking on a reference moves the code list selection to that
|
||||
reference, and adds the previous selection to the navigation stack.</p>
|
||||
|
||||
|
||||
<h3><a name="notes">Notes Window</a></h3>
|
||||
|
||||
<p>When you add a note, it will also be added to this window.
|
||||
Double-clicking on a note will jump directly to it, and add the previous
|
||||
selection to the navigation stack. This makes notes useful as bookmarks.</p>
|
||||
|
||||
|
||||
<h3><a name="symbols">Symbols Window</a></h3>
|
||||
|
||||
<p>All known <a href="intro-details.html#about-symbols">symbols</a> are shown
|
||||
here. The filter buttons allow you to screen out symbols you're not
|
||||
interested in, such as platform symbols or constants.</p>
|
||||
|
||||
<p>Clicking on one of the column headers will sort the list on that
|
||||
field. Click a second time to reverse the sort direction.</p>
|
||||
|
||||
<p>Double-clicking on an auto or user label will jump to that label, and
|
||||
add the previous selection to the navigation stack. This can be a handy
|
||||
way to move around the file, jumping from label to label.</p>
|
||||
|
||||
<p>The "type" column uses a two-letter code to identify the symbol's
|
||||
type and scope. The first letter is one of A (auto), U (user),
|
||||
P (platform), J (project), R (pre-label), or V (variable).
|
||||
The second letter is one of N (non-unique local), L (local), G (global),
|
||||
X (exported), E (external), or C (constant).</p>
|
||||
|
||||
|
||||
<h3><a name="info">Info Window</a></h3>
|
||||
|
||||
<p>Some additional information about the currently-selected line is
|
||||
shown, such as the formatting applied to the operand. If the operand
|
||||
has a default format, any automatically-generated format will be noted.
|
||||
For an instruction,
|
||||
a summary is shown that includes the cycle count, flags affected, and a
|
||||
brief description of what the instruction does. The latter can be
|
||||
especially handy for undocumented instructions.</p>
|
||||
|
||||
|
||||
<h3><a name="messages">Messages Window</a></h3>
|
||||
|
||||
<p>Sometimes a change will invalidate an earlier change. For example,
|
||||
suppose you add a code stop point, and format the data that follows
|
||||
as a string. Later on you change it to a code start point. You now have
|
||||
a block of executable code with a string format record sitting in the
|
||||
middle of it. SourceGen tries very hard not to throw away anything
|
||||
you've done, but it will ignore anything invalid.</p>
|
||||
<p>If a problem like this is encountered, an entry is added to a list
|
||||
of messages displayed at the bottom of the main window. Each entry identifies
|
||||
the nature of the problem, the severity of the problem, the offset where
|
||||
it occurred, and what was done to resolve it. The problem categories
|
||||
include:</p>
|
||||
<ul>
|
||||
<li>Hidden label: a label placed on code or data is now stuck in the
|
||||
middle of a multi-byte instruction or data item.</li>
|
||||
<li>Unresolved weak ref: a reference to a non-existent symbol was found.</li>
|
||||
<li>Invalid offset or length: the offset or length in a format object
|
||||
had an invalid value.</li>
|
||||
<li>Invalid descriptor: the format descriptor is inappropriate,
|
||||
e.g. formatting an instruction as a string.</li>
|
||||
</ul>
|
||||
<p>The "context" column will provide additional detail about the problem,
|
||||
and the "resolution" column will indicate how it's being handled. In most
|
||||
cases, the offending item will be ignored.</p>
|
||||
<p>Double-clicking on an entry will jump to that offset.</p>
|
||||
<p>The message list will not appear if there are no messages. You can
|
||||
hide the list by clicking on the "Hide" button to the left of the messages.
|
||||
Un-hide the list by clicking on the "N messages" button at the bottom-right
|
||||
corner of the application window.</p>
|
||||
|
||||
|
||||
<h3><a name="navigation">Navigation</a></h3>
|
||||
|
||||
<p>The simplest way to move through the code list is with the scroll wheel
|
||||
on your mouse, or by left-clicking and dragging the scroll bar. You
|
||||
can also use PgUp/PgDn and the arrow keys.</p>
|
||||
|
||||
<p>Use Navigate > Find to search for text. This performs a case-insensitive
|
||||
text search on the label, opcode, operand, and comment fields.
|
||||
Use Navigate > Find Next to find the next match, and
|
||||
Navigate > Find Previous to find the previous match. Note "next" is
|
||||
always downward, and "previous" is always upward, regardless of the
|
||||
direction of the initial search chosen in the Find dialog.</p>
|
||||
|
||||
<p>Use Navigate > Go To to jump to an offset, address, or label. Remember
|
||||
that offsets and addresses are always hexadecimal, and offsets start
|
||||
with a '+'. If you have a label that is also a valid hexadecimal
|
||||
address, like "FEED", the label takes precedence. To jump to the address
|
||||
write "$FEED" instead. If you enter a non-unique label, the selection
|
||||
will jump to the nearest instance.</p>
|
||||
|
||||
<p>If an instruction or data line has an operand that references an address
|
||||
in the file, you can navigate to the operand's location with
|
||||
Navigate > Jump to Operand. You can also do this by double-clicking
|
||||
in the opcode column.</p>
|
||||
|
||||
<p>When you edit something, lines throughout the listing can change. This
|
||||
is different from a source code editor, where editing a line just changes
|
||||
that line. To allow you to watch the effects changes have, the undo/redo
|
||||
commands try to keep the listing in the same position.
|
||||
If you want to go to the place where the last change (i.e. the change
|
||||
that will be undone by the next Undo operation) was made,
|
||||
Navigate > Go to Last Change will jump to the first offset
|
||||
associated with the most recent change.
|
||||
If the last change was to the project properties, it will jump to the
|
||||
first offset in the file.</p>
|
||||
|
||||
<p>When you jump around, e.g. by double-clicking on an opcode or an entry
|
||||
in one of the side windows, the previously-selected line is added to
|
||||
a navigation stack. You can use Navigate > Nav Forward and
|
||||
Navigate > Nav Backward to move forward and backward through the
|
||||
stack. (The curly arrows on the left side of the toolbar may be more
|
||||
convenient. You can use Alt+Left/Right Arrow, or
|
||||
Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)</p>
|
||||
|
||||
|
||||
<h3><a name="atags">Adding and Removing Analyzer Tags</a></h3>
|
||||
|
||||
<p><i>(Note: These were referred to as code/data "hints" in older
|
||||
versions of SourceGen.)</i></p>
|
||||
|
||||
<p>To set code start or stop points, select the desired offsets and
|
||||
use Actions > Tag Address As Code Start Point (or Stop Point). Because
|
||||
these indicate a transition between code and data regions, there is rarely
|
||||
any need to tag multiple consecutive bytes.
|
||||
For this reason, only the first byte on each selected line will be tagged.</p>
|
||||
|
||||
<p>For inline data, you need to cover the entire range, so every byte in every
|
||||
selected line is tagged when you select Tag Bytes As Inline Data. Similarly,
|
||||
the Remove Analyzer Tags menu item will remove tags from every byte.</p>
|
||||
|
||||
<p>If you're having a hard time selecting just the right bytes because
|
||||
the instructions are caught up in a multi-byte data item, such as an
|
||||
auto-detected character string, you can disable uncategorized data analysis
|
||||
(the thing that creates the .STR and .FILL ops for you). You can do this
|
||||
from the
|
||||
<a href="settings.html#project-properties">project properties</a> editor,
|
||||
or simply by hitting Ctrl+D. Hit that, tag the byte or bytes, then hit it
|
||||
again to re-enable the string & fill analyzer.</p>
|
||||
<p>Another approach is to use the "Toggle Single-Byte Format"
|
||||
menu item to "flatten" the item.</p>
|
||||
|
||||
|
||||
<h3><a name="address-table">Format Address Table</a></h3>
|
||||
|
||||
<p>Tables of addresses are fairly common. Sometimes you'll find them as a
|
||||
series of 16-bit words, like this:</p>
|
||||
<pre>
|
||||
jmptab .dd2 func1
|
||||
.dd2 func2
|
||||
.dd2 func3
|
||||
</pre>
|
||||
|
||||
<p>While that's fairly common in 16-bit software, 8-bit software often splits
|
||||
the high and low bytes into separate arrays, like this:</p>
|
||||
<pre>
|
||||
jmptabl .dd1 <func1
|
||||
.dd1 <func2
|
||||
.dd1 <func3
|
||||
jmptabh .dd1 >func1
|
||||
.dd1 >func2
|
||||
.dd1 >func3
|
||||
</pre>
|
||||
|
||||
<p>Sometimes the tables contain <code>address - 1</code>, because the
|
||||
values are to be pushed onto the stack for an RTS call.</p>
|
||||
|
||||
<p>While the .dd2 case is easy to format with the data operand editor,
|
||||
formatting addresses whose components are split into multiple tables can
|
||||
be tedious. Even in the easy case, you may want to create labels and set
|
||||
code start points for each item.</p>
|
||||
|
||||
<p>The Address Table Formatter helps you associate symbols with the
|
||||
addresses in the table. It works for simple and "split" tables.</p>
|
||||
<p>To use it, start by selecting the entire table. In the examples above,
|
||||
you would select all 6 bytes. The number of bytes in each part of a
|
||||
split table must be equal: here, it's 3 low bytes, followed by 3 high
|
||||
bytes. If the number of bytes selected can't be evenly divided by the
|
||||
number of parts -- two parts for 16-bit data, three parts for 24-bit data --
|
||||
the formatter will report an error.</p>
|
||||
<p>With the data selected, open the format dialog with
|
||||
Actions > Format Split-Address Table. The rather complicated dialog
|
||||
is split into sections.</p>
|
||||
<ul>
|
||||
<li>Address Characteristics: select whether the table has 16-bit
|
||||
addresses or 24-bit addresses. (24-bit addresses are disabled if you
|
||||
don't have the CPU set to 65816.) If the table is split into individual
|
||||
sub-tables for low bytes and high bytes, check the "Parts are split
|
||||
across sub-tables" box. If the address parts are being pushed
|
||||
on the stack for an RTS/RTL, check the "Adjusted for RTS/RTL" box to
|
||||
adjust them by 1.</li>
|
||||
<li>Low Byte Source: indicate which part of the table or word holds the
|
||||
low bytes. For common little-endian words, the low bytes come first. In
|
||||
the split-table example above, the low bytes came first, followed by the
|
||||
high bytes, so you would select "first part of selection". If they were
|
||||
stored the other way around, you would click "second part" instead.</li>
|
||||
<li>High Byte Source: indicate which part of the table or word holds
|
||||
the high bytes. For a 16-bit address this will be the part you didn't
|
||||
pick for the low bytes.
|
||||
Sometimes, if all addresses land on the same 256-byte page, the high byte
|
||||
will be a constant in the code, and only the low bytes will be stored in
|
||||
a table. If that's the case, select "Constant", and enter the high byte
|
||||
in the text box. (Decimal, hex, and binary are accepted.)</li>
|
||||
<li>Bank Byte Source: for 24-bit addresses, you can select "Nth part of
|
||||
selection", which will just use whichever part you didn't specify for
|
||||
the low and high bytes. If the table holds 16-bit addresses, you can
|
||||
use the "Constant" field to specify the data bank.</li>
|
||||
<li>Options: if the table holds the addresses of executable code, check
|
||||
the "Tag targets as code start points" box. If the target address
|
||||
hasn't been identified by the code analyzer through some other execution
|
||||
path, it will be tagged as a code start point.</li>
|
||||
<li>Generated Addresses: this shows the full list of addresses that are
|
||||
generated with the current set of parameters. Each address is shown with
|
||||
a file offset and a symbol. If the address can't be mapped within the
|
||||
file, the offset is shown as dashes instead. If the address can be
|
||||
mapped, and it already has a user-specified label, the label will be
|
||||
shown. If no label was found, the table will show "(+)", indicating
|
||||
that a permanent label will be added at the target offset. If everything
|
||||
is set up correctly, and the addresses fall entirely within the program,
|
||||
you shouldn't see any unknown entries here.</li>
|
||||
</ul>
|
||||
|
||||
<p>For a 16-bit address, you have three choices: low byte first, high byte
|
||||
first, or low byte only with a constant high byte. For a 24-bit address
|
||||
the set of possibilities expands, but is essentially the same: pick the
|
||||
order in which things appear, using fixed constants if desired.</p>
|
||||
|
||||
<p>A message at the top of the screen shows how many bytes are selected.
|
||||
It also tells you how many groups there are, but unlike the data operand
|
||||
formatter, the split-address table formatter doesn't care about group
|
||||
boundaries. For this reason, tables do not have to be contiguous in
|
||||
memory. The low bytes and high bytes could be on separate 256-byte
|
||||
pages. You just need to have all of the data selected.</p>
|
||||
|
||||
<p>It should be mentioned that SourceGen does not record the fact that the
|
||||
data in question is part of a table. The formatting, labels, and code
|
||||
start point tags are applied as if you entered them all individually by
|
||||
hand. The formatter is just significantly more convenient. It also
|
||||
does everything as a single undoable action, so if it comes out looking
|
||||
wrong, just hit "undo" and try something else.</p>
|
||||
|
||||
|
||||
<h3><a name="toggle-single">Toggle Single-Byte Format</a></h3>
|
||||
|
||||
<p>The "Toggle Single-Byte Format" feature provides a quick way to
|
||||
change a range of bytes to single bytes
|
||||
or back to their default format. It's equivalent to opening the Edit
|
||||
Data Operand dialog and selecting "Single bytes" displayed as hex, or
|
||||
selecting "Default".</p>
|
||||
<p>This can be handy if the default format for a range of bytes is a
|
||||
string, but you want to see it as bytes or set a label in the middle.</p>
|
||||
|
||||
|
||||
<h3><a name="format-as-word">Format As Word</a></h3>
|
||||
|
||||
<p>This is a quick way to format pairs of bytes as 16-bit words. It's
|
||||
equivalent to opening the Edit Data Operand dialog and selecting
|
||||
"16-bit words, little-endian", displayed as hex.</p>
|
||||
|
||||
<p>To avoid some confusing situations, it only works on sets of
|
||||
single-byte values. This means, for example, that you can't select a
|
||||
10-byte string and have it turn into five 16-bit words. You can select as
|
||||
many bytes as you want, but they must come in pairs. (Remember that you
|
||||
can turn off auto-generation of strings and .FILLs with
|
||||
<a href="#toggle-data">Toggle Data Scan</a>.)</p>
|
||||
<p>As a special case, if you select a single byte, the following byte will
|
||||
also be selected. This won't work if the following byte is part of a
|
||||
multi-byte data item, is the start of a new region (see
|
||||
<a href="editors.html#data-operand">Edit Data Operand</a> for a definition of
|
||||
what splits a region), or is the last byte in the file.</p>
|
||||
|
||||
|
||||
<h3><a name="toggle-data">Toggle Data Scan</a></h3>
|
||||
|
||||
<p>This menu item is in the Edit menu, and acts as a shortcut to opening
|
||||
the Project Properties editor, and clicking on the "Analyze Uncategorized
|
||||
Data" checkbox. When enabled, SourceGen will look for character strings and
|
||||
regions of identical bytes, and generate .STR and .FILL directives. When
|
||||
disabled, uncategorized data is presented as one byte per line, which can
|
||||
be handy if you're trying to get at a byte in the middle of a string.</p>
|
||||
<p>As with all other project property changes, this is an undoable
|
||||
action.</p>
|
||||
|
||||
|
||||
<h3><a name="clipboard">Copying to Clipboard</a></h3>
|
||||
|
||||
<p>When you use Edit > Copy, all lines selected in the code list are
|
||||
copied to the system clipboard. This can be a convenient way to post
|
||||
code snippets into forum postings or documentation. The text is
|
||||
copied from the data shown on screen, so your chosen capitalization
|
||||
and pseudo-ops will appear in the copy.</p>
|
||||
<p>Long comments are included, but notes are not.</p>
|
||||
<p>By default, only the label, opcode, operand, and comment fields are
|
||||
included. From the
|
||||
<a href="settings.html#app-settings">app settings</a> dialog you can select
|
||||
alternative formats that include additional columns.</p>
|
||||
|
||||
<p>A copy of all of the fields is also written to the clipboard in CSV
|
||||
format. If you have a spreadsheet like Excel, you can use Paste Special
|
||||
to put the data into individual cells.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,393 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Properties & Settings - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Properties & Settings</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="overview">Settings Overview</a></h2>
|
||||
|
||||
<p>There are two kinds of settings: application settings, and
|
||||
project properties.</p>
|
||||
|
||||
|
||||
<h2><a name="app-settings">Application Settings</a></h2>
|
||||
|
||||
<p>Application settings are stored in a file called "SourceGen-settings"
|
||||
in the SourceGen installation directory. If the file is missing or
|
||||
corrupted, default settings will be used. These settings are local
|
||||
to your system, and include everything from window sizes to whether or not
|
||||
you prefer hexadecimal values to be shown in upper case. None of them
|
||||
affect the way the project analyzes code and data, though they may affect
|
||||
the way generated assembly sources look.</p>
|
||||
|
||||
<p>The settings editor is divided into four tabs. Changes don't take
|
||||
effect until you hit Apply or OK.</p>
|
||||
|
||||
|
||||
<h3><a name="appset-codeview">Code View</a></h3>
|
||||
|
||||
<p>These settings change the way the code looks on screen.</p>
|
||||
|
||||
<p>Click the Column Visibility buttons to hide columns. Click them
|
||||
again to restore the column to a width appropriate for the current font.
|
||||
A "hidden" column just has a width of zero, so with careful mouse
|
||||
positioning you can show and hide columns by dragging the column headers.
|
||||
The buttons may be more convenient though.</p>
|
||||
|
||||
<p>You can select a different font for the code list, and make it as large
|
||||
or as small as you want. Mono-space fonts like Courier or Consolas are
|
||||
recommended (and will be the only ones shown).</p>
|
||||
|
||||
<p>You can choose to display different parts of the display in upper or
|
||||
lower case, using the "all lower" and "all upper" buttons as a quick way
|
||||
to set all values. These settings are also used for generated assembly
|
||||
code, unless the assembler has specific case-sensitivity requirements. There
|
||||
is no setting for labels, which are always case-sensitive.</p>
|
||||
|
||||
<p>The Clipboard drop-down list lets you choose the format for text
|
||||
<a href="mainwin.html#clipboard">copied to the clipboard</a>. The
|
||||
"Assembler Source" format includes the rightmost columns (label,
|
||||
opcode, operand, and comment), like assembly source code does. The
|
||||
"Disassembly" format adds the address and bytes on the left. Use
|
||||
the "All Columns" format to get all columns.</p>
|
||||
|
||||
<p>When "show cycle counts for instructions" is checked, every instruction
|
||||
line will have an end-of-line comment that indicates the number of cycles
|
||||
required for that instruction. If the cycle count can't be determined
|
||||
solely from a static analysis, e.g. an extra cycle is required if
|
||||
<code>LDA (dp),Y</code> crosses a page boundary, a '+' will be shown.
|
||||
In some cases the variability can be factored out if the state of
|
||||
certain status flags is known, e.g. 65C02 instructions that take longer
|
||||
in decimal mode won't be shown as variable if the analyzer can determine
|
||||
that D=0 or D=1. This checkbox enables display in the on-screen list, but
|
||||
does not affect generated source code, which can be configured independently
|
||||
on the Asm Config tab.</p>
|
||||
|
||||
<p>Check "use 'dark' color scheme" to change the main disassembly list
|
||||
to use white text on a black background, and mute the Note highlight
|
||||
colors.
|
||||
(Most of the GUI uses standard Windows controls that take their colors
|
||||
from the system theme, but the disassembly list uses a custom style. You
|
||||
can change the rest of the UI from the Windows display "personalization"
|
||||
controls.)</p>
|
||||
|
||||
|
||||
<h3><a name="appset-textdelim">Text Delimiters</a></h3>
|
||||
|
||||
<p>Character and string operands are shown surrounded by quotes, e.g.
|
||||
<code>LDA #'*'</code> or <code>.STR "Hello, world!"</code>. It's
|
||||
handy to be able to tell at a glance how characters are encoded, so
|
||||
SourceGen allows you to set the delimiters independently for every
|
||||
supported character encoding.</p>
|
||||
<p>String operands may contain a mixture of text and hexadecimal values.
|
||||
For example, in ASCII data, the control characters for linefeed and
|
||||
carriage return ($0a and $0d) are considered part of the string, but
|
||||
don't have a printable symbol. (Unicode defines some glpyhs, but they
|
||||
don't look very good at smaller font sizes.)</p>
|
||||
<p>If one of the delimiter characters appears in the string itself,
|
||||
the character will be output as hex to avoid confusion. For this
|
||||
reason, it's generally wise to use delimiter characters that aren't
|
||||
part of the ASCII character set. The "Sample Characters" box holds some
|
||||
characters that you can copy and paste (with Ctrl+C / Ctrl+V) into the
|
||||
delimiter fields.</p>
|
||||
<p>For character operands, the prefix and suffix are added to the start
|
||||
and end of the operand. For string operands, the prefix is added to the
|
||||
start of the first line, and suffixes aren't allowed.
|
||||
<p>These options change the way the code list looks on screen. They
|
||||
do not affect generated code, which must use the delimiter characters
|
||||
specified by the chosen assembler.</p>
|
||||
|
||||
|
||||
<h3><a name="appset-displayformat">Display Format</a></h3>
|
||||
|
||||
<p>These options change the way the code list looks on screen. They
|
||||
do not affect generated code.</p>
|
||||
|
||||
<p>The
|
||||
<a href="intro-details.html#width-disambiguation">operand width disambiguator</a>
|
||||
strings are used when the width of an instruction operand is unclear.
|
||||
You may specify values for all of them or none of them.</p>
|
||||
|
||||
<p>Different assemblers have different ways of forming expressions.
|
||||
Sometimes the rules allow expressions to be written simply, other times
|
||||
explicit grouping with parenthesis is required. Select whichever style
|
||||
you are most comfortable with.</p>
|
||||
|
||||
<p>Non-unique labels are identified with a prefix character, typically
|
||||
'@' or ':'. The default is '@', but you can configure it to any character
|
||||
that isn't valid for the start of a label. (64tass uses '_' for locals,
|
||||
but that's a valid label start character, and so isn't allowed here.)
|
||||
The setting affects label editing as well as display.</p>
|
||||
|
||||
<p>If you would like your local variables to be shown with a prefix
|
||||
character, you can set it in the "local variable prefix" box.</p>
|
||||
|
||||
<p>The "quick set" pop-up configures the fields on the left side of the
|
||||
tab to match the conventions of the specified assembler. Select your
|
||||
preferred assembler in the combo box to set the fields. The setting
|
||||
automatically switches to "custom" when you edit a field.
|
||||
(64tass and ACME use the "common"
|
||||
expression style, cc65 and Merlin 32 have their own unique styles.)</p>
|
||||
|
||||
<p>The "add spaces in Bytes column" checkbox changes the format of the
|
||||
hex data in the code list "bytes" column from dense (<code>20edfd</code>)
|
||||
to spaced (<code>20 ed fd</code>). This also affects the format of
|
||||
clipboard copies and exports.</p>
|
||||
|
||||
<p>The "comma-separated format for bulk data" determines whether large
|
||||
blocks of hex look like <code>ABC123</code> or
|
||||
<code>$AB,$C1,$23</code>. The former reduces the number of lines
|
||||
required, the latter is more readable.</p>
|
||||
<p>Long operands, such as strings and bulk data, are wrapped to a new
|
||||
line after a certain number of characters. Use the pop-up to configure
|
||||
the value. Larger values can make the code easier to read, but smaller
|
||||
values allow you to shrink the width of the operand column in the
|
||||
on-screen listing, moving the comment field closer in.</p>
|
||||
|
||||
|
||||
<h3><a name="appset-pseudoop">Pseudo-Op</a></h3>
|
||||
|
||||
<p>These options change the way the code list looks on screen. Assembler
|
||||
directives and data pseudo-opcodes will use these values. This does
|
||||
not affect generated source code, which always matches the conventions
|
||||
of the target assembler.</p>
|
||||
|
||||
<p>Enter the string you want to use for the various data formats. If
|
||||
a field is left blank, a default value is used.</p>
|
||||
|
||||
<p>The "quick set" pop-up configures the fields on this tab to match
|
||||
the conventions of the specified assembler. Select your preferred assembler
|
||||
in the combo box to set the fields. The setting automatically switches to
|
||||
"custom" when you edit a field.</p>
|
||||
|
||||
|
||||
|
||||
<h3><a name="appset-asmconfig">Asm Config</a></h3>
|
||||
|
||||
<p>These settings configure cross-assemblers and modify assembly source
|
||||
generation in various ways.</p>
|
||||
<p>To configure an assembler, select it in the pop-up menu. The fields
|
||||
will initially contain assembler-specific default values. All of
|
||||
the values in the Assembler Configuration box may be configured
|
||||
differently for each assembler.</p>
|
||||
<p>The "executable" box holds the full path to the cross-assembler
|
||||
executable.</p>
|
||||
<ul>
|
||||
<li>64tass: <code>64tass.exe</code>
|
||||
<li>ACME: <code>acme.exe</code>
|
||||
<li>cc65: <code>bin/cl65.exe</code> -- full installation required,
|
||||
with all configuration files and libraries
|
||||
<li>Merlin 32: <code>Merlin32.exe</code>
|
||||
</ul>
|
||||
<p>The "column widths" section allows you to specify the minimum
|
||||
width of the label, opcode, operand, and comment fields. If the width
|
||||
is less than 1, or isn't a valid number, 1 will be used. These are
|
||||
not hard stops: if the contents of a field are too wide, the contents
|
||||
of the next column will be pushed over. (The comment field width is
|
||||
not currently being used, but may be used to fold lines in the future.)</p>
|
||||
|
||||
<p>When "show cycle counts in comments" is checked, cycle counts are
|
||||
inserted into end-of-line comments. This works the same as the option
|
||||
in the Code View tab, but applies to generated source code rather than
|
||||
the on-screen display.</p>
|
||||
|
||||
<p>If "put long labels on separate line" is checked, labels that are
|
||||
longer than the label column are placed on their own line. This looks
|
||||
a bit nicer because otherwise the opcode gets pushed out of alignment.
|
||||
(Some assemblers get bent out of shape if you split an equate
|
||||
directive, so those might stay on one line.)</p>
|
||||
|
||||
<p>If you enable "identify assembler in output", a comment will be
|
||||
added to the top of the generated assembly output that identifies the
|
||||
target assembler and version. It also shows the command-line options
|
||||
passed to the assembler. This can be very helpful if the source
|
||||
file is sent to other people, since it may not otherwise be obvious from
|
||||
the source file what the intended target assembler is, or what options
|
||||
are required to process the file correctly.</p>
|
||||
|
||||
|
||||
<h2><a name="project-properties">Project Properties</a></h2>
|
||||
|
||||
<p>Project properties are stored in the .dis65 project file.
|
||||
They specify which CPU to use, which extension scripts to load, and a
|
||||
variety of other things that directly impact how SourceGen processes
|
||||
the project. Because of the potential impact, all changes to
|
||||
the project properties are made through the undo/redo buffer,
|
||||
which means you hit "undo" to revert a property change.</p>
|
||||
|
||||
<p>The properties editor is divided into four tabs. Changes aren't pushed
|
||||
out to the main application until you close the dialog. Clicking Apply
|
||||
will capture the current changes, ensuring that they're applied even if
|
||||
you later hit Cancel, but the changes are not applied immediately.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-general">General</a></h3>
|
||||
|
||||
<p>The choice of CPU determines the set of available instructions, as
|
||||
well as cycle costs and register widths. There are many variations
|
||||
on the 6502, but from the perspective of a disassembler most can be
|
||||
treated as one of these four:</p>
|
||||
<ol>
|
||||
<li>MOS 6502. The original 8-bit instruction set.</li>
|
||||
<li>WDC 65C02. Expanded the instruction set and smoothed
|
||||
some rough edges.</li>
|
||||
<li>WDC W65C02S. An enhanced version of the 65C02, with some
|
||||
additional instructions introduced by Rockwell (R65C02), as well
|
||||
as WDC's STP and WAI instructions. The Rockwell additions overlap
|
||||
with 65816 instructions, so code that uses them will not work on
|
||||
16-bit CPUs.</li>
|
||||
<li>WDC W65C816S. Expanded instruction set, 24-bit address space,
|
||||
and 16-bit registers.</li>
|
||||
</ol>
|
||||
<p>The Hudson Soft HuC6280 and Commodore CSG 4510 / 65CE02 are very
|
||||
similar, but they have additional instructions and some fundamental
|
||||
architectural changes. These are not currently supported by SourceGen.</p>
|
||||
|
||||
<p>If "enable undocumented instructions" is checked, some additional
|
||||
opcodes are recognized on the 6502 and 65C02. These instructions are
|
||||
not part of the chip specification, but most of them have consistent
|
||||
behavior and can be used. If the box is not checked, the instructions
|
||||
are treated as invalid and cause the code analyzer to assume that it
|
||||
has run into a data area. This option has no effect on the 65816.</p>
|
||||
<p>The "treat BRK as two-byte instruction" checkbox determines whether
|
||||
BRK instructions should be handled as if they have an operand.</p>
|
||||
|
||||
<p>The entry flags determine the initial value for the processor status
|
||||
flag register. Code that is unreachable internally (requiring a code
|
||||
start point tag) will use this value. This is chiefly of value for
|
||||
65816 code, where the initial value of the M/X/E flags has a significant
|
||||
impact on how instructions are disassembled.</p>
|
||||
|
||||
<p>If "analyze uncategorized data" is checked, SourceGen will attempt to
|
||||
identify character strings and regions that are filled with a repeated
|
||||
value. If it's not checked, anything that isn't detected as code or
|
||||
explicitly formatted as data will be shown as individual byte values.</p>
|
||||
<p>If "seek nearby targets" is checked, the analyzer will try to use
|
||||
nearby labels for data loads and stores, adjusting them to fit
|
||||
(e.g. <code>LDA LABEL+1</code>). If not enabled, labels are not applied
|
||||
unless they match exactly. Note that references into the middle of an
|
||||
instruction or formatted data area are always adjusted, regardless of
|
||||
how this is set. This setting has no effect on local variables, and
|
||||
only enables a 1-byte backward search on project/platform symbols.</p>
|
||||
<p>The "use relocation data" checkbox is only available if the project
|
||||
was created from a relocatable source, e.g. by the OMF Converter tool.
|
||||
If checked, information from the relocation dictionary will be used to
|
||||
improve automatic operand formatting.</p>
|
||||
<p>If "smart PLP handling" is checked, the analyzer will try to use
|
||||
the processor status flags from a nearby <code>PHP</code> when a
|
||||
<code>PLP</code> is encountered. If not enabled, all flags are set to
|
||||
"indeterminate" following a <code>PLP</code>, except for the M/X
|
||||
flags on the 65816, which are left unmodified. (In practice this
|
||||
approach doesn't seem to work all that well, so the setting is
|
||||
un-checked by default.)</p>
|
||||
<p>If "smart PLB handling" is checked, the analyzer will watch for
|
||||
code patterns like <code>PLB</code> preceded by <code>PHK</code>,
|
||||
and generate appropriate Data Bank Register changes. If not enabled,
|
||||
the DBR is set to the bank of the address of the start of the file,
|
||||
and does not change unless explicitly set. Only useful for 65816 code.</p>
|
||||
|
||||
<p>The "default text encoding" setting has two effects. First, it
|
||||
specifies which character encoding to use when searching for strings in
|
||||
uncategorized data. Second, if an assembler has a notion of preferred
|
||||
character encoding (e.g. you can default string operands to PETSCII),
|
||||
this setting will determine which encoding is preferred.</p>
|
||||
<p>The "min chars for string detection" setting determines how many
|
||||
ASCII characters need to appear consecutively for the data analyzer to
|
||||
declare it a string. Shorter values are prone to false-positive
|
||||
identifications, longer values miss out on short strings. You can also
|
||||
set it to "none" to disable automatic string identification.</p>
|
||||
|
||||
<p>The auto-label style setting determines the format for labels that are
|
||||
generated automatically. By default the label will be the letter 'L'
|
||||
followed by the hexadecimal address, but the label can be annotated based
|
||||
on usage. For example, addresses that are the target of branch instructions
|
||||
can be labeled with the letter 'B'.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-projsym">Project Symbols</a></h3>
|
||||
<p>You can add, edit, and delete individual symbols and constants.
|
||||
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
|
||||
explanation of how project symbols work.</p>
|
||||
|
||||
<p>The Edit Symbol button opens the
|
||||
<a href="editors.html#project-symbol">Edit Project Symbol</a> dialog, which
|
||||
allows changing any part of a symbol definition. You're not allowed to
|
||||
create two symbols with the same label.</p>
|
||||
|
||||
<p>The Import button allows you to import symbols from another project.
|
||||
Only labels that have been tagged as global and exported will be imported.
|
||||
Existing symbols with identical labels will be replaced, so it's okay to
|
||||
run the importer multiple times. Labels that aren't found will not be
|
||||
removed, so you can safely import from multiple projects, but will need
|
||||
to manually delete any symbols that are no longer being exported.</p>
|
||||
|
||||
<p>Shortcut: you can open the project properties window with the
|
||||
Project Symbols tab selected by hitting F6 from the main code list.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-symfiles">Symbol Files</a></h3>
|
||||
<p>From here, you can add and remove platform symbol files, or change
|
||||
the order in which they are loaded.
|
||||
See the <a href="intro-details.html#about-symbols">symbols</a> section for an
|
||||
explanation of how platform symbols work, and the
|
||||
<a href="advanced.html#platform-symbols">advanced topics</a> section
|
||||
for a description of the file syntax.</p>
|
||||
|
||||
<p>Platform symbol files must live in the RuntimeData directory that comes
|
||||
with SourceGen, or in the directory where the project file lives. This
|
||||
is mostly to keep things manageable when projects are distributed to
|
||||
other people, but also acts as a minor security check, to prevent a
|
||||
wayward project from trying to open files it shouldn't.</p>
|
||||
<p>Click one of the "Add Symbol Files" buttons to include one or more
|
||||
symbol files in the project.
|
||||
The "Add Symbol Files from Runtime" button sets the directory
|
||||
to the SourceGen RuntimeData directory, while "Add Symbol Files from Project"
|
||||
starts in the project directory. If you haven't yet saved the project,
|
||||
the latter button will be disabled. The only difference between the
|
||||
buttons is the initial directory.</p>
|
||||
<p>In the list, files loaded from the RuntimeData directory will be
|
||||
prefixed with <code>RT:</code>. Files loaded from the project directory
|
||||
will be prefixed with <code>PROJ:</code>.</p>
|
||||
<p>If a platform symbol file can't be found when the project is opened,
|
||||
you will receive a warning.</p>
|
||||
|
||||
|
||||
<h3><a name="projprop-extscripts">Extension Scripts</a></h3>
|
||||
<p>From here, you can add and remove extension script files.
|
||||
See the <a href="advanced.html#extension-scripts">extension scripts</a>
|
||||
section for details on how extension scripts work.</p>
|
||||
|
||||
<p>Extension script files must live in the RuntimeData directory that comes
|
||||
with SourceGen, or in the directory where the project file lives. This
|
||||
is mostly to keep things manageable when projects are distributed to
|
||||
other people, but also acts as a minor security check, to prevent a
|
||||
wayward project from trying to open files it shouldn't.</p>
|
||||
<p>Click one of the "Add Scripts" buttons to include one more scripts in
|
||||
the project. The "Add Scripts from Runtime" button sets the directory
|
||||
to the SourceGen RuntimeData directory, while "Add Scripts from Project"
|
||||
starts in the project directory. If you haven't yet saved the project,
|
||||
the latter button will be disabled. The only difference between the
|
||||
buttons is the initial directory.</p>
|
||||
<p>In the list, files loaded from the RuntimeData directory will be
|
||||
prefixed with <code>RT:</code>. Files loaded from the project directory
|
||||
will be prefixed with <code>PROJ:</code>.</p>
|
||||
<p>If an extension script file can't be found when the project is opened,
|
||||
you will receive a warning.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,159 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Tools - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Tools</h1>
|
||||
|
||||
<h2><a name="instruction-chart">Instruction Chart</a></h2>
|
||||
|
||||
<p>This opens a window with a summary of all 256 opcodes. The CPU can
|
||||
be chosen from the pop-up list at the bottom. Undocumented opcodes for
|
||||
6502/65C02 are shown in italics, and can be excluded from the list
|
||||
by unchecking the box at the bottom.</p>
|
||||
<p>The status flags affected by each instruction reflect their behavior
|
||||
on the 65816. The only significant difference between 65816 and
|
||||
6502/65C02 is the way the BRK instruction affects the D and B/X flags.</p>
|
||||
|
||||
|
||||
<h2><a name="ascii-chart">ASCII Chart</a></h2>
|
||||
|
||||
<p>This opens a window with the ASCII character set. Each character is
|
||||
displayed next to its numeric value in decimal and hexadecimal. The
|
||||
pop-up list at the bottom allows you to flip between standard and "high"
|
||||
ASCII.</p>
|
||||
|
||||
|
||||
<h2><a name="apple2-screen-chart">Apple II Screen Chart</a></h2>
|
||||
|
||||
<p>The Apple II text and hi-res screens are mapped to memory in a way
|
||||
that makes sense to computers but is a little confusing for humans. This
|
||||
chart maps line numbers to addresses and vice-versa. Select different
|
||||
screens and sort orders from the list at the bottom.</p>
|
||||
|
||||
|
||||
<h2><a name="hexdump">Hex Dump Viewer</a></h2>
|
||||
|
||||
<p>You can use this to view the contents of the project data file
|
||||
by double-clicking the "bytes" column, or with Actions > Show Hex Dump.
|
||||
The viewer is displayed in a "modeless" dialog that does not
|
||||
prevent you from continuing to work with the project. If you
|
||||
double-click a different line in the project, the viewer will automatically
|
||||
highlight those bytes.</p>
|
||||
|
||||
<p>You can also use this to view the contents of arbitrary files by
|
||||
using Tools > Hex Dump. There is no fixed limit on the number of
|
||||
viewers you can have open simultaneously. (Be aware that the viewer
|
||||
currently loads the entire file into memory, and you will run out of room
|
||||
eventually. Not coincidentally, the viewer has a size limit of 16MiB
|
||||
per file.)</p>
|
||||
|
||||
<p>You can select lines with the mouse as you would in any other list
|
||||
view. Ctrl+A selects all lines. Ctrl+C copies the selected lines to
|
||||
the system clipboard.</p>
|
||||
|
||||
<p>The "character conversion" selector allows you to choose how the
|
||||
bytes are converted to characters for the Text column. Choose from
|
||||
the usual set of encodings.</p>
|
||||
|
||||
<p>If "ASCII-only dump" is not checked, non-printable bytes are shown in
|
||||
the ASCII dump as a middle dot ('·'). If the box is checked,
|
||||
non-printable bytes are represented by a period ('.') instead. The
|
||||
use of non-ASCII characters makes the dump unambiguous when unprintable
|
||||
characters are mixed with periods, but the lines may be unsuitable for
|
||||
pasting in some forums.</p>
|
||||
|
||||
<p>If "always on top" is checked, the window will stay above all other
|
||||
windows that don't also declare that they should always be on top. By
|
||||
default this box is checked when displaying project data, and not checked for
|
||||
external files.</p>
|
||||
|
||||
|
||||
<h2><a name="file-concat">File Concatenator</a></h2>
|
||||
|
||||
<p>The File Concatenator combines multiple files into a single file.
|
||||
Select the files to add, arrange them in the proper order, then hit
|
||||
"Save". CRC-32 values are shown for reference.</p>
|
||||
|
||||
|
||||
<h2><a name="file-slicer">File Slicer</a></h2>
|
||||
|
||||
<p>The File Slicer allows you to "slice" a piece out of a file, saving
|
||||
it to a new file. Specify the start and length in decimal or hex. If
|
||||
you leave a field blank, they will default to offset 0 and the remaining
|
||||
length of the file, respectively.</p>
|
||||
<p>The hex dumps show the area just before and after the chunk to be
|
||||
sliced, allowing you to confirm the placement.</p>
|
||||
|
||||
|
||||
<h2><a name="omf-converter">OMF Converter</a></h2>
|
||||
|
||||
<p>This tool allows you to view Apple IIgs Object Module Format (OMF)
|
||||
executables, and convert them for disassembly.</p>
|
||||
|
||||
<p>OMF executables have multiple segments with relocatable code. References
|
||||
to addresses aren't filled in until the program is loaded into memory,
|
||||
which makes it difficult to disassemble the file. The conversion tool
|
||||
loads the OMF file in roughly the same way the GS/OS System Loader would,
|
||||
placing each segment at the start of a bank unless otherwise directed.
|
||||
The loaded image is saved to a new file, and a SourceGen project file is
|
||||
created with some basic attributes filled in.</p>
|
||||
|
||||
<p>Only "Load" files (S16, PIF, TOL, etc) may be converted. Compiler object
|
||||
files and libraries contain references that must be resolved by
|
||||
a IIgs linker, and are not supported.</p>
|
||||
|
||||
<p>Before you can examine or convert a file, you must first extract
|
||||
it from the Apple II disk image, using a mode that does not modify the
|
||||
original (e.g. extract with "configure to preserve Apple II formats"
|
||||
in CiderPress). Then, open it with the "Tools > Convert OMF".</p>
|
||||
|
||||
<p>The initial view shows all of the OMF segments in the file. Double-clicking
|
||||
on an entry opens a detailed view that shows the segment header and a
|
||||
list of all the OMF records. For load files, the relocation dictionary is
|
||||
also shown.</p>
|
||||
|
||||
<p>To convert the file, click "Generate" to create a modified binary and a
|
||||
SourceGen project file.</p>
|
||||
|
||||
<p>If "offset segment start by $0100" is checked, the converter will try
|
||||
to shift the segment's load address from <code>$xx/0000</code> to
|
||||
<code>$xx/0100</code>. This can make the generated code a little nicer
|
||||
to work with because it removes potential ambiguity with direct page
|
||||
addresses. For example, <code>LDA $56</code> and <code>LDA $0056</code>
|
||||
may be interpreted as the same thing by the assembler, requiring
|
||||
generation of operand width disambiguators. By shifting the initial
|
||||
address we avoid the potential ambiguity.</p>
|
||||
<p>Check "add comments and notes for each segment" to add a long comment
|
||||
and a note at the start of each segment. The comments include the
|
||||
segment name, type, and optional flags. The notes just provide a quick
|
||||
way to jump to a segment.</p>
|
||||
|
||||
<p>The binary generated by the tool is not in OMF format and will not
|
||||
execute on an Apple IIgs. To be functional, the generated sources must be
|
||||
assembled by a program capable of generating OMF output, such as Merlin.</p>
|
||||
|
||||
<p>The <a href="advanced.html#reloc-data">relocation dictionaries</a> from
|
||||
the executable are included in the project file, and can be used to guide
|
||||
the disassembler's analysis. The "use reloc data" setting in the project
|
||||
properties controls this feature.</p>
|
||||
|
||||
<p>A full explanation of the structure of OMF is beyond the scope of this
|
||||
manual. For more information on OMF, see Appendix F of the GS/OS Reference
|
||||
Manual.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,25 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Tutorials - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Tutorials</h1>
|
||||
|
||||
<p><strong>NOTE:</strong> this tutorial has been replaced by
|
||||
content on the 6502bench web site. Visit
|
||||
<a href="https://6502bench.com/sgtutorial/">https://6502bench.com/sgtutorial/</a>.</p>
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2018 faddenSoft -->
|
||||
</html>
|
||||
@@ -0,0 +1,315 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<head>
|
||||
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="main.css" rel="stylesheet" type="text/css" />
|
||||
<title>Visualizations - 6502bench SourceGen</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<div id="content">
|
||||
<h1>6502bench SourceGen: Visualizations</h1>
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
|
||||
<h2><a name="overview">Overview</a></h2>
|
||||
|
||||
<p>Programs are generally a combination of code and data. Sometimes
|
||||
the data is graphical in nature, e.g. a bitmap used as a font or
|
||||
game sprite. Being able to see the data in graphic form can make it
|
||||
easier to determine the purpose of associated code.</p>
|
||||
<p>While modern systems use GIF, JPEG, and PNG to hold 2D bitmaps,
|
||||
graphical elements embedded in 6502 applications are almost always
|
||||
in a platform-specific form. For this reason, the task of generating
|
||||
images from data is performed by
|
||||
<a href="advanced.html#extension-scripts">extension scripts</a>. Some
|
||||
scripts for common formats are included in the SourceGen runtime directory.
|
||||
If these don't do what you need, you can write your own scripts and
|
||||
include them in your project.</p>
|
||||
<p>The project file doesn't store the converted graphics. Instead, the
|
||||
project file holds a string that identifies the converter, and a list of
|
||||
parameters that are passed to the converter. Images are generated when
|
||||
the project is first opened, and updated when certain things change in
|
||||
the project.</p>
|
||||
<p>Visualizations are not included in generated assembly output. They
|
||||
may be included in HTML exports.</p>
|
||||
<p>Because visualizations are associated with a specific file offset,
|
||||
they will become "hidden" if the offset isn't at the start of a line,
|
||||
e.g. it's in the middle of a multi-byte instruction or data item. The
|
||||
editors will try to prevent you from doing this.</p>
|
||||
<p>Bitmaps will always be scaled up as much as possible to make them
|
||||
easy to see. This means that small shapes and large shapes may appears
|
||||
to be the same size when displayed as thumbnails in the code list.</p>
|
||||
<p>The role of a visualization generator is to take a collection of input
|
||||
parameters and generate graphical data. It's most useful for graphical
|
||||
sources like bitmaps, but it's not limited to that. You could, for example,
|
||||
write a script that generates random flowers, and use it to make your
|
||||
source listings more cheerful.</p>
|
||||
|
||||
|
||||
<h2><a name="vis-and-sets">Visualizations and Visualization Sets</a></h2>
|
||||
|
||||
<p>Visualizations are essentially decorative: they do not affect the
|
||||
assembled output, and do not change how code is analyzed. They are
|
||||
contained in sets that are placed at arbitrary offsets. Each set can
|
||||
contain multiple items. For example, if a file has data for
|
||||
10 bitmaps, you can place a visualization near each, or create a single
|
||||
visualization set with all 10 items and put it at the start of the file.
|
||||
You can display a visualization near the data or near the instructions
|
||||
that perform the drawing. Or both.</p>
|
||||
|
||||
<p>To create a visualization set, select a code or data line, and use
|
||||
Actions > Create/Edit Visualization Set. To edit a visualization set,
|
||||
select it and use the same menu item, or just double-click on it. This
|
||||
opens the Visualization Set Editor window.</p>
|
||||
|
||||
<p>The visualization set editor shows a list of visualizations associated
|
||||
with the selected file offset. You can create a new visualization, edit
|
||||
or remove an existing entry, or rearrange them.
|
||||
If you select "New Bitmap" or edit an existing bitmap entry, the
|
||||
Bitmap Visualization Editor window will open.
|
||||
Similarly, if you select "New Bitmap Animation" or edit an existing
|
||||
bitmap animation, the Bitmap Animation Editor will open.</p>
|
||||
|
||||
<h4>Visualization Editor</h4>
|
||||
|
||||
<p>The combo box at the top of the screen lists every visualization
|
||||
generator defined by an active extension script. Select the one that is
|
||||
appropriate for the data you're trying to visualize. Every visualizer may
|
||||
have different parameters, so as you select different entries the set of
|
||||
input parameters below the preview window may change.</p>
|
||||
<p>There are two categorizes of visualization generator: bitmap, and
|
||||
wireframe. Bitmaps are simple 2D images, but wireframes are 2D or 3D
|
||||
meshes that can be viewed from different angles. When you select a
|
||||
wireframe generator, additional view controls will be added at the bottom.
|
||||
(See below.)</p>
|
||||
|
||||
<p>The "tag" is a unique string that will be shown in the display list.
|
||||
This is not a label, and may contain any characters you want (but leading
|
||||
and trailing whitespace will be trimmed). The only requirement is that
|
||||
it be unique across all visualizations (bitmaps, animations, etc).</p>
|
||||
<p>The preview window shows the visualizer output. The generated image is
|
||||
expanded to fill the window, so small bitmaps will be shown with very
|
||||
large pixels.
|
||||
If you resize the editor window, the preview window will expand, which
|
||||
can make it easier to see detail on larger images.
|
||||
If the generator fails, the preview window will show a red 'X', and an
|
||||
error message will appear below it.</p>
|
||||
<p>Parameters may be numeric or boolean. The latter use a simple checkbox,
|
||||
the former a text entry field that accepts decimal and hexadecimal values.
|
||||
The range of allowable values is shown to the right of the entry field.
|
||||
If you enter an invalid value, the parameter description will turn red.</p>
|
||||
|
||||
<p>The "Export" button at the top right can be used to save a copy of
|
||||
the bitmap or wireframe rendering with the current parameters.</p>
|
||||
|
||||
<h5>Wireframe View Controls</h5>
|
||||
|
||||
<p>The wireframe generator may offer the choice of perspective vs.
|
||||
orthographic projection, and whether or not to enable backface
|
||||
culling. These are declared in the visualization generator script,
|
||||
but implemented in the viewer. If the generator doesn't
|
||||
declare them, the default is to render with a perspective projection
|
||||
and without culling.</p>
|
||||
<p>The viewer allows you to rotate the image about the X, Y, and Z
|
||||
axes. The viewer provides a left-handed coordinate system,
|
||||
with +X toward the right, +Y toward the top of the screen, and +Z
|
||||
going into the screen. The object will be placed a short distance
|
||||
down the Z axis and scaled to fit the window.
|
||||
Positive rotations cause a counter-clockwise rotation when the axis
|
||||
about which rotations are performed points toward the viewer. The
|
||||
rotations are performed with a matrix using Euler angles, and are
|
||||
subject to gimbal lock (e.g. if you set Y to 90 degrees, X and Z rotate
|
||||
about the same axis).</p>
|
||||
<p>If you check the "Animated" box, you can add a simple spin. Choose
|
||||
the number of degrees to rotate per frame, how many frames to generate before
|
||||
resetting, and the delay between each frame. Clicking the "Auto" button
|
||||
will automatically select the number of frames needed to display the
|
||||
animation in an unbroken loop (useful for animated GIFs). Click
|
||||
the "Test Animation" button to see it in action.</p>
|
||||
|
||||
<h4>Bitmap Animation Editor</h4>
|
||||
|
||||
<p>Bitmap animations allow you to create a simple animation from a
|
||||
collection of other visualizations. This can be useful when a program
|
||||
stores animated graphics as a series of frames.</p>
|
||||
<p>The "tag" is a unique string that will be shown in the display list.
|
||||
The same rules apply as for bitmap visualizations.</p>
|
||||
<p>The list at the top left holds all visualizations. Select items on
|
||||
the left and use the "Add" button to add them to the list on the right,
|
||||
which has the set that is included in the animation. You can reorder
|
||||
the list with the up/down buttons. Adding the same frame multiple times
|
||||
is allowed.</p>
|
||||
<p>The "frame delay" field lets you specify how long each frame is shown
|
||||
on screen, in milliseconds. Some animation formats may use a different
|
||||
time resolution; for example, animated GIFs use units of 1/100th of a
|
||||
second. The closest value will be used. Note also that some viewers
|
||||
(notably web browsers) will cap the update rate.</p>
|
||||
<p>When you have one or more frames in the animation list, you can preview
|
||||
the result in the window at the bottom. The actual appearance may be
|
||||
slightly different, especially if the frames are different sizes. For
|
||||
example, the preview window scales individual frames, but animated GIFs
|
||||
will be scaled to the size of the largest frame.</p>
|
||||
|
||||
|
||||
<h2><a name="runtime">Scripts Included with SourceGen</a></h2>
|
||||
|
||||
<p>A number of visualization generation scripts are included with
|
||||
SourceGen, in the platform-specific runtime data directories.</p>
|
||||
|
||||
<p>Most generators will take the file offset, bitmap width, and bitmap
|
||||
height as parameters. Offsets are handled as they are elsewhere, i.e.
|
||||
always in hexadecimal, with a leading '+'.
|
||||
Some less-common parameters include:</p>
|
||||
<ul>
|
||||
<li><b>Column stride</b> - number of bytes used to hold a column.
|
||||
This is uncommon, but could be used if (say) a pair of bitmaps
|
||||
was stored with interleaved bytes. If you set this to zero the
|
||||
visualizer will default to no interleave (col_stride = 1).</li>
|
||||
<li><b>Row stride</b> - number of bytes between the start of each
|
||||
row. This is used when a row has padding on the end, e.g. a
|
||||
bitmap that's 7 bytes wide might be padded to 8 for easy indexing,
|
||||
or when bitmap data is interleaved. If you set this to zero the
|
||||
visualizer will default to no padding
|
||||
(row_stride = width * column_stride).</li>
|
||||
<li><b>Cell stride</b> - for multi-bitmap data like a font or sprite
|
||||
sheet, this determines the number of bytes between the start of
|
||||
one item and the next. If set to zero a "dense" arrangement is
|
||||
assumed (cell_stride = row_stride * item_height).</li>
|
||||
</ul>
|
||||
|
||||
<p>Remember that this is a disassembler, not an image converter. The
|
||||
results do not need to be perfectly accurate to be useful when disassembling
|
||||
code.</p>
|
||||
|
||||
|
||||
<h3>Apple II - Apple/VisHiRes and Apple/VisShapeTable</h3>
|
||||
|
||||
<p>There is no standard format for small hi-res bitmaps, but certain
|
||||
arrangements are common. The VisHiRes script defines four generators:</p>
|
||||
|
||||
<ul>
|
||||
<li><b>Hi-Res Bitmap</b> - converts an MxN row-major bitmap.</li>
|
||||
<li><b>Hi-Res Sprite Sheet</b> - converts a series of bitmaps and
|
||||
renders them in a grid. Useful for games that use cell
|
||||
animation. The generated bitmap has a 1-pixel transparent gap
|
||||
between elements.</li>
|
||||
<li><b>Hi-Res Bitmap Font</b> - a simplified version of the
|
||||
Sprite Sheet, intended for the common 7x8 monochrome fonts.
|
||||
Most fonts have 96 or 128 glyphs, though some drop the last
|
||||
character.
|
||||
(This also works for Apple /// fonts, but currently ignores
|
||||
the high bit in each byte.)</li>
|
||||
<li><b>Hi-Res Screen Image</b> - used for 8KiB screen images. The
|
||||
data is linearized and converted to a 280x192 bitmap. Because
|
||||
images are relatively large, the generator does not require them
|
||||
to be contiguous in the file, i.e. two halves of the image can be
|
||||
in different parts of the file so long as they end up contiguous
|
||||
in memory.</li>
|
||||
</ul>
|
||||
|
||||
<p>Widths are specified in bytes, not pixels. Each byte represents 7
|
||||
pixels (with some hand-waving).</p>
|
||||
|
||||
<p>In addition to offset, dimensions, and stride values, the bitmap
|
||||
converter has a checkbox for monochrome or color, and two checkboxes
|
||||
that affect the color. The first causes the first byte to be treated
|
||||
as being in an odd column rather than an even one, which affects
|
||||
green vs. purple and orange vs. blue. The second flips the high bits
|
||||
on every byte, switching green vs. orange and purple vs. blue.
|
||||
Neither has any effect on black & white bitmaps.</p>
|
||||
<p>The converter generates one output pixel for every source pixel, so
|
||||
half-pixel shifts are not represented.</p>
|
||||
|
||||
<p>The VisShapeTable script renders Applesoft shape tables, which can
|
||||
have multiple vector shapes. The only parameter other than the offset
|
||||
is the shape number.</p>
|
||||
|
||||
|
||||
<h3>Atari 2600 - Atari/Vis2600</h3>
|
||||
|
||||
<p>The Atari 2600 graphics system has registers that determine the
|
||||
appearance of a sprite or playfield on a single row. The register
|
||||
values are typically changed as the screen is drawn to get different
|
||||
data on successive rows. The visualization generator doesn't attempt
|
||||
to emulate this behavior, but works well for data stored in a
|
||||
straightforward fashion.</p>
|
||||
|
||||
<ul>
|
||||
<li><b>Sprite</b> - basic 1xN sprite, converted to an image 8 pixels
|
||||
wide. Square pixels are assumed.</li>
|
||||
<li><b>Playfield</b> - assumes PF0,PF1,PF2 are stored in that order,
|
||||
multiple entries following each other. Specify the number of
|
||||
3-byte entries as the height.
|
||||
Since most playfields aren't the full height of the screen,
|
||||
it will tend to look squashed. Use the "row thickness" feature
|
||||
to repeat each row N times to adjust the proportions.
|
||||
The "Reflected" checkbox determines whether the right-side image is
|
||||
repeated as-is or flipped.</li>
|
||||
</ul>
|
||||
|
||||
<h3>Atari Arcade - Atari/VisAVG </h3>
|
||||
|
||||
<p>Different versions of Atari's Analog Vector Graphics were used in
|
||||
several games, notably Battlezone, Tempest, and Star Wars. The commands
|
||||
drove a vector display monitor. SourceGen visualizes them as 2D
|
||||
wireframes, which isn't a perfect fit since they can describe points as
|
||||
well as lines, but works fine for annotating a disassembly.</p>
|
||||
<p>The visualizer takes two arguments: the offset of the start of
|
||||
the commands to visualize, and the base address of vector RAM. The latter
|
||||
is necessary to convert AVG JMP/JSR commands into offsets.</p>
|
||||
|
||||
<h3>Commodore 64 - Commodore/VisC64</h3>
|
||||
|
||||
<p>The Commodore 64 has a 64-bit sprite format defined by the hardware.
|
||||
It comes in two basic varieties:</p>
|
||||
<ul>
|
||||
<li><b>High-resolution sprite</b> - 24x21 monochrome. Pixels are either
|
||||
colored or transparent.</li>
|
||||
<li><b>Multi-color sprite</b> - 12x21 3-color. The width of each pixel
|
||||
is doubled to make it 24x21.
|
||||
</ul>
|
||||
<p>Sprites can be doubled in width and/or height.</p>
|
||||
<p>Colors come from a hardware-defined palette of 16:</p>
|
||||
<ol start="0" style="columns:2; -webkit-columns:2; -moz-columns:2;">
|
||||
<li><span style="color:#ffffff;background-color:#000000"> black </span></li>
|
||||
<li><span style="color:#000000;background-color:#ffffff"> white </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#67372b"> red </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#70a4b2"> cyan </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#6f3d86"> purple </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#588d43"> green </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#352879"> blue </span></li>
|
||||
<li><span style="color:#000000;background-color:#b8c76f"> yellow </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#6f4f25"> orange </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#433900"> brown </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#9a6759"> light red </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#444444"> dark grey </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#6c6c6c"> grey </span></li>
|
||||
<li><span style="color:#000000;background-color:#9ad284"> light green </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#6c5eb5"> light blue </span></li>
|
||||
<li><span style="color:#ffffff;background-color:#959595"> light grey </span></li>
|
||||
</ol>
|
||||
|
||||
<p>Bear in mind that the editor scales images to their maximum size, so
|
||||
a sprite that is doubled in both width and height will look exactly like
|
||||
a sprite that is not doubled at all.</p>
|
||||
|
||||
<h3>Nintendo Entertainment System - Nintendo/VisNES</h3>
|
||||
|
||||
<p>NES PPU pattern tables hold 8x8 tiles with 2 bits of color per pixel.
|
||||
Converting the full collection to a reference bitmap is straightforward.
|
||||
A few color palette options are offered.</p>
|
||||
|
||||
<p>Sprites and backgrounds are formed from collections of tiles. In
|
||||
some cases this is straightfoward, in others it's not. A visualization
|
||||
generator that renders a "tile grid" is available for simpler cases.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<div id="footer">
|
||||
<p><a href="index.html">Back to index</a></p>
|
||||
</div>
|
||||
</body>
|
||||
<!-- Copyright 2019 faddenSoft -->
|
||||
</html>
|
||||
Reference in New Issue
Block a user