mirror of
https://github.com/fadden/6502bench.git
synced 2025-01-07 06:30:52 +00:00
465 lines
21 KiB
HTML
465 lines
21 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<head>
|
|
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
<link href="main.css" rel="stylesheet" type="text/css" />
|
|
<title>Advanced Topics - 6502bench SourceGen</title>
|
|
</head>
|
|
|
|
<body>
|
|
<div id="content">
|
|
<h1>6502bench SourceGen: Advanced Topics</h1>
|
|
<p><a href="index.html">Back to index</a></p>
|
|
|
|
|
|
<h2><a name="platform-symbols">Platform Symbol Files (.sym65)</a></h2>
|
|
|
|
<p>Platform symbol files contain lists of symbols, each of which has a
|
|
label and a value. SourceGen comes with a collection of symbols for
|
|
popular systems, but you can create your own. This can be handy if a
|
|
few different projects are coded against a common library.</p>
|
|
|
|
<p>If two symbols have the same value, the older symbol is replaced by
|
|
the newer one. This is why the order in which symbol files are loaded
|
|
matters.</p>
|
|
|
|
<p>Platform symbol files consist of comments, commands, and symbols.
|
|
Blank lines, and lines that begin with a semicolon (';'), are ignored. Lines
|
|
that begin with an asterisk ('*') are commands. Three are currently
|
|
defined:</p>
|
|
<ul>
|
|
<li><code>*SYNOPSIS</code> - a short summary of the file contents.</li>
|
|
<li><code>*TAG</code> - a tag string to apply to all symbols that follow
|
|
in this file.</li>
|
|
<li><code>*MULTI_MASK</code> - specify a mask for symbols that appear
|
|
at multiple addresses.</li>
|
|
</ul>
|
|
|
|
<p>Tags can be used by extension scripts to identify a subset of symbols.
|
|
The symbols are still part of the global set; the tag just provides a
|
|
way to extract a subset. Tags should be comprised of non-whitespace ASCII
|
|
characters. Tags are global, so use a long, descriptive string. If
|
|
<code>*TAG</code> is not followed by a string, the symbols that follow
|
|
are treated as untagged.</p>
|
|
|
|
<p>All other lines are symbols, which have the form:</p>
|
|
<pre>
|
|
LABEL {=|@|<|>} VALUE [WIDTH] [;COMMENT]
|
|
</pre>
|
|
|
|
<p>The LABEL must be at least two characters long, begin with a letter or
|
|
underscore, and consist entirely of alphanumeric ASCII characters
|
|
(A-Z, a-z, 0-9) and the underscore ('_'). (This is the same format
|
|
required for line labels in SourceGen.)</p>
|
|
<p>The next token can be one of:</p>
|
|
<ul>
|
|
<li>@: general addresses</li>
|
|
<li><: read-only addresses</li>
|
|
<li>>: write-only addresses</li>
|
|
<li>=: constants</li>
|
|
</ul>
|
|
<p>If an instruction references an address, and that address is outside
|
|
the bounds of the file, the list of address symbols (i.e. everything
|
|
that's not a constant) will be scanned for a match.
|
|
If found, the symbol is applied automatically. You normally want to
|
|
use '@', but can use '<' and '>' for memory-mapped I/O locations
|
|
that have different behavior depending on whether they are read or
|
|
written.</p>
|
|
|
|
<p>The VALUE is a number in decimal, hexadecimal (with a leading '$'), or
|
|
binary (with a leading '%'). The numeric base will be recorded and used when
|
|
formatting the symbol in generated output, so use whichever form is most
|
|
appropriate. Values are unsigned 24-bit numbers. The special value
|
|
"erase" may be used for an address to erase a symbol defined in an earlier
|
|
platform file.</p>
|
|
|
|
<p>The WIDTH is optional, and ignored for constants. It must be a
|
|
decimal or hexadecimal value between 1 and 65536, inclusive. If omitted,
|
|
the default width is 1.</p>
|
|
|
|
<p>The COMMENT is optional. If present, it will be saved and used as the
|
|
end-of-line comment on the .EQ directive if the symbol is used.</p>
|
|
|
|
<h4>Using MULTI_MASK</h4>
|
|
|
|
<p>The multi-address mask is used for systems like the Atari 2600, where
|
|
RAM, ROM, and I/O registers appear at multiple addresses. The hardware
|
|
looks for certain address lines to be set or clear, and if the pattern
|
|
matches, another set of bits is examined to determine which register or
|
|
RAM address is being accessed.</p>
|
|
<p>This is expressed in symbol files with the MULTI_MASK statement.
|
|
Address symbol declarations that follow have the mask set applied. Symbols
|
|
whose addresses don't fit the pattern cause a warning and will be
|
|
ignored. Constants are not affected.</p>
|
|
|
|
<p>The mask set is best explained with an example. Suppose the address
|
|
pattern for a set of registers is <code>???0 ??1? 1??x xxxx</code>
|
|
(where '?' can be any value, 0/1 must be that value, and 'x' means the bit
|
|
is used to determine the register).
|
|
So any address between $0280-029F matches, as does $23C0-23DF, but
|
|
$0480 and $1280 don't. The register number is found in the low five bits.</p>
|
|
<p>The corresponding MULTI_MASK line, with values specifed in binary,
|
|
would be:</p>
|
|
<pre> *MULTI_MASK %0001001010000000 %0000001010000000 %0000000000011111</pre>
|
|
<p>The values are CompareMask, CompareValue, and AddressMask. To
|
|
determine if an address is in the register set, we check to see if
|
|
<code>(address & CompareMask) == CompareValue</code>. If so, we can
|
|
extract the register number with <code>(address & AddressMask)</code>.</p>
|
|
|
|
<p>We don't want to have a huge collection of equates at the top of the
|
|
generated source file, so whatever value is used in the symbol declaration
|
|
is considered the "canonical" value. All other matching values are output
|
|
with an offset.</p>
|
|
<p>All mask values must fall between 0 and $00FFFFFF. The set bits in
|
|
CompareMask and AddressMask must not overlap, and CompareValue must not
|
|
have any bits set that aren't also set in CompareMask.</p>
|
|
<p>If an address can be mapped to a masked value and an unmasked value,
|
|
the unmasked value takes precedence for exact matches. In the example
|
|
above, if you declare <code>REG1 @ $0281</code> outside the MULTI_MASK
|
|
declaration, the disassembler will use <code>REG1</code> for all operands
|
|
that reference $0281. If other code accesses the same register as $23C1,
|
|
the symbol established for the masked value will be used instead.</p>
|
|
<p>If there are multiple masked values for a given address, the precedence
|
|
is undefined.</p>
|
|
<p>To disable the MULTI_MASK and resume normal declarations, write the
|
|
tag without arguments:
|
|
<pre> *MULTI_MASK</pre></p>
|
|
|
|
|
|
<h3>Creating a Project-Specific Symbol File</h3>
|
|
|
|
<p>To create a platform symbol file for your project, just create a new
|
|
text file, named with a ".sym65" extension. (If your text editor of choice
|
|
doesn't like that, you can put a ".txt" on the end while you're editing.)
|
|
Make sure you create it in the same directory where your project file
|
|
(the file that ends with ".dis65") lives. Add a <code>*SYNOPSIS</code>,
|
|
then add the desired symbols.</p>
|
|
<p>Finally, add it to your project. Select Edit > Project Properties,
|
|
switch to the Symbol Files tab, click Add Symbol Files from Project, and
|
|
select your symbol file. It should appear in the list with a
|
|
"PROJ:" prefix.</p>
|
|
|
|
<p>If an example helps, the A2-Amper-fdraw project in the Examples
|
|
directory has a project-local symbol file, called "fdraw-exports".
|
|
(fdraw-exports is a list of exported symbols from the fdraw library,
|
|
for which Amper-fdraw provides an Applesoft BASIC interface.)
|
|
|
|
<p>NOTE: in the current version of SourceGen, changes to .sym65 files are
|
|
not detected automatically. Use File > Reload External Files to
|
|
import the changes.</p>
|
|
|
|
|
|
<h2><a name="extension-scripts">Extension Scripts</a></h2>
|
|
|
|
<p>Extension scripts, also called "plugins", are C# programs with access to
|
|
the full .NET Standard 2.0 APIs. They're compiled at run time by SourceGen
|
|
and executed in a sandbox with security restrictions.</p>
|
|
|
|
<p>SourceGen defines an interface that plugins must implement, and an
|
|
interface that plugins can use to interact with SourceGen. See
|
|
Interfaces.cs in the PluginCommon directory.</p>
|
|
|
|
<p>The current interfaces can be used to generate visualizations, to
|
|
identify inline data that follows JSR, JSL, or BRK instructions, and to
|
|
format operands. The latter can be used to format code and data, e.g.
|
|
replacing immediate load operands with symbolic constants.</p>
|
|
|
|
<p>Scripts may be loaded from the RuntimeData directory, or from the directory
|
|
where the project file lives. Attempts to load them from other locations
|
|
will fail.</p>
|
|
<p>A project may load multiple scripts. The order in which they are
|
|
invoked is not defined.</p>
|
|
|
|
<h4>Known Issues and Limitations</h4>
|
|
|
|
<p>Scripts are currently limited to C# version 5, because the compiler
|
|
built into .NET only handles that. C# 6 and later require installing an
|
|
additional package ("Roslyn"), so SourceGen does not support this.</p>
|
|
|
|
<p>When a project is opened, any errors encountered by the script compiler
|
|
are reported to the user. If the project is already open, and a script
|
|
is added to the project through the Project Properties editor, compiler
|
|
messages are silently discarded. (This also applies if you undo/redo across
|
|
the property edit.) Use File > Reload External Files to see the
|
|
compiler messages.</p>
|
|
|
|
<h4>Development</h4>
|
|
|
|
<p>The easiest way to develop extension scripts is inside the 6502bench
|
|
solution in Visual Studio. This way you have the interfaces available
|
|
for IntelliSense completion, and get all the usual syntax and compile
|
|
checking in the editor. (This is why there's a RuntimeData project for
|
|
Visual Studio.)</p>
|
|
|
|
<p>If you have the solution configured for debug builds, SourceGen will pass
|
|
<code>IncludeDebugInformation=true</code> to the script compiler. This
|
|
causes a .PDB file to be created. While this can help with debugging,
|
|
it can sometimes get in the way: if you edit the script source code and
|
|
reload the project without restarting the app, SourceGen will recompile
|
|
the script, but the old .PDB file will still be open by VisualStudio
|
|
and you'll see some failure messages. Exiting and restarting SourceGen
|
|
will allow regeneration of the PDB files.</p>
|
|
|
|
<p>Some commonly useful functions are defined in the
|
|
<code>PluginCommon.Util</code> class, which is available to plugins. These
|
|
call into the CommonUtil library, which is shared with SourceGen.
|
|
While plugins could use CommonUtil directly, they should avoid doing so. The
|
|
APIs there are not guaranteed to be stable, so plugins that rely on them
|
|
may break in a subsequent release of SourceGen.</p>
|
|
|
|
<h4>PluginDllCache Directory</h4>
|
|
|
|
<p>Extension scripts are compiled into .DLLs, and saved in the PluginDllCache
|
|
directory, which lives next to the application executable and RuntimeData.
|
|
If the extension script is the same age or older than the DLL, SourceGen
|
|
will continue to use the existing DLL.</p>
|
|
|
|
<p>The DLL names are a combination of the script filename and script location.
|
|
The compiled name for "MyPlatform/MyScript.cs" in the RuntimeData directory
|
|
will be "RT_MyPlatform_MyScript.dll". For a project-specific script, it
|
|
would look like "PROJ_MyProject_MyScript.dll".</p>
|
|
|
|
<p>The PluginCommon and CommonUtil DLLs will be copied into the directory, so
|
|
that code in the sandbox has access to them.</p>
|
|
|
|
<p>The contents of the directory are generated as needed, and can be deleted
|
|
entirely whenever SourceGen isn't running.</p>
|
|
|
|
<h4>Sandboxing</h4>
|
|
|
|
<p>Extension scripts are executed in an App Domain sandbox. App domains are
|
|
a .NET feature that creates a partition inside the virtual machine, isolating
|
|
code. It still runs in the same address space, on the same threads, so the
|
|
isolation is only effective for "partially trusted" code that has been
|
|
declared safe by the bytecode verifier.</p>
|
|
|
|
<p>SourceGen disallows most actions, notably file access. An exception is
|
|
made for reading files from the directory where the plugin DLLs live, but
|
|
scripts are otherwise unable to read or write from the filesystem. (A
|
|
future version of SourceGen may provide an API that allows limited access
|
|
to data files.)</p>
|
|
|
|
<p>App domain security is not absolute. I don't really expect SourceGen to
|
|
be used as a malware vector, so there's no value in forcing scripts to
|
|
execute in an isolated server process, or to jump through the other hoops
|
|
required to really lock things down. I do believe there's value in
|
|
defining the API in such a way that we <b>could</b> implement full security if
|
|
circumstances change, so I'm using app domains as a way to keep the API
|
|
honest.</p>
|
|
|
|
|
|
<h2><a name="multi-bin">Working With Multiple Binaries</a></h2>
|
|
|
|
<p>Sometimes a program is split into multiple files on disk. They
|
|
may be all loaded at once, or some may be loaded into the same place
|
|
at different times. In such situations it's not uncommon for one
|
|
file to provide a set of interfaces that other files use. It's
|
|
useful to have symbols for these interfaces be available to all
|
|
projects.</p>
|
|
<p>There are two ways to do this: (1) define a common platform symbol
|
|
file with the relevant addresses, and keep it up to date as you work;
|
|
or (2) declare the labels as global and exported, and import them
|
|
as project symbols into the other projects.</p>
|
|
<p>Support for this is currently somewhat weak, requiring a manual
|
|
symbol-import step in every interested project. This step must be
|
|
repeated whenever the labels are updated.</p>
|
|
<p>A different but related problem is typified by arcade ROM sets,
|
|
where files are split apart because each file must be burned into a
|
|
separate PROM. All files are expected to be present in memory at
|
|
once, so there's no reason to treat them as separate projects. Currently,
|
|
the best way to deal with this is to concatenate the files into a single
|
|
file, and operate on that.</p>
|
|
|
|
<h2><a name="overlap">Overlapping Address Spaces</a></h2>
|
|
<p>Some programs use memory overlays, where multiple parts of the
|
|
code run in the same address in RAM. Others use bank switching to access
|
|
parts of the program that reside in separate physical RAM, but appear at
|
|
the same address.</p>
|
|
<p>SourceGen allows you to set the same address on multiple parts of
|
|
a file. Branches to a given address are resolved against the current
|
|
segment first. For example, consider this:</p>
|
|
<pre>
|
|
.ORG $1000
|
|
JMP L1100
|
|
|
|
.ORG $1100
|
|
L1100 BIT L1100
|
|
L1103 LDA #$11
|
|
BRA L1103
|
|
|
|
.ORG $1100
|
|
L1100_0 BIT L1100_0
|
|
L1103_0 LDA #$22
|
|
JMP L1103_0
|
|
</pre>
|
|
|
|
<p>Both sections start at $1100, and have branches to $1103. The branch
|
|
in the first section resolves to the label in the first version of
|
|
that address chunk, while the branch in the second section resolves to
|
|
the label in the second chunk. When branches originate outside the current
|
|
address chunk, the first chunk that includes that address is used, as
|
|
it is with the <code>JMP $1000</code> at the start of the file.</p>
|
|
|
|
|
|
<h2><a name="reloc-data">OMF Relocation Dictionaries</a></h2>
|
|
|
|
<p><i>This feature is considered experimental. Some features,
|
|
like cross-reference tracking, may not work correctly with it.</i></p>
|
|
|
|
<p>65816 code can be tricky to disassemble for a number of reasons.
|
|
24-bit addresses are formed from 16-bit data-access operands by combining
|
|
with the Data Bank Register (DBR), which often requires a bit of manual
|
|
intervention. But the problems go beyond that. Consider the following
|
|
bit of source code for the Apple IIgs:</p>
|
|
<pre>
|
|
rsrcmsg pea rsrcmsg2|-16
|
|
pea rsrcmsg2
|
|
_WriteCString
|
|
lda #buffer
|
|
sta pArcRead+$04
|
|
lda #buffer|-16
|
|
sta pArcRead+$06
|
|
</pre>
|
|
<p>In both cases we're referencing a 24-bit address as two 16-bit values.
|
|
Without context, the disassembler will treat the PEA instruction as two
|
|
independent 16-bit addresses, and the immediate values as constants:</p>
|
|
<pre>
|
|
.dbank $02
|
|
02/317c: f4 02 00 L2317C pea L20002 & $ffff
|
|
02/317f: f4 54 32 pea L23254 & $ffff
|
|
02/3182: a2 0c 20 ldx #WriteCString
|
|
02/3185: 22 00 00 e1 jsl Toolbox
|
|
02/3189: a9 00 00 L23189 lda #$0000
|
|
02/318c: 8d 78 3f sta L23F78 & $ffff
|
|
02/318f: a9 03 00 lda #$0003
|
|
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
|
|
</pre>
|
|
<p>Worse yet, those <code>STA</code> instruction operands would have been
|
|
shown as hex values or incorrect labels if the DBR had been set incorrectly.
|
|
However, if we have the relocation data, we know the full
|
|
address from which the addresses were formed, and we can tell when
|
|
immediate values are addresses rather than constants. And we can do this
|
|
even without setting the DBR.</p>
|
|
<pre>
|
|
02/317c: f4 02 00 L2317C pea L23254 >> 16
|
|
02/317f: f4 54 32 pea L23254 & $ffff
|
|
02/3182: a2 0c 20 ldx #WriteCString
|
|
02/3185: 22 00 00 e1 jsl Toolbox
|
|
02/3189: a9 00 00 L23189 lda #L30000 & $ffff
|
|
02/318c: 8d 78 3f sta L23F78 & $ffff
|
|
02/318f: a9 03 00 lda #L30000 >> 16
|
|
02/3192: 8d 7a 3f sta L23F78 & $ffff +2
|
|
</pre>
|
|
<p>The absence of relocation data can be a useful signal as well. For
|
|
example, when pushing arguments for a toolbox call, the disassembler
|
|
can tell the difference between addresses and constants without needing
|
|
emulation or pattern-matching, because only the addresses get
|
|
relocated. Consider this bit of source code:</p>
|
|
<pre>
|
|
lda <total_records
|
|
pha
|
|
pea linebuf|-16
|
|
pea linebuf+65
|
|
pea $0005
|
|
pea $0000
|
|
_Int2Dec
|
|
</pre>
|
|
<p>Without relocation data, it becomes:</p>
|
|
<pre>
|
|
02/0aa8: a5 42 lda $42
|
|
02/0aaa: 48 pha
|
|
02/0aab: f4 02 00 pea L20002 & $ffff
|
|
02/0aae: f4 03 31 pea L23103 & $ffff
|
|
02/0ab1: f4 05 00 pea L20005 & $ffff
|
|
02/0ab4: f4 00 00 pea L20000 & $ffff
|
|
02/0ab7: a2 0b 26 ldx #Int2Dec
|
|
02/0aba: 22 00 00 e1 jsl Toolbox
|
|
</pre>
|
|
<p>If we treat the non-relocated operands as constants:</p>
|
|
<pre>
|
|
02/0aa8: a5 42 lda $42
|
|
02/0aaa: 48 pha
|
|
02/0aab: f4 02 00 pea L230C2 >> 16
|
|
02/0aae: f4 03 31 pea L23103 & $ffff
|
|
02/0ab1: f4 05 00 pea $0005
|
|
02/0ab4: f4 00 00 pea $0000
|
|
02/0ab7: a2 0b 26 ldx #Int2Dec
|
|
02/0aba: 22 00 00 e1 jsl Toolbox
|
|
</pre>
|
|
|
|
|
|
<h2><a name="debug">Debug Menu Options</a></h2>
|
|
|
|
<p>The DEBUG menu is hidden by default in release builds, but can be
|
|
exposed by checking the "enable DEBUG menu" box in the application
|
|
settings. These features are used for debugging SourceGen. They will
|
|
not help you debug 6502 projects.</p>
|
|
|
|
<p>Features:</p>
|
|
<ul>
|
|
<li>Re-analyze (F5). Causes a full re-analysis. Useful if you think
|
|
the display is out of sync.</li>
|
|
<li>Source Generation Tests. Opens the regression test harness. See
|
|
<code>README.md</code> in the SGTestData directory for more information.
|
|
If the regression tests weren't included in the SourceGen distribution,
|
|
this will have nothing to do.</li>
|
|
<li>Show Analyzer Output. Opens a floating window with a text log from
|
|
the most recent analysis pass. The exact contents will vary depending
|
|
on how the verbosity level is configured internally. Debug messages
|
|
from extension scripts appear here.</li>
|
|
<li>Show Analysis Timers. Opens a floating window with a dump of
|
|
timer results from the most recent analysis pass. Times for individual
|
|
stages are noted, as are times for groups of functions. This
|
|
provides a crude sense of where time is being spent.</li>
|
|
<li>Show Undo/Redo History. Opens a floating window that lets you
|
|
watch the contents of the undo buffer while you work.</li>
|
|
<li>Extension Script Info. Shows a bit about the currently-loaded
|
|
extension scripts.</li>
|
|
<li>Show Comment Rulers. Adds a string of digits above every
|
|
multi-line comment (long comment, note). Useful for confirming that
|
|
the width limitation is being obeyed. These are added exactly
|
|
as shown, without comment delimiters, into generated assembly output,
|
|
which doesn't work out well if you run the assembler.</li>
|
|
<li>Disable Security Sandbox. Extension scripts are loaded and run in
|
|
a "sandbox" to prevent security issues. Setting this flag allows
|
|
them to execute with full permissions.
|
|
This setting is not persistent.</li>
|
|
<li>Disable Keep-Alive Hack. The hack sends a "ping" to the extension
|
|
script sandbox every 60 seconds. This seems to be required to avoid
|
|
an infrequently-encountered Windows bug. (See code for notes and
|
|
stackoverflow.com links.)
|
|
This setting is not persistent.</li>
|
|
<li>Reboot Security Sandbox. Discards the sandbox, creates a new one,
|
|
and reloads it. Only useful for exercising the sandbox code that
|
|
runs when the keep-alives are unsuccessful.</li>
|
|
<li>Applesoft to HTML. An experimental feature that formats an
|
|
Applesoft program as HTML.</li>
|
|
<li>Export Edit Commands. Outputs comments and notes in
|
|
SourceGen Edit Command format. This is an experimental feature.</li>
|
|
<li>Apply Edit Commands. Reads a file in SourceGen Edit Command
|
|
format and applies the commands.</li>
|
|
<li>Apply External Symbols. An experimental feature for turning platform
|
|
and project symbols into address labels. This will run through the list
|
|
of all symbols loaded from .sym65 files and find addresses that fall
|
|
within the bounds of the file. If it finds an address that is the start
|
|
of a code/data line and doesn't already have a user-supplied label,
|
|
and the platform symbol's label isn't already defined elsewhere, the
|
|
platform label will be applied. Useful when disassembling ROM images
|
|
or other code with an established set of public entry points.
|
|
(Tip: disable "analyze uncategorized data" from the project
|
|
properties editor first, as this will not set labels in the middle
|
|
of multi-byte data items.)</li>
|
|
</ul>
|
|
|
|
|
|
</div>
|
|
|
|
<div id="footer">
|
|
<p><a href="index.html">Back to index</a></p>
|
|
</div>
|
|
</body>
|
|
<!-- Copyright 2018 faddenSoft -->
|
|
</html>
|