mirror of
https://github.com/fadden/6502bench.git
synced 2025-01-07 06:30:52 +00:00
ed4cc84782
Move the SourceGen manual to a subdirectory in "docs", so that it can be accessed directly from the 6502bench web site. The place where it's installed in the distribution doesn't change.
416 lines
20 KiB
HTML
416 lines
20 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<head>
|
|
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
<link href="main.css" rel="stylesheet" type="text/css" />
|
|
<title>Code Generation & Assembly - 6502bench SourceGen</title>
|
|
</head>
|
|
|
|
<body>
|
|
<div id="content">
|
|
<h1>6502bench SourceGen: Code Generation & Assembly</h1>
|
|
<p><a href="index.html">Back to index</a></p>
|
|
|
|
<p>SourceGen can generate an assembly source file that, when fed into
|
|
the target assembler, will recreate the original data file exactly.
|
|
Every assembler is different, so support must be added to SourceGen
|
|
for each.</p>
|
|
<p>The generation / assembly dialog can be opened with File > Assemble.</p>
|
|
<p>If you want to show code to others, perhaps by adding a page to
|
|
your web site, you can "export" the formatted code as text or HTML.
|
|
This is explained in more detail <a href="#export-source">below</a>.
|
|
|
|
|
|
<h2><a name="generate">Generating Source Code</a></h2>
|
|
|
|
<p>Cross assemblers tend to generate additional files, either compiler
|
|
intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
|
|
generators may produce multiple source files, perhaps a link script or
|
|
symbol definition header to go with the assembly source. To avoid
|
|
spreading files across the filesystem, SourceGen does all of its work
|
|
in the same directory where the project lives. Before you can generate
|
|
code, you have to have assigned your project a directory. This is why
|
|
you can't assemble a project until you've saved it for the first time.</p>
|
|
|
|
<p>The Generate and Assemble dialog has a drop-down list near the top
|
|
that lets you pick which assembler to target. The name of the assembler
|
|
will be shown with the detected version number. If the assembler
|
|
executable isn't configured, "[latest version]" will be shown instead
|
|
of a version number.</p>
|
|
<p>The Settings button will take you directly to the assembler configuration
|
|
tab in the application settings dialog.</p>
|
|
<p>Hit the Generate button to generate the source code into a file on disk.
|
|
The file will use the project name, with the <code>.dis65</code> extension
|
|
replaced by <code>_<assembler>.S</code>.</p>
|
|
<p>The first 64KiB of each generated file will be shown in the preview
|
|
window. If multiple files were generated, you can use the "preview file"
|
|
drop-down to select between them. Line numbers are
|
|
prepended to each line to make it easier to track down errors.</p>
|
|
|
|
|
|
|
|
<h3><a name="localizer">Label Localizer</a></h3>
|
|
<p>The label localizer is an optional feature that automatically converts
|
|
some labels to an assembler-specific less-than-global label format. Local
|
|
labels may be reusable (e.g. using "]LOOP" for multiple consecutive
|
|
loops is easier to understand than giving each one a unique label) or
|
|
reduce the size of a generated link table. There are usually restrictions
|
|
on local labels, e.g. references to them may not be allowed to cross a
|
|
global label definition, which the localizer factors in automatically.</p>
|
|
|
|
|
|
<h3><a name="reserved-labels">Reserved Label Names</a></h3>
|
|
<p>Some label names aren't allowed. For example, 64tass reserves the
|
|
use of labels that begin with two underscores. Most assemblers will
|
|
also prevent you from using opcode mnemonics as labels (which means
|
|
you can't assemble <code>jmp jmp jmp</code>).</p>
|
|
<p>If a label doesn't appear to be legal, the generated code will use
|
|
a suitable replacement (e.g. <code>jmp_1 jmp jmp_1</code>).</p>
|
|
|
|
|
|
<h3><a name="platform-features">Platform-Specific Features</a></h3>
|
|
<p>SourceGen needs to be able to assemble binaries for any system
|
|
with any assembler, so it generally avoids platform-specific features.
|
|
One exception to that is C64 PRG files.</p>
|
|
<p>PRG files start with a 16-bit value that tells the OS where the
|
|
rest of the file should be loaded. The value is not usually part of
|
|
the source code, but instead is generated by the assembler, based on
|
|
the address of the first byte output. If SourceGen detects that
|
|
a file is PRG, the source generators for some assemblers will suppress
|
|
the first 2 bytes, and instead pass appropriate meta-data (such as
|
|
an additional command-line option) to the assembler.</p>
|
|
<p>A file is treated as a PRG if:</p>
|
|
<ul>
|
|
<li>it is between 3 and 65536 bytes long (inclusive)</li>
|
|
<li>the format at offset +000000 is a 16-bit numeric data item
|
|
(not executable code, not two 8-byte values, not the first part
|
|
of a 24-bit value, etc.)</li>
|
|
<li>there is an address region start directive at +000002
|
|
<li>the 16-bit value at +000000 is equal to the address of the
|
|
byte at +000002</li>
|
|
<li>there is no label at offset +000000 (explicit or auto-generated)</li>
|
|
</ul>
|
|
<p>The definition is sufficiently narrow to avoid most false-positives.
|
|
If a file is being treated as PRG and you'd rather it weren't, you
|
|
can add a label or reformat the bytes. This feature is currently only
|
|
enabled for 64tass.</p>
|
|
|
|
|
|
<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
|
|
|
|
<p>After generating sources, if you have a cross-assembler executable
|
|
configured, you can run it by clicking the "Run Assembler" button. The
|
|
command-line output will be displayed, with stdout and stderr separated.
|
|
(I'd prefer them to be interleaved, but that's not what the system
|
|
provides.)</p>
|
|
|
|
<p>The output will show the assembler's exit code, which will be zero
|
|
on success (note: sometimes they lie.) If it appeared to succeed,
|
|
SourceGen will then compare the assembler's output to the original file,
|
|
and report any differences.</p>
|
|
<p>Failures here may be due to bugs in the cross-assembler or in
|
|
SourceGen. However, SourceGen can generally work around assembler bugs,
|
|
so any failure is an opportunity for improvement.</p>
|
|
|
|
|
|
<h2><a name="supported">Supported Assemblers</a></h2>
|
|
|
|
<p>SourceGen currently supports the following cross-assemblers:</p>
|
|
<ul>
|
|
<li><a href="#64tass">64tass</a></li>
|
|
<li><a href="#acme">ACME</a></li>
|
|
<li><a href="#cc65">cc65</a></li>
|
|
<li><a href="#merlin32">Merlin 32</a></li>
|
|
</ul>
|
|
|
|
<h3><a name="version">Version-Specific Code Generation</a></h3>
|
|
|
|
<p>Code generation must be tailored to the specific version of the
|
|
assembler. This is most easily understood with an example.</p>
|
|
<p>If the code has a statement like <code>MVN #$01,#$02</code>, the
|
|
assembler is expected to output <code>54 02 01</code>, with the arguments
|
|
reversed. cc65 v2.17 got it backward; the behavior was fixed in v2.18. The
|
|
bug means we can't generate the same <code>MVN</code>/<code>MVP</code>
|
|
instructions for both versions of the assembler.</p>
|
|
<p>Having version-dependent source code is a bad idea. If we generated
|
|
reversed operands (<code>MVN #$02,#$01</code>), we'd get the correct
|
|
output with v2.17, but the wrong output for v2.18. Unambiguous code can
|
|
be generated for all versions of the assembler by just outputting raw hex
|
|
bytes, but that's ugly and annoying, so we don't want to be stuck doing
|
|
that forever. We want to detect which version of the assembler is in
|
|
use, and output actual <code>MVN</code>/<code>MVP</code> instructions
|
|
when producing code for newer versions of the assembler.</p>
|
|
<p>When you configure a cross-assembler, SourceGen runs the executable with
|
|
version query args, and extracts the version information from the output
|
|
stream. This is used by the generator to ensure that the output will compile.
|
|
If no assembler is configured, SourceGen will produce code optimized
|
|
for the latest version of the assembler.</p>
|
|
|
|
|
|
<h3><a name="quirks">Assembler-Specific Bugs & Quirks</a></h3>
|
|
|
|
<p>This is a list of bugs and quirky behavior in cross-assemblers that
|
|
SourceGen works around when generating code.</p>
|
|
<p>Every assembler seems to have a different way of dealing with expressions.
|
|
Most of them will let you group expressions with parenthesis, but that
|
|
doesn't always help. For example, <code>PEA label >> 8 + 1</code> is
|
|
perfectly valid, but writing <code>PEA (label >> 8) + 1</code> will cause
|
|
most assemblers to assume you're trying to use an alternate (and non-existent)
|
|
form of <code>PEA</code> with indirect addressing, causing the assembler
|
|
to halt with an error message. The code generator needs
|
|
to understand expression syntax and operator precedence to generate correct
|
|
code, but also needs to know how to handle the corner cases.</p>
|
|
|
|
|
|
<h3><a name="64tass">64tass</a></h3>
|
|
|
|
<p>Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625
|
|
<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>[Fixed in v1.55.2176]
|
|
Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
|
|
the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
|
|
<li>[Fixed in v1.55.2176] WDM is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The underscore character ('_') is allowed as a character in labels,
|
|
but when used as the first character in a label it indicates the
|
|
label is local. If you create labels with leading underscores that
|
|
are not local, the labels must be altered to start with some other
|
|
character, and made unique.</li>
|
|
<li>Labels starting with two underscores are "reserved". Trying to
|
|
use them causes an error.</li>
|
|
<li>By default, 64tass sets the first two bytes of the output file to
|
|
the load address. The <code>--nostart</code> flag is used to
|
|
suppress this.</li>
|
|
<li>By default, 64tass is case-insensitive, but SourceGen treats labels
|
|
as case-sensitive. The <code>--case-sensitive</code> flag must be passed
|
|
to the assembler.</li>
|
|
<li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
|
|
and operands must be lower-case. Most of the SourceGen options that
|
|
cause things to appear in upper case must be disabled.</li>
|
|
<li>For 65816, selecting the bank byte is done with the grave accent
|
|
character ('`') rather than the caret ('^'). (There's a note in the
|
|
docs to the effect that they plan to move to carets.)</li>
|
|
<li>Instructions whose argument is formed by combining with the
|
|
65816 Program Bank Register (16-bit JMP/JSR) must be specified
|
|
as 24-bit values for code that lives outside bank 0. This is
|
|
true for both symbols and raw hex (e.g. <code>JSR $1234</code>
|
|
is invalid outside bank 0). Attempting to JSR to a label in bank
|
|
0 from outside bank 0 causes an error, even though it is technically
|
|
a 16-bit operand.</li>
|
|
<li>The arguments to COP and BRK require immediate-mode syntax
|
|
(<code>COP #$03</code> rather than <code>COP $03</code>).
|
|
<li>For historical reasons, the default behavior of the assembler is to
|
|
assume that the source file is PETSCII, and the desired encoding for
|
|
strings is also PETSCII. No character conversion is done, so anybody
|
|
assembling ASCII files will get ASCII strings (which works out pretty
|
|
well if you're assembling code for a non-Commodore target). However,
|
|
the documentation says you're required to pass the "--ascii" flag when
|
|
the input is ASCII/UTF-8, so to build files that want ASCII operands
|
|
an explicit character encoding definition must be provided.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="acme">ACME</a></h3>
|
|
|
|
<p>Tested versions: v0.96.4, v0.97
|
|
<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
|
|
outside bank zero cannot be assembled. SourceGen currently deals with
|
|
this by outputting the entire file as a hex dump.</li>
|
|
<li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
|
|
<li>BRK and WDM are not allowed to have operands.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The assembler shares some traits with one-pass assemblers. In
|
|
particular, if you forward-reference a zero-page label, the reference
|
|
generates a 16-bit absolute address instead of an 8-bit zero-page
|
|
address. Unlike other one-pass assemblers, the width is "sticky",
|
|
and backward references appearing later in the file also use absolute
|
|
addressing even though the proper width is known at that point. This is
|
|
worked around by using explicit "force zero page" annotations on
|
|
all references to zero-page labels.</li>
|
|
<li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
|
|
<code>ASR</code> instead.</li>
|
|
<li>Does not allow the accumulator to be specified explicitly as an
|
|
operand, e.g. you can't write <code>LSR A</code>.</li>
|
|
<li>[Fixed in v0.97.]
|
|
Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
|
|
before 8-bit operands.</li>
|
|
<li>Officially, the preferred file extension for ACME source code is ".a",
|
|
but this is already used on UNIX systems for static libraries (which
|
|
means shell filename completion tends to ignore them). Since ".S" is
|
|
pretty universally recognized as assembly source, code generated by
|
|
SourceGen for ACME also uses ".S".</li>
|
|
<li>Version 0.97 started interpreting '\' in strings as an escape
|
|
character, to allow C-style escapes like "\n". This requires escaping
|
|
all occurrences of '\' in data strings as "\\". Compiling an older
|
|
source file with a newer version of ACME may fail unless you pass
|
|
a backward-compatibility command-line argument.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="cc65">cc65</a></h3>
|
|
|
|
<p>Tested versions: v2.17, v2.18
|
|
<a href="https://cc65.github.io/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>BRK can only be given an argument in 65816 mode.</li>
|
|
<li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
|
|
<li>[Fixed in v2.18] <code>BRK <arg></code> is assembled to opcode
|
|
$05 rather than $00.</li>
|
|
<li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Consider <code>label >> 8 - 16</code>.
|
|
cc65 puts shift higher than subtraction, whereas languages like C
|
|
and assemblers like 64tass do it the other way around. So cc65
|
|
regards the expression as <code>(label >> 8) - 16</code>, while the
|
|
more common interpretation would be <code>label >> (8 - 16)</code>.
|
|
(This is actually somewhat convenient, since none of the expressions
|
|
SourceGen currently generates require parenthesis.)</li>
|
|
<li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS. All
|
|
other opcodes match up with the "unintended opcodes" document.</li>
|
|
<li>ca65 is implemented as a single-pass assembler, so label widths
|
|
can't always be known in time. For example, if you use some zero-page
|
|
labels, but they're defined via <code>.ORG $0000</code> after the point
|
|
where the labels are used, the assembler will already have generated them
|
|
as absolute values. Width disambiguation must be applied to operands
|
|
that wouldn't be ambiguous to a multi-pass assembler.</li>
|
|
<li>Assignment of constants and variables (<code>=</code> and
|
|
<code>.set</code>) ends local label scope, so the label localizer
|
|
has to take variable assignment into account.</li>
|
|
<li>The assembler is geared toward generating relocatable code with
|
|
multiple segments (it is, after all, an assembler for a C compiler).
|
|
A linker configuration script is expected to be provided for anything
|
|
complex. SourceGen generates a custom config file for each project.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="merlin32">Merlin 32</a></h3>
|
|
|
|
<p>Tested Versions: v1.0
|
|
<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
|
|
<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
|
|
</p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>For some failures, an exit code of zero is returned.</li>
|
|
<li>Immediate operands with a comma (e.g. <code>LDA #','</code>)
|
|
or curly braces (e.g. <code>LDA #'{'</code>) cause an error.</li>
|
|
<li>Some DP indexed store instructions cause errors if the label isn't
|
|
unambiguously DP (e.g. <code>STX $00,X</code> vs.
|
|
<code>STX $0000,X</code>). This isn't a problem with project/platform
|
|
symbols, which are output as two-digit hex values when possible, but
|
|
causes failures when direct page locations are included in the project
|
|
and given labels.</li>
|
|
<li>The check for 64KiB overflow appears to happen before instructions
|
|
that might be absolute or direct page are resolved and reduced in size.
|
|
This makes it unlikely that a full 64KiB bank of code can be
|
|
assembled.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Expressions are generally processed
|
|
from left to right. The byte-selection operators have a lower
|
|
precedence than all of the others, and so are always processed last.</li>
|
|
<li>The byte selection operators ('<', '>', '^') are actually
|
|
word-selection operators, yielding 16-bit values when wide registers
|
|
are enabled on the 65816.</li>
|
|
<li>Values loaded into registers are implicitly mod 256 or 65536. There
|
|
is no need to explicitly mask an expression.</li>
|
|
<li>The assembler tracks register widths when it sees SEP/REP instructions,
|
|
but doesn't attempt to track the emulation flag. So if you issue a
|
|
<code>REP #$20</code>
|
|
while in emulation mode, the assembler will incorrectly assume long
|
|
registers. Ideally it would be possible to configure that off, but
|
|
there's no way to do that, so instead we occasionally generate
|
|
additional width directives.</li>
|
|
<li>Non-unique local labels should cause an error, but don't.</li>
|
|
<li>No undocumented opcodes are supported, nor are the Rockwell
|
|
65C02 instructions.</li>
|
|
</ul>
|
|
|
|
|
|
|
|
<h2><a name="export-source">Exporting Source Code</a></h2>
|
|
<p>The "export" function takes what you see in the code list in the app
|
|
and converts it to text or HTML. The options you've set in the app
|
|
settings, such as capitalization, text delimiters, pseudo-opcode names,
|
|
operand expression style, and display of cycle counts are all taken into
|
|
account. The file generated is not expected to work with an actual
|
|
assembler.</p>
|
|
<p>The text output is similar to what you'd get by copying lines to the
|
|
clipboard and pasting them into a text file, except that you have greater
|
|
control over which columns are included. The HTML version is augmented
|
|
with links and (optionally) images.</p>
|
|
|
|
<p>Use File > Export to open the export dialog. You have several
|
|
options:</p>
|
|
<ul>
|
|
<li><b>Include only selected lines</b>. This allows you to choose between
|
|
exporting all or part of a file. If no lines are selected, the entire
|
|
file will exported. This setting does <b>not</b> affect link generation
|
|
for HTML output, so you may have some dead internal links if you don't
|
|
export the entire file.</li>
|
|
<li><b>Include notes</b>. Notes are normally excluded from generated
|
|
sources. Check this to include them.</li>
|
|
<li><b>Show <Column></b>. The leftmost five columns are optional,
|
|
and will not appear in the output unless the appropriate option is
|
|
checked.</li>
|
|
<li><b>Column widths</b>. These determine the minimum widths of the
|
|
rightmost four columns. These are not hard limits: if the contents
|
|
of the column are too wide, the next column will start farther over.
|
|
The widths are not used at all for CSV output.</li>
|
|
<li><b>Text vs. CSV</b>. For text generation, you can choose between
|
|
plain text and Comma-Separated Value format. The latter is useful
|
|
for importing source code into another application, such as a
|
|
spreadsheet.</li>
|
|
<li><b>Generate image files</b>. When exporting to HTML, selecting this
|
|
will cause GIF images to be generated for visualizations.</li>
|
|
<li><b>Overwrite CSS file</b>. Some aspects of the HTML output's format
|
|
are defined by a file called "SGStyle.css", which may be shared between
|
|
multiple HTML files and customized. The file is copied out
|
|
of the RuntimeData directory without modification. It will be
|
|
created if it doesn't exist, but will not be overwritten unless this
|
|
box is checked. The setting is <b>not</b> sticky, and will revert
|
|
to unchecked. (Think of this as a proactive alternative to "are you
|
|
sure you wish to overwrite SGStyle.css?")</li>
|
|
</ul>
|
|
<p>Once you've picked your options, click either "Generate HTML" or
|
|
"Generate Text", then select an output file name from the standard file
|
|
dialog. Any additional files generated, such as graphics for HTML pages,
|
|
will be written to the same directory.</p>
|
|
|
|
<p>All output uses UTF-8 encoding. Filenames of HTML files will have '#'
|
|
replaced with '_' to make linking easier.</p>
|
|
|
|
</div>
|
|
|
|
<div id="footer">
|
|
<p><a href="index.html">Back to index</a></p>
|
|
</div>
|
|
</body>
|
|
<!-- Copyright 2018 faddenSoft -->
|
|
</html>
|