mirror of
https://github.com/fadden/6502bench.git
synced 2024-12-13 12:29:49 +00:00
3a02132694
64tass v1.55.2176 added a missing undocumented op, so we can remove the workaround unless we're configured for an older version.
416 lines
20 KiB
HTML
416 lines
20 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<head>
|
|
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
<link href="main.css" rel="stylesheet" type="text/css" />
|
|
<title>Code Generation & Assembly - 6502bench SourceGen</title>
|
|
</head>
|
|
|
|
<body>
|
|
<div id="content">
|
|
<h1>6502bench SourceGen: Code Generation & Assembly</h1>
|
|
<p><a href="index.html">Back to index</a></p>
|
|
|
|
<p>SourceGen can generate an assembly source file that, when fed into
|
|
the target assembler, will recreate the original data file exactly.
|
|
Every assembler is different, so support must be added to SourceGen
|
|
for each.</p>
|
|
<p>The generation / assembly dialog can be opened with File > Assemble.</p>
|
|
<p>If you want to show code to others, perhaps by adding a page to
|
|
your web site, you can "export" the formatted code as text or HTML.
|
|
This is explained in more detail <a href="#export-source">below</a>.
|
|
|
|
|
|
<h2><a name="generate">Generating Source Code</a></h2>
|
|
|
|
<p>Cross assemblers tend to generate additional files, either compiler
|
|
intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
|
|
generators may produce multiple source files, perhaps a link script or
|
|
symbol definition header to go with the assembly source. To avoid
|
|
spreading files across the filesystem, SourceGen does all of its work
|
|
in the same directory where the project lives. Before you can generate
|
|
code, you have to have assigned your project a directory. This is why
|
|
you can't assemble a project until you've saved it for the first time.</p>
|
|
|
|
<p>The Generate and Assemble dialog has a drop-down list near the top
|
|
that lets you pick which assembler to target. The name of the assembler
|
|
will be shown with the detected version number. If the assembler
|
|
executable isn't configured, "[latest version]" will be shown instead
|
|
of a version number.</p>
|
|
<p>The Settings button will take you directly to the assembler configuration
|
|
tab in the application settings dialog.</p>
|
|
<p>Hit the Generate button to generate the source code into a file on disk.
|
|
The file will use the project name, with the <code>.dis65</code> extension
|
|
replaced by <code>_<assembler>.S</code>.</p>
|
|
<p>The first 64KiB of each generated file will be shown in the preview
|
|
window. If multiple files were generated, you can use the "preview file"
|
|
drop-down to select between them. Line numbers are
|
|
prepended to each line to make it easier to track down errors.</p>
|
|
|
|
|
|
|
|
<h3><a name="localizer">Label Localizer</a></h3>
|
|
<p>The label localizer is an optional feature that automatically converts
|
|
some labels to an assembler-specific less-than-global label format. Local
|
|
labels may be reusable (e.g. using "]LOOP" for multiple consecutive
|
|
loops is easier to understand than giving each one a unique label) or
|
|
reduce the size of a generated link table. There are usually restrictions
|
|
on local labels, e.g. references to them may not be allowed to cross a
|
|
global label definition, which the localizer factors in automatically.</p>
|
|
|
|
|
|
<h3><a name="reserved-labels">Reserved Label Names</a></h3>
|
|
<p>Some label names aren't allowed. For example, 64tass reserves the
|
|
use of labels that begin with two underscores. Most assemblers will
|
|
also prevent you from using opcode mnemonics as labels (which means
|
|
you can't assemble <code>jmp jmp jmp</code>).</p>
|
|
<p>If a label doesn't appear to be legal, the generated code will use
|
|
a suitable replacement (e.g. <code>jmp_1 jmp jmp_1</code>).</p>
|
|
|
|
|
|
<h3><a name="platform-features">Platform-Specific Features</a></h3>
|
|
<p>SourceGen needs to be able to assemble binaries for any system
|
|
with any assembler, so it generally avoids platform-specific features.
|
|
One exception to that is C64 PRG files.</p>
|
|
<p>PRG files start with a 16-bit value that tells the OS where the
|
|
rest of the file should be loaded. The value is not usually part of
|
|
the source code, but instead is generated by the assembler, based on
|
|
the address of the first byte output. If SourceGen detects that
|
|
a file is PRG, the source generators for some assemblers will suppress
|
|
the first 2 bytes, and instead pass appropriate meta-data (such as
|
|
an additional command-line option) to the assembler.</p>
|
|
<p>A file is treated as a PRG if:</p>
|
|
<ul>
|
|
<li>it is between 3 and 65536 bytes long (inclusive)</li>
|
|
<li>the format at offset +000000 is a 16-bit numeric data item
|
|
(not executable code, not two 8-byte values, not the first part
|
|
of a 24-bit value, etc.)</li>
|
|
<li>there is an ORG directive at +000002
|
|
<li>the 16-bit value at +000000 is equal to the address of the
|
|
byte at +000002</li>
|
|
<li>there is no label at offset +000000 (explicit or auto-generated)</li>
|
|
</ul>
|
|
<p>The definition is sufficiently narrow to avoid most false-positives.
|
|
If a file is being treated as PRG and you'd rather it weren't, you
|
|
can add a label or reformat the bytes. This feature is currently only
|
|
enabled for 64tass.</p>
|
|
|
|
|
|
<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
|
|
|
|
<p>After generating sources, if you have a cross-assembler executable
|
|
configured, you can run it by clicking the "Run Assembler" button. The
|
|
command-line output will be displayed, with stdout and stderr separated.
|
|
(I'd prefer them to be interleaved, but that's not what the system
|
|
provides.)</p>
|
|
|
|
<p>The output will show the assembler's exit code, which will be zero
|
|
on success (note: sometimes they lie.) If it appeared to succeed,
|
|
SourceGen will then compare the assembler's output to the original file,
|
|
and report any differences.</p>
|
|
<p>Failures here may be due to bugs in the cross-assembler or in
|
|
SourceGen. However, SourceGen can generally work around assembler bugs,
|
|
so any failure is an opportunity for improvement.</p>
|
|
|
|
|
|
<h2><a name="supported">Supported Assemblers</a></h2>
|
|
|
|
<p>SourceGen currently supports the following cross-assemblers:</p>
|
|
<ul>
|
|
<li><a href="#64tass">64tass</a></li>
|
|
<li><a href="#acme">ACME</a></li>
|
|
<li><a href="#cc65">cc65</a></li>
|
|
<li><a href="#merlin32">Merlin 32</a></li>
|
|
</ul>
|
|
|
|
<h3><a name="version">Version-Specific Code Generation</a></h3>
|
|
|
|
<p>Code generation must be tailored to the specific version of the
|
|
assembler. This is most easily understood with an example.</p>
|
|
<p>If the code has a statement like <code>MVN #$01,#$02</code>, the
|
|
assembler is expected to output <code>54 02 01</code>, with the arguments
|
|
reversed. cc65 v2.17 got it backward; the behavior was fixed in v2.18. The
|
|
bug means we can't generate the same <code>MVN</code>/<code>MVP</code>
|
|
instructions for both versions of the assembler.</p>
|
|
<p>Having version-dependent source code is a bad idea. If we generated
|
|
reversed operands (<code>MVN #$02,#$01</code>), we'd get the correct
|
|
output with v2.17, but the wrong output for v2.18. Unambiguous code can
|
|
be generated for all versions of the assembler by just outputting raw hex
|
|
bytes, but that's ugly and annoying, so we don't want to be stuck doing
|
|
that forever. We want to detect which version of the assembler is in
|
|
use, and output actual <code>MVN</code>/<code>MVP</code> instructions
|
|
when producing code for newer versions of the assembler.</p>
|
|
<p>When you configure a cross-assembler, SourceGen runs the executable with
|
|
version query args, and extracts the version information from the output
|
|
stream. This is used by the generator to ensure that the output will compile.
|
|
If no assembler is configured, SourceGen will produce code optimized
|
|
for the latest version of the assembler.</p>
|
|
|
|
|
|
<h3><a name="quirks">Assembler-Specific Bugs & Quirks</a></h3>
|
|
|
|
<p>This is a list of bugs and quirky behavior in cross-assemblers that
|
|
SourceGen works around when generating code.</p>
|
|
<p>Every assembler seems to have a different way of dealing with expressions.
|
|
Most of them will let you group expressions with parenthesis, but that
|
|
doesn't always help. For example, <code>PEA label >> 8 + 1</code> is
|
|
perfectly valid, but writing <code>PEA (label >> 8) + 1</code> will cause
|
|
most assemblers to assume you're trying to use an alternate (and non-existent)
|
|
form of <code>PEA</code> with indirect addressing, causing the assembler
|
|
to halt with an error message. The code generator needs
|
|
to understand expression syntax and operator precedence to generate correct
|
|
code, but also needs to know how to handle the corner cases.</p>
|
|
|
|
|
|
<h3><a name="64tass">64tass</a></h3>
|
|
|
|
<p>Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625
|
|
<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>[Fixed in v1.55.2176]
|
|
Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
|
|
the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
|
|
<li>[Fixed in v1.55.2176] WDM is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The underscore character ('_') is allowed as a character in labels,
|
|
but when used as the first character in a label it indicates the
|
|
label is local. If you create labels with leading underscores that
|
|
are not local, the labels must be altered to start with some other
|
|
character, and made unique.</li>
|
|
<li>Labels starting with two underscores are "reserved". Trying to
|
|
use them causes an error.</li>
|
|
<li>By default, 64tass sets the first two bytes of the output file to
|
|
the load address. The <code>--nostart</code> flag is used to
|
|
suppress this.</li>
|
|
<li>By default, 64tass is case-insensitive, but SourceGen treats labels
|
|
as case-sensitive. The <code>--case-sensitive</code> flag must be passed
|
|
to the assembler.</li>
|
|
<li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
|
|
and operands must be lower-case. Most of the SourceGen options that
|
|
cause things to appear in upper case must be disabled.</li>
|
|
<li>For 65816, selecting the bank byte is done with the grave accent
|
|
character ('`') rather than the caret ('^'). (There's a note in the
|
|
docs to the effect that they plan to move to carets.)</li>
|
|
<li>Instructions whose argument is formed by combining with the
|
|
65816 Program Bank Register (16-bit JMP/JSR) must be specified
|
|
as 24-bit values for code that lives outside bank 0. This is
|
|
true for both symbols and raw hex (e.g. <code>JSR $1234</code>
|
|
is invalid outside bank 0). Attempting to JSR to a label in bank
|
|
0 from outside bank 0 causes an error, even though it is technically
|
|
a 16-bit operand.</li>
|
|
<li>The arguments to COP and BRK require immediate-mode syntax
|
|
(<code>COP #$03</code> rather than <code>COP $03</code>).
|
|
<li>For historical reasons, the default behavior of the assembler is to
|
|
assume that the source file is PETSCII, and the desired encoding for
|
|
strings is also PETSCII. No character conversion is done, so anybody
|
|
assembling ASCII files will get ASCII strings (which works out pretty
|
|
well if you're assembling code for a non-Commodore target). However,
|
|
the documentation says you're required to pass the "--ascii" flag when
|
|
the input is ASCII/UTF-8, so to build files that want ASCII operands
|
|
an explicit character encoding definition must be provided.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="acme">ACME</a></h3>
|
|
|
|
<p>Tested versions: v0.96.4, v0.97
|
|
<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
|
|
outside bank zero cannot be assembled. SourceGen currently deals with
|
|
this by outputting the entire file as a hex dump.</li>
|
|
<li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
|
|
<li>BRK and WDM are not allowed to have operands.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The assembler shares some traits with one-pass assemblers. In
|
|
particular, if you forward-reference a zero-page label, the reference
|
|
generates a 16-bit absolute address instead of an 8-bit zero-page
|
|
address. Unlike other one-pass assemblers, the width is "sticky",
|
|
and backward references appearing later in the file also use absolute
|
|
addressing even though the proper width is known at that point. This is
|
|
worked around by using explicit "force zero page" annotations on
|
|
all references to zero-page labels.</li>
|
|
<li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
|
|
<code>ASR</code> instead.</li>
|
|
<li>Does not allow the accumulator to be specified explicitly as an
|
|
operand, e.g. you can't write <code>LSR A</code>.</li>
|
|
<li>[Fixed in v0.97.]
|
|
Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
|
|
before 8-bit operands.</li>
|
|
<li>Officially, the preferred file extension for ACME source code is ".a",
|
|
but this is already used on UNIX systems for static libraries (which
|
|
means shell filename completion tends to ignore them). Since ".S" is
|
|
pretty universally recognized as assembly source, code generated by
|
|
SourceGen for ACME also uses ".S".</li>
|
|
<li>Version 0.97 started interpreting '\' in strings as an escape
|
|
character, to allow C-style escapes like "\n". This requires escaping
|
|
all occurrences of '\' in data strings as "\\". Compiling an older
|
|
source file with a newer version of ACME may fail unless you pass
|
|
a backward-compatibility command-line argument.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="cc65">cc65</a></h3>
|
|
|
|
<p>Tested versions: v2.17, v2.18
|
|
<a href="https://cc65.github.io/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>BRK can only be given an argument in 65816 mode.</li>
|
|
<li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
|
|
<li>[Fixed in v2.18] <code>BRK <arg></code> is assembled to opcode
|
|
$05 rather than $00.</li>
|
|
<li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Consider <code>label >> 8 - 16</code>.
|
|
cc65 puts shift higher than subtraction, whereas languages like C
|
|
and assemblers like 64tass do it the other way around. So cc65
|
|
regards the expression as <code>(label >> 8) - 16</code>, while the
|
|
more common interpretation would be <code>label >> (8 - 16)</code>.
|
|
(This is actually somewhat convenient, since none of the expressions
|
|
SourceGen currently generates require parenthesis.)</li>
|
|
<li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS. All
|
|
other opcodes match up with the "unintended opcodes" document.</li>
|
|
<li>ca65 is implemented as a single-pass assembler, so label widths
|
|
can't always be known in time. For example, if you use some zero-page
|
|
labels, but they're defined via <code>.ORG $0000</code> after the point
|
|
where the labels are used, the assembler will already have generated them
|
|
as absolute values. Width disambiguation must be applied to operands
|
|
that wouldn't be ambiguous to a multi-pass assembler.</li>
|
|
<li>Assignment of constants and variables (<code>=</code> and
|
|
<code>.set</code>) ends local label scope, so the label localizer
|
|
has to take variable assignment into account.</li>
|
|
<li>The assembler is geared toward generating relocatable code with
|
|
multiple segments (it is, after all, an assembler for a C compiler).
|
|
A linker configuration script is expected to be provided for anything
|
|
complex. SourceGen generates a custom config file for each project.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="merlin32">Merlin 32</a></h3>
|
|
|
|
<p>Tested Versions: v1.0
|
|
<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
|
|
<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
|
|
</p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>For some failures, an exit code of zero is returned.</li>
|
|
<li>Immediate operands with a comma (e.g. <code>LDA #','</code>)
|
|
or curly braces (e.g. <code>LDA #'{'</code>) cause an error.</li>
|
|
<li>Some DP indexed store instructions cause errors if the label isn't
|
|
unambiguously DP (e.g. <code>STX $00,X</code> vs.
|
|
<code>STX $0000,X</code>). This isn't a problem with project/platform
|
|
symbols, which are output as two-digit hex values when possible, but
|
|
causes failures when direct page locations are included in the project
|
|
and given labels.</li>
|
|
<li>The check for 64KiB overflow appears to happen before instructions
|
|
that might be absolute or direct page are resolved and reduced in size.
|
|
This makes it unlikely that a full 64KiB bank of code can be
|
|
assembled.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Expressions are generally processed
|
|
from left to right. The byte-selection operators have a lower
|
|
precedence than all of the others, and so are always processed last.</li>
|
|
<li>The byte selection operators ('<', '>', '^') are actually
|
|
word-selection operators, yielding 16-bit values when wide registers
|
|
are enabled on the 65816.</li>
|
|
<li>Values loaded into registers are implicitly mod 256 or 65536. There
|
|
is no need to explicitly mask an expression.</li>
|
|
<li>The assembler tracks register widths when it sees SEP/REP instructions,
|
|
but doesn't attempt to track the emulation flag. So if you issue a
|
|
<code>REP #$20</code>
|
|
while in emulation mode, the assembler will incorrectly assume long
|
|
registers. Ideally it would be possible to configure that off, but
|
|
there's no way to do that, so instead we occasionally generate
|
|
additional width directives.</li>
|
|
<li>Non-unique local labels should cause an error, but don't.</li>
|
|
<li>No undocumented opcodes are supported, nor are the Rockwell
|
|
65C02 instructions.</li>
|
|
</ul>
|
|
|
|
|
|
|
|
<h2><a name="export-source">Exporting Source Code</a></h2>
|
|
<p>The "export" function takes what you see in the code list in the app
|
|
and converts it to text or HTML. The options you've set in the app
|
|
settings, such as capitalization, text delimiters, pseudo-opcode names,
|
|
operand expression style, and display of cycle counts are all taken into
|
|
account. The file generated is not expected to work with an actual
|
|
assembler.</p>
|
|
<p>The text output is similar to what you'd get by copying lines to the
|
|
clipboard and pasting them into a text file, except that you have greater
|
|
control over which columns are included. The HTML version is augmented
|
|
with links and (optionally) images.</p>
|
|
|
|
<p>Use File > Export to open the export dialog. You have several
|
|
options:</p>
|
|
<ul>
|
|
<li><b>Include only selected lines</b>. This allows you to choose between
|
|
exporting all or part of a file. If no lines are selected, the entire
|
|
file will exported. This setting does <b>not</b> affect link generation
|
|
for HTML output, so you may have some dead internal links if you don't
|
|
export the entire file.</li>
|
|
<li><b>Include notes</b>. Notes are normally excluded from generated
|
|
sources. Check this to include them.</li>
|
|
<li><b>Show <Column></b>. The leftmost five columns are optional,
|
|
and will not appear in the output unless the appropriate option is
|
|
checked.</li>
|
|
<li><b>Column widths</b>. These determine the minimum widths of the
|
|
rightmost four columns. These are not hard limits: if the contents
|
|
of the column are too wide, the next column will start farther over.
|
|
The widths are not used at all for CSV output.</li>
|
|
<li><b>Text vs. CSV</b>. For text generation, you can choose between
|
|
plain text and Comma-Separated Value format. The latter is useful
|
|
for importing source code into another application, such as a
|
|
spreadsheet.</li>
|
|
<li><b>Generate image files</b>. When exporting to HTML, selecting this
|
|
will cause GIF images to be generated for visualizations.</li>
|
|
<li><b>Overwrite CSS file</b>. Some aspects of the HTML output's format
|
|
are defined by a file called "SGStyle.css", which may be shared between
|
|
multiple HTML files and customized. The file is copied out
|
|
of the RuntimeData directory without modification. It will be
|
|
created if it doesn't exist, but will not be overwritten unless this
|
|
box is checked. The setting is <b>not</b> sticky, and will revert
|
|
to unchecked. (Think of this as a proactive alternative to "are you
|
|
sure you wish to overwrite SGStyle.css?")</li>
|
|
</ul>
|
|
<p>Once you've picked your options, click either "Generate HTML" or
|
|
"Generate Text", then select an output file name from the standard file
|
|
dialog. Any additional files generated, such as graphics for HTML pages,
|
|
will be written to the same directory.</p>
|
|
|
|
<p>All output uses UTF-8 encoding. Filenames of HTML files will have '#'
|
|
replaced with '_' to make linking easier.</p>
|
|
|
|
</div>
|
|
|
|
<div id="footer">
|
|
<p><a href="index.html">Back to index</a></p>
|
|
</div>
|
|
</body>
|
|
<!-- Copyright 2018 faddenSoft -->
|
|
</html>
|