mirror of
https://github.com/fadden/6502bench.git
synced 2024-11-26 06:49:19 +00:00
149e763821
The documentation for 64tass says you're required to pass "--ascii" when the source file is ASCII (as opposed to PETSCII). We were ignoring this, but it turns out that everything works a bit better if we don't. So we now pass "--ascii" on the command line, and add a two-line character encoding definition to every file that is generated with ASCII as the default encoding. The sg_petscii and sg_screen encodings go away, as PETSCII is now the default, and we can use the built-in "screen" encoding.
310 lines
14 KiB
HTML
310 lines
14 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<head>
|
|
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
<link href="main.css" rel="stylesheet" type="text/css" />
|
|
<title>Code Generation & Assembly - 6502bench SourceGen</title>
|
|
</head>
|
|
|
|
<body>
|
|
<div id="content">
|
|
<h1>6502bench SourceGen: Code Generation & Assembly</h1>
|
|
<p><a href="index.html">Back to index</a></p>
|
|
|
|
<p>SourceGen can generate an assembly source file that, when fed into
|
|
the target assembler, will recreate the original data file exactly.
|
|
Every assembler is different, so support must be added to SourceGen
|
|
for each.</p>
|
|
<p>The generation / assembly dialog can be opened with File > Assemble.</p>
|
|
|
|
|
|
<h2><a name="supported">Supported Assemblers</a></h2>
|
|
|
|
<p>SourceGen currently supports the following cross-assemblers:</p>
|
|
<ul>
|
|
<li><a href="#64tass">64tass</a></li>
|
|
<li><a href="#acme">ACME</a></li>
|
|
<li><a href="#cc65">cc65</a></li>
|
|
<li><a href="#merlin32">Merlin 32</a></li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="version">Version-Specific Code Generation</a></h3>
|
|
|
|
<p>Code generation must be tailored to the specific version of the
|
|
assembler. This is most easily understood with an example.</p>
|
|
<p>If you write <code>MVN $01,$02</code>, the assembler is expected to output
|
|
<code>54 02 01</code>, with the arguments reversed. cc65 v2.17 doesn't
|
|
do that; this is a bug that was fixed in a later version. So if you're
|
|
generating code for v2.17, you want to create source code with the
|
|
arguments the wrong way around.</p>
|
|
<p>Having version-dependent source code is a bad idea, so SourceGen
|
|
just outputs raw hex bytes for MVN/MVP instructions. This yields the
|
|
correct code for all versions of the assembler, but is ugly and
|
|
annoying. So we want to output actual MVN/MVP instructions when producing
|
|
code for newer versions of the assembler.</p>
|
|
<p>When you configure a cross-assembler, SourceGen runs the executable with
|
|
version query args, and extracts the version information from the output
|
|
stream. This is used by the generator to ensure that the output will compile.
|
|
If no assembler is configured, SourceGen will produce code optimized
|
|
for the latest version of the assembler.</p>
|
|
|
|
<h2><a name="generate">Generating Source Code</a></h2>
|
|
|
|
<p>Cross assemblers tend to generate additional files, either compiler
|
|
intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some
|
|
generators may produce multiple source files, perhaps a link script or
|
|
symbol definition header to go with the assembly source. To avoid
|
|
spreading files across the filesystem, SourceGen does all of its work
|
|
in the same directory where the project lives. Before you can generate
|
|
code, you have to have assigned your project a directory. This is why
|
|
you can't assemble a project until you've saved it for the first time.</p>
|
|
|
|
<p>The Generate and Assemble dialog has a drop-down list near the top
|
|
that lets you pick which assembler to target. The name of the assembler
|
|
will be shown with the detected version number. If the assembler
|
|
executable isn't configured, "[latest version]" will be shown instead
|
|
of a version number.</p>
|
|
<p>The Settings button will take you directly to the assembler configuration
|
|
tab in the application settings dialog.</p>
|
|
<p>Hit the Generate button to generate the source code into a file on disk.
|
|
The file will use the project name, with the ".dis65" replaced by
|
|
"_<assembler>.S".</p>
|
|
<p>The first 64KiB of each generated file will be shown in the preview
|
|
window. If multiple files were generated, you can use the "preview file"
|
|
drop-down to select between them. Line numbers are
|
|
prepended to each line to make it easier to track down errors.</p>
|
|
|
|
|
|
|
|
<h3><a name="localizer">Label Localizer</a></h3>
|
|
<p>The label localizer is an optional feature that automatically converts
|
|
some labels to an assembler-specific less-than-global label format. Local
|
|
labels may be reusable (e.g. using "]LOOP" for multiple consecutive
|
|
loops is easier to understand than giving each one a unique label) or
|
|
reduce the size of a generated link table. There are usually restrictions
|
|
on local labels, e.g. references to them may not be allowed to cross a
|
|
global label definition, which the localizer factors in automatically.</p>
|
|
<p>The localizer is somewhat experimental at this time, and can be
|
|
disabled from the
|
|
<a href="settings.html#app-settings">application settings</a>.</p>
|
|
|
|
|
|
<h2><a name="assemble">Cross-Assembling Generated Code</a></h2>
|
|
|
|
<p>After generating sources, if you have a cross-assembler executable
|
|
configured, you can run it by clicking the "Run Assembler" button. The
|
|
command-line output will be displayed, with stdout and stderr separated.
|
|
(I'd prefer them to be interleaved, but that's not what the system
|
|
provides.)</p>
|
|
|
|
<p>The output will show the assembler's exit code, which will be zero
|
|
on success (note: sometimes they lie.) If it appeared to succeed,
|
|
SourceGen will then compare the assembler's output to the original file,
|
|
and report any differences.</p>
|
|
<p>Failures here may be due to bugs in the cross-assembler or in
|
|
SourceGen. However, SourceGen can generally work around assembler bugs,
|
|
so any failure is an opportunity for improvement.</p>
|
|
|
|
|
|
<h2><a name="quirks">Assembler-Specific Bugs & Quirks</a></h2>
|
|
|
|
<p>This is a list of bugs and quirky behavior in cross-assemblers that
|
|
SourceGen works around when generating code.</p>
|
|
<p>Every assembler seems to have a different way of dealing with expressions.
|
|
Most of them will let you group expressions with parenthesis, but that
|
|
doesn't always help. For example, <code>PEA label >> 8 + 1</code> is
|
|
perfectly valid, but writing <code>PEA (label >> 8) + 1</code> will cause
|
|
most assemblers to assume you're trying to use an alternate (and non-existent)
|
|
form of <code>PEA</code> with indirect addressing, causing the assembler
|
|
to halt with an error message. The code generator needs
|
|
to understand expression syntax and operator precedence to generate correct
|
|
code, but also needs to know how to handle the corner cases.</p>
|
|
|
|
|
|
<h3><a name="64tass">64tass</a></h3>
|
|
|
|
<p>Code is generated for 64tass v1.53.1515 or later.
|
|
<a href="https://sourceforge.net/projects/tass64/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>Undocumented opcode <code>SHA (ZP),Y</code> ($93) is not supported;
|
|
the assembler appears to be expecting <code>SHA ABS,X</code> instead.</li>
|
|
<li>WDM is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The underscore character ('_') is allowed as a character in labels,
|
|
but when used as the first character in a label it indicates the
|
|
label is local. If you create labels with leading underscores that
|
|
are not local, the labels must be altered to start with some other
|
|
character, and made unique.</li>
|
|
<li>Labels starting with two underscores are "reserved". Trying to
|
|
use them causes an error.</li>
|
|
<li>By default, 64tass sets the first two bytes of the output file to
|
|
the load address. The <code>--nostart</code> flag is used to
|
|
suppress this.</li>
|
|
<li>By default, 64tass is case-insensitive, but SourceGen treats labels
|
|
as case-sensitive. The <code>--case-sensitive</code> must be passed to
|
|
the assembler.</li>
|
|
<li>If you set the <code>--case-sensitive</code> flag, <b>all</b> opcodes
|
|
and operands must be lower-case. Most of the SourceGen options that
|
|
cause things to appear in upper case must be disabled.</li>
|
|
<li>For 65816, selecting the bank byte is done with the back-quote ('`')
|
|
rather than the caret ('^'). (There's a note in the docs to the effect
|
|
that they plan to move to carets.)</li>
|
|
<li>The arguments to COP and BRK require immediate-mode syntax
|
|
(<code>COP #$03</code> rather than <code>COP $03</code>).
|
|
<li>For historical reasons, the default behavior of the assembler is to
|
|
assume that the source file is PETSCII, and the desired encoding for
|
|
strings is also PETSCII. No character conversion is done, so anybody
|
|
assembling ASCII files will get ASCII strings (which works out pretty
|
|
well if you're assembling code for a non-Commodore target). However,
|
|
the documentation says you're required to pass the "--ascii" flag when
|
|
the input is ASCII/UTF-8, so to build files that want ASCII operands
|
|
an explicit character encoding definition must be provided.</li>
|
|
</ul>
|
|
|
|
<p>Notes:</p>
|
|
<ul>
|
|
<li>The "default text encoding" project property is used by SourceGen
|
|
to set the text encoding for the entire source file. For non-ASCII
|
|
projects, a small encoding table is generated at the top of the output.
|
|
This works for C64 PETSCII and C64 screen codes, but not for high
|
|
ASCII. This is done without passing "--ascii" on the command line.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="acme">ACME</a></h3>
|
|
|
|
<p>Code is generated for ACME v0.96.4 or later.
|
|
<a href="https://sourceforge.net/projects/acme-crossass/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>The "pseudo PC" is only 16 bits, so any 65816 code targeted to run
|
|
outside bank zero cannot be assembled. SourceGen currently deals with
|
|
this by outputting the entire file as a hex dump.</li>
|
|
<li>Undocumented opcode $AB (<code>LAX #imm</code>) generates an error.</li>
|
|
<li>WDM is not allowed to have an operand.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>The assembler shares some traits with one-pass assemblers. In
|
|
particular, if you forward-reference a zero-page label, the reference
|
|
generates a 16-bit absolute address instead of an 8-bit zero-page
|
|
address. Unlike other one-pass assemblers, the width is "sticky",
|
|
and backward references appearing later in the file also use absolute
|
|
addressing even though the proper width is known at that point. This is
|
|
worked around by using explicit "force zero page" annotations on
|
|
all references to zero-page labels.</li>
|
|
<li>Undocumented opcode <code>ALR</code> ($4b) uses mnemonic
|
|
<code>ASR</code> instead.</li>
|
|
<li>Does not allow the accumulator to be specified explicitly as an
|
|
operand, e.g. you can't write <code>LSR A</code>.</li>
|
|
<li>Syntax for <code>MVN</code>/<code>MVP</code> doesn't allow '#'
|
|
before 8-bit operands.</li>
|
|
<li>Officially, the preferred file extension for ACME source code is ".a",
|
|
but this is already used on UNIX systems for static libraries (which
|
|
means shell filename completion tends to ignore them). Since ".S" is
|
|
pretty universally recognized as assembly source, code generated by
|
|
SourceGen for ACME also uses ".S".</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="cc65">cc65</a></h3>
|
|
|
|
<p>Code is generated for cc65 v2.17 or v2.18.
|
|
<a href="https://cc65.github.io/">[web site]</a></p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>[Fixed in v2.18] The arguments to <code>MVN</code>/<code>MVP</code> are reversed.</li>
|
|
<li>[Fixed in v2.18] <code>BRK <arg></code> is assembled to opcode
|
|
$05 rather than $00.</li>
|
|
<li>[Fixed in v2.18] <code>WDM</code> is not supported.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Consider <code>label >> 8 - 16</code>.
|
|
cc65 puts shift higher than subtraction, whereas languages like C
|
|
and assemblers like 64tass do it the other way around. So cc65
|
|
regards the expression as <code>(label >> 8) - 16</code>, while the
|
|
more common interpretation would be <code>label >> (8 - 16)</code>.
|
|
(This is actually somewhat convenient, since none of the expressions
|
|
SourceGen currently generates require parenthesis.)</li>
|
|
<li>Undocumented opcode <code>SBX</code> ($cb) uses the mnemonic AXS. All
|
|
other opcodes match up with the "unintended opcodes" document.</li>
|
|
<li>ca65 is implemented as a single-pass assembler, so label widths
|
|
can't always be known in time. For example, if you use some zero-page
|
|
labels, but they're defined via <code>.ORG $0000</code> after the point
|
|
where the labels are used, the assembler will already have generated them
|
|
as absolute values. Width disambiguation must be applied to operands
|
|
that wouldn't be ambiguous to a multi-pass assembler.</li>
|
|
<li>The assembler is geared toward generating relocatable code with
|
|
multiple segments (it is, after all, an assembler for a C compiler).
|
|
A linker configuration script is expected to be provided for anything
|
|
complex. SourceGen generates a custom config file for each project.</li>
|
|
</ul>
|
|
|
|
|
|
<h3><a name="merlin32">Merlin 32</a></h3>
|
|
|
|
<p>Code is generated for Merlin 32 v1.0.
|
|
<a href="https://www.brutaldeluxe.fr/products/crossdevtools/merlin/">[web site]</a>
|
|
<a href="https://github.com/apple2accumulator/merlin32/issues">[bug tracker]</a>
|
|
</p>
|
|
|
|
<p>Bugs:</p>
|
|
<ul>
|
|
<li>PC relative branches don't wrap around at bank boundaries.</li>
|
|
<li>For some failures, an exit code of zero is returned.</li>
|
|
<li>Some DP indexed store instructions cause errors if the label isn't
|
|
unambiguously DP (e.g. <code>STX $00,X</code> vs.
|
|
<code>STX $0000,X</code>). This isn't a problem with project/platform
|
|
symbols, which are output as two-digit hex values when possible, but
|
|
causes failures when direct page locations are included in the project
|
|
and given labels.</li>
|
|
<li>The check for 64KiB overflow appears to happen before instructions
|
|
that might be absolute or direct page are resolved and reduced in size.
|
|
This makes it unlikely that a full 64KiB bank of code can be
|
|
assembled.</li>
|
|
</ul>
|
|
|
|
<p>Quirks:</p>
|
|
<ul>
|
|
<li>Operator precedence is unusual. Expressions are generally processed
|
|
from left to right. The byte-selection operators have a lower
|
|
precedence than all of the others, and so are always processed last.</li>
|
|
<li>The byte selection operators ('<', '>', '^') are actually
|
|
word-selection operators, yielding 16-bit values when wide registers
|
|
are enabled on the 65816.</li>
|
|
<li>Values loaded into registers are implicitly mod 256 or 65536. There
|
|
is no need to explicitly mask an expression.</li>
|
|
<li>The assembler tracks register widths when it sees SEP/REP instructions,
|
|
but doesn't attempt to track the emulation flag. So if you issue a
|
|
<code>REP #$20</code>
|
|
while in emulation mode, the assembler will incorrectly assume long
|
|
registers. Ideally it would be possible to configure that off, but
|
|
there's no way to do that, so instead we occasionally generate
|
|
additional width directives.</li>
|
|
<li>Non-unique local labels should cause an error, but don't.</li>
|
|
<li>No undocumented opcodes are supported.</li>
|
|
</ul>
|
|
|
|
</div>
|
|
|
|
<div id="footer">
|
|
<p><a href="index.html">Back to index</a></p>
|
|
</div>
|
|
</body>
|
|
<!-- Copyright 2018 faddenSoft -->
|
|
</html>
|