We generate operand adjustments automatically, and output small
values as decimal (e.g. LDA FOO-1) and large values as hex (e.g.
LDA FOO+$1000). This change adds a setting for the threshold
between small and large.
The default setting is 8. The previous threshold was 255, which
felt a bit high.
(issue #176)
SourceGen puts a lot of effort into connecting address operands to
internal offsets and external symbols. Sometimes this is undesirable.
For example, a BIT or conditional branch instruction is sometimes used
as a dummy when there are multiple entry points that vary only in a
status flag or register value. The address in the BIT or branch is
either irrelevant or never actually referenced, so auto-generating a
label for it is annoying.
Having a way to tell the disassembler to disregard an address operand,
so that it won't generate an auto-label or be included in the list of
cross-references, provides a simple solution to this.
The operand's file offset is determined during the code analysis
pass. Setting it to -1 makes the operand look like an external address.
If we disable the project/platform symbol lookup for that instruction
as well, we can avoid label generation and cross-references.
We also need to prevent the 65816 data bank fixup code from restoring
it, and the relocation data code from doing its thing.
The flag is implemented as a new "misc flags" table, which holds a
set of bit flags for every file offset. The "disregard operand address"
flag is set on the opcode byte of instructions. The flags are expected
to be used infrequently, so they're currently output as a list of
(offset, integer) pairs.
The flag is set in the Edit Instruction Operand dialog, which has a
new checkbox. The checkbox is disabled for immediate operands.
(issue #174)
This allows signed decimal operands to be formatted as such, e.g.
"LDA #$FE" becomes "LDA #-2". This can be applied to immediate
operands and to numeric data pseudo-ops.
Not all assemblers support this in all situations. The asm generators
will output unsigned decimal operands if so.
The 20020- and 20022-operand-formats tests have been updated.
Most assemblers treat '<' and '>' as byte-select operators. Merlin 32
treats them as shift operators, making '<' effectively a no-op. A
mask is applied based on the length implied by the opcode or pseudo-op,
e.g. "DFB <FOO" will mask off the high bits. This is problematic for
instructions like "LDA <FOO", where the choice between absolute and DP
addressing depends on the value of "<FOO". If FOO is >= $100, the
lack of masking will cause it be treated as absolute. There is
currently no other mechanism to force the use of a direct-page opcode.
The only recourse is to append "&$ff" to the operand, so the
assembler knows that it can be handled as a DP op. The code generator
has been updated to add "&$ff" where needed, based on the label's
value. Unfortunately the assembler doesn't accept this for certain
specific addressing modes; this appears to be a bug.
The relevant test cases (2003x-labels-and-symbols) have been updated
to exercise these situations. The current test projects side-step the
failing assembler behavior by using a DP label instead. These should
be changed to correctly exercise the full behavior, with the code
generator outputting the instructions as raw hex values to work
around bugs.
There are some deliberately broken things (like a duplicate label) in
the 20030 case that were copied into the 20032 case when it was
created. For ease of editing these have been removed from 20032.
(issue #170)
Fixed "HiAscii" macro used for cc65 output. The macro name was being
forced to upper case at the point of use when the settings were
configured to generate upper-case pseudo-op names.
Also, added upper-casing config info to comment in HTML output.
Merlin32 v1.2 requires DSK and TYP directives. We now add these if
version 1.2 is detected. (Earlier versions of Merlin32 accept them,
so at some point we may just want to output them always.)
The version-parsing code has been updated to deal with pre-release
version strings.
By default, implicit acc operands are shown, e.g. "LSR A" rather
than just "LSR". I like showing operands for instructions that
have multiple address modes.
Not everyone agrees, so now it's a setting. They're shown by default,
but enabling the option will strip them on-screen, in generated
assembly, and in the instruction chart.
They are always omitted for ACME output, which doesn't allow them.
(issue #162)
This was used to prefix boxed lines with a comment delimiter, but
in practice we just need to do this whenever the box character
doesn't match the full-line comment delimiter.
This has no effect on generated assembly output, but the on-screen
(and exported HTML) form will now reflect the current settings.
While it's okay to use ';', the classic Merlin editor will treat it
as an end-of-line comment and shove the entire thing off to the right
side of the screen.
This adds a configuration item to the app settings, with a default
value of ';'.
It was defined as a struct with exposed fields. Now it's a class
with simple properties. Default values are set explicitly, and
the contents are copied with a copy constructor instead of using
the struct member assignment.
The only functional change should be that the DelimiterSet members
are now properly cloned. The bulk of the changes are just refactoring
renames for the property names.
This also adds comments for all the properties.
This adds a new data format option, "binary include", that takes a
filename operand. When assembly sources are generated, the section
of file is replaced with an appropriate pseudo-op, and binary files
are generated that hold the file contents. This is a convenient way
to remove large binary blobs, such as music or sound samples, that
aren't useful to have in text form in the sources.
Partial pathnames are allowed, so you can output a sound blob to
"sounds/blather.bin". For safety reasons, we don't allow the files
to be created above the project directory, and existing files will
only be overwritten if they have a matching length (so you don't
accidentally stomp on your project file).
The files are not currently shown in the GenAsm dialog, which lets
you see a preview of the generated sources. The hex dump tool
can do this for the (presumably rare) situations where it's useful.
A new regression test, 20300-binary-include, has been added. The
pseudo-op name can be overridden on-screen in the settings.
We don't currently do anything new for text/HTML exports. It might
be useful to generate an optional appendix with a hex dump of the
excised sections.
(issue #144)
Added an address-to-offset test in the GeneratePlatformSymbolRefs()
method, which sets the operand symbols for anything that lands outside
the scope of the file. Because the region isolation code prevented
symbols from being associated with the operands in the initial code
scan, those operands were being examined here. Without the additional
test, the inappropriate label associations were getting a second chance.
Added "[!in]" and "[!out]" to the comment field of .addrs lines. This
is only for the on-screen display and text exports, not asm gen.
Bumped the project file CONTENT_VERSION.
Added a regression test (20290-region-isolation).
The test turned up an existing problem: pre-labels are emitted by the
asm generators on their own line, but the code that puts excessively
long labels on a separate line wasn't taking that into account. This
has been fixed. No changes to existing regression tests, which didn't
happen to use long labels.
We currently have two options for assembly code output, selected by
a checkbox in the application settings: always put labels on the same
lines as the instruction or data operand, or split the labels onto
their own line if they were wider than the label text field.
This change adds a third option, which puts labels on their own line
whenever possible. Assemblers don't generally allow this for variable
assignment pseudo-ops like "foo = $1000", but it's accepted for most
other situations. This is a cosmetic change to the output, and will
not affect the generated code.
The old true/false app setting will be disregarded. "Split if too
long" will be used by default.
Added test 20280-label-placement to exercise the "split whenever
allowed" behavior.
The "export" function has a similar option that has not been updated
(for no particular reason other than laziness).
Also, simplified the app settings GetEnum / SetEnum calls, which
can infer the enumerated type from the arguments. This should not
impact behavior.
It's useful for extension scripts to be able to get the file offset
of symbols in non-addressable regions. One example of this is CHR
ROM data for an NES cartridge. However, we were getting the offset
by doing an address-to-offset mapping on the plugin side, which by
definition doesn't work for non-addressable memory.
So we now add the offset to PlSymbol objects for user labels and
address region pre-labels. The NES visualizer has been updated to
use the new field.
Also, fixed a bogus complaint about bank overruns for non-addressable
regions.
The ACME assembler gets upset if you use "not" as a label. We now
avoid doing so, using a generalized implementation of the opcode
mnemonic rename code. (Issue #112.)
Renamed a label to "not" in the 20081-label-localizer test.
This allows regions that hold variable storage to be marked as data
that is initialized by the program before it is used. Previously
the choices were to treat it as bulk data (initialized) or junk
(totally unused), neither of which are correct.
This is functionally equivalent to "junk" as far as source code
generation is concerned (though it doesn't have to be).
For the code/data/junk counter, uninitialized data is counted as
junk, because it technically does not need to be part of the binary.
Added support for "relative" address regions to the Merlin 32 and cc65
code generators. These generate "flat" address directives, and so
were a little more complicated.
Suppressed generation of relative operands for non-addressable regions.
Also, tweaked the 20250-nested-regions test to include a negative
relative region offset.
Modified "jump to" code to understand address range start/end lines.
If there are multiple starts or ends at the same offset, we jump to
the first one in the set, which is suboptimal but simpler to do.
Simplified the API, embedding GoToMode in the Location object (which
is where it really needs to be, to make fwd/back work right).
Updated HTML export to grey out addresses in NON_ADDR sections.
Changed default pseudo-op strings for address regions to ".addrs" and
".adrend", after trying a bunch of things that were worse. Added
definitions for region-end pseudo-ops to Merlin32 and cc65 for display
on screen.
Added regression test 20260 for address region pre-labels.
Fixed handling of leading underscores in platform/project symbols.
These need to be escaped in 64tass output. Updated regression test
20170-external-symbols to check it.
Implemented address region pre-labels. These are useful if the code is
relocating a block from address A to address B, because the code that
does the copying refers to both the "before" address and the "after"
address. Previously you'd give the block the "after" address and the
"before" would just appears as hex, because it's effectively an
external address.
Pre-labels are shown on screen with their address, but no other fields.
Showing the address makes it easy to see the label's value, which isn't
always obvious right before a .arstart. The labels are suppressed if the
address value evaluates to non-addressable.
This defines a new type of symbol, which is external and always global
in scope. Pre-labels affect label localization and must go through
the usual remapping to handle clashes with opcode mnemonics and the
use of leading underscores. Cross-references are computed, but are
associated with the file offset rather than the label line itself.
Added a new filter to the Symbols window ("PreL").
Implemented label input and checking in the address editor. Generally
added highlighting of relevant error labels.
Implemented "is relative" flag. This only affects source code
generation, replacing ".arstart <addr>" with ".arstart *+<value>".
Only output by 64tass and ACME generators.
Added a bold-text summary to radio buttons in address region edit
dialog. This makes it much easier to see what you're doing. Added
a warning to the label edit dialog when a label is being placed in
a non-addressable region.
Modified double-click behavior for .arstart/.arend to jump to the
other end when the opcode is clicked on. This matches the behavior
of instructions with address operands.
Reordered Actions menu, putting "edit operand" at the top.
Fixed AddressMap entry collision testing.
Fixed PRG issue with multiple address regions at offset +000002.
Added regression tests. Most of the complicated stuff with regions
is tested by unit tests inside AddressMap, but we still need to
exercise nested region code generation.
Added support for non-addressable regions, which are useful for things
like file headers stripped out by the system loader, or chunks that
get loaded into non-addressable graphics RAM. Regions are specified
with the "NA" address value. The code list displays the address field
greyed out, starting from zero (which is kind of handy if you want to
know the relative offset within the region).
Putting labels in non-addressable regions doesn't make sense, but
symbol resolution is complicated enough that we really only have two
options: ignore the labels entirely, or allow them but warn of their
presence. The problem isn't so much the label, which you could
legitimately want to access from an extension script, but rather the
references to them from code or data. So we keep the label and add a
warning to the Messages list when we see a reference.
Moved NON_ADDR constants to Address class. AddressMap now has a copy.
This is awkward because Asm65 and CommonUtil don't share.
Updated the asm code generators to understand NON_ADDR, and reworked
the API so that Merlin and cc65 output is correct for nested regions.
Address region changes are now noted in the anattribs array, which
makes certain operations faster than checking the address map. It
also fixes a failure to recognize mid-instruction region changes in
the code analyzer.
Tweaked handling of synthetic regions, which are non-addressable areas
generated by the linear address map traversal to fill in any "holes".
The address region editor now treats attempts to edit them as
creation of a new region.
Updated project file format to save the new map entries.
Tweaked appearance of .arend directives to show the .arstart address
in the operand field. This makes it easier to match them up on screen.
Also, add a synthetic comment on auto-generated .arstart entries.
Added .arstart/.arend to the things that respond to Jump to Operand
(Ctrl+J). Selecting one jumps to the other end. (Well, it jumps
to the code nearest the other, which will do for now.)
Added a menu item to display a text rendering of the address map.
Helpful when things get complicated.
Modified the linear map iterator to return .arend items with the offset
of the last byte in the region, rather than the first byte of the
following region. While the "exclusive end" approach is pretty
common, it caused problems when updating the line list, because it
meant that the .arend directives were outside the range of offsets
being updated (and, for directives at the end of the file, outside
the file itself). This was painful to deal with for partial updates.
Changing this required some relatively subtle changes and annoyed some
of the debug assertions, such as the one where all Line items have
offsets that match the start of a line, but it's the cleaner approach.
Split ".org" into ".arstart" and ".arend" (address range start/end).
Address range ends are now shown in the code list view, and the
pseudo-op can be edited in app settings. Address range starts are
now shown after notes and long comments, rather than before, which
brings the on-screen display in sync with generated code.
Reworked the address range editor UI to include the new features.
The implementation is fully broken.
More changes to the AddressMap API, putting the resolved region length
into a separate ActualLength field. Added FindRegion(). Renamed
some things.
Code generation changed slightly: the blank line before a region-end
line now comes after it, and ACME's "} ;!pseudopc" is now just "}".
This required minor updates to some of the regression test results.
AddressMap API reshuffle. Added "pre-label" to class and API. Split
AddressMapEntry into two parts to make it clear when FLOATING_LEN
has been resolved.
Updated display line list generator to use in-line linear map
traversal. Previous approach was to walk through the list of regions
in a second pass, inserting .ORG directives, but that was awkward
and is no longer needed.
This is the first step toward changing the address region map from a
linear list to a hierarchy. See issue #107 for the plan.
The AddressMap class has been rewritten to support the new approach.
The rest of the project has been updated to conform to the new API,
but feature-wise is unchanged. While the map class supports
nested regions with explicit lengths, the rest of the application
still assumes a series of non-overlapping regions with "floating"
lengths.
The Set Address dialog is currently non-functional.
All of the output for cc65 changed because generation of segment
comments has been removed. Some of the output for ACME changed as
well, because we no longer follow "* = addr" with a redundant
pseudopc statement. ACME and 65tass have similar approaches to
placing things in memory, and so now have similar implementations.
If a DCI string ended with a string delimiter or non-ASCII character
(e.g. a PETSCII char with no ASCII equivalent), the code generator
output the last byte as a hex value. This caused an error because it
was outputting the raw hex value, with the high bit already set, which
the assembler did not expect.
This change corrects the behavior for code generation and on-screen
display, and adds a few samples to the regression test suite.
(see issue #102)
On the 65816, if you say "JSR foo" from bank $12, but "foo" is an
address in bank 0, most assemblers will conclude that you're forming
a 16-bit argument with a 16-bit address and assemble happily. 64tass
halts with an error. Up until v1.55 or so, you could fake it out
by supplying a large offset.
This no longer works. The preferred way to say "no really I mean to
do this" is to append ",k" to the operand. We now do that as needed.
I didn't want to define a new ExpressionMode for 64tass just to
support an operand modifier that should probably never actually get
generated (you can't call across banks with JSR!), so this is
implemented with a quirk and an op flag.
64tass v1.56.2625 is now the default.
(issue #104)
The DCI string format uses character values where the high bit of the
last byte differs from the rest of the string. Usually all the high
bits are clear except on the last byte, but SourceGen generally allows
either polarity.
This gets a little uncertain with single-character strings, because
SourceGen can't auto-detect DCI very effectively. A series of bytes
with the high bit set could be a single high-ASCII string or a series
of single-byte DCI strings.
The motivation for allowing them is C64 PETSCII. While ASCII allows
"high ASCII" as an escape hatch, PETSCII doesn't have that option, so
there's no way to mark the data as a character or a string. We still
want to do a bit of screening, but if the user specifies a non-ASCII
character set and the selected bytes have their high bits set, we
want to just treat the whole set as 1-byte DCI.
Some minor adjustments were needed for a couple of validity checks
that expected longer strings.
This adds some short DCI strings in different character sets to the
char-encoding regression tests.
(for issue #102)
64tass wants to place its output into a 64KB region of memory,
starting at the address "*" is set to, and continuing without
wrapping around the end of the bank. Some files aren't meant to be
handled that way, so we need to generate the output differently.
If the file's output fits nicely, it's considered "loadable", and
is generated in the usual way. If it doesn't, it's treated as
"streamable", and the initial "* = addr" directive is omitted
(leaving "*" at zero), and we go straight to ".logical" directives.
65816 code with an initial address outside bank 0 is treated as
"streamable" whether or not the contents fit nicely in the designated
64K area. This caused a minor change to a few of the 65816 tests.
A new test, 20240-large-overlay, exercises "streamable" by creating
a file with eight overlapping 8KB segments that load at $8000.
While the file as a whole fits in 64KB, it wouldn't if loaded at
the desired start address.
Also, updated the regression test harness to report assembler
failure independently of overall test failure. This makes it easier
to confirm that (say) ACME v0.96.4 still works with the code we
generate, even though it doesn't match the expected output (which
was generated for v0.97).
(problem was raised in issue #98)
The initial implementation was testing the byte value rather than
the converted value, so backslashes were getting through in high
ASCII strings. PETSCII and C64 screen codes don't really have a
backslash so it's not really an issue there.
The new implementation handles high ASCII correctly. The various
201n0-char-encoding-x regression tests have been updated to verify
this.
The code generator outputs an optional comment specifying which
version of which assembler the code was generated for. This was
handled inconsistently and, for the most part, incorrectly. We now
report the correct version.
Two things changed: (1) string literals can now hold backslash
escapes like "\n"; (2) MVN/MVP operands can now be prefixed with '#'.
The former was a breaking change because any string with "\" must
be changed to "\\". This is now handled by the string operand
formatter.
Also, improved test harness output. Show the assembler versions at
the end, and include assembler failure messages in the collected
output.
We append an assembler identifier to generated code. For Merlin 32,
this was "_Merlin32". All of the other assemblers use a lower-case
string, which makes Merlin look a little weird, so it has been
changed to "_merlin32".
Windows filesystems are generally case-insensitive, so this won't
likely affect anything.
A few tweaks:
- Test now requires an ORG on offset +000002, not just a correct
address.
- Suppress on-screen display of the initial ORG directive when
a PRG file is detected. Subtle, but helpful.
- In new project setup, fix initial address for PRG projects that
load at $0000.
- In new project setup, add a "load address" comment to the first line.
Also, fix some out-of-date documentation.
(issue #90)
C64 PRG files are pretty common. Their salient feature is that they
start with a 16-bit value that is used as the load address. The
value is commonly generated by the assembler itself, rather than
explicitly added to the source file.
Not all assemblers know what a PRG file is, and some of them handle
it in ways that are difficult to guarantee in SourceGen. ACME adds
the 16-bit header when the output file name ends in ".prg", cc65
uses a modified config file, 64tass uses a different command-line
option, and Merlin 32 has no idea what they are.
This change adds PRG file detection and handling to the 64tass code
generator. Doing so required making a few changes to the gen/asm
interfaces, because we now need to have the generator pass additional
flags to the assembler, and sometimes we need code generation to
start somewhere other than offset zero. Overall the changes were
pretty minor.
The 20042-address-changes test needed a 6502-only variant. A new test
(20040-address-changes) has been added and given a PRG header. As
part of this change the 65816 variant was changed to use addresses
in bank 2, which uncovered a code generation bug that this change
also fixes.
The 64tass --long-address flag doesn't appear to be necessary for
files <= 65536 bytes long, so we no longer emit it for those.
(issue #90)
Modified the asm source generators and on-screen display to show the
DP arg for BBR/BBS as hex. The instructions are otherwise treated
as relative branches, e.g. the DP arg doesn't get factored into the
cross-reference table.
ACME/cc65 put the bit number in the mnemonic, 64tass wants it to be
in the first argument, and Merlin32 wants nothing to do with any of
this because it's incompatible with the 65816.
Added an "all ops" test for W65C02.
Long operands, such as strings and bulk data, can span multiple lines.
SourceGen wraps them at 64 characters, which is fine for assembly
output but occasionally annoying on screen: if the operand column is
wide enough to show the entire value, the comment column is pushed
pretty far to the right.
This change makes the width configurable, as 32/48/64 characters,
with a pop-up in app settings.
The assemblers are all wired to 64 characters, though we could make
this configurable as well with an assembler-specific setting.
Some things have moved around a bit in app settings. The Asm Config
tab now comes last. Having it sandwiched in the middle of tabs that
altered the on-screen display didn't make much sense. The Display
Format is now explicitly for opcodes and operands, and is split into
two columns. The left column is managed by the "quick set" feature,
the right column is independent.
Added a "fake" assembler pseudo-op for DBR changes. Display entries
in line list.
Added entry to double-click handler so that you can double-click on
a PLB instruction operand to open the data bank editor.
The decision of how to handle indeterminate M/X flag values is made in
StatusFlags. This provides consistent behavior throughout the app.
This was being done for M/X but not for E.
This change also renames the M/X tests, prefixing them with "Is" to
emphasize that they are boolean rather than tri-state.
There should be no change in behavior from this.
Code generated for 64tass was incorrect for JSR/JMP to a location
outside the file bounds. A test added to 20052-branches-and-banks
revealed an issue with cc65 generation as well.
Two basic problems:
(1) cc65, being a one-pass assembler, can't tell if a forward-referenced
label is 16-bit or 24-bit. If the operand is potentially ambiguous,
such as "LDA label", we need to add an operand width disambiguator.
(The existing tests managed to only do backward references.)
(2) 64tass wants the labels on JMP/JSR absolute operands to have 24-bit
values that match the current program bank. This is the opposite of
cc65, which requires 16-bit values. We need to distinguish PBR vs.
DBR instructions (i.e. "LDA abs" vs. "JMP abs") and handle them
differently when formatting for "Common".
Merlin32 doesn't care, and ACME doesn't work at all, so neither of
those needed updating.
The 20052-branches-and-banks test was expanded to cover the problematic
cases.