It's useful for extension scripts to be able to get the file offset
of symbols in non-addressable regions. One example of this is CHR
ROM data for an NES cartridge. However, we were getting the offset
by doing an address-to-offset mapping on the plugin side, which by
definition doesn't work for non-addressable memory.
So we now add the offset to PlSymbol objects for user labels and
address region pre-labels. The NES visualizer has been updated to
use the new field.
Also, fixed a bogus complaint about bank overruns for non-addressable
regions.
Added support for "relative" address regions to the Merlin 32 and cc65
code generators. These generate "flat" address directives, and so
were a little more complicated.
Suppressed generation of relative operands for non-addressable regions.
Also, tweaked the 20250-nested-regions test to include a negative
relative region offset.
Implemented "is relative" flag. This only affects source code
generation, replacing ".arstart <addr>" with ".arstart *+<value>".
Only output by 64tass and ACME generators.
Added a bold-text summary to radio buttons in address region edit
dialog. This makes it much easier to see what you're doing. Added
a warning to the label edit dialog when a label is being placed in
a non-addressable region.
Modified double-click behavior for .arstart/.arend to jump to the
other end when the opcode is clicked on. This matches the behavior
of instructions with address operands.
Reordered Actions menu, putting "edit operand" at the top.
Fixed AddressMap entry collision testing.
Fixed PRG issue with multiple address regions at offset +000002.
Added regression tests. Most of the complicated stuff with regions
is tested by unit tests inside AddressMap, but we still need to
exercise nested region code generation.
Added support for non-addressable regions, which are useful for things
like file headers stripped out by the system loader, or chunks that
get loaded into non-addressable graphics RAM. Regions are specified
with the "NA" address value. The code list displays the address field
greyed out, starting from zero (which is kind of handy if you want to
know the relative offset within the region).
Putting labels in non-addressable regions doesn't make sense, but
symbol resolution is complicated enough that we really only have two
options: ignore the labels entirely, or allow them but warn of their
presence. The problem isn't so much the label, which you could
legitimately want to access from an extension script, but rather the
references to them from code or data. So we keep the label and add a
warning to the Messages list when we see a reference.
Moved NON_ADDR constants to Address class. AddressMap now has a copy.
This is awkward because Asm65 and CommonUtil don't share.
Updated the asm code generators to understand NON_ADDR, and reworked
the API so that Merlin and cc65 output is correct for nested regions.
Address region changes are now noted in the anattribs array, which
makes certain operations faster than checking the address map. It
also fixes a failure to recognize mid-instruction region changes in
the code analyzer.
Tweaked handling of synthetic regions, which are non-addressable areas
generated by the linear address map traversal to fill in any "holes".
The address region editor now treats attempts to edit them as
creation of a new region.
Updated project file format to save the new map entries.
Tweaked appearance of .arend directives to show the .arstart address
in the operand field. This makes it easier to match them up on screen.
Also, add a synthetic comment on auto-generated .arstart entries.
Added .arstart/.arend to the things that respond to Jump to Operand
(Ctrl+J). Selecting one jumps to the other end. (Well, it jumps
to the code nearest the other, which will do for now.)
Added a menu item to display a text rendering of the address map.
Helpful when things get complicated.
Modified the linear map iterator to return .arend items with the offset
of the last byte in the region, rather than the first byte of the
following region. While the "exclusive end" approach is pretty
common, it caused problems when updating the line list, because it
meant that the .arend directives were outside the range of offsets
being updated (and, for directives at the end of the file, outside
the file itself). This was painful to deal with for partial updates.
Changing this required some relatively subtle changes and annoyed some
of the debug assertions, such as the one where all Line items have
offsets that match the start of a line, but it's the cleaner approach.
Split ".org" into ".arstart" and ".arend" (address range start/end).
Address range ends are now shown in the code list view, and the
pseudo-op can be edited in app settings. Address range starts are
now shown after notes and long comments, rather than before, which
brings the on-screen display in sync with generated code.
Reworked the address range editor UI to include the new features.
The implementation is fully broken.
More changes to the AddressMap API, putting the resolved region length
into a separate ActualLength field. Added FindRegion(). Renamed
some things.
Code generation changed slightly: the blank line before a region-end
line now comes after it, and ACME's "} ;!pseudopc" is now just "}".
This required minor updates to some of the regression test results.
AddressMap API reshuffle. Added "pre-label" to class and API. Split
AddressMapEntry into two parts to make it clear when FLOATING_LEN
has been resolved.
Updated display line list generator to use in-line linear map
traversal. Previous approach was to walk through the list of regions
in a second pass, inserting .ORG directives, but that was awkward
and is no longer needed.
This is the first step toward changing the address region map from a
linear list to a hierarchy. See issue #107 for the plan.
The AddressMap class has been rewritten to support the new approach.
The rest of the project has been updated to conform to the new API,
but feature-wise is unchanged. While the map class supports
nested regions with explicit lengths, the rest of the application
still assumes a series of non-overlapping regions with "floating"
lengths.
The Set Address dialog is currently non-functional.
All of the output for cc65 changed because generation of segment
comments has been removed. Some of the output for ACME changed as
well, because we no longer follow "* = addr" with a redundant
pseudopc statement. ACME and 65tass have similar approaches to
placing things in memory, and so now have similar implementations.
On the 65816, if you say "JSR foo" from bank $12, but "foo" is an
address in bank 0, most assemblers will conclude that you're forming
a 16-bit argument with a 16-bit address and assemble happily. 64tass
halts with an error. Up until v1.55 or so, you could fake it out
by supplying a large offset.
This no longer works. The preferred way to say "no really I mean to
do this" is to append ",k" to the operand. We now do that as needed.
I didn't want to define a new ExpressionMode for 64tass just to
support an operand modifier that should probably never actually get
generated (you can't call across banks with JSR!), so this is
implemented with a quirk and an op flag.
64tass v1.56.2625 is now the default.
(issue #104)
64tass wants to place its output into a 64KB region of memory,
starting at the address "*" is set to, and continuing without
wrapping around the end of the bank. Some files aren't meant to be
handled that way, so we need to generate the output differently.
If the file's output fits nicely, it's considered "loadable", and
is generated in the usual way. If it doesn't, it's treated as
"streamable", and the initial "* = addr" directive is omitted
(leaving "*" at zero), and we go straight to ".logical" directives.
65816 code with an initial address outside bank 0 is treated as
"streamable" whether or not the contents fit nicely in the designated
64K area. This caused a minor change to a few of the 65816 tests.
A new test, 20240-large-overlay, exercises "streamable" by creating
a file with eight overlapping 8KB segments that load at $8000.
While the file as a whole fits in 64KB, it wouldn't if loaded at
the desired start address.
Also, updated the regression test harness to report assembler
failure independently of overall test failure. This makes it easier
to confirm that (say) ACME v0.96.4 still works with the code we
generate, even though it doesn't match the expected output (which
was generated for v0.97).
(problem was raised in issue #98)
A few tweaks:
- Test now requires an ORG on offset +000002, not just a correct
address.
- Suppress on-screen display of the initial ORG directive when
a PRG file is detected. Subtle, but helpful.
- In new project setup, fix initial address for PRG projects that
load at $0000.
- In new project setup, add a "load address" comment to the first line.
Also, fix some out-of-date documentation.
(issue #90)
C64 PRG files are pretty common. Their salient feature is that they
start with a 16-bit value that is used as the load address. The
value is commonly generated by the assembler itself, rather than
explicitly added to the source file.
Not all assemblers know what a PRG file is, and some of them handle
it in ways that are difficult to guarantee in SourceGen. ACME adds
the 16-bit header when the output file name ends in ".prg", cc65
uses a modified config file, 64tass uses a different command-line
option, and Merlin 32 has no idea what they are.
This change adds PRG file detection and handling to the 64tass code
generator. Doing so required making a few changes to the gen/asm
interfaces, because we now need to have the generator pass additional
flags to the assembler, and sometimes we need code generation to
start somewhere other than offset zero. Overall the changes were
pretty minor.
The 20042-address-changes test needed a 6502-only variant. A new test
(20040-address-changes) has been added and given a PRG header. As
part of this change the 65816 variant was changed to use addresses
in bank 2, which uncovered a code generation bug that this change
also fixes.
The 64tass --long-address flag doesn't appear to be necessary for
files <= 65536 bytes long, so we no longer emit it for those.
(issue #90)
Modified the asm source generators and on-screen display to show the
DP arg for BBR/BBS as hex. The instructions are otherwise treated
as relative branches, e.g. the DP arg doesn't get factored into the
cross-reference table.
ACME/cc65 put the bit number in the mnemonic, 64tass wants it to be
in the first argument, and Merlin32 wants nothing to do with any of
this because it's incompatible with the 65816.
Added an "all ops" test for W65C02.
Long operands, such as strings and bulk data, can span multiple lines.
SourceGen wraps them at 64 characters, which is fine for assembly
output but occasionally annoying on screen: if the operand column is
wide enough to show the entire value, the comment column is pushed
pretty far to the right.
This change makes the width configurable, as 32/48/64 characters,
with a pop-up in app settings.
The assemblers are all wired to 64 characters, though we could make
this configurable as well with an assembler-specific setting.
Some things have moved around a bit in app settings. The Asm Config
tab now comes last. Having it sandwiched in the middle of tabs that
altered the on-screen display didn't make much sense. The Display
Format is now explicitly for opcodes and operands, and is split into
two columns. The left column is managed by the "quick set" feature,
the right column is independent.
The decision of how to handle indeterminate M/X flag values is made in
StatusFlags. This provides consistent behavior throughout the app.
This was being done for M/X but not for E.
This change also renames the M/X tests, prefixing them with "Is" to
emphasize that they are boolean rather than tri-state.
There should be no change in behavior from this.
Code generated for 64tass was incorrect for JSR/JMP to a location
outside the file bounds. A test added to 20052-branches-and-banks
revealed an issue with cc65 generation as well.
Two basic problems:
(1) cc65, being a one-pass assembler, can't tell if a forward-referenced
label is 16-bit or 24-bit. If the operand is potentially ambiguous,
such as "LDA label", we need to add an operand width disambiguator.
(The existing tests managed to only do backward references.)
(2) 64tass wants the labels on JMP/JSR absolute operands to have 24-bit
values that match the current program bank. This is the opposite of
cc65, which requires 16-bit values. We need to distinguish PBR vs.
DBR instructions (i.e. "LDA abs" vs. "JMP abs") and handle them
differently when formatting for "Common".
Merlin32 doesn't care, and ACME doesn't work at all, so neither of
those needed updating.
The 20052-branches-and-banks test was expanded to cover the problematic
cases.
We're doing this for user labels but not for project/platform
symbols. So if you have a constant named "BCC" you can't assemble
your code with certain assemblers. Now we rename it automatically.
Added a quick test to 2007-labels-and-symbols. (No change to ACME,
which barfs on the test.)
The "is the .junk alignment directive correct" was returning true
for subtype=None (not aligned), which caused execution to go down
the wrong path and irritate an assert.
Correct handling of local variables. We now correctly uniquify them
with regard to non-unique labels. Because local vars can effectively
have global scope we mostly want to treat them as global, but they're
uniquified relative to other globals very late in the process, so we
can't just throw them in the symbol table and be done. Fortunately
local variables exist in a separate namespace, so we just need to
uniquify the variables relative to the post-localization symbol table.
In other words, we take the symbol table, apply the label map, and
rename any variable that clashes.
This also fixes an older problem where we weren't masking the
leading '_' on variable labels when generating 64tass output.
The code list now makes non-unique labels obvious, but you can't tell
the difference between unique global and unique local. What's more,
the default type value in Edit Label is now adjusted to Global for
unique locals that were auto-generated. To make it a bit easier to
figure out what's what, the Info panel now has a "label type" line
that reports the type.
The 2023-non-unique-labels test had some additional tests added to
exercise conflicts with local variables. The 2019-local-variables
test output changed slightly because the de-duplicated variable
naming convention was simplified.
- Renamed "strip label prefix/suffix" to "omit label prefix/suffix".
- Changed a Merlin operand workaround so it doesn't apply to code
that is explicitly not in bank zero.
- Changed {addr}/{const} annotations on project/platform symbol
equates so they line up a little better on screen and in exported
sources.
Continue development of non-unique labels. The actual labels are
still unique, because we append a uniquifier tag, which gets added
and removed behind the scenes. We're currently using the six-digit
hex file offset because this is only used for internal address
symbols.
The label editor and most of the formatters have been updated. We
can't yet assemble code that includes non-unique labels, but older
stuff hasn't been broken.
This removes the "disable label localization" property, since that's
fundamentally incompatible with what we're doing, and adds a non-
unique label prefix setting so you can put '@' or ':' in front of
your should-be-local labels.
Also, fixed a field name typo.
This adds the concept of label annotations. The primary driver of
the feature is the desire to note that sometimes you know what a
thing is, but sometimes you're just taking an educated guess.
Instead of writing "high_score_maybe", you can now write "high_score?",
which is more compact and consistent. The annotations are stripped
off when generating source code, making them similar to Notes.
I also created a "Generated" annotation for the labels that are
synthesized by the address table formatter, but don't modify the
label for them, because there's not much need to remind the user
that "T1234" was generated by algorithm.
This also lays some of the groundwork for non-unique labels.
While adding a message log entry for failing alignment directives,
I noticed that the assembler source generator's test for valid
alignment was allowing some bad alignment values through.
I'm holding off on reporting the message to the log because not all
format changes cause a data-reanalysis, which means the log entry
doesn't always appear and disappear when it should. If we decide
this is an important message we can add a scan for "softer" errors.
In the assembler output, add a blank line between the constants
and addresses in the long list of equates.
The earlier change that corrected the BIT instruction caused test
2009-branches-and-banks to fail, because it was relying on the idea
that BIT made the carry flag indeterminate. Changing a BCC to a
BVS restored the desired behavior.
Sometimes there's a bunch of junk in the binary that isn't used for
anything. Often it's there to make things line up at the start of
a page boundary.
This adds a ".junk" directive that tells the disassembler that it
can safely disregard the contents of a region. If the region ends
on a power-of-two boundary, an alignment value can be specified.
The assembly source generators will output an alignment directive
when possible, a .fill directive when appropriate, and a .dense
directive when all else fails. Because we're required to regenerate
the original data file, it's not always possible to avoid generating
a hex dump.
Most assemblers end local label scope when a global label is
encountered. cc65 takes this one step further by ending local label
scope when constants or variables are defined. So, if we have a
variable table with a nonzero number of entries, we want to create
a fake global label at that point to end the scope.
Merlin 32 won't let you write " LDA #',' ". For some reason the
comma causes an error. IGenerator now has a "tweak operand format"
interface that lets us fix that.
If a line has a comment with a cycle count and nothing else, it was
getting an extra space or two on the end.
Also, added a few end-of-line comments to the 2020 test to show how
they interact with the cycle counts.
Unlike 64tass and Merlin, which allow you to redefine symbols, ACME
uses "zones" that provide scope for local variables. This means
that, at the point of a local variable table definition, we have to
start a new zone and output the full set of active symbols, not just
the newly-defined ones. (If you set the "clear previous" flag in
the LvTable there's no difference.)
We could do a bit better by only outputting the symbols that are
actually used within the zone, similar to what we do for global
project/platform symbols, but that's a bunch of work for questionable
benefit.
After thrashing around a bit, I had to choose between making the
uniquifier more complicated, or making de-duplication a separate
step. Since I don't really expect duplicates to be a thing, I went
with the latter.
Updated the regression test.
Variables are now handled properly end-to-end, except for label
uniquification. So cc65 and ACME can't yet handle a file that
redefines a local variable.
This required a bunch of plumbing, but I think it came out okay.
Previously, we used the default character encoding from the project
properties to determine how strings and character constants in the
entire source file should be encoded. Now we switch between
encodings as needed. The default character encoding is no longer
relevant.
High ASCII is now an actual encoding, rather than acting like ASCII
that sometimes doesn't work. Because we can do high ASCII character
operands with "| $80", we don't output a .enc to switch from ASCII
to high ASCII unless we need to generate a string. (If we're
already in high ASCII mode, the "| $80" isn't required but won't
hurt anything.)
We now do a scan up front to see if ASCII or high ASCII is needed,
and only output the .cdefs for the encodings that are actually used.
The only gap in the matrix is high ASCII DCI strings -- the ".shift"
pseudo-op rejects text if the string doesn't start with the high
bit clear.
The previous code output a character in single-quotes if it was
standard ASCII, double-quotes if high ASCII, or hex if it was neither
of those. If a flag was set, high ASCII would also be output as
hex.
The new system takes the character value and an encoding identifier.
The identifier selects the character converter and delimiter
pattern, and puts the two together to generate the operand.
While doing this I realized that I could trivially support high
ASCII character arguments in all assemblers by setting the delimiter
pattern to "'#' | $80".
In FormatDescriptor, I had previously renamed the "Ascii" sub-type
"LowAscii" so it wouldn't be confused, but I dislike filling the
project file with "LowAscii" when "Ascii" is more accurate and less
confusing. So I switched it back, and we now check the project
file version number when deciding what to do with an ASCII item.
The CharEncoding tests/converters were also renamed.
Moved the default delimiter patterns to the string table.
Widened the delimiter pattern input fields slightly. Added a read-
only TextBox with assorted non-typewriter quotes and things so
people have something to copy text from.
During a discussion with the cc65 developers, I became convinced that
generating "MVN $01,$02" is wrong, and "MVN #$01,#$02" is correct.
64tass, cc65, and Merlin 32 all accept this syntax; only ACME does
not. Operands without a leading '#' should be treated as 24-bit
values, and have the bank byte extracted.
This change updates the on-screen display and assembled output to
include the '#'. The ACME generator uses a Quirk to suppress the
hash mark. (It doesn't currently accept values larger than 8 bits,
so there's no ambiguity.)
If you double-click on the opcode of "JSR label", the code view
selection jumps to the label. This now works for partial operands,
e.g. "LDA #<label".
Some changes to the find-label-offset code affected the cc65 "is it
a forward reference to a direct-page label" logic. The regression
test now correctly identifies an instruction that refers to itself
as not being a forward reference.
The cc65 assembler runs in a single pass, which means forward
address references default to 16 bits. For zero-page references
we have to add an explicit width disambiguator. (This is an
unusual situation that only occurs if you have a zero-page .ORG
in the file after code that references it.)
With this change, 2014-label-dp passes, and no other regression
tests were affected.
(issue #40)
The 2014-label-dp test now passes. Prior regression tests are
unaffected.
Also, renamed an IGenerator interface to more accurately reflect
its role.
(issue #37)
To avoid confusing the assembler, expressions with a leading
parenthesis like "(foo & $ffff) + 1" are prefixed with a "0+". This
is not necessary if the operand begins with a '#'.
(issue #16)
Gave cc65 its own expression generator, as the precedence table seems
atypical if not unique. Configured 64tass to use the "simple"
expression mode.
Added some operations on a 32-bit constant to 2007-labels-and-symbols
to exercise the current worst-case expression (shift + AND + add).
Tweaked the Merlin expression generator to handle it.
(issue #16)
Most tests pass, but 2007-labels-and-symbols fails because the
expressions recognized by 64tass don't match up with either of the
other assemblers.
This is currently using a workaround for the local label syntax.
64tass uses '_' as the prefix, which is unfortunate since SourceGen
explicitly allowed underscores in labels. (So does 64tass for that
matter, but it treats labels specially when the '_' comes first.)
We will need to rename any non-local user labels that start with '_'.
(issue #16)
Changed the "quick config" buttons for the asm config and pseudo-op
tabs into a drop-list and "set" button. The default values for
each assembler are now defined in the Asm*.cs file, rather than in
the settings code.