Long operands, such as strings and bulk data, can span multiple lines.
SourceGen wraps them at 64 characters, which is fine for assembly
output but occasionally annoying on screen: if the operand column is
wide enough to show the entire value, the comment column is pushed
pretty far to the right.
This change makes the width configurable, as 32/48/64 characters,
with a pop-up in app settings.
The assemblers are all wired to 64 characters, though we could make
this configurable as well with an assembler-specific setting.
Some things have moved around a bit in app settings. The Asm Config
tab now comes last. Having it sandwiched in the middle of tabs that
altered the on-screen display didn't make much sense. The Display
Format is now explicitly for opcodes and operands, and is split into
two columns. The left column is managed by the "quick set" feature,
the right column is independent.
The decision of how to handle indeterminate M/X flag values is made in
StatusFlags. This provides consistent behavior throughout the app.
This was being done for M/X but not for E.
This change also renames the M/X tests, prefixing them with "Is" to
emphasize that they are boolean rather than tri-state.
There should be no change in behavior from this.
Code generated for 64tass was incorrect for JSR/JMP to a location
outside the file bounds. A test added to 20052-branches-and-banks
revealed an issue with cc65 generation as well.
Two basic problems:
(1) cc65, being a one-pass assembler, can't tell if a forward-referenced
label is 16-bit or 24-bit. If the operand is potentially ambiguous,
such as "LDA label", we need to add an operand width disambiguator.
(The existing tests managed to only do backward references.)
(2) 64tass wants the labels on JMP/JSR absolute operands to have 24-bit
values that match the current program bank. This is the opposite of
cc65, which requires 16-bit values. We need to distinguish PBR vs.
DBR instructions (i.e. "LDA abs" vs. "JMP abs") and handle them
differently when formatting for "Common".
Merlin32 doesn't care, and ACME doesn't work at all, so neither of
those needed updating.
The 20052-branches-and-banks test was expanded to cover the problematic
cases.
We're doing this for user labels but not for project/platform
symbols. So if you have a constant named "BCC" you can't assemble
your code with certain assemblers. Now we rename it automatically.
Added a quick test to 2007-labels-and-symbols. (No change to ACME,
which barfs on the test.)
The "is the .junk alignment directive correct" was returning true
for subtype=None (not aligned), which caused execution to go down
the wrong path and irritate an assert.
Correct handling of local variables. We now correctly uniquify them
with regard to non-unique labels. Because local vars can effectively
have global scope we mostly want to treat them as global, but they're
uniquified relative to other globals very late in the process, so we
can't just throw them in the symbol table and be done. Fortunately
local variables exist in a separate namespace, so we just need to
uniquify the variables relative to the post-localization symbol table.
In other words, we take the symbol table, apply the label map, and
rename any variable that clashes.
This also fixes an older problem where we weren't masking the
leading '_' on variable labels when generating 64tass output.
The code list now makes non-unique labels obvious, but you can't tell
the difference between unique global and unique local. What's more,
the default type value in Edit Label is now adjusted to Global for
unique locals that were auto-generated. To make it a bit easier to
figure out what's what, the Info panel now has a "label type" line
that reports the type.
The 2023-non-unique-labels test had some additional tests added to
exercise conflicts with local variables. The 2019-local-variables
test output changed slightly because the de-duplicated variable
naming convention was simplified.
- Renamed "strip label prefix/suffix" to "omit label prefix/suffix".
- Changed a Merlin operand workaround so it doesn't apply to code
that is explicitly not in bank zero.
- Changed {addr}/{const} annotations on project/platform symbol
equates so they line up a little better on screen and in exported
sources.
Continue development of non-unique labels. The actual labels are
still unique, because we append a uniquifier tag, which gets added
and removed behind the scenes. We're currently using the six-digit
hex file offset because this is only used for internal address
symbols.
The label editor and most of the formatters have been updated. We
can't yet assemble code that includes non-unique labels, but older
stuff hasn't been broken.
This removes the "disable label localization" property, since that's
fundamentally incompatible with what we're doing, and adds a non-
unique label prefix setting so you can put '@' or ':' in front of
your should-be-local labels.
Also, fixed a field name typo.
This adds the concept of label annotations. The primary driver of
the feature is the desire to note that sometimes you know what a
thing is, but sometimes you're just taking an educated guess.
Instead of writing "high_score_maybe", you can now write "high_score?",
which is more compact and consistent. The annotations are stripped
off when generating source code, making them similar to Notes.
I also created a "Generated" annotation for the labels that are
synthesized by the address table formatter, but don't modify the
label for them, because there's not much need to remind the user
that "T1234" was generated by algorithm.
This also lays some of the groundwork for non-unique labels.
While adding a message log entry for failing alignment directives,
I noticed that the assembler source generator's test for valid
alignment was allowing some bad alignment values through.
I'm holding off on reporting the message to the log because not all
format changes cause a data-reanalysis, which means the log entry
doesn't always appear and disappear when it should. If we decide
this is an important message we can add a scan for "softer" errors.
In the assembler output, add a blank line between the constants
and addresses in the long list of equates.
The earlier change that corrected the BIT instruction caused test
2009-branches-and-banks to fail, because it was relying on the idea
that BIT made the carry flag indeterminate. Changing a BCC to a
BVS restored the desired behavior.
Sometimes there's a bunch of junk in the binary that isn't used for
anything. Often it's there to make things line up at the start of
a page boundary.
This adds a ".junk" directive that tells the disassembler that it
can safely disregard the contents of a region. If the region ends
on a power-of-two boundary, an alignment value can be specified.
The assembly source generators will output an alignment directive
when possible, a .fill directive when appropriate, and a .dense
directive when all else fails. Because we're required to regenerate
the original data file, it's not always possible to avoid generating
a hex dump.
Most assemblers end local label scope when a global label is
encountered. cc65 takes this one step further by ending local label
scope when constants or variables are defined. So, if we have a
variable table with a nonzero number of entries, we want to create
a fake global label at that point to end the scope.
Merlin 32 won't let you write " LDA #',' ". For some reason the
comma causes an error. IGenerator now has a "tweak operand format"
interface that lets us fix that.
If a line has a comment with a cycle count and nothing else, it was
getting an extra space or two on the end.
Also, added a few end-of-line comments to the 2020 test to show how
they interact with the cycle counts.
Unlike 64tass and Merlin, which allow you to redefine symbols, ACME
uses "zones" that provide scope for local variables. This means
that, at the point of a local variable table definition, we have to
start a new zone and output the full set of active symbols, not just
the newly-defined ones. (If you set the "clear previous" flag in
the LvTable there's no difference.)
We could do a bit better by only outputting the symbols that are
actually used within the zone, similar to what we do for global
project/platform symbols, but that's a bunch of work for questionable
benefit.
After thrashing around a bit, I had to choose between making the
uniquifier more complicated, or making de-duplication a separate
step. Since I don't really expect duplicates to be a thing, I went
with the latter.
Updated the regression test.
Variables are now handled properly end-to-end, except for label
uniquification. So cc65 and ACME can't yet handle a file that
redefines a local variable.
This required a bunch of plumbing, but I think it came out okay.
Previously, we used the default character encoding from the project
properties to determine how strings and character constants in the
entire source file should be encoded. Now we switch between
encodings as needed. The default character encoding is no longer
relevant.
High ASCII is now an actual encoding, rather than acting like ASCII
that sometimes doesn't work. Because we can do high ASCII character
operands with "| $80", we don't output a .enc to switch from ASCII
to high ASCII unless we need to generate a string. (If we're
already in high ASCII mode, the "| $80" isn't required but won't
hurt anything.)
We now do a scan up front to see if ASCII or high ASCII is needed,
and only output the .cdefs for the encodings that are actually used.
The only gap in the matrix is high ASCII DCI strings -- the ".shift"
pseudo-op rejects text if the string doesn't start with the high
bit clear.
The previous code output a character in single-quotes if it was
standard ASCII, double-quotes if high ASCII, or hex if it was neither
of those. If a flag was set, high ASCII would also be output as
hex.
The new system takes the character value and an encoding identifier.
The identifier selects the character converter and delimiter
pattern, and puts the two together to generate the operand.
While doing this I realized that I could trivially support high
ASCII character arguments in all assemblers by setting the delimiter
pattern to "'#' | $80".
In FormatDescriptor, I had previously renamed the "Ascii" sub-type
"LowAscii" so it wouldn't be confused, but I dislike filling the
project file with "LowAscii" when "Ascii" is more accurate and less
confusing. So I switched it back, and we now check the project
file version number when deciding what to do with an ASCII item.
The CharEncoding tests/converters were also renamed.
Moved the default delimiter patterns to the string table.
Widened the delimiter pattern input fields slightly. Added a read-
only TextBox with assorted non-typewriter quotes and things so
people have something to copy text from.
During a discussion with the cc65 developers, I became convinced that
generating "MVN $01,$02" is wrong, and "MVN #$01,#$02" is correct.
64tass, cc65, and Merlin 32 all accept this syntax; only ACME does
not. Operands without a leading '#' should be treated as 24-bit
values, and have the bank byte extracted.
This change updates the on-screen display and assembled output to
include the '#'. The ACME generator uses a Quirk to suppress the
hash mark. (It doesn't currently accept values larger than 8 bits,
so there's no ambiguity.)
If you double-click on the opcode of "JSR label", the code view
selection jumps to the label. This now works for partial operands,
e.g. "LDA #<label".
Some changes to the find-label-offset code affected the cc65 "is it
a forward reference to a direct-page label" logic. The regression
test now correctly identifies an instruction that refers to itself
as not being a forward reference.
The cc65 assembler runs in a single pass, which means forward
address references default to 16 bits. For zero-page references
we have to add an explicit width disambiguator. (This is an
unusual situation that only occurs if you have a zero-page .ORG
in the file after code that references it.)
With this change, 2014-label-dp passes, and no other regression
tests were affected.
(issue #40)
The 2014-label-dp test now passes. Prior regression tests are
unaffected.
Also, renamed an IGenerator interface to more accurately reflect
its role.
(issue #37)
To avoid confusing the assembler, expressions with a leading
parenthesis like "(foo & $ffff) + 1" are prefixed with a "0+". This
is not necessary if the operand begins with a '#'.
(issue #16)
Gave cc65 its own expression generator, as the precedence table seems
atypical if not unique. Configured 64tass to use the "simple"
expression mode.
Added some operations on a 32-bit constant to 2007-labels-and-symbols
to exercise the current worst-case expression (shift + AND + add).
Tweaked the Merlin expression generator to handle it.
(issue #16)
Most tests pass, but 2007-labels-and-symbols fails because the
expressions recognized by 64tass don't match up with either of the
other assemblers.
This is currently using a workaround for the local label syntax.
64tass uses '_' as the prefix, which is unfortunate since SourceGen
explicitly allowed underscores in labels. (So does 64tass for that
matter, but it treats labels specially when the '_' comes first.)
We will need to rename any non-local user labels that start with '_'.
(issue #16)
Changed the "quick config" buttons for the asm config and pseudo-op
tabs into a drop-list and "set" button. The default values for
each assembler are now defined in the Asm*.cs file, rather than in
the settings code.