Added support for non-addressable regions, which are useful for things
like file headers stripped out by the system loader, or chunks that
get loaded into non-addressable graphics RAM. Regions are specified
with the "NA" address value. The code list displays the address field
greyed out, starting from zero (which is kind of handy if you want to
know the relative offset within the region).
Putting labels in non-addressable regions doesn't make sense, but
symbol resolution is complicated enough that we really only have two
options: ignore the labels entirely, or allow them but warn of their
presence. The problem isn't so much the label, which you could
legitimately want to access from an extension script, but rather the
references to them from code or data. So we keep the label and add a
warning to the Messages list when we see a reference.
Moved NON_ADDR constants to Address class. AddressMap now has a copy.
This is awkward because Asm65 and CommonUtil don't share.
Updated the asm code generators to understand NON_ADDR, and reworked
the API so that Merlin and cc65 output is correct for nested regions.
Address region changes are now noted in the anattribs array, which
makes certain operations faster than checking the address map. It
also fixes a failure to recognize mid-instruction region changes in
the code analyzer.
Tweaked handling of synthetic regions, which are non-addressable areas
generated by the linear address map traversal to fill in any "holes".
The address region editor now treats attempts to edit them as
creation of a new region.
This is the first step toward changing the address region map from a
linear list to a hierarchy. See issue #107 for the plan.
The AddressMap class has been rewritten to support the new approach.
The rest of the project has been updated to conform to the new API,
but feature-wise is unchanged. While the map class supports
nested regions with explicit lengths, the rest of the application
still assumes a series of non-overlapping regions with "floating"
lengths.
The Set Address dialog is currently non-functional.
All of the output for cc65 changed because generation of segment
comments has been removed. Some of the output for ACME changed as
well, because we no longer follow "* = addr" with a redundant
pseudopc statement. ACME and 65tass have similar approaches to
placing things in memory, and so now have similar implementations.
The data operand editor determines low vs. high ASCII formatting by
examining the first byte of string data. Unfortunately the test was
broken, and for strings with a 1- or 2-byte length, was testing the
length byte instead of the character data. This is now fixed.
This also changes the way empty strings are handled. Before, they
were allowed but not counted, so you couldn't create an empty string
by itself, but could do it if it were part of a larger group. This
was unnecessarily restrictive. Empty L1/L2/null-term strings are now
allowed.
This means that a buffer full of $00 can be formatted as a big pile
of empty strings, which seems a bit ridiculous but there's no good
reason to obstruct it.
(issue #110)
The DCI string format uses character values where the high bit of the
last byte differs from the rest of the string. Usually all the high
bits are clear except on the last byte, but SourceGen generally allows
either polarity.
This gets a little uncertain with single-character strings, because
SourceGen can't auto-detect DCI very effectively. A series of bytes
with the high bit set could be a single high-ASCII string or a series
of single-byte DCI strings.
The motivation for allowing them is C64 PETSCII. While ASCII allows
"high ASCII" as an escape hatch, PETSCII doesn't have that option, so
there's no way to mark the data as a character or a string. We still
want to do a bit of screening, but if the user specifies a non-ASCII
character set and the selected bytes have their high bits set, we
want to just treat the whole set as 1-byte DCI.
Some minor adjustments were needed for a couple of validity checks
that expected longer strings.
This adds some short DCI strings in different character sets to the
char-encoding regression tests.
(for issue #102)
When formatting one or more strings with the Edit Data Operand dialog,
the code must determine which options to present. If the selected
bytes appear to represent one or more null-terminated strings, that
option is enabled in the UI.
The "format recognizers" enforce some strict rules, e.g. null-
terminated strings must end in $00, and also try to confirm that the
data looks like a printable string. The algorithm rejects strings
with "illegal" characters in them. This is simpler on some systems
than others. For example, C64 PETSCII defines quite a few control
characters in ways that make them useful for embedding in printable
strings.
The "recognizers" are only used by the operand edit feature, not as
part of an automated string detector, so there's no real upside in
overriding the user's desire to form a string with arbitrary bytes.
This removes the quick rejection from the four recognizers (null-term,
len8, len16, dci). It does not alter the high-level code, which
still insists on a certain percentage of the string being printable;
that may be worth revisiting as well.
(issue #100)
When we have relocation data available, the code currently skips the
process of matching an address with a label for a PEA instruction when
the instruction in question doesn't have reloc data. This does a
great job of separating code that pushes parts of addresses from code
that pushes constants.
This change expands the behavior to exclude instructions with 16-bit
address operands that use the Data Bank Register, e.g. "LDA abs"
and "LDA abs,X". This is particularly useful for code that accesses
structured data using the operand as the structure offset, e.g.
"LDX addr" / "LDA $0000,X"
The 20212-reloc-data test has been updated to check the behavior.
If code accesses the high/low parts of a 32-bit address value with
no label, it auto-generates labels for addr+2 and addr. The reloc
handler was replacing the unformatted bytes with a single multi-byte
format, hiding the label at addr+2.
The easy fix is to have the reloc data handler skip the entry. This
is less useful than other approaches, but much simpler.
Added a test to 20212-reloc-data.
The Visual Studio performance profiler showed the FormatDescriptor
equality test being called quite a lot. The test was vs. null, so
a simple change from "==" to "is" improved performance dramatically.
Fixing the underlying issue with a better data structure is still
important, but this provided a big boost with little effort.
The test wasn't correctly excluding instructions, so it was possible
to create a situation where a two-byte data item had an instruction
starting in the second byte.
We also weren't checking the length of the instruction to ensure that
it was wider than the reloc data. This could get weird for an
immediate constant when the M/X flags are wrong. When in doubt, don't
overwrite.
This test exercises the relocation data feature. The test file is
generated from a multi-segment OMF file that was hex-edited to have
specific attributes (see 20212-reloc-data-lnk.S for instructions).
The test also serves as a way to exercise the OMF converter.
Also, implement the Bank Relative flag.
Works well for things like jump tables. Seeing a bunch of these
scattered in a chunk of data is a decent signal that it's actually
code.
In a bold move, we now exclude PEA operands from auto-label gen when
they don't have relocation data. This is very useful for things
like Int2Hex for which constants are typically pushed with PEA.
Reworked the "use reloc data" setting so it defaults to false and is
explicitly set to true when converting OMF. This provides a minor
optimization since we now check the boolean and skip doing a lookup
in an empty table.
This was a relatively lightweight change to confirm the usefulness
of relocation data. The results were very positive.
The relatively superficial integration of the data into the data
analysis process causes some problems, e.g. the cross-reference table
entries show an offset because the code analyzer's computed operand
offset doesn't match the value of the label. The feature should be
considered experimental
The feature can be enabled or disabled with a project property. The
results were sufficiently useful and non-annoying to make the setting
enabled by default.
These were being overlooked because they didn't actually cause
anything to happen (a no-op .ORG sets the address to what it would
already have been). The assembly source generator works in a way
that causes them to be skipped, so everybody was happy.
This seemed like the sort of thing that was likely to cause problems
down the road, however, so we now split regions correctly when a
no-op .ORG is encountered. This affects the uncategorized data
analyzer and selection grouping.
This changed the behavior of the 2004-numeric-types test, which was
visibly weird in the UI but generated correct output.
Added the 2024-ui-edge-cases test to provide a place to exercise
edge cases when testing the UI by hand. It has some value for the
automated regression test, so it's included there.
Also, changed the AddressMapEntry objects to be immutable. This
is handy when passing lists of them around.
If you have a single line selected, Set Address adds a .ORG directive
that changes the addresses of all following data, until the next .ORG
directive is reached. Sometimes code will relocate part of itself,
and it's useful to be able to set the address at the end of the block
to what it would have been before the .ORG change.
If you have multiple lines selected, we now add the second .ORG to
the offset that follows the last selected line.
Also, fixed a bug in the Symbol value updater that wasn't handling
non-unique labels correctly.
Various changes:
- Generally treat visualization sets like long comments and notes
when it comes to defining data region boundaries. (We were doing
this for selections; now we're also doing it for format-as-word
and in the data analyzer when scanning for strings/fill.)
- Clear the visualization cache when the address map is altered.
This is necessary for visualizers that dereference addresses.
- Read the Apple II screen image from a series of addresses rather
than a series of offsets. This allows it to work when the image
is contiguous in memory but split into chunks in the file.
- Put 1 pixel of padding around the images in the main code list,
so they don't blend into the background.
- Remember the last visualizer used, so we can re-use it the next
time the user selects "new".
- Move min-size hack from Loaded to ContentRendered, as it apparently
spoils CenterOwner placement.
The code that found a nearby data target for an instruction operand
was searching backward but not forward. We now take one step
forward, so that "LDA TABLE-1,Y" fills in automatically.
This altered 2008-address-changes, which had just this situation.
It didn't alter 2010-target-adjustment, but the existing tests were
insufficient and have been improved.
If we have a bug, or somebody edits the project file manually, we
can end up with a very wrong string, such as a null-terminated
string that isn't, or a DCI string that has a mix of high and low
ASCII from start to finish. We now check all incoming strings for
validity, and discard any that fail the test. The verification
code is shared with the extension script inline data formatter.
Also, added a comment to an F8-ROM symbol I stumbled over.
Implement multi-byte project/platform symbols by filling out a table
of addresses. Each symbol is "painted" into the table, replacing
an existing entry if the new entry has higher priority. This allows
us to handle overlapping entries, giving boosted priority to platform
symbols that are defined in .sym65 files loaded later.
The bounds on project/platform symbols are now rigidly defined. If
the "nearby" feature is enabled, references to SYM-1 will be picked
up, but we won't go hunting for SYM+1 unless the symbol is at least
two bytes wide.
The cost of adding a symbol to the symbol table is about the same,
but we don't have a quick way to remove a symbol.
Previously, if two platform symbols had the same value, the symbol
with the alphabetically lowest label would win. Now, the symbol
defined in the most-recently-loaded file wins. (If you define two
symbols with the same value in the same file, it's still resolved
alphabetically.) This allows the user to pick the winner by
arranging the load order of the platform symbol files.
Platform symbols now keep a reference to the file ident of the
symbol file that defined them, so we can show the symbols's source
in the Info panel.
These changes altered the behavior of test 2008-address-changes,
which includes some tests on external addresses that are close to
labeled internal addresses. The previous behavior essentially
treated user labels as being 3 bytes wide and extending outside the
file bounds, which was mildly convenient on occasion but felt a
little skanky. (We could do with a way to define external symbols
relative to internal symbols, for things like the source address of
code that gets relocated.)
Also, re-enabled some unit tests.
Also, added a bit of identifying stuff to CrashLog.txt.
Rearrange the UI elements, and convert the code-behind to a more
XAML-style form. The basic stuff works, but the old "shortcut"
system is still in the process of being replaced.
The code that checked to see if a data target was inside a data
operand wasn't going all the way back to the start of the file.
It was also failing to stop when it should, wasting time.
The anattrib validation method has code that avoids a false-positive
on certain complex embedded instruction arrangements. This was also
preventing it from seeing a transition from a data area to the
middle of an instruction (caused by issue #45).
Fixed a minor bug in GenerateLineList that would cause a blank line
to disappear under certain circumstances. Harmless, but odd.
Added a width property to DefSymbol.
Updated comments.
I didn't think it made sense, but I found something that used it,
so apparently it's a thing. This updates the operand editor to
let you choose PETSCII+DCI, and updates the assemblers to handle
it correctly (really just 64tass, since the others either don't
have a DCI directive or don't deal with PETSCII at all).
Changed the char-encoding sample from "bad dcI" to "pet dcI", and
updated the documentation.
Both dialogs got a couple extra radio buttons for selection of
single character operands. The data operand editor got a combo box
that lets you specify how it scans for viable strings.
Various string scanning methods were made more generic. This got a
little strange with auto-detection of low/high ASCII, but that was
mostly a matter of keeping the previous code around as a special
case.
Made C64 Screen Code DCI strings a thing that works.
The previous functions just grabbed 62 characters and slapped quotes
on the ends, but that doesn't work if we want to show strings with
embedded control characters. This change replaces the simple
formatter with the one used to generate assembly source code. This
increases the cost of refreshing the display list, so a cache will
need to be added in a future change.
Converters for C64 PETSCII and C64 Screen Code have been defined.
The results of changing the auto-scan encoding can now be viewed.
The string operand formatter was using a single delimiter, but for
the on-screen version we want open-quote and close-quote, and might
want to identify some encodings with a prefix. The formatter now
takes a class that defines the various parts. (It might be worth
replacing the delimiter patterns recently added for single-character
operands with this, so we don't have two mechanisms for very nearly
the same thing.)
While working on this change I remembered why there were two kinds
of "reverse" in the old Merlin 32 string operand generator: what you
want for assembly code is different from what you want on screen.
The ReverseMode enum has been resurrected.
The code that searches for character strings in uncategorized data
now recognizes the C64 encodings when selected in the project
properties.
The new code avoids some redundant comparisons when runs of
printable characters are found. I suspect the new implementation
loses on overall performance because we're now calling through
delegates instead of testing characters directly, but I haven't
tested for that.
The previous code output a character in single-quotes if it was
standard ASCII, double-quotes if high ASCII, or hex if it was neither
of those. If a flag was set, high ASCII would also be output as
hex.
The new system takes the character value and an encoding identifier.
The identifier selects the character converter and delimiter
pattern, and puts the two together to generate the operand.
While doing this I realized that I could trivially support high
ASCII character arguments in all assemblers by setting the delimiter
pattern to "'#' | $80".
In FormatDescriptor, I had previously renamed the "Ascii" sub-type
"LowAscii" so it wouldn't be confused, but I dislike filling the
project file with "LowAscii" when "Ascii" is more accurate and less
confusing. So I switched it back, and we now check the project
file version number when deciding what to do with an ASCII item.
The CharEncoding tests/converters were also renamed.
Moved the default delimiter patterns to the string table.
Widened the delimiter pattern input fields slightly. Added a read-
only TextBox with assorted non-typewriter quotes and things so
people have something to copy text from.
We've been treating ASCII strings and instruction/data operands as
ambiguous, resolving low vs. high when generating output for the
display or assembler. This change splits it into two separate
formats, simplifying output generation.
The UI will continue to treat low/high ASCII as as single thing,
selecting the format appropriately based on the data. There's no
reason to have two radio buttons that are never both enabled.
The data operand string functions need some additional work, but
that overlaps substantially with the upcoming PETSCII changes, so
for now all strings set by the data operand editor are low ASCII.
The file format has changed again, but since there hasn't been a
release since the previous change, I'm leaving the file format
at v2. Code has been added to resolve the ASCII mode when loading
a v1 project file.
This removes some complexity from the assembly code generators.
We used to use type="String", with the sub-type indicating whether
the string was null-terminated, prefixed with a length, or whatever.
This didn't leave much room for specifying a character encoding,
which is orthogonal to the sub-type.
What we actually want is to have the type specify the string type,
and then have the sub-type determine the character encoding. These
sub-types can also be used with the Numeric type to specify the
encoding of character operands.
This change updates the enum definitions and the various bits of
code that use them, but does not add any code for working with
non-ASCII character encodings.
The project file version number was incremented to 2, since the new
FormatDescriptor serialization is mildly incompatible with the old.
(Won't explode, but it'll post a complaint and ignore the stuff
it doesn't recognize.)
While I was at it, I finished removing DciReverse. It's still part
of the 2005-string-types regression test, which currently fails
because the generated source doesn't match.
SourceGen creates "auto" labels when it finds a reference to an
address that doesn't have a label associated with it. The label for
address $1234 would be "L1234". This change allows the project to
specify alternative label naming conventions, annotating them with
information from the cross-reference data. For example, a subroutine
entry point (i.e. the target of a JSR) would be "S_1234". (The
underscore was added to avoid confusion when an annotation letter
is the same as a hex digit.)
Also, tweaked the way the preferred clipboard line format is stored
in the settings file (was an integer, now an enumeration string).
In the cross-reference table we now indicate whether the reference
source is doing a read, write, read-modify-write, branch, subroutine
call, is just referencing the address, or is part of the data.
If you double-click on the opcode of "JSR label", the code view
selection jumps to the label. This now works for partial operands,
e.g. "LDA #<label".
Some changes to the find-label-offset code affected the cc65 "is it
a forward reference to a direct-page label" logic. The regression
test now correctly identifies an instruction that refers to itself
as not being a forward reference.
When you edit the operand of an instruction that targets an in-file
address, you're given the opportunity to specify a shortcut that
applies the symbol to the instruction's target address in addition
to or instead of defining a weak symbol reference on the instruction
being edited.
This didn't work right for operands with adjustments, e.g. the store
instructions in self-modifying code. It put the label at the
unadjusted offset, which does nothing useful.
We now correctly back up to the start of the instruction or multi-
byte data area.
Allows specification of table data in various ways, for 16-bit and
24-bit addresses. Shows a preview so you can see if the addresses
look about right. Adds permanent labels at target offsets if none
are present. Optionally sets code hints.
Works beautifully on the A2-Amper-fdraw example, but needs some
additional testing, documentation, etc. Dialog is more complicated
that I would have liked, mostly because of 65816 support, but I
think it'll do.
(issue #10)