1
0
mirror of https://github.com/fadden/6502bench.git synced 2024-11-04 15:05:03 +00:00
Commit Graph

40 Commits

Author SHA1 Message Date
Andy McFadden
e6c5c7f8df ORG rework, part 6
Added support for non-addressable regions, which are useful for things
like file headers stripped out by the system loader, or chunks that
get loaded into non-addressable graphics RAM.  Regions are specified
with the "NA" address value.  The code list displays the address field
greyed out, starting from zero (which is kind of handy if you want to
know the relative offset within the region).

Putting labels in non-addressable regions doesn't make sense, but
symbol resolution is complicated enough that we really only have two
options: ignore the labels entirely, or allow them but warn of their
presence.  The problem isn't so much the label, which you could
legitimately want to access from an extension script, but rather the
references to them from code or data.  So we keep the label and add a
warning to the Messages list when we see a reference.

Moved NON_ADDR constants to Address class.  AddressMap now has a copy.
This is awkward because Asm65 and CommonUtil don't share.

Updated the asm code generators to understand NON_ADDR, and reworked
the API so that Merlin and cc65 output is correct for nested regions.

Address region changes are now noted in the anattribs array, which
makes certain operations faster than checking the address map.  It
also fixes a failure to recognize mid-instruction region changes in
the code analyzer.

Tweaked handling of synthetic regions, which are non-addressable areas
generated by the linear address map traversal to fill in any "holes".
The address region editor now treats attempts to edit them as
creation of a new region.
2021-09-30 21:11:26 -07:00
Andy McFadden
39b7b20144 ORG rework, part 1
This is the first step toward changing the address region map from a
linear list to a hierarchy.  See issue #107 for the plan.

The AddressMap class has been rewritten to support the new approach.
The rest of the project has been updated to conform to the new API,
but feature-wise is unchanged.  While the map class supports
nested regions with explicit lengths, the rest of the application
still assumes a series of non-overlapping regions with "floating"
lengths.

The Set Address dialog is currently non-functional.

All of the output for cc65 changed because generation of segment
comments has been removed.  Some of the output for ACME changed as
well, because we no longer follow "* = addr" with a redundant
pseudopc statement.  ACME and 65tass have similar approaches to
placing things in memory, and so now have similar implementations.
2021-09-16 17:02:19 -07:00
Andy McFadden
0dfa2326dd Fix L1/L2 ASCII string editing
The data operand editor determines low vs. high ASCII formatting by
examining the first byte of string data.  Unfortunately the test was
broken, and for strings with a 1- or 2-byte length, was testing the
length byte instead of the character data.  This is now fixed.

This also changes the way empty strings are handled.  Before, they
were allowed but not counted, so you couldn't create an empty string
by itself, but could do it if it were part of a larger group.  This
was unnecessarily restrictive.  Empty L1/L2/null-term strings are now
allowed.

This means that a buffer full of $00 can be formatted as a big pile
of empty strings, which seems a bit ridiculous but there's no good
reason to obstruct it.

(issue #110)
2021-09-12 09:46:55 -07:00
Andy McFadden
ec2ad529c8 Remove a couple of faulty assertions
One asserted unnecessarily, one should have been an if/then.  Both
were concerned with instruction operands being formatted with
type "address".
2021-08-11 16:25:24 -07:00
Andy McFadden
3368182e14 Allow single-character DCI strings
The DCI string format uses character values where the high bit of the
last byte differs from the rest of the string.  Usually all the high
bits are clear except on the last byte, but SourceGen generally allows
either polarity.

This gets a little uncertain with single-character strings, because
SourceGen can't auto-detect DCI very effectively.  A series of bytes
with the high bit set could be a single high-ASCII string or a series
of single-byte DCI strings.

The motivation for allowing them is C64 PETSCII.  While ASCII allows
"high ASCII" as an escape hatch, PETSCII doesn't have that option, so
there's no way to mark the data as a character or a string.  We still
want to do a bit of screening, but if the user specifies a non-ASCII
character set and the selected bytes have their high bits set, we
want to just treat the whole set as 1-byte DCI.

Some minor adjustments were needed for a couple of validity checks
that expected longer strings.

This adds some short DCI strings in different character sets to the
char-encoding regression tests.

(for issue #102)
2021-08-08 15:38:39 -07:00
Andy McFadden
d65ab59461 Don't reject strings with "invalid" characters
When formatting one or more strings with the Edit Data Operand dialog,
the code must determine which options to present.  If the selected
bytes appear to represent one or more null-terminated strings, that
option is enabled in the UI.

The "format recognizers" enforce some strict rules, e.g. null-
terminated strings must end in $00, and also try to confirm that the
data looks like a printable string.  The algorithm rejects strings
with "illegal" characters in them.  This is simpler on some systems
than others.  For example, C64 PETSCII defines quite a few control
characters in ways that make them useful for embedding in printable
strings.

The "recognizers" are only used by the operand edit feature, not as
part of an automated string detector, so there's no real upside in
overriding the user's desire to form a string with arbitrary bytes.

This removes the quick rejection from the four recognizers (null-term,
len8, len16, dci).  It does not alter the high-level code, which
still insists on a certain percentage of the string being printable;
that may be worth revisiting as well.

(issue #100)
2021-08-01 17:50:32 -07:00
Andy McFadden
cc6ebaffc5 Update relocation data handling
When we have relocation data available, the code currently skips the
process of matching an address with a label for a PEA instruction when
the instruction in question doesn't have reloc data.  This does a
great job of separating code that pushes parts of addresses from code
that pushes constants.

This change expands the behavior to exclude instructions with 16-bit
address operands that use the Data Bank Register, e.g. "LDA abs"
and "LDA abs,X".  This is particularly useful for code that accesses
structured data using the operand as the structure offset, e.g.
"LDX addr" / "LDA $0000,X"

The 20212-reloc-data test has been updated to check the behavior.
2020-07-10 17:41:38 -07:00
Andy McFadden
6ce2cc0b58 Fix label-trampling bug in reloc data handler
If code accesses the high/low parts of a 32-bit address value with
no label, it auto-generates labels for addr+2 and addr.  The reloc
handler was replacing the unformatted bytes with a single multi-byte
format, hiding the label at addr+2.

The easy fix is to have the reloc data handler skip the entry.  This
is less useful than other approaches, but much simpler.

Added a test to 20212-reloc-data.
2020-07-10 13:56:07 -07:00
Andy McFadden
44522dc2f2 Performance tweak
The Visual Studio performance profiler showed the FormatDescriptor
equality test being called quite a lot.  The test was vs. null, so
a simple change from "==" to "is" improved performance dramatically.

Fixing the underlying issue with a better data structure is still
important, but this provided a big boost with little effort.
2020-07-07 12:09:00 -07:00
Andy McFadden
f4fe3af050 Fix application of reloc info in data areas
The test wasn't correctly excluding instructions, so it was possible
to create a situation where a two-byte data item had an instruction
starting in the second byte.

We also weren't checking the length of the instruction to ensure that
it was wider than the reloc data.  This could get weird for an
immediate constant when the M/X flags are wrong.  When in doubt, don't
overwrite.
2020-07-07 11:48:51 -07:00
Andy McFadden
4e70edc90c Add 20212-reloc-data test
This test exercises the relocation data feature.  The test file is
generated from a multi-segment OMF file that was hex-edited to have
specific attributes (see 20212-reloc-data-lnk.S for instructions).
The test also serves as a way to exercise the OMF converter.

Also, implement the Bank Relative flag.
2020-07-05 17:17:44 -07:00
Andy McFadden
0fa77cba75 Apply relocation data to unformatted data
Works well for things like jump tables.  Seeing a bunch of these
scattered in a chunk of data is a decent signal that it's actually
code.

In a bold move, we now exclude PEA operands from auto-label gen when
they don't have relocation data.  This is very useful for things
like Int2Hex for which constants are typically pushed with PEA.

Reworked the "use reloc data" setting so it defaults to false and is
explicitly set to true when converting OMF.  This provides a minor
optimization since we now check the boolean and skip doing a lookup
in an empty table.
2020-07-03 22:03:50 -07:00
Andy McFadden
d58b747571 Use relocation data to format instruction operands
This was a relatively lightweight change to confirm the usefulness
of relocation data.  The results were very positive.

The relatively superficial integration of the data into the data
analysis process causes some problems, e.g. the cross-reference table
entries show an offset because the code analyzer's computed operand
offset doesn't match the value of the label.  The feature should be
considered experimental

The feature can be enabled or disabled with a project property.  The
results were sufficiently useful and non-annoying to make the setting
enabled by default.
2020-07-03 17:58:41 -07:00
Andy McFadden
0bbb307d4e Correct handling of no-op .ORG statements
These were being overlooked because they didn't actually cause
anything to happen (a no-op .ORG sets the address to what it would
already have been).  The assembly source generator works in a way
that causes them to be skipped, so everybody was happy.

This seemed like the sort of thing that was likely to cause problems
down the road, however, so we now split regions correctly when a
no-op .ORG is encountered.  This affects the uncategorized data
analyzer and selection grouping.

This changed the behavior of the 2004-numeric-types test, which was
visibly weird in the UI but generated correct output.

Added the 2024-ui-edge-cases test to provide a place to exercise
edge cases when testing the UI by hand.  It has some value for the
automated regression test, so it's included there.

Also, changed the AddressMapEntry objects to be immutable.  This
is handy when passing lists of them around.
2020-02-28 14:49:18 -08:00
Andy McFadden
091955b9c2 Allow setting the start/end address for a block
If you have a single line selected, Set Address adds a .ORG directive
that changes the addresses of all following data, until the next .ORG
directive is reached.  Sometimes code will relocate part of itself,
and it's useful to be able to set the address at the end of the block
to what it would have been before the .ORG change.

If you have multiple lines selected, we now add the second .ORG to
the offset that follows the last selected line.

Also, fixed a bug in the Symbol value updater that wasn't handling
non-unique labels correctly.
2019-12-25 18:17:50 -08:00
Andy McFadden
1cdb31de32 Visualizer improvements
Various changes:
- Generally treat visualization sets like long comments and notes
  when it comes to defining data region boundaries.  (We were doing
  this for selections; now we're also doing it for format-as-word
  and in the data analyzer when scanning for strings/fill.)
- Clear the visualization cache when the address map is altered.
  This is necessary for visualizers that dereference addresses.
- Read the Apple II screen image from a series of addresses rather
  than a series of offsets.  This allows it to work when the image
  is contiguous in memory but split into chunks in the file.
- Put 1 pixel of padding around the images in the main code list,
  so they don't blend into the background.
- Remember the last visualizer used, so we can re-use it the next
  time the user selects "new".
- Move min-size hack from Loaded to ContentRendered, as it apparently
  spoils CenterOwner placement.
2019-12-06 15:05:49 -08:00
Andy McFadden
51081c5db0 Tweak "nearby" label finder
The code that found a nearby data target for an instruction operand
was searching backward but not forward.  We now take one step
forward, so that "LDA TABLE-1,Y" fills in automatically.

This altered 2008-address-changes, which had just this situation.
It didn't alter 2010-target-adjustment, but the existing tests were
insufficient and have been improved.
2019-10-29 18:12:22 -07:00
Andy McFadden
8c87ce3004 Check formatted string structure at load time
If we have a bug, or somebody edits the project file manually, we
can end up with a very wrong string, such as a null-terminated
string that isn't, or a DCI string that has a mix of high and low
ASCII from start to finish.  We now check all incoming strings for
validity, and discard any that fail the test.  The verification
code is shared with the extension script inline data formatter.

Also, added a comment to an F8-ROM symbol I stumbled over.
2019-10-06 17:07:07 -07:00
Andy McFadden
0d9814d993 Allow explicit widths in project/platform symbols, part 3
Implement multi-byte project/platform symbols by filling out a table
of addresses.  Each symbol is "painted" into the table, replacing
an existing entry if the new entry has higher priority.  This allows
us to handle overlapping entries, giving boosted priority to platform
symbols that are defined in .sym65 files loaded later.

The bounds on project/platform symbols are now rigidly defined.  If
the "nearby" feature is enabled, references to SYM-1 will be picked
up, but we won't go hunting for SYM+1 unless the symbol is at least
two bytes wide.

The cost of adding a symbol to the symbol table is about the same,
but we don't have a quick way to remove a symbol.

Previously, if two platform symbols had the same value, the symbol
with the alphabetically lowest label would win.  Now, the symbol
defined in the most-recently-loaded file wins.  (If you define two
symbols with the same value in the same file, it's still resolved
alphabetically.)  This allows the user to pick the winner by
arranging the load order of the platform symbol files.

Platform symbols now keep a reference to the file ident of the
symbol file that defined them, so we can show the symbols's source
in the Info panel.

These changes altered the behavior of test 2008-address-changes,
which includes some tests on external addresses that are close to
labeled internal addresses.  The previous behavior essentially
treated user labels as being 3 bytes wide and extending outside the
file bounds, which was mildly convenient on occasion but felt a
little skanky.  (We could do with a way to define external symbols
relative to internal symbols, for things like the source address of
code that gets relocated.)

Also, re-enabled some unit tests.

Also, added a bit of identifying stuff to CrashLog.txt.
2019-10-02 16:50:15 -07:00
Andy McFadden
4d9d5e2ecf Instruction operand editor rework, part 3
Implemented editing of labels and project symbols.

Also, cleaned up the local variable edit code.
2019-09-08 16:41:54 -07:00
Andy McFadden
2633720c82 Instruction operand editor rework, part 1
Rearrange the UI elements, and convert the code-behind to a more
XAML-style form.  The basic stuff works, but the old "shortcut"
system is still in the process of being replaced.
2019-09-07 13:39:22 -07:00
Andy McFadden
ee6e5d7fb6 Fix a couple of obscure bugs
The code that checked to see if a data target was inside a data
operand wasn't going all the way back to the start of the file.
It was also failing to stop when it should, wasting time.

The anattrib validation method has code that avoids a false-positive
on certain complex embedded instruction arrangements.  This was also
preventing it from seeing a transition from a data area to the
middle of an instruction (caused by issue #45).
2019-09-04 17:48:55 -07:00
Andy McFadden
5dcdbe3f3a Various tweaks
Fixed a minor bug in GenerateLineList that would cause a blank line
to disappear under certain circumstances.  Harmless, but odd.

Added a width property to DefSymbol.

Updated comments.
2019-08-24 17:35:26 -07:00
Andy McFadden
38d3adbb08 PETSCII does DCI
I didn't think it made sense, but I found something that used it,
so apparently it's a thing.  This updates the operand editor to
let you choose PETSCII+DCI, and updates the assemblers to handle
it correctly (really just 64tass, since the others either don't
have a DCI directive or don't deal with PETSCII at all).

Changed the char-encoding sample from "bad dcI" to "pet dcI", and
updated the documentation.
2019-08-20 17:55:12 -07:00
Andy McFadden
7bbe5692bd Add C64 encodings to instruction and data operand editors
Both dialogs got a couple extra radio buttons for selection of
single character operands.  The data operand editor got a combo box
that lets you specify how it scans for viable strings.

Various string scanning methods were made more generic.  This got a
little strange with auto-detection of low/high ASCII, but that was
mostly a matter of keeping the previous code around as a special
case.

Made C64 Screen Code DCI strings a thing that works.
2019-08-15 17:53:12 -07:00
Andy McFadden
5889f45737 Replace on-screen string operand formatting
The previous functions just grabbed 62 characters and slapped quotes
on the ends, but that doesn't work if we want to show strings with
embedded control characters.  This change replaces the simple
formatter with the one used to generate assembly source code.  This
increases the cost of refreshing the display list, so a cache will
need to be added in a future change.

Converters for C64 PETSCII and C64 Screen Code have been defined.
The results of changing the auto-scan encoding can now be viewed.

The string operand formatter was using a single delimiter, but for
the on-screen version we want open-quote and close-quote, and might
want to identify some encodings with a prefix.  The formatter now
takes a class that defines the various parts.  (It might be worth
replacing the delimiter patterns recently added for single-character
operands with this, so we don't have two mechanisms for very nearly
the same thing.)

While working on this change I remembered why there were two kinds
of "reverse" in the old Merlin 32 string operand generator: what you
want for assembly code is different from what you want on screen.
The ReverseMode enum has been resurrected.
2019-08-13 17:52:58 -07:00
Andy McFadden
f3c28406a5 Add multiple encoding support to uncategorized data analyzer
The code that searches for character strings in uncategorized data
now recognizes the C64 encodings when selected in the project
properties.

The new code avoids some redundant comparisons when runs of
printable characters are found.  I suspect the new implementation
loses on overall performance because we're now calling through
delegates instead of testing characters directly, but I haven't
tested for that.
2019-08-13 14:08:27 -07:00
Andy McFadden
f33cd7d8a6 Replace character operand output method
The previous code output a character in single-quotes if it was
standard ASCII, double-quotes if high ASCII, or hex if it was neither
of those.  If a flag was set, high ASCII would also be output as
hex.

The new system takes the character value and an encoding identifier.
The identifier selects the character converter and delimiter
pattern, and puts the two together to generate the operand.

While doing this I realized that I could trivially support high
ASCII character arguments in all assemblers by setting the delimiter
pattern to "'#' | $80".

In FormatDescriptor, I had previously renamed the "Ascii" sub-type
"LowAscii" so it wouldn't be confused, but I dislike filling the
project file with "LowAscii" when "Ascii" is more accurate and less
confusing.  So I switched it back, and we now check the project
file version number when deciding what to do with an ASCII item.
The CharEncoding tests/converters were also renamed.

Moved the default delimiter patterns to the string table.

Widened the delimiter pattern input fields slightly.  Added a read-
only TextBox with assorted non-typewriter quotes and things so
people have something to copy text from.
2019-08-11 22:11:00 -07:00
Andy McFadden
975b62db6b Treat low and high ASCII as two distinct formats
We've been treating ASCII strings and instruction/data operands as
ambiguous, resolving low vs. high when generating output for the
display or assembler.  This change splits it into two separate
formats, simplifying output generation.

The UI will continue to treat low/high ASCII as as single thing,
selecting the format appropriately based on the data.  There's no
reason to have two radio buttons that are never both enabled.

The data operand string functions need some additional work, but
that overlaps substantially with the upcoming PETSCII changes, so
for now all strings set by the data operand editor are low ASCII.

The file format has changed again, but since there hasn't been a
release since the previous change, I'm leaving the file format
at v2.  Code has been added to resolve the ASCII mode when loading
a v1 project file.

This removes some complexity from the assembly code generators.
2019-08-10 14:59:24 -07:00
Andy McFadden
0d0854bda7 Change the way string formats are defined
We used to use type="String", with the sub-type indicating whether
the string was null-terminated, prefixed with a length, or whatever.
This didn't leave much room for specifying a character encoding,
which is orthogonal to the sub-type.

What we actually want is to have the type specify the string type,
and then have the sub-type determine the character encoding.  These
sub-types can also be used with the Numeric type to specify the
encoding of character operands.

This change updates the enum definitions and the various bits of
code that use them, but does not add any code for working with
non-ASCII character encodings.

The project file version number was incremented to 2, since the new
FormatDescriptor serialization is mildly incompatible with the old.
(Won't explode, but it'll post a complaint and ignore the stuff
it doesn't recognize.)

While I was at it, I finished removing DciReverse.  It's still part
of the 2005-string-types regression test, which currently fails
because the generated source doesn't match.
2019-08-07 16:19:13 -07:00
Andy McFadden
c64f72d147 Move WPF code from SourceGenWPF to SourceGen 2019-07-20 13:28:37 -07:00
Andy McFadden
e3906e021b Move WinForms code to SourceGenWF 2019-07-20 13:02:54 -07:00
Andy McFadden
823aa072fb Update comments 2019-04-29 13:07:52 -07:00
Andy McFadden
8d0ce87ec7 Experiment on uncategorized data analysis
Tried something to speed it up.  Didn't help.  Cleaned up the code
a bit though.
2019-04-18 15:58:43 -07:00
Andy McFadden
97a372a884 Add selectable auto-label styles
SourceGen creates "auto" labels when it finds a reference to an
address that doesn't have a label associated with it.  The label for
address $1234 would be "L1234".  This change allows the project to
specify alternative label naming conventions, annotating them with
information from the cross-reference data.  For example, a subroutine
entry point (i.e. the target of a JSR) would be "S_1234".  (The
underscore was added to avoid confusion when an annotation letter
is the same as a hex digit.)

Also, tweaked the way the preferred clipboard line format is stored
in the settings file (was an integer, now an enumeration string).
2019-04-15 15:14:04 -07:00
Andy McFadden
47b1363738 Add more detail to cross references
In the cross-reference table we now indicate whether the reference
source is doing a read, write, read-modify-write, branch, subroutine
call, is just referencing the address, or is part of the data.
2019-04-11 16:23:02 -07:00
Andy McFadden
2f74fce80b Expand set of things that work with double-click on opcode
If you double-click on the opcode of "JSR label", the code view
selection jumps to the label.  This now works for partial operands,
e.g. "LDA #<label".

Some changes to the find-label-offset code affected the cc65 "is it
a forward reference to a direct-page label" logic.  The regression
test now correctly identifies an instruction that refers to itself
as not being a forward reference.
2018-11-03 15:03:25 -07:00
Andy McFadden
b97f7ca3d8 Fix add-label shortcut for adjusted operands
When you edit the operand of an instruction that targets an in-file
address, you're given the opportunity to specify a shortcut that
applies the symbol to the instruction's target address in addition
to or instead of defining a weak symbol reference on the instruction
being edited.

This didn't work right for operands with adjustments, e.g. the store
instructions in self-modifying code.  It put the label at the
unadjusted offset, which does nothing useful.

We now correctly back up to the start of the instruction or multi-
byte data area.
2018-10-11 16:48:55 -07:00
Andy McFadden
f4e4ac842d First cut of split-address table formatter
Allows specification of table data in various ways, for 16-bit and
24-bit addresses.  Shows a preview so you can see if the addresses
look about right.  Adds permanent labels at target offsets if none
are present.  Optionally sets code hints.

Works beautifully on the A2-Amper-fdraw example, but needs some
additional testing, documentation, etc.  Dialog is more complicated
that I would have liked, mostly because of 65816 support, but I
think it'll do.

(issue #10)
2018-10-06 18:05:31 -07:00
Andy McFadden
2c6212404d Initial file commit 2018-09-28 10:05:11 -07:00