From 908a1c99007859f88ba8c3c74adc4eb72c3819de Mon Sep 17 00:00:00 2001
From: Andy McFadden
Analysis of the file data is a complex multi-step process. Some -changes to the project, such as adding a code entry point hint or +changes to the project, such as adding a code start point or changing the CPU selection, require a full re-analysis of instructions and data. Other changes, such as adding or removing a label, don't affect the code tracing and only require a re-analysis of the data areas. @@ -279,7 +279,7 @@ out wrong more often than right.
Sometimes the indirect jump targets are coming from a table of addresses in the file. If so, these can be formatted as addresses, -and then the target locations hinted as code entry points.
+and then the target locations tagged as code entry points.The 65816 adds an additional twist: 16-bit data access instructions use the data bank register ("B") to determine which bank to load from. SourceGen can't determine what the value is, so it currently assumes @@ -304,7 +304,7 @@ branch will be ignored. If an extension script tries to flag bytes as inline data that have already been executed, the script will be ignored. This can lead to a race condition in the analyzer if an extension script is doing the wrong thing. (The race doesn't exist -with inline data hints specified by the user, because those are applied +with inline data tags specified by the user, because those are applied before code analysis starts.)
diff --git a/SourceGen/RuntimeData/Help/index.html b/SourceGen/RuntimeData/Help/index.html index abd0775..7eb63c0 100644 --- a/SourceGen/RuntimeData/Help/index.html +++ b/SourceGen/RuntimeData/Help/index.html @@ -30,7 +30,7 @@ and 65816 code. The official web site isThe code tracing has to start somewhere, so SourceGen uses "code entry -point hints" to identify places where execution may begin. By default, -a hint is placed at the start of the file. From there, the tracing process -walks through the code, pursuing all branches. In many cases, if you -mark all external entry points, SourceGen will automatically find all -executable code and separate it from variable storage and data areas.
+The code tracing has to start somewhere, so SourceGen uses "code start +points" to identify places where execution may begin. By default, +the first byte of the file is tagged as a start point. From there, the +tracing process walks through the code, pursuing all branches. In many +cases, if you tag all external entry points, SourceGen will automatically +find all executable code and separate it from variable storage and +data areas.
As noted earlier, tracking the processor status flags can make the analysis more accurate. Identifying situations where a branch instruction @@ -239,13 +240,13 @@ is launched, but may be manually deleted without harm.
advanced topics section. -Sometimes SourceGen can't automatically find the start or end of an instruction stream, or gets confused by inline data. These situations -can be resolved by adding an appropriate hint.
+can be resolved by adding analyzer tags. -Code entry point hints tell the analyzer to add the offset +
Code start point tags tell the analyzer to add the offset to the list of instruction start points. Suppose you've got a code library that begins with jump vectors, like this:
@@ -272,7 +273,7 @@ L1009 CLCSourceGen doesn't see any code that jumps to $1003 or $1006, so it assumes those are data. Further, the functions at those addresses may also be considered data unless some bit of code reachable from L1009 -calls into them. If you add a code hint to $1003 and $1006, +calls into them. If you tag $1003 and $1006 as code start points, you'll get better results:
.ORG $1000 @@ -282,9 +283,9 @@ you'll get better results: L1009 CLC-Be careful that you only add hints to the instruction opcode. If -you applied hints to the full range of bytes from $1003 to $1008, you would -end up with this:
+Be careful that you only tag the instruction opcode byte. If +you tagged each and every byte from $1003 to $1008, you would +end up with a mess:
.ORG $1000 JMP L1009 @@ -297,15 +298,15 @@ L1009 CLCThe exact set of instructions shown depends on your CPU configuration. The problem is that the bytes in the middle of the instruction have -been marked as entry points, and SourceGen is treating them as +been tagged as start points, so SourceGen is treating them as embedded instructions. $EF and $12 aren't valid 6502 opcodes, so -they're being ignored, but $10 is BPL and $30 is BMI. Because hinting +they're being ignored, but $10 is BPL and $30 is BMI. Because tagging multiple consecutive bytes is rarely useful, SourceGen only applies code -hints to the first byte in a selected line.
+start tags to the first byte in a selected line. -Data hints tell the analyzer when it should stop. For example, -suppose address $ff00 is known to always be nonzero, and the code uses -that fact to get a branch-always on the 6502:
+Code stop point tags tell the analyzer when it should stop. For +example, suppose address $ff00 is known to always be nonzero, and the code +uses that fact to get a branch-always on the 6502:
.ORG $1000 LDA $ff00 @@ -313,17 +314,18 @@ that fact to get a branch-always on the 6502: BRK $11-By placing a data hint on the BRK, you're telling the analyzer that -it should stop the current path of execution. (Note that this example -would actually be better solved by setting a status flag override on -the BNE that sets Z=0, so the code tracer will know it's a branch-always -and do the right thing.) It's only necessary to place a hint on the -very first (opcode) byte. Placing a data hint in the middle of what -SourceGen believes to be instruction will have no effect.
-As with code hints, only the first byte in each selected line will -be hinted.
+By tagging the BRK as a code stop point, you're telling the analyzer that +it should stop trying to execute code when it reaches that point. (Note +that this example would actually be better solved by setting a status flag +override on the BNE that sets Z=0, so the code tracer will know it's a +branch-always and just do the right thing.) As with code start points, +code stop points should only be placed on the opcode byte. Placing a +code stop point in the middle of what SourceGen believes to be instruction +will have no effect.
+As with code start points, only the first byte in each selected line will +be tagged.
-Inline data hints identify bytes as being part of the +
Inline data tags identify bytes as being part of the instruction stream, but not instructions. A simple example of this is the ProDOS 8 call interface on the Apple II, which looks like this:
@@ -333,11 +335,13 @@ is the ProDOS 8 call interface on the Apple II, which looks like this: BCS BAD-The three bytes following the
-JSR $bf00
should be hinted -as inline data, so that the code analyzer skips them and continues the -analysis at theBCS
. Because you need to hint every byte -of inline data, all bytes in a selected line will receive hints.If code branches into a region that is marked as inline data, the +
The three bytes following the
+JSR $bf00
should be tagged +as inline data, so that the code analyzer skips over them and continues the +analysis at theBCS
instruction. You can think of these as +"code skip" tags, but they're different from stop/start points, because +every byte of inline data must be tagged. When +applying the tag, all bytes in a selected line will be modified.If code branches into a region that is tagged as inline data, the branch will be ignored.
@@ -775,8 +779,8 @@ address of the byte before the target instruction, so we actually push $1006.The disassembler won't know that offset $1007 is code because nothing -appears to reference it. After adding a code hint at $1007, the project -looks like this:
+appears to reference it. After tagging $1007 as a code start point, the +project looks like this:LDA #$10 PHA diff --git a/SourceGen/RuntimeData/Help/mainwin.html b/SourceGen/RuntimeData/Help/mainwin.html index f91919c..8244e31 100644 --- a/SourceGen/RuntimeData/Help/mainwin.html +++ b/SourceGen/RuntimeData/Help/mainwin.html @@ -112,11 +112,13 @@ assembler directive. field to open the Edit Status Flag Override dialog.
Sometimes a change will invalidate an earlier change. For example, -suppose you hint an area as data, and format it as a string. -Later on you hint it as code. You now have a block of code with a -string format record sitting in the middle of it. SourceGen tries very -hard not to throw away anything you've done, but it will ignore anything -invalid.
+suppose you add a code stop point, and format the data that follows +as a string. Later on you change it to a code start point. You now have +a block of executable code with a string format record sitting in the +middle of it. SourceGen tries very hard not to throw away anything +you've done, but it will ignore anything invalid.If a problem like this is encountered, an entry is added to a list -of messages displayed at the bottom of the window. Each entry identifies +of messages displayed at the bottom of the main window. Each entry identifies the nature of the problem, the severity of the problem, the offset where it occurred, and what was done to resolve it. The problem categories include:
@@ -359,11 +362,12 @@ include:The "context" column will provide additional detail about the problem. -In most cases, the offending item will be ignored.
+The "context" column will provide additional detail about the problem, +and the "resolution" column will indicate how it's being handled. In most +cases, the offending item will be ignored.
Double-clicking on an entry will jump to that offset.
The message list will not appear if there are no messages. You can hide the list by clicking on the "Hide" button to the left of the messages. @@ -415,17 +419,20 @@ convenient. You can use Alt+Left/Right Arrow, or Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)
-To add code entry or data hints, select the desired offsets and -use Actions > Hint As Code Entry Point or Hint As Data. Because code -hints mean "the code analyzer should start here", and data hints mean -"the code analyzer should stop here", there is rarely any reason to hint -multiple consecutive bytes. For this reason, only the first byte on each -selected line will be hinted.
-For inline data, you need to hint every byte, so every byte in every -selected line is hinted when you select Hint As Inline Data. Similarly, -the Remove Hints menu item will remove hints from every byte.
+(Note: These were referred to as code/data "hints" in older +versions of SourceGen.)
+ +To set code start or stop points, select the desired offsets and +use Actions > Tag Address As Code Start Point (or Stop Point). Because +these indicate a transition between code and data regions, there is rarely +any need to tag multiple consecutive bytes. +For this reason, only the first byte on each selected line will be tagged.
+ +For inline data, you need to cover the entire range, so every byte in every +selected line is tagged when you select Tag Bytes As Inline Data. Similarly, +the Remove Analyzer Tags menu item will remove tags from every byte.
If you're having a hard time selecting just the right bytes because the instructions are caught up in a multi-byte data item, such as an @@ -433,7 +440,7 @@ auto-detected character string, you can disable uncategorized data analysis (the thing that creates the .STR and .FILL ops for you). You can do this from the project properties editor, -or simply by hitting Ctrl+D. Hit that, apply the hint, then hit it +or simply by hitting Ctrl+D. Hit that, tag the byte or bytes, then hit it again to re-enable the string & fill analyzer.
Another approach is to use the "Toggle Single-Byte Format" menu item to "flatten" the item.
@@ -466,7 +473,7 @@ values are to be pushed onto the stack for an RTS call.While the .dd2 case is easy to format with the data operand editor, formatting addresses whose components are split into multiple tables can be tedious. Even in the easy case, you may want to create labels and set -code hints for each item.
+code start points for each item.The Address Table Formatter helps you associate symbols with the addresses in the table. It works for simple and "split" tables.
@@ -503,9 +510,10 @@ is split into sections. selection", which will just use whichever part you didn't specify for the low and high bytes. If the table holds 16-bit addresses, you can use the "Constant" field to specify the data bank. -It should be mentioned that SourceGen does not record the fact that the -data in question is part of a table. The formatting, labels, and code hints -are applied as if you entered them all individually by hand. The formatter -is just significantly more convenient. It also does everything as a single -undoable action, so if it comes out looking wrong, just hit "undo" and -try something else.
+data in question is part of a table. The formatting, labels, and code +start point tags are applied as if you entered them all individually by +hand. The formatter is just significantly more convenient. It also +does everything as a single undoable action, so if it comes out looking +wrong, just hit "undo" and try something else.The entry flags determine the initial value for the processor status flag register. Code that is unreachable internally (requiring a code -entry point hint) will use this value. This is chiefly of value for -65816 code, where the initial value of the M/X/E flags is significant.
+start point tag) will use this value. This is chiefly of value for +65816 code, where the initial value of the M/X/E flags has a significant +impact on how instructions are disassembled.If "analyze uncategorized data" is checked, SourceGen will attempt to identify character strings and regions that are filled with a repeated diff --git a/SourceGen/RuntimeData/Help/tutorials.html b/SourceGen/RuntimeData/Help/tutorials.html index e1e4f92..acf888d 100644 --- a/SourceGen/RuntimeData/Help/tutorials.html +++ b/SourceGen/RuntimeData/Help/tutorials.html @@ -392,28 +392,29 @@ to the Examples directory, then from the Tutorial directory select "Tutorial2".
The first thing you'll notice is that we immediately ran into a BRK,
which is a pretty reliable sign that we're not in a code section. The
-generic profile puts a code entry point hint on the first byte, but that's
+generic profile puts a code start point tag on the first byte, but that's
wrong here. This particular file begins with 00 20
, which
could be a load address (some C64 binaries look like this). So let's start
with that assumption.
Click on the first line of code at address $1000, and select -Actions > Remove Hints. The $20 got absorbed into a string. The string -is making it hard to manipulate the next few bytes, so let's fix that by -selecting Edit > Toggle Data Scan (Ctrl+D). This turns off the feature -that looks for strings and .FILL regions, so now each uncategorized byte is -on its own line.
+Actions > Remove Analyzer Tags. The $20 got absorbed into a string. The +string is making it hard to manipulate the next few bytes, so let's fix +that by selecting Edit > Toggle Data Scan (Ctrl+D). This turns off +the feature that looks for strings and .FILL regions, so now each +uncategorized byte is on its own line.You could select the first two lines and use Actions > Edit Operand to format them as a 16-bit little-endian hex value, but there's a shortcut: -select only the first line of code, then Actions > Format As Word (Ctrl+W). It -automatically grabbed the following byte and combined them. Since we believe -$2000 is the load address for everything that follows, click on the line -with address $1002, select Actions > Set Address, and enter "2000". With -that line still selected, use Actions > Hint As Code Entry Point -(Ctrl+H then Ctrl+C) to identify it as code.
+select only the first line of code, then Actions > Format As Word (Ctrl+W). +It automatically grabbed the following byte and combined them. Since we +believe $2000 is the load address for everything that follows, click on +the line with address $1002, select Actions > Set Address, and +enter "2000". With that line still selected, use +Actions > Tag Address As Code Start Point (Ctrl+H then Ctrl+C) to +identify it as code.That looks better, but it's branching off the bottom of the screen (unless you have a really tall screen or small fonts) because of all the -intervening data. Use Edit > Toggle Data Scan to turn the string -finder back on.
+intervening data. Use Edit > Toggle Data Scan to turn the +string-finder back on.There are four strings starting at address $2004, each of which is followed by $00. These look like null-terminated strings, so let's make @@ -525,9 +526,10 @@ point past the inline data before returning (technically, it points at the very last byte of the inline data, because RTS jumps to address + 1).
To format the data, we first need to tell SourceGen that there's data in line with the code. Select the line at address $206E, then -shift-click the line at address $2077. Use Actions > Hint as Inline Data.
+shift-click the line at address $2077. Use +Actions > Tag Bytes As Inline Data.The data turns to single-byte values, and we now see the code -continuing at address $2078. We can format the data as string by +continuing at address $2078. We can format the data as a string by using Actions > Edit Operand, setting the Character Encoding to "Low or High ASCII", and choosing "null-terminated strings".
@@ -612,8 +614,8 @@ change. Double-click the initial .ORG statement, and change it from $2000 to $1d60. We can now see that $1d70 starts right after this initial chunk of code. -Select the line with address $1d70, then Actions > Hint As Code -Entry Point. +
Select the line with address $1d70, then +Actions > Tag Address As Code Start Point. More code appears, but not much -- if you scroll down you'll see that most of the file is still data. The code at $1d70 searches through a table at $1d88 for a match with the contents of the accumulator. If it finds a match, @@ -625,7 +627,8 @@ must be the desired address minus one.
The first byte in the first address table at $1d97 (which has the auto-label L1D97) is $b4. The first byte in the second table is $1d. So the first address we want is $1db4 + 1 = $1db5.
-Select the line at $1db5, and use Actions > Hint As Code Entry Point. +
Select the line at $1db5, and use
+Actions > Tag Address As Code Start Point.
More code appears, but again it's only a few lines. Let's dress this one
up a bit. Set a label on the code at $1db5 called "FUNC". At $1d97, edit
the data item (double-click on "$b4"), click "Single bytes", then type "FUNC"
@@ -654,7 +657,7 @@ checkbox. As soon as you do, the first line of the Generated Addresses
list should show the symbol "FUNC". The rest of the addresses will look like
(+) T1DD0
. The "(+)" means that a label was not found at
that location, so a label will be generated automatically.
Down near the bottom, check the "add code entry hint if needed" checkbox. +
Down near the bottom, check the "tag targets as code start points" checkbox. Because we saw the table contents being pushed onto the stack for RTS, we know that they're all code entry points.
Click "OK". The table of address bytes at $1d97 should now all be @@ -664,7 +667,7 @@ last dozen bytes at the end of the file. (If this isn't the case, use Edit > Undo, then work through the steps again.)
The formatter did the same steps you went through earlier -- set a label, apply the label to the low and high bytes in the table, add a -code entry point hint -- but did several of them at once.
+code start point tag -- but did several of them at once.We don't want to save this project, so select File > Close. When SourceGen asks for confirmation, click Discard & Continue.
diff --git a/SourceGen/WpfGui/FormatAddressTable.xaml b/SourceGen/WpfGui/FormatAddressTable.xaml index ab05d53..212ebe6 100644 --- a/SourceGen/WpfGui/FormatAddressTable.xaml +++ b/SourceGen/WpfGui/FormatAddressTable.xaml @@ -115,7 +115,7 @@ limitations under the License.