diff --git a/SourceGen/RuntimeData/Help/advanced.html b/SourceGen/RuntimeData/Help/advanced.html index 96675f5..6f67fcf 100644 --- a/SourceGen/RuntimeData/Help/advanced.html +++ b/SourceGen/RuntimeData/Help/advanced.html @@ -28,8 +28,8 @@ as project symbols into the other projects.
symbol-import step in every interested project. This step must be repeated whenever the labels are updated.A different but related problem is typified by arcade ROM sets, -where files are split apart because each file must be flashed to a -separate chip. All files are expected to be present in memory at +where files are split apart because each file must be burned into a +separate PROM. All files are expected to be present in memory at once, so there's no reason to treat them as separate projects. Currently, the best way to deal with this is to concatenate the files into a single file, and operate on that.
@@ -60,7 +60,7 @@ L1103_0 LDA #$22Both sections start at $1100, and have branches to $1103. The branch
in the first section resolves to the label in the first version of
that address chunk, while the branch in the second section resolves to
-the label in the second chunk. When branches are outside the current
+the label in the second chunk. When branches originate outside the current
address chunk, the first chunk that includes that address is used, as
it is with the JMP $1000
at the start of the file.
DisasmProject.cs
):
The Anattrib array tracks most of the state from here on. If we're
doing a partial re-analysis, this step will just clone a copy of the
Anattrib array that was made at this point in a previous run. (This
- step is described in more detail below.)DisasmProject.cs
):
data, and connects instruction and data operands to target offsets.
The "nearby label" stuff is handled here. All of the results are
stored in the Anattribs array. (This step is described in more
- detail below.)
+ detail below.)
Every offset in the file is marked as an instruction byte, data byte, or +inline data byte. Some offsets are also marked as the start of an instruction +or data area. The start offsets may have a format descriptor associated +with them.
+Format descriptors have a format (like "numeric" or "string") a +sub-format (like "hexadecimal" or "null-terminated"), and a length. For +an instruction operand the length is redundant, but for a data operand it +determines the width of the numeric value or length of the string. For +this reason, instructions do not need a format descriptor, but all +data items do.
+Symbolic references are format descriptors with a symbol attached. +The symbol reference also specifies low/high/bank.
+Every offset marked as a start point gets its own line in the on-screen +display list. Embedded instructions are identified internally by +looking for instruction-start offsets inside instructions.
+ +The Anattrib array holds the post-analysis state for every offset, +including comments and formatting, but any changes you make in the +editors are applied to the data structures that are saved in the project +file. After a change is made, a full or partial re-analysis is done to +fill out the Anattribs.
+Consider a simple example:
++ .ORG $1000 + JMP L1003 +L1003 NOP ++ +
We haven't formatted anything yet. The data analyzer sees that the +JMP operand is inside the file, and has no label, so it creates an +auto-label at offset +000003 and a format descriptor with a symbolic +operand reference to "L1003" at +000000.
+Now we edit the label, changing L1003 to "FOO". This goes into the +project's "user label" list. The analyzer is +run, and applies the new "user label" to the Anattrib array. The +data analyzer finds the numeric reference in the JMP operand, and finds +a label at the target address, so it creates a symbolic operand reference +to "FOO". When the display list is generated, the symbol "FOO" appears +in both places.
+Even though the JMP operand changed from "L1003" to "FOO", the only +change actually written to the project file is the label edit. The +contents of the Anattrib array are disposable, so it can be used to +add labels and "fix up" numeric references. Generated labels and +format descriptors are never added to the project file.
+ +If the JMP operand were edited, a format descriptor would be added +to the user-specified descriptor list. During the analysis pass it would +be added to the Anattrib array at offset +000000.
+ + +The analysis pass always considers the current state of the user +data structures. Whether you're adding a label or removing one, the +code runs through the same set of steps. The advantage of this approach +is that the act of doing a thing, undoing a thing, and redoing a thing +are all handled the same way.
+None of the editors modify the project data structures directly. All +changes are added to a change set, which is processed by a single function. +The change sets are kept in the undo/redo buffer indefinitely. After +the changes are made, the Anattrib array and other data structures are +regenerated.
+ +Data format editing can create some tricky situations. For example, +suppose you have 8 bytes that have been formatted as two 32-bit words: + +
+1000: 68690074 .dd4 $74006968 +1004: 65737400 .dd4 $00747365 ++ +You realize these are null-terminated strings, select both words, and +reformat them: + +
+1000: 686900 .zstr "hi" +1003: 74657374+ .zstr "test" ++ +Seems simple enough. Under the hood, SourceGen created three changes: +
Each entry in the change set has "before" and "after" states for the +format descriptor at a specific offset. Only the state for the affected +offsets is included -- the program doesn't take a complete state snapshot +(even with the RAM on a modern system that would add up quickly). When +undoing a change, before and after are simply reversed.
+ +The code tracer walks through the instructions, examining them to @@ -81,8 +178,9 @@ for every instruction:
Examples:LDA
, STA
, AND
,
NOP
.
RTS
, BRK
.
+ determined from the file data (unless you're disassembling the
+ system ROM around the BRK vector).
+ Examples: RTS
, BRK
.
JMP
, BRA
, BRL
.
Branch targets are added to a list. When the current run of instructions -is exhausted (i.e. a "don't continue" instruction is reached), the next -target is pulled off of the list.
+is exhausted (i.e. a "don't continue" or "branch always" instruction is +reached), the next target is pulled off of the list.The state of the processor status flags is recorded for every instruction. When execution proceeds to the next instruction or branches @@ -116,18 +214,19 @@ of status flags, the analyzer stops pursuing that path.
when examining 65816 code, but it's possible for the status flag values to be indeterminate. In such a situation, short registers are assumed. Similarly, if the carry flag is unknown when anXCE
is
-performed, we assume a transition to emulation mode.
+performed, we assume a transition to emulation mode (E=1).
-There are three ways to set a definite value in a status flags:
+There are three ways in which code can set a flag to a definite value:
SEC
or
+ SEC
or
CLD
.LDA #$00
sets Z=1 and N=0.
- ORA #$80
sets Z=0 and N=1.LDA #$00
sets Z=1
+ and N=0. ORA #$80
sets Z=0 and N=1.BCC
instruction,
we know that the carry will be clear at the branch target address, and
set at the following instruction. The instruction doesn't affect the
- value of the flag, but we know what the value is at either address.Self-modifying code can render spoil any of these, possibly requiring a status flag override to get correct disassembly.
@@ -145,7 +244,7 @@ code doesCLC
/PHP
, followed a bit later by the
flag around. Flagging the carry bit as indeterminate with a status flag
override on the instruction following the PLP fixes things.)
-Some other things that the code analyzer can't handle:
+Some other things that the code analyzer can't recognize automatically:
Extension scripts can mark data that follows a JSR or JSL as inline +data, or change the format of nearby data or instructions. The first +time a JSR/JSL instruction is encountered, all loaded extension scripts +are offered a chance to act.
+ +The first script that applies a format wins. Attempts to re-format +instructions or data will fail. This rule ensure that anything explicitly +formatted by the user will not be overridden by a script.
+ +If code jumps into a region that is marked as inline data, the +branch will be ignored. If an extension script tries to flag bytes +as inline data that have already been executed, the script will be +ignored. This can lead to a race condition in the analyzer if +an extension script is doing the wrong thing. (The race doesn't exist +with inline data hints specified by the user, because those are applied +before code analysis starts.)
+ +The data analyzer performs two tasks. It matches operands with offsets, and it analyzes uncategorized data. Either or both of @@ -171,17 +290,17 @@ these can be disabled from the
The data target analyzer examines every instruction and data operand to see if it's referring to an offset within the data file. If the -target is within the file, and has a label, a weak symbolic reference -to that label is added to the Anattrib array. If the target doesn't -have a label, the analyzer will either use a nearby label, or generate -a unique label and use that.
+target is within the file, and has a label, a format descriptor with a +weak symbolic reference to that label is added to the Anattrib array. If +the target doesn't have a label, the analyzer will either use a nearby +label, or generate a unique label and use that.While most of the "nearby label" logic can be disabled, targets that land in the middle of an instruction are always adjusted backward to the instruction start. This is necessary because labels are only visible if they're associated with the first (opcode) byte of an instruction.
The uncategorized data analyzer tries to find ASCII strings and -opportunities to use the ".FILL" instruction. It breaks the file into +opportunities to use the ".FILL" operation. It breaks the file into pieces, where contiguous regions hold nothing but data, are not split across a ".ORG" directive, are not interrupted by data, and do not contain anything that the user has chosen to format. Each region is diff --git a/SourceGen/RuntimeData/Help/codegen.html b/SourceGen/RuntimeData/Help/codegen.html index d9dc9a0..ae67e83 100644 --- a/SourceGen/RuntimeData/Help/codegen.html +++ b/SourceGen/RuntimeData/Help/codegen.html @@ -15,8 +15,8 @@
SourceGen can generate an assembly source file that, when fed into the target assembler, will recreate the original data file exactly. -Every assembler is different, so code must be written especially for -each.
+Every assembler is different, so support must be added to SourceGen +for each.
The generation / assembly dialog can be opened with File > Assemble.
@@ -37,7 +37,7 @@ assembler. This is most easily understood with an example.54 02 01
, with the arguments reversed. cc65 v2.17 doesn't
do that; this is a bug that was fixed in a later version. So if you're
generating code for v2.17, you want to create source code with the
-arguments the other way around.
+arguments the wrong way around.
Having version-dependent source code is a bad idea, so SourceGen just outputs raw hex bytes for MVN/MVP instructions. This yields the correct code for all versions of the assembler, but is ugly and @@ -56,7 +56,7 @@ intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some generators may produce multiple source files, perhaps a link script or symbol definition header to go with the assembly source. To avoid spreading files across the filesystem, SourceGen does all of its work -in the same directory where the project lives. So before you can generate +in the same directory where the project lives. Before you can generate code, you have to have given your project a name by saving it.
The Generate and Assemble dialog has a drop-down list near the top @@ -98,12 +98,12 @@ command-line output will be displayed, with stdout and stderr separated. provides.)
The output will show the assembler's exit code, which will be zero -on success (note: sometimes they lie.) If it did, SourceGen will then -compare the assembler's output to the original file, and report any -differences.
+on success (note: sometimes they lie.) If it appeared to succeed, +SourceGen will then compare the assembler's output to the original file, +and report any differences.Failures here may be due to bugs in the cross-assembler or in SourceGen. However, SourceGen can generally work around assembler bugs, -so any failure here is an opportunity for improvement.
+so any failure is an opportunity for improvement. diff --git a/SourceGen/RuntimeData/Help/editors.html b/SourceGen/RuntimeData/Help/editors.html index 84ae567..9749b68 100644 --- a/SourceGen/RuntimeData/Help/editors.html +++ b/SourceGen/RuntimeData/Help/editors.html @@ -16,13 +16,13 @@This adds a target address directive (".ORG") to the current offset. -If you leave the field blank, the directive will be removed.
+If you leave the text field blank, the directive will be removed.Addresses are always interpreted as hexadecimal. You can prefix -it with a '$', but that's not necessary.
-24-bit addresses may be written with a bank separator, e.g. "12/3456" +it with a '$', but that's not required. +24-bit addresses may be written with a bank separator, e.g. "12/3456" would resolve to address $123456.
-There will always be an address directive at the start of the list. +
There will always be an address directive at the start of the file. Attempts to remove it will be ignored.
@@ -34,14 +34,15 @@ that instruction. You can override the value of individual flags.The 65816 emulation bit, which is not part of the processor status register, may also be set in the editor.
The M, X, and E flags will not be editable unless your CPU configuration -is set to a 16-bit CPU.
+is set to 65816.Sets or clears a label at the selected offset. The label must have -the proper form, and not have the same name as another symbol.
+the proper form, and not have the same name as another symbol. If +you edit an auto-generated label you will be required to change the name.The label may be marked as local, global, or global and exported. -Local labels may be generated in the assembler output in a +Local labels may be modified by the assembly code generator to have a more convenient form, such as a local loop identifier. Global labels are always output as-is. Exported labels are added to a table that may be imported by other projects.
@@ -51,16 +52,17 @@ be imported by other projects.Operands can be displayed in a variety of numeric formats, or as a symbol. The ASCII character format is only available for operands whose value falls into the range of low- or high-ASCII characters.
-Symbols may be used in their entirety, or offset by a byte or two. +
Symbols may be used in their entirety, or shifted and masked. The low / high / bank selector determines which byte is used as the low byte. For 16-bit operands, this acts as a shift rather than a byte -select.
+select. If the symbol is wider than the operand field, a mask will be +applied automatically.A few shortcuts are provided when specifying a symbol. As noted in the introductory sections, operand symbols are weak references. If the symbol hasn't been defined as a label yet, the operand will be formatted as hex, which is probably not what you want.
-The default behavior is to just set the operand's symbol.
+The default behavior is just to set the operand's symbol.
For operands that target an offset inside the file, if the target address does not yet have a label, and the symbol doesn't exist, you may set the symbol as the label on the target address as well. You can do @@ -84,24 +86,35 @@ future release.)
This dialog offers a variety of choices, and can be used to apply a -format to a range of offsets. If the range crosses a visual boundary, +format to a range of offsets. You must select all of the bytes you want +to format. For example, to format two bytes as a 16-bit word, you must +select both bytes in the editor. (If you click on the first item, then +Shift+double-click on the operand field of the last item, you can do +this very quickly.) The selection does not need to be contiguous: you +can use Control+click to select scattered items.) +
If the range is discontiguous, or crosses a visual boundary such as a change in address, a user-specified label, or a long comment -or note, the region will be split. The top of the dialog indicates how -many bytes have been selected, and how many regions they have been -divided into.
+or note, the selection will be split into smaller regions. A message at the +top of the dialog indicates how many bytes have been selected, and how +many regions they have been divided into.(End-of-line comments do not split a region, and will disappear if they end up inside a multi-byte data item.)
The "Simple Data" items behave the same as their equivalents in the Edit Operand dialog. However, because the width is not determined by -an instruction opcode, you will need to specify how wide each item is, -and the byte order.
-Suppose you find a table of 16-bit addresses in the code. Click on +an instruction opcode, and multiple items can be selected, you will need +to specify how wide each item is and what its byte order is. For data +you also have the option of setting the format to "Address", which marks +the selected bytes as a numeric reference.
+ +Consider a simple example: suppose you find a table of 16-bit +addresses in the code. Click on the first byte, shift-click the last byte, then select the Edit Data menu item. The number of bytes selected should be even. Select -"16-bit words, little-endian", then to the right "Address". When you -click OK, the selected data will be formatted as a series of 16-bit -address values.
+"16-bit words, little-endian", then over to the right click on +"Address". When you click OK, the selected data will be formatted as a +series of 16-bit address values. If the addresses can be resolved inside +the data file, each address will be assigned a label.The "Bulk Data" items can represent large chunks of data compactly. The "fill" option is only available if all selected bytes have the @@ -161,8 +174,8 @@ want to limit the overall length if you're hoping to create 80-column output. Some retro assemblers may have hard line length limitations, which could result in the comment being truncated in generated sources.
A semicolon (';') is placed at the start of the line. If an assembler -has different conventions, a different character may be used. You don't -need to include a delimiter in the comment field.
+has different conventions, a different delimiter character may be used. You +don't need to include a semicolon in the comment field.Comments on platform symbols are read from the platform symbol file, and cannot be edited from within SourceGen. Comments on project symbols are @@ -176,11 +189,11 @@ will be word-wrapped at a line width of your choosing. They're always drawn with a fixed-width font, so you can create ASCII-art diagrams. Comment delimiters are added automatically at the start of each line.
For a true retro look you can "box" the comment with asterisks. You -can create a fill-width row of asterisks by putting a '*' on a line by +can create a full-width row of asterisks by putting a '*' on a line by itself. (Assembly source generators are allowed to use a character other than '*' for the output, e.g. they might use a full set of box outline characters, though that's somewhat against the spirit of -the thing.)
+the thing. Regardless, a solo '*' results in a line.)The bottom window will update automatically as you type, showing what the output is expected to look like. The actual assembler source output will depend on features of the target assembler, such as comment @@ -226,7 +239,7 @@ the same way when used in a .EQ directive.
the .EQ directive.Symbols marked as "address" will be applied automatically when an operand references an address outside the scope of the data file. Symbols -marked as "constant" will not, though you can still specify it manually.
+marked as "constant" will not, though you can still specify them manually. diff --git a/SourceGen/RuntimeData/Help/end-notes.html b/SourceGen/RuntimeData/Help/end-notes.html index 7de5c8c..8763855 100644 --- a/SourceGen/RuntimeData/Help/end-notes.html +++ b/SourceGen/RuntimeData/Help/end-notes.html @@ -19,10 +19,10 @@ school in the late 1980s, I read Don Lancaster's Enhancing Your Apple II, Vol. 1 (available for download here). This included a very detailed methodology for disassembling 6502 software. -I decided to give it a try, so I dumped a monitor listing of the -operating system from an SSI game ("RDOS") to paper with my Epson -RX-80 -- tractor feed paper was helpful for this sort of thing -- and -set to work. +I wanted to give it a try, so I generated a monitor listing of an +operating system (called "RDOS") that SSI used on their games, and +printed it out on my Epson RX-80 -- tractor feed paper was helpful for +this sort of thing -- then set to work.Lancaster's methodology involved highlighting different types of instructions with different colors, making notes, and adding labels. @@ -44,14 +44,17 @@ like a modern IDE, because I didn't just want it to translate machine code into readable form. I wanted it to help me with the process of understanding the code, by providing cross-reference tables and symbol lists and giving me a place to scribble notes to myself while I worked. -Especially the note-scribbling.
+I especially wanted the note-scribbling, because learning how something +works is usually an iterative process, where the function of a chunk of +code gradually reveals itself over time.In 2002, while writing the 6502/65816 disassembler for CiderPress, I ran into the same problems I had with the original Apple II monitor: it blundered through data sections and got lost briefly when a new code -section started. This made it annoying to use for even small binaries. I +section started. You had to pick long or short registers for the entire +diassembly, which made 65816 code something of a disaster. I jotted down some notes on what I thought the core features of a good -6502 diassembler should be, then went back to work on other features. It +6502 disassembler should be, then moved on to work on other features. It was another 15 years before I picked up the idea again.
More recently, I disassembled some code by dumping it to a text diff --git a/SourceGen/RuntimeData/Help/index.html b/SourceGen/RuntimeData/Help/index.html index e125f93..491e5f1 100644 --- a/SourceGen/RuntimeData/Help/index.html +++ b/SourceGen/RuntimeData/Help/index.html @@ -54,6 +54,7 @@ and 65816 code. The official web site is
The computer I built in 2014 has a 4GHz CPU and 8GB of RAM. -We should put that to good use.
+I figured we should put that kind of power to good use.The second purpose is to facilitate sharing and collaboration. Most disassemblers generate output for a specific assembler, or in a way that's @@ -49,12 +49,13 @@ capabilities within SourceGen are sufficiently flexible. If you need to generate assembly source and tweak it a bunch to express the intent of the original code, then passing a SourceGen project around won't work. This sort of thing is a bit outside the bounds of what a typical -disassembler does, so it remains to be seen whether this succeeds at -what it's trying to do, and also whether what it's trying to do is actually -something that people want.
+disassembler does, so it remains to be seen whether SourceGen succeeds at +what it's trying to do, and also whether what it's trying to do is +something that people actually want. -You can get started by watching the demo video and playing with the -tutorials.
+You can get started by watching the +demo video and playing with the +tutorials.
Code usually arrives in a big binary blob. Some of it will be +instructions, some of it will be data, some will be empty space used +for variable storage. Part of the challenge of disassembly is +identifying which parts of the file contain which.
+Much of the code you'll find for the 6502 was written by humans, rather than generated by a compiler, which means it won't conform to a -specific set of conventions. However, most programmers will use -subroutines, and will often intersperse code with bits of data storage -for variables. The variable data storage is referred to as a "stash". +standard set of conventions. However, most programmers will use +subroutines, which can be identified and analyzed in isolation. Subroutines +are often interspersed with variable storage, referred to as a "stash". Variables may be single-byte or multi-byte, the latter typically in little-endian byte order.
-Data that is principally read-only can take many forms. Among the -more common forms are graphics and ASCII string data. The former is -generally difficult to recognize automatically, but strings can often be -identified. Address tables, which are a collection of addresses to -other things, are also fairly common. When used as jump tables, they -might actually refer to the address before the actual instruction, because -of the way the RTS (Return to Subroutine) instruction works.
+Much of the data in a typical program is read-only, often in the +form of graphics or ASCII string data. Graphics can be difficult +to recognize automatically, but strings can be identified with a +reasonable degree of confidence. Address tables, which are a collection +of addresses to other things, are also fairly common.
A simple disassembler would start at the top of the file and just start converting bytes to instructions. Unfortunately there's no reliable @@ -127,14 +131,17 @@ by the program bank register and the data bank register, respectively. The disassembler can't generally know the contents of the data bank register, which makes life a bit more interesting.
-The 6502 has an 8-bit processor status register with a bunch of flags -in it. One use of certain flags is to determine whether a -conditional branch is taken or not. -Two flags that are only present on the 65816 (M and X) are especially -interesting, because they determine whether the accumulator and index -registers are 8 or 16 bits wide. This determines the width of immediate-mode -instructions, so if you don't know what's in the processor status -register it's hard to correctly disassemble the instruction stream.
+The 6502 has an 8-bit processor status register ("P") with a bunch of flags
+in it. Some of the flags determine whether a conditional branch is taken
+or not, which is important because some branches appear to be conditional
+but actually are always or never taken in practice. The disassembler needs
+to be able to figure this out so that it doesn't try to disassemble the
+bytes that follow an always-taken branch.
+A more significant concern is the M and X flags found on the 65802/65816,
+which determine the width of the registers and of immediate load
+instructions. If you don't know what state the flags are in, you can't
+know whether LDA #value
is two bytes or three, and the
+disassembly of the instruction stream will come out wrong.
The code tracing has to start somewhere, so SourceGen uses "code entry point hints" to identify places where execution may begin. By default, -one is placed at the start of the file. From there, the tracing process +a hint is placed at the start of the file. From there, the tracing process walks through the code, pursuing all branches. In many cases, if you -mark all code entry points, SourceGen will automatically find all +mark all external entry points, SourceGen will automatically find all executable code and separate it from variable storage and data areas.
As noted earlier, tracking the processor status flags can make the @@ -155,7 +162,7 @@ analysis more accurate. Identifying situations where a branch instruction is always or never taken avoids mis-categorizing a data region as code. On the 65816, it's absolutely crucial to track the M/X flags, since those affect the width of instructions. SourceGen tracks the value of the -processor flags at every instruction, blending sets together when +processor flags at every instruction, blending sets of flags together when multiple paths of execution converge.
Once instructions and data have been separated, the instruction operands @@ -172,23 +179,16 @@ by an equate directive.
Extension scripts are C# source files that are compiled and -executed by SourceGen. They can be added to a project from the RuntimeData -directory or the directory the project file lives in.
-In v1.0, scripts are only called to examine JSR/JSL instructions. -They can format nearby bytes as inline data, or apply symbols to -operands.
- -If code jumps into a region that is marked as inline data, the -branch will be ignored. If an extension script tries to flag bytes -as inline data that have already been executed, the script will be -ignored. This can lead to a race condition in the analyzer if -an extension script is doing the wrong thing. (The race doesn't exist -with inline data hints specified by the user, because those are applied -before code analysis starts.)
+executed by SourceGen. They can be added to a project from SourceGen's +runtime data directory, or can live in the directory next to the project +file. +In the current implementation, scripts are only called to examine +JSR/JSL instructions. They can format nearby bytes as inline data, or +apply symbols to operands.
To reduce the chances of a script causing problems, all scripts are executed in a sandbox with severely restricted access. Notably, nothing -in the script can access files, except to read those in the PluginDll +in the sandbox can access files, except to read files from the PluginDll directory.
The PluginDll directory lives next to the SourceGen executable, and contains all of the compiled script DLLs, as well as two pre-built @@ -199,10 +199,9 @@ is launched, but may be manually deleted without harm.
Sometimes SourceGen can't automatically find the start or end of a -code area. Maybe there's inline data after a JSR that didn't get -recognized by an extension scripts. These situations can be resolved -by adding an appropriate hint.
+Sometimes SourceGen can't automatically find the start or end of an +instruction stream, or gets confused by inline data. These situations +can be resolved by adding an appropriate hint.
Code entry point hints tell the analyzer to add the offset to the list of instruction start points. Suppose you've got a code @@ -247,9 +246,9 @@ end up with this:
.ORG $1000 JMP L1009 - JMP ⏩ L10ef - BPL ⏩ L1053 - JMP ⏩ L1230 + JMP ⏩ L10ef + BPL ⏩ L1053 + JMP ⏩ L1230 BMI L101b L1009 CLC@@ -276,7 +275,7 @@ would actually be better solved by setting a status flag override on the BNE that sets Z=0, so the code tracer will know it's a branch-always and do the right thing.) It's only necessary to place a hint on the very first (opcode) byte. Placing a data hint in the middle of what -SourceGen believes is an instruction will have no effect. +SourceGen believes to be instruction will have no effect.
Inline data hints identify bytes as being part of the instruction stream, but not instructions. A simple example of this @@ -285,11 +284,13 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:
JSR $bf00 .DD1 $function .DD2 $address + BCS BAD -The three bytes following a JSR to $bf00 should be skipped over by -the code analyzer. In this case, all three bytes must be hinted.
-If code jumps into a region that is marked as inline data, the +
The three bytes following the JSR $bf00
should be hinted
+as inline data, so that the code analyzer skips them and continues the
+analysis at the BCS
.
If code branches into a region that is marked as inline data, the branch will be ignored.
@@ -303,9 +304,9 @@ of the work being disassembled. (This will vary by region. Also, note that the mere act of disassembling a piece of software may be illegal in some cases.) -To avoid mix-ups, the data file's length and CRC are stored in the -project file. SourceGen will refuse to open a project if the data file's -length and CRC don't match.
+To avoid mix-ups where the wrong data file is used, the file's length +and CRC are stored in the project file. SourceGen will refuse to open a +project if the data file's length and CRC don't match.
Most of the data in the project file is associated with a file offset. When you create a comment, you aren't associating it with line 53, you're @@ -317,14 +318,20 @@ convention, file offsets are always shown as a six-digit hexadecimal value with a leading '+', e.g. "+0012ab". This makes it easy to distinguish between an address and a offset.
+Instruction and data operands can be formatted in various ways. The +formatting choice is associated with the first offset of the item. For +instructions the number of bytes in the operand is determined by the opcode +(and, on the 65816, the M/X status flags). For data items the length +can be a single byte or an entire file. Operand formats are not allowed +to overlap.
+When an instruction or data operand references an address, we call it a numeric reference. When the target address has a label, and the operand uses that symbol, we call that a symbolic reference. SourceGen tries to establish symbolic references whenever possible, so that the generated assembly source doesn't refer to hard-coded -locations within the program.
-Data operands can also be numeric references. From the Edit Data -dialog, select the "Address" format.
+locations within the program. Labels are generated automatically for +the targets of numeric references.As your understanding of the disassembled code develops, you will want to add comments explaining it. SourceGen projects have three kinds of @@ -339,32 +346,38 @@ comments:
are a way for you to leave notes to yourself, perhaps "don't forget to figure this out" or "this is the cool part". -Each offset can have one of each.
+Every file offset can have one of each.
Labels and comments may disappear if you associate them with a file offset that is in the middle of a multi-byte instruction or data item. For example, suppose you put a long comment at offset +000010, and then mark a 50-byte region starting at offset +000008 as an ASCII string. The comment won't be deleted, but won't be displayed either. The same thing -happens to labels.
+can happen to labels. SourceGen will try to prevent this from happening +by splitting formatted data into sub-regions at label boundaries.A symbol has two parts, a label and a value. The value may be an -address or a numeric constant. Symbols can be defined in different ways, -and applied in different ways.
+A symbol has two parts, a label and a value. The label is a short +ASCII string; the value may be an 8-to-24-bit address or a numeric +constant. Symbols can be defined in different ways, and applied in +different ways.
-The label format is restricted:
+The label syntax is restricted to a format that should be compatible +with most assemblers:
Label comparisons are case-sensitive, as is customary for programming +languages.
-Platform symbols are defined in platform symbol files, which
-have a ".sym65" filename extension. Several come with SourceGen and
-live in the RuntimeData
directory. You can also create your
+
Platform symbols are defined in platform symbol files. These
+are named with a ".sym65" extension, and have a fairly straightforward
+name/value syntax. Several files for popular platforms come with SourceGen
+and live in the RuntimeData
directory. You can also create your
own, but they have to live in the same directory as the project file.
Platform symbols can be addresses or constants. If an instruction @@ -384,7 +397,7 @@ creating two symbols with the same name. If two symbols have the same value, the one whose label comes first alphabetically is used.
Project symbols always have precedence over platform symbols, allowing -you to redefine symbols within a project. (You can "block" a platform +you to redefine symbols within a project. (You can "hide" a platform symbol by creating a project symbol with the same name and an unused value, such as $ffffffff.)
@@ -400,8 +413,8 @@ instructions or data offsets that are the target of operands. They're formed by appending the hexadecimal address to the letter "L", with additional characters added if some other symbol has already defined that label. Auto labels are only added where they are needed. Because -auto labels may be redefined at any time, the editor will try to prevent -you from using them in operands. +auto labels may be redefined or disappear, the editor will try to prevent +you from referring to them when editing operands.Operands may use parts of symbols. For example, if you have a label
MYSTRING
, you can write:
The format editor allows you to choose which part of the symbol's -value to use. If the value doesn't match exactly, and adjustment will +value to use. If the value doesn't match exactly, an adjustment will be applied.
This happened because you added a weak reference to "FOO" in the operand, -but the label doesn't exist. The operand is formatted as hex. This also -means that there's no longer a need for an auto label on the NOP instruction, -so SourceGen removed that as well.
+but the label doesn't exist. The operand is formatted as hex. Because +there's no longer a reference to L1003, SourceGen removed the auto-label +as well.If you set the label "FOO" on the NOP instruction, you'll see what you probably wanted:
@@ -518,7 +531,9 @@ and jumps to it with the RTS instruction. However, RTS requires the address of the byte before the target instruction, so we actually push $1006. -After adding a code hint at $1007, the project looks like this:
+The disassembler won't know that offset $1007 is code because nothing +appears to reference it. After adding a code hint at $1007, the project +looks like this:
LDA #$10 PHA diff --git a/SourceGen/RuntimeData/Help/mainwin.html b/SourceGen/RuntimeData/Help/mainwin.html index b302985..3184f37 100644 --- a/SourceGen/RuntimeData/Help/mainwin.html +++ b/SourceGen/RuntimeData/Help/mainwin.html @@ -31,7 +31,7 @@ incomplete. The maximum size for a data file is currently 1 MiB.The first time you save the project (with File > Save), you will be prompted for the project name. It's best to use the data file's name -with ".dis65" added. This will be configured automatically. The data +with ".dis65" added, so this will be set as the default. The data file's name is not stored in the project file, so if you pick a different name, or save the project in a different directory, you will have to select the data file manually whenever you open the project.
@@ -58,7 +58,7 @@ to cancel the loading of the project.The locations of the last few projects you've worked with are saved in the application settings. You can access them from File > Recent Projects. If no project is open, links to the two -most-recently opened projects will be available.
+most-recently-opened projects will be available.Working With a Project
@@ -70,7 +70,7 @@ most-recently opened projects will be available.
Most of the action takes place in the center code list.
@@ -94,10 +94,12 @@ assembler directive. correspond to the instruction or data. To see the full dump of a longer item, such as an ASCII string, double-click on the field to open the - Hex Dump Viewer. (Note this is - a floating window, so you can keep it open while you work.)The undo buffer is not saved as part of the project, so closing and -reopening the project resets the buffer.
+The undo history is not saved as part of the project. Closing a project +clears the buffer.
Use Edit > Go To to jump to an offset, address, or label. Remember that offsets and addresses are always hexadecimal, and offsets start -with a '+'.
+with a '+'. If you have a label that is also a valid hexadecimal +address, like "FEED", the label takes precedence. To jump to the address +write "$FEED" instead.When you jump around, by double-clicking on an opcode or an entry in one of the side windows, the currently-selected line is added to @@ -291,6 +296,17 @@ entirely from the project properties editor. +
The "Toggle Single-Byte Format" feature provides a quick way to +change a range of bytes to single bytes +or back to their default format. It's equivalent to opening the Edit +Data Format dialog and selecting "Single bytes" displayed as hex, or +selecting "Default".
+This can be handy if the default format for a range of bytes is a +string, but you want to see it as bytes or set a label in the middle.
+ +When you use Edit > Copy, all lines selected in the code list are @@ -298,14 +314,16 @@ copied to the system clipboard. This can be a convenient way to post code snippets into forum postings or documentation. The text is copied from the data shown on screen, so your chosen capitalization and pseudo-ops will appear in the copy.
-A copy of all of the fields is also written to the clipboard, in -CSV format. If you open a program like Excel, you can use Paste Special -to put the data into individual cells.
+Long comments are included, but notes are not.
+By default, the label, opcode, operand, and comment fields are included. +From the +app settings dialog you can select +a different format, "Disassembly", which also includes the address and byte +columns.
-By default, the label, opcode, operand, and comment fields are included -in the text form. From the -app settings you can select -a different format that also includes the address and byte columns.
+A copy of all of the fields is also written to the clipboard in CSV +format. If you have a spreadsheet like Excel, you can use Paste Special +to put the data into individual cells.
diff --git a/SourceGen/RuntimeData/Help/settings.html b/SourceGen/RuntimeData/Help/settings.html index 1f6f21f..06b8d09 100644 --- a/SourceGen/RuntimeData/Help/settings.html +++ b/SourceGen/RuntimeData/Help/settings.html @@ -21,15 +21,15 @@ project properties.Application settings are stored in a file called "SourceGen-settings" in the SourceGen installation directory. If the file is missing or corrupted, some default settings will be used. These settings are local -to your system, and include everything from window sizes to whether you -prefer hexadecimal values to be shown in upper case. None of them +to your system, and include everything from window sizes to whether or not +you prefer hexadecimal values to be shown in upper case. None of them affect the way the project analyzes code and data, though they may affect the way generated assembly sources look.
Project properties are stored in each individual .dis65 project file. They specify which CPU to use, which extension scripts to load, and a variety of other things that directly impact how SourceGen processes -the project. Because of the way it impacts the project, all changes to +the project. Because of the potential impact, all changes to the project properties are made through the undo/redo buffer.
@@ -50,7 +50,7 @@ hide columns from the code list. The buttons may be more convenient though.You can select a different font for the code list. Make it as large -or small as you want. Monospace fonts like Courier or Consolas are +or small as you want. Mono-space fonts like Courier or Consolas are recommended.
You can choose to display different parts of the display in upper or @@ -147,8 +147,8 @@ you later hit Cancel, but the changes are not applied immediately.
The choice of CPU determines the set of available instructions, as well as cycle costs and register widths. There are many variations -on the 6502, but from the perspective of a disassembler only three -matter: +on the 6502, but from the perspective of a disassembler most can be +treated as one of these three:
The Rockwell R65C02 features an expanded instruction set that is -compatible with the WDC 65C02 but incompatible with the 65816. It's -not currently supported by SourceGen.
+The Rockwell R65C02, Hudson Soft HuC6280, and Commodore CSG 4510 / 65CE02 +have instruction sets that expand on the 6502/65C02, but aren't compatible +with the 65816. These are not yet supported by SourceGen.
If "enable undocumented instructions" is checked, some additional opcodes are recognized on the 6502 and 65C02. These instructions are @@ -198,14 +198,18 @@ create two symbols with the same label.
The Import button allows you to import symbols from another project. Only labels that have been tagged as global and exported will be imported. Existing symbols with identical labels will be replaced, so it's okay to -run the importer multiple times.
+run the importer multiple times. Labels that aren't found will not be +removed, so you can safely import from multiple projects, but will need +to manually delete any symbols that are no longer being exported.From here, you can add and remove platform symbol files, or change the order in which they are loaded. See the symbols section for an -explanation of how platform symbols work.
+explanation of how platform symbols work. +See "README.md" in the RuntimeData directory for a description of the +file syntax.Platform symbol files must live in the RuntimeData directory that comes with SourceGen, or in the directory where the project file lives. This @@ -222,7 +226,9 @@ you will receive a warning.
From here, you can add and remove extension script files. See the extension scripts section for -an explanation of how extension scripts work.
+an overview of how extension scripts work. +There's a more detailed document in the RuntimeData directory +("ExtensionScripts.md").Extension script files must live in the RuntimeData directory that comes diff --git a/SourceGen/RuntimeData/Help/tools.html b/SourceGen/RuntimeData/Help/tools.html index 09b153e..cc3b136 100644 --- a/SourceGen/RuntimeData/Help/tools.html +++ b/SourceGen/RuntimeData/Help/tools.html @@ -46,7 +46,7 @@ pasting in some situations.
If "always on top" is checked, the window will stay above all other windows that don't also declare that they should always be on top. By -default this box is checked for the project dump, and not checked for +default this box is checked when displaying project data, and not checked for external files.
diff --git a/SourceGen/RuntimeData/Help/tutorials.html b/SourceGen/RuntimeData/Help/tutorials.html index bf52f5d..9dec157 100644 --- a/SourceGen/RuntimeData/Help/tutorials.html +++ b/SourceGen/RuntimeData/Help/tutorials.html @@ -70,15 +70,18 @@ these distracting, collapse the column.Click on the fourth line down, which has address 1002. The line has a label, "L1002", and is performing an indexed load from L1017. Both of these labels were automatically generated, and are named for the -address they appear. When you clicked on the line, a few things happened:
+address at which they appear. When you clicked on the line, a few +things happened:Click some other lines, such as address $100B and $1014. Note how the @@ -91,17 +94,17 @@ the operand itself opens a format editor; more on that later.)
References window. Note the selection jumps to L1002. You can immediately jump to any reference.At the top of the Symbols window on the right side of the screen is a -row of buttons. Make sure "Auto" is highlighted. You should see three +row of buttons. Make sure "Auto" is selected. You should see three labels in the window (L1002, L1014, L1017). Double-click on L1014. The selection jumps to the appropriate line.
Select Edit > Find. Type "hello", and hit Enter. The selection will move to address $100E, which is a string that says "hello!". You can use Edit > Find Next to try to find the next occurrence (there isn't one). You -can search for text that appears in the rightmost columns (label, opcode, +can search for any text that appears in the rightmost columns (label, opcode, operand, comment).
Select Edit > Go To. You can enter a label, address, or file offset. -Enter "100b" to jump the selection to $100B.
+Enter "100b" to set the selection to $100B.Near the top-left of the SourceGen window is a set of toolbar icons. Click the left-arrow, and watch the selection moves. Click it again. Then @@ -118,21 +121,21 @@ something like "6502bench SourceGen vX.Y.Z". There are three ways to open the comment editor:
Most things in the code list will respond to a double-click. Double-clicking on addresses, flags, labels, operands, and comments will open editors for those things. Double-clicking on a value in the "bytes" column will open a floating hex dump viewer. This is usually the most -convenient way to edit something.
+convenient way to edit something: point and click.Double-click the comment to open the editor. Type some words into the upper window, and note that a formatted version appears in the bottom window. Experiment with the maximum line width and "render in box" settings to see what they do. You can hit Enter to create line breaks, or let SourceGen wrap lines for you. When you're done, click OK. (Or -hit Ctrl-Enter.
+hit Ctrl+Enter.)When the dialog closes, you'll see your new comment in place at the top of the file. If you typed enough words, your comment will span multiple lines. You can select the comment by selecting any line in it.
@@ -151,15 +154,17 @@ differences:It's time to do something with the code. It's copying the instructions -from $1017 to $2000, then jumping to $2000, so it looks like it's -relocating the code before executing it. We want to do the same thing -to our disassembled code, so select the line at address $1017 and then +
It's time to do something with the code. If you look at what the code +does you'll see that it's copying several dozen bytes from $1017 +to $2000, then jumping to $2000. It appears to be relocating the next +part of the code before +executing it. We want to let the disassembler know what's going on, so +select the line at address $1017 and then Edit > Edit Address. (Or double-click the "1017" in the addr column.) In the Edit Address dialog, type "2000", and hit Enter.)
@@ -178,8 +183,8 @@ so you'll be forgiven if you reduce the offset column width to zero.)On the line at address $2000, select Actions > Edit Label, or double-click on the label "L2000". Change the label to "MAIN", and hit Enter. The label changes on that line, and on the two lines that refer -to address $2000. (If you're not sure what refers to line $2000, check -the References window.)
+to address $2000. (If you're not sure what refers to line $2000, select +it and check the References window.)On that same line, select Actions > Edit Comment. Type a short comment, and hit Enter. Your comment appears in the "comment" column.
@@ -215,12 +220,12 @@ Actions > Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are case-sensitive, so it needs to match the operand at $2005 exactly.) You'll see the new label appear, and the operand at line $2005 will use it.There's an easier way. Use Edit > Undo twice, to get back to the place -where line $2005 is using "L2009" as it's operand. Select that line and +where line $2005 is using "L2009" as its operand. Select that line and Actions > Edit Operand. Enter "IS_OK", then select "Create label at target address instead". Hit "OK".
You should now see that both the operand at $2005 and the label at $2009 have changed to IS_OK, accomplishing what we wanted to do in a -single step. (There's actually a sutble difference compared to the two-step +single step. (There's actually a subtle difference compared to the two-step process: the operand at $2005 is still a numeric reference. It was automatically changed to match IS_OK in the same way that the references to MAIN were when we renamed "L2000" earlier. If you actually do want the @@ -248,7 +253,7 @@ label to "STR1". Move up a bit and select address $2030, then scroll to the bottom and shift-click address $2070. Select Actions > Edit Data Format. At the top it should now say, "65 bytes selected in 2 groups". There are two groups because the presence of a label split the data into -two separate regions. Selected "mixed ASCII and non-ASCII", then click +two separate regions. Select "mixed ASCII and non-ASCII", then click "OK".
We now have two ".STR" lines, one for "string zero ", one with the STR1 label and the rest of the string data. This is okay, but it's not @@ -260,8 +265,8 @@ a single ".STR" line at the bottom, split across two lines with a '+'.
but that appears to be incorrect, so let's format it as individual bytes instead. There's an easy way to do that: use Actions > Toggle Single-Byte Format (or hit Ctrl+B). -The data starting at $2025 appears to be 16-bit addresses into the -table of strings, so let's format them appropriately.
+The data starting at $2025 appears to be 16-bit addresses that point +into the table of strings, so let's format them appropriately.
Select the line at $2025, then shift-click the line at $202E. Select Actions > Edit Data Format. If you selected the correct set of bytes, the top should say, "10 bytes selected". Click the @@ -277,7 +282,7 @@ on their own line, so each string is now in a separate ".STR" statement.
You can generate asssembly source code from the disassembled data. +
You can generate assembly source code from the disassembled data. Select File > Assembler (or hit Ctrl+Shift+A) to open the generation and assembly dialog.
Pick your favorite assembler from the drop list at the top right,