diff --git a/SourceGen/RuntimeData/Help/advanced.html b/SourceGen/RuntimeData/Help/advanced.html index 96675f5..6f67fcf 100644 --- a/SourceGen/RuntimeData/Help/advanced.html +++ b/SourceGen/RuntimeData/Help/advanced.html @@ -28,8 +28,8 @@ as project symbols into the other projects.

symbol-import step in every interested project. This step must be repeated whenever the labels are updated.

A different but related problem is typified by arcade ROM sets, -where files are split apart because each file must be flashed to a -separate chip. All files are expected to be present in memory at +where files are split apart because each file must be burned into a +separate PROM. All files are expected to be present in memory at once, so there's no reason to treat them as separate projects. Currently, the best way to deal with this is to concatenate the files into a single file, and operate on that.

@@ -60,7 +60,7 @@ L1103_0 LDA #$22

Both sections start at $1100, and have branches to $1103. The branch in the first section resolves to the label in the first version of that address chunk, while the branch in the second section resolves to -the label in the second chunk. When branches are outside the current +the label in the second chunk. When branches originate outside the current address chunk, the first chunk that includes that address is used, as it is with the JMP $1000 at the start of the file.

@@ -96,7 +96,7 @@ not help you debug 6502 projects.

multi-line comment (long comment, note). Useful for confirming that the width limitation is being obeyed. These are added exactly as shown, without comment delimiters, into generated assembly output, - which doesn't work out well. + which doesn't work out well if you run the assembler.
  • Use Keep-Alive Hack. If set, a "ping" is sent to the extension script sandbox every 60 seconds. This seems to be required to avoid an infrequently-encountered Windows bug. (See code for notes and diff --git a/SourceGen/RuntimeData/Help/analysis.html b/SourceGen/RuntimeData/Help/analysis.html index bac1532..eef65a3 100644 --- a/SourceGen/RuntimeData/Help/analysis.html +++ b/SourceGen/RuntimeData/Help/analysis.html @@ -43,7 +43,7 @@ method in DisasmProject.cs):

    The Anattrib array tracks most of the state from here on. If we're doing a partial re-analysis, this step will just clone a copy of the Anattrib array that was made at this point in a previous run. (This - step is described in more detail below.)
  • + step is described in more detail below.)
  • Apply user-specified labels to Anattribs.
  • Apply user-specified format descriptors. These are the instruction and data operand formats.
  • @@ -51,14 +51,14 @@ method in DisasmProject.cs):

    data, and connects instruction and data operands to target offsets. The "nearby label" stuff is handled here. All of the results are stored in the Anattribs array. (This step is described in more - detail below.) + detail below.)
  • Remove hidden labels from the symbol table. These are user-specified labels that have been placed on offsets that are in the middle of an instruction or multi-byte data item. They can't be referenced, so we want to pull them out of the symbol table. (Remember, symbolic operands use "weak references", so a missing symbol just means the operand is shown as a hex value.)
  • -
  • Resolve references to platform and project external symbols> +
  • Resolve references to platform and project external symbols. This sets the operand symbol in Anattrib, and adds the symbol to the list that is displayed in .EQ directives.
  • Generate cross-reference lists. This is done for file data and @@ -71,6 +71,103 @@ by walking through the annotated file data. Most of the actual strings aren't rendered until they're needed.

    +

    Automatic Formatting

    + +

    Every offset in the file is marked as an instruction byte, data byte, or +inline data byte. Some offsets are also marked as the start of an instruction +or data area. The start offsets may have a format descriptor associated +with them.

    +

    Format descriptors have a format (like "numeric" or "string") a +sub-format (like "hexadecimal" or "null-terminated"), and a length. For +an instruction operand the length is redundant, but for a data operand it +determines the width of the numeric value or length of the string. For +this reason, instructions do not need a format descriptor, but all +data items do.

    +

    Symbolic references are format descriptors with a symbol attached. +The symbol reference also specifies low/high/bank.

    +

    Every offset marked as a start point gets its own line in the on-screen +display list. Embedded instructions are identified internally by +looking for instruction-start offsets inside instructions.

    + +

    The Anattrib array holds the post-analysis state for every offset, +including comments and formatting, but any changes you make in the +editors are applied to the data structures that are saved in the project +file. After a change is made, a full or partial re-analysis is done to +fill out the Anattribs.

    +

    Consider a simple example:

    +
    +         .ORG    $1000
    +         JMP     L1003
    +L1003    NOP
    +
    + +

    We haven't formatted anything yet. The data analyzer sees that the +JMP operand is inside the file, and has no label, so it creates an +auto-label at offset +000003 and a format descriptor with a symbolic +operand reference to "L1003" at +000000.

    +

    Now we edit the label, changing L1003 to "FOO". This goes into the +project's "user label" list. The analyzer is +run, and applies the new "user label" to the Anattrib array. The +data analyzer finds the numeric reference in the JMP operand, and finds +a label at the target address, so it creates a symbolic operand reference +to "FOO". When the display list is generated, the symbol "FOO" appears +in both places.

    +

    Even though the JMP operand changed from "L1003" to "FOO", the only +change actually written to the project file is the label edit. The +contents of the Anattrib array are disposable, so it can be used to +add labels and "fix up" numeric references. Generated labels and +format descriptors are never added to the project file.

    + +

    If the JMP operand were edited, a format descriptor would be added +to the user-specified descriptor list. During the analysis pass it would +be added to the Anattrib array at offset +000000.

    + + +

    Interaction With Undo/Redo

    + +

    The analysis pass always considers the current state of the user +data structures. Whether you're adding a label or removing one, the +code runs through the same set of steps. The advantage of this approach +is that the act of doing a thing, undoing a thing, and redoing a thing +are all handled the same way.

    +

    None of the editors modify the project data structures directly. All +changes are added to a change set, which is processed by a single function. +The change sets are kept in the undo/redo buffer indefinitely. After +the changes are made, the Anattrib array and other data structures are +regenerated.

    + +

    Data format editing can create some tricky situations. For example, +suppose you have 8 bytes that have been formatted as two 32-bit words: + +

    +1000: 68690074           .dd4    $74006968
    +1004: 65737400           .dd4    $00747365
    +
    + +You realize these are null-terminated strings, select both words, and +reformat them: + +
    +1000: 686900             .zstr   "hi"
    +1003: 74657374+          .zstr   "test"
    +
    + +Seems simple enough. Under the hood, SourceGen created three changes: +
      +
    1. At offset +000000, replace the current format descriptor (4-byte + numeric) with a 3-byte null-terminated string descriptor.
    2. +
    3. At offset +000003, add a new 5-byte null-terminated string + descriptor.
    4. +
    5. At offset +000004, remove the 4-byte numeric descriptor.
    6. +
    + +

    Each entry in the change set has "before" and "after" states for the +format descriptor at a specific offset. Only the state for the affected +offsets is included -- the program doesn't take a complete state snapshot +(even with the RAM on a modern system that would add up quickly). When +undoing a change, before and after are simply reversed.

    + +

    Code Analysis

    The code tracer walks through the instructions, examining them to @@ -81,8 +178,9 @@ for every instruction:

    Examples: LDA, STA, AND, NOP.
  • Don't continue. The next instruction to be executed can't be - determined from the file data, unless you're disassembling the - system ROM. Examples: RTS, BRK. + determined from the file data (unless you're disassembling the + system ROM around the BRK vector). + Examples: RTS, BRK.
  • Branch always. The operand specifies the next instruction address. Examples: JMP, BRA, BRL.
  • Branch sometimes. Execution may continue at the operand address, @@ -96,8 +194,8 @@ for every instruction:

    Branch targets are added to a list. When the current run of instructions -is exhausted (i.e. a "don't continue" instruction is reached), the next -target is pulled off of the list.

    +is exhausted (i.e. a "don't continue" or "branch always" instruction is +reached), the next target is pulled off of the list.

    The state of the processor status flags is recorded for every instruction. When execution proceeds to the next instruction or branches @@ -116,18 +214,19 @@ of status flags, the analyzer stops pursuing that path.

    when examining 65816 code, but it's possible for the status flag values to be indeterminate. In such a situation, short registers are assumed. Similarly, if the carry flag is unknown when an XCE is -performed, we assume a transition to emulation mode.

    +performed, we assume a transition to emulation mode (E=1).

    -

    There are three ways to set a definite value in a status flags:

    +

    There are three ways in which code can set a flag to a definite value:

      -
    1. By specific instructions, like SEC or +
    2. By explicit instructions, like SEC or CLD.
    3. -
    4. By immediate instructions. LDA #$00 sets Z=1 and N=0. - ORA #$80 sets Z=0 and N=1.
    5. +
    6. By immediate-operand instructions. LDA #$00 sets Z=1 + and N=0. ORA #$80 sets Z=0 and N=1.
    7. By inference. For example, if we see a BCC instruction, we know that the carry will be clear at the branch target address, and set at the following instruction. The instruction doesn't affect the - value of the flag, but we know what the value is at either address.
    8. + value of the flag, but we know what the value will be at both + addresses.

    Self-modifying code can render spoil any of these, possibly requiring a status flag override to get correct disassembly.

    @@ -145,7 +244,7 @@ code does CLC/PHP, followed a bit later by the flag around. Flagging the carry bit as indeterminate with a status flag override on the instruction following the PLP fixes things.)

    -

    Some other things that the code analyzer can't handle:

    +

    Some other things that the code analyzer can't recognize automatically:

    @@ -108,11 +109,6 @@ and 65816 code. The official web site is
  • ASCII Chart
  • -
  • Tutorials
  • - -
  • Advanced Topics
  • diff --git a/SourceGen/RuntimeData/Help/intro.html b/SourceGen/RuntimeData/Help/intro.html index 4727b71..3074e19 100644 --- a/SourceGen/RuntimeData/Help/intro.html +++ b/SourceGen/RuntimeData/Help/intro.html @@ -28,7 +28,7 @@ navigate the code while trying to figure out what it does. A disassembler should help you understand the code, not just dump the instructions to a text file.

    The computer I built in 2014 has a 4GHz CPU and 8GB of RAM. -We should put that to good use.

    +I figured we should put that kind of power to good use.

    The second purpose is to facilitate sharing and collaboration. Most disassemblers generate output for a specific assembler, or in a way that's @@ -49,12 +49,13 @@ capabilities within SourceGen are sufficiently flexible. If you need to generate assembly source and tweak it a bunch to express the intent of the original code, then passing a SourceGen project around won't work. This sort of thing is a bit outside the bounds of what a typical -disassembler does, so it remains to be seen whether this succeeds at -what it's trying to do, and also whether what it's trying to do is actually -something that people want.

    +disassembler does, so it remains to be seen whether SourceGen succeeds at +what it's trying to do, and also whether what it's trying to do is +something that people actually want.

    -

    You can get started by watching the demo video and playing with the -tutorials.

    +

    You can get started by watching the +demo video and playing with the +tutorials.

    Fundamental Concepts

    @@ -63,7 +64,7 @@ tutorials.

    rest of the documentation assumes you've read and understood this. It will be helpful if you already understand something about the 6502 instruction set and assembly-language programming, but disassembling other programs is -actually a pretty good way to learn assembly.

    +actually a pretty good way to learn how to code in assembly.

    About 6502 Code

    @@ -71,21 +72,24 @@ actually a pretty good way to learn assembly.

    the 6502 CPU or any of its derivatives, including but not limited to the 65C02 and 65816". So let's talk about 6502 code.

    +

    Code usually arrives in a big binary blob. Some of it will be +instructions, some of it will be data, some will be empty space used +for variable storage. Part of the challenge of disassembly is +identifying which parts of the file contain which.

    +

    Much of the code you'll find for the 6502 was written by humans, rather than generated by a compiler, which means it won't conform to a -specific set of conventions. However, most programmers will use -subroutines, and will often intersperse code with bits of data storage -for variables. The variable data storage is referred to as a "stash". +standard set of conventions. However, most programmers will use +subroutines, which can be identified and analyzed in isolation. Subroutines +are often interspersed with variable storage, referred to as a "stash". Variables may be single-byte or multi-byte, the latter typically in little-endian byte order.

    -

    Data that is principally read-only can take many forms. Among the -more common forms are graphics and ASCII string data. The former is -generally difficult to recognize automatically, but strings can often be -identified. Address tables, which are a collection of addresses to -other things, are also fairly common. When used as jump tables, they -might actually refer to the address before the actual instruction, because -of the way the RTS (Return to Subroutine) instruction works.

    +

    Much of the data in a typical program is read-only, often in the +form of graphics or ASCII string data. Graphics can be difficult +to recognize automatically, but strings can be identified with a +reasonable degree of confidence. Address tables, which are a collection +of addresses to other things, are also fairly common.

    A simple disassembler would start at the top of the file and just start converting bytes to instructions. Unfortunately there's no reliable @@ -127,14 +131,17 @@ by the program bank register and the data bank register, respectively. The disassembler can't generally know the contents of the data bank register, which makes life a bit more interesting.

    -

    The 6502 has an 8-bit processor status register with a bunch of flags -in it. One use of certain flags is to determine whether a -conditional branch is taken or not. -Two flags that are only present on the 65816 (M and X) are especially -interesting, because they determine whether the accumulator and index -registers are 8 or 16 bits wide. This determines the width of immediate-mode -instructions, so if you don't know what's in the processor status -register it's hard to correctly disassemble the instruction stream.

    +

    The 6502 has an 8-bit processor status register ("P") with a bunch of flags +in it. Some of the flags determine whether a conditional branch is taken +or not, which is important because some branches appear to be conditional +but actually are always or never taken in practice. The disassembler needs +to be able to figure this out so that it doesn't try to disassemble the +bytes that follow an always-taken branch. +A more significant concern is the M and X flags found on the 65802/65816, +which determine the width of the registers and of immediate load +instructions. If you don't know what state the flags are in, you can't +know whether LDA #value is two bytes or three, and the +disassembly of the instruction stream will come out wrong.

    How SourceGen Works

    @@ -145,9 +152,9 @@ only its effect on the flow of execution matters.

    The code tracing has to start somewhere, so SourceGen uses "code entry point hints" to identify places where execution may begin. By default, -one is placed at the start of the file. From there, the tracing process +a hint is placed at the start of the file. From there, the tracing process walks through the code, pursuing all branches. In many cases, if you -mark all code entry points, SourceGen will automatically find all +mark all external entry points, SourceGen will automatically find all executable code and separate it from variable storage and data areas.

    As noted earlier, tracking the processor status flags can make the @@ -155,7 +162,7 @@ analysis more accurate. Identifying situations where a branch instruction is always or never taken avoids mis-categorizing a data region as code. On the 65816, it's absolutely crucial to track the M/X flags, since those affect the width of instructions. SourceGen tracks the value of the -processor flags at every instruction, blending sets together when +processor flags at every instruction, blending sets of flags together when multiple paths of execution converge.

    Once instructions and data have been separated, the instruction operands @@ -172,23 +179,16 @@ by an equate directive.

    Extension Scripts

    Extension scripts are C# source files that are compiled and -executed by SourceGen. They can be added to a project from the RuntimeData -directory or the directory the project file lives in.

    -

    In v1.0, scripts are only called to examine JSR/JSL instructions. -They can format nearby bytes as inline data, or apply symbols to -operands.

    - -

    If code jumps into a region that is marked as inline data, the -branch will be ignored. If an extension script tries to flag bytes -as inline data that have already been executed, the script will be -ignored. This can lead to a race condition in the analyzer if -an extension script is doing the wrong thing. (The race doesn't exist -with inline data hints specified by the user, because those are applied -before code analysis starts.)

    +executed by SourceGen. They can be added to a project from SourceGen's +runtime data directory, or can live in the directory next to the project +file.

    +

    In the current implementation, scripts are only called to examine +JSR/JSL instructions. They can format nearby bytes as inline data, or +apply symbols to operands.

    To reduce the chances of a script causing problems, all scripts are executed in a sandbox with severely restricted access. Notably, nothing -in the script can access files, except to read those in the PluginDll +in the sandbox can access files, except to read files from the PluginDll directory.

    The PluginDll directory lives next to the SourceGen executable, and contains all of the compiled script DLLs, as well as two pre-built @@ -199,10 +199,9 @@ is launched, but may be manually deleted without harm.

    Analyzer Hints

    -

    Sometimes SourceGen can't automatically find the start or end of a -code area. Maybe there's inline data after a JSR that didn't get -recognized by an extension scripts. These situations can be resolved -by adding an appropriate hint.

    +

    Sometimes SourceGen can't automatically find the start or end of an +instruction stream, or gets confused by inline data. These situations +can be resolved by adding an appropriate hint.

    Code entry point hints tell the analyzer to add the offset to the list of instruction start points. Suppose you've got a code @@ -247,9 +246,9 @@ end up with this:

              .ORG    $1000
              JMP     L1009
    -         JMP ⏩   L10ef
    -         BPL ⏩   L1053
    -         JMP ⏩   L1230
    +         JMP ⏩  L10ef
    +         BPL ⏩  L1053
    +         JMP ⏩  L1230
              BMI     L101b
     L1009    CLC
     
    @@ -276,7 +275,7 @@ would actually be better solved by setting a status flag override on the BNE that sets Z=0, so the code tracer will know it's a branch-always and do the right thing.) It's only necessary to place a hint on the very first (opcode) byte. Placing a data hint in the middle of what -SourceGen believes is an instruction will have no effect.

    +SourceGen believes to be instruction will have no effect.

    Inline data hints identify bytes as being part of the instruction stream, but not instructions. A simple example of this @@ -285,11 +284,13 @@ is the ProDOS 8 call interface on the Apple II, which looks like this:

    JSR $bf00 .DD1 $function .DD2 $address + BCS BAD -

    The three bytes following a JSR to $bf00 should be skipped over by -the code analyzer. In this case, all three bytes must be hinted.

    -

    If code jumps into a region that is marked as inline data, the +

    The three bytes following the JSR $bf00 should be hinted +as inline data, so that the code analyzer skips them and continues the +analysis at the BCS.

    +

    If code branches into a region that is marked as inline data, the branch will be ignored.

    @@ -303,9 +304,9 @@ of the work being disassembled. (This will vary by region. Also, note that the mere act of disassembling a piece of software may be illegal in some cases.)

    -

    To avoid mix-ups, the data file's length and CRC are stored in the -project file. SourceGen will refuse to open a project if the data file's -length and CRC don't match.

    +

    To avoid mix-ups where the wrong data file is used, the file's length +and CRC are stored in the project file. SourceGen will refuse to open a +project if the data file's length and CRC don't match.

    Most of the data in the project file is associated with a file offset. When you create a comment, you aren't associating it with line 53, you're @@ -317,14 +318,20 @@ convention, file offsets are always shown as a six-digit hexadecimal value with a leading '+', e.g. "+0012ab". This makes it easy to distinguish between an address and a offset.

    +

    Instruction and data operands can be formatted in various ways. The +formatting choice is associated with the first offset of the item. For +instructions the number of bytes in the operand is determined by the opcode +(and, on the 65816, the M/X status flags). For data items the length +can be a single byte or an entire file. Operand formats are not allowed +to overlap.

    +

    When an instruction or data operand references an address, we call it a numeric reference. When the target address has a label, and the operand uses that symbol, we call that a symbolic reference. SourceGen tries to establish symbolic references whenever possible, so that the generated assembly source doesn't refer to hard-coded -locations within the program.

    -

    Data operands can also be numeric references. From the Edit Data -dialog, select the "Address" format.

    +locations within the program. Labels are generated automatically for +the targets of numeric references.

    As your understanding of the disassembled code develops, you will want to add comments explaining it. SourceGen projects have three kinds of @@ -339,32 +346,38 @@ comments:

    are a way for you to leave notes to yourself, perhaps "don't forget to figure this out" or "this is the cool part". -

    Each offset can have one of each.

    +

    Every file offset can have one of each.

    Labels and comments may disappear if you associate them with a file offset that is in the middle of a multi-byte instruction or data item. For example, suppose you put a long comment at offset +000010, and then mark a 50-byte region starting at offset +000008 as an ASCII string. The comment won't be deleted, but won't be displayed either. The same thing -happens to labels.

    +can happen to labels. SourceGen will try to prevent this from happening +by splitting formatted data into sub-regions at label boundaries.

    All About Symbols

    -

    A symbol has two parts, a label and a value. The value may be an -address or a numeric constant. Symbols can be defined in different ways, -and applied in different ways.

    +

    A symbol has two parts, a label and a value. The label is a short +ASCII string; the value may be an 8-to-24-bit address or a numeric +constant. Symbols can be defined in different ways, and applied in +different ways.

    -

    The label format is restricted:

    +

    The label syntax is restricted to a format that should be compatible +with most assemblers:

    +

    Label comparisons are case-sensitive, as is customary for programming +languages.

    -

    Platform symbols are defined in platform symbol files, which -have a ".sym65" filename extension. Several come with SourceGen and -live in the RuntimeData directory. You can also create your +

    Platform symbols are defined in platform symbol files. These +are named with a ".sym65" extension, and have a fairly straightforward +name/value syntax. Several files for popular platforms come with SourceGen +and live in the RuntimeData directory. You can also create your own, but they have to live in the same directory as the project file.

    Platform symbols can be addresses or constants. If an instruction @@ -384,7 +397,7 @@ creating two symbols with the same name. If two symbols have the same value, the one whose label comes first alphabetically is used.

    Project symbols always have precedence over platform symbols, allowing -you to redefine symbols within a project. (You can "block" a platform +you to redefine symbols within a project. (You can "hide" a platform symbol by creating a project symbol with the same name and an unused value, such as $ffffffff.)

    @@ -400,8 +413,8 @@ instructions or data offsets that are the target of operands. They're formed by appending the hexadecimal address to the letter "L", with additional characters added if some other symbol has already defined that label. Auto labels are only added where they are needed. Because -auto labels may be redefined at any time, the editor will try to prevent -you from using them in operands.

    +auto labels may be redefined or disappear, the editor will try to prevent +you from referring to them when editing operands.

    Operands may use parts of symbols. For example, if you have a label MYSTRING, you can write:

    @@ -414,7 +427,7 @@ MYSTRING .STR "hello"

    The format editor allows you to choose which part of the symbol's -value to use. If the value doesn't match exactly, and adjustment will +value to use. If the value doesn't match exactly, an adjustment will be applied.

    Weak References

    @@ -451,9 +464,9 @@ results are probably not what you want:

    This happened because you added a weak reference to "FOO" in the operand, -but the label doesn't exist. The operand is formatted as hex. This also -means that there's no longer a need for an auto label on the NOP instruction, -so SourceGen removed that as well.

    +but the label doesn't exist. The operand is formatted as hex. Because +there's no longer a reference to L1003, SourceGen removed the auto-label +as well.

    If you set the label "FOO" on the NOP instruction, you'll see what you probably wanted:

    @@ -518,7 +531,9 @@ and jumps to it with the RTS instruction. However, RTS requires the address of the byte before the target instruction, so we actually push $1006.

    -

    After adding a code hint at $1007, the project looks like this:

    +

    The disassembler won't know that offset $1007 is code because nothing +appears to reference it. After adding a code hint at $1007, the project +looks like this:

              LDA     #$10
              PHA
    diff --git a/SourceGen/RuntimeData/Help/mainwin.html b/SourceGen/RuntimeData/Help/mainwin.html
    index b302985..3184f37 100644
    --- a/SourceGen/RuntimeData/Help/mainwin.html
    +++ b/SourceGen/RuntimeData/Help/mainwin.html
    @@ -31,7 +31,7 @@ incomplete.  The maximum size for a data file is currently 1 MiB.

    The first time you save the project (with File > Save), you will be prompted for the project name. It's best to use the data file's name -with ".dis65" added. This will be configured automatically. The data +with ".dis65" added, so this will be set as the default. The data file's name is not stored in the project file, so if you pick a different name, or save the project in a different directory, you will have to select the data file manually whenever you open the project.

    @@ -58,7 +58,7 @@ to cancel the loading of the project.

    The locations of the last few projects you've worked with are saved in the application settings. You can access them from File > Recent Projects. If no project is open, links to the two -most-recently opened projects will be available.

    +most-recently-opened projects will be available.

    Working With a Project

    @@ -70,7 +70,7 @@ most-recently opened projects will be available.

  • Top left: cross-reference list.
  • Bottom left: notes list.
  • Top right: symbols list. -
  • Bottom right: line info. +
  • Bottom right: info on selected line.

    Most of the action takes place in the center code list.

    @@ -94,10 +94,12 @@ assembler directive.

    correspond to the instruction or data. To see the full dump of a longer item, such as an ASCII string, double-click on the field to open the - Hex Dump Viewer. (Note this is - a floating window, so you can keep it open while you work.)
  • + Hex Dump Viewer. This is + a floating window, so you can keep it open while you work. + Double-clicking in the bytes column in other rows will update + the window position and selection.
  • Flags. This shows the state of the status flags as they - were before the instruction was executed. Double-click on this + are before the instruction is executed. Double-click on this field to open the Edit Status Flag Override dialog.
  • Attributes. Some instructions and data items have @@ -115,8 +117,8 @@ assembler directive.

    If an instruction is embedded inside this one, a ⏩ symbol will appear. If you double-click this field for an instruction or data item - whose operand refers to an address in the file, the view will jump to - that location.
  • + whose operand refers to an address in the file, the selection will + jump to that location.
  • Operand. The instruction or data operand. Data operands may span a large number of bytes. Double-click on this field to open the @@ -177,7 +179,7 @@ enabled will depend on what you have selected in the main window.

    when a single equate directive, generated from a project symbol, is selected.
  • -
  • Hinting (Hint As Code Entry Point, Hint As +
  • Hinting (Hint As Code Entry Point, Hint As Data Start, Hint As Inline Data, Remove Hints). Enabled when one or more code and data lines are selected. Remove Hints is only enabled when at least one line has hints.
  • @@ -187,7 +189,8 @@ enabled will depend on what you have selected in the main window.

  • Delete Note / Long Comment. Deletes the selected note or long comment. Enabled when a single note or long comment is selected.
  • Show Hex Dump. Opens the hex dump - viewer with the current selection highlighted. Always enabled.
  • + viewer, with the current selection highlighted. Always enabled. If + nothing is selected, the viewer will open at the top of the file. @@ -199,8 +202,8 @@ change with Edit > Redo, Ctrl+Y, or Ctrl+Shift+Z.

    are added to the undo/redo buffer. This has no fixed size limit, so no matter how much you change, you can always undo back to the point where the project was opened.

    -

    The undo buffer is not saved as part of the project, so closing and -reopening the project resets the buffer.

    +

    The undo history is not saved as part of the project. Closing a project +clears the buffer.

    References Window

    @@ -264,7 +267,9 @@ Use Edit > Find Next to find the next match.

    Use Edit > Go To to jump to an offset, address, or label. Remember that offsets and addresses are always hexadecimal, and offsets start -with a '+'.

    +with a '+'. If you have a label that is also a valid hexadecimal +address, like "FEED", the label takes precedence. To jump to the address +write "$FEED" instead.

    When you jump around, by double-clicking on an opcode or an entry in one of the side windows, the currently-selected line is added to @@ -291,6 +296,17 @@ entirely from the project properties editor. +

    Quick Format Toggle

    + +

    The "Toggle Single-Byte Format" feature provides a quick way to +change a range of bytes to single bytes +or back to their default format. It's equivalent to opening the Edit +Data Format dialog and selecting "Single bytes" displayed as hex, or +selecting "Default".

    +

    This can be handy if the default format for a range of bytes is a +string, but you want to see it as bytes or set a label in the middle.

    + +

    Copying to Clipboard

    When you use Edit > Copy, all lines selected in the code list are @@ -298,14 +314,16 @@ copied to the system clipboard. This can be a convenient way to post code snippets into forum postings or documentation. The text is copied from the data shown on screen, so your chosen capitalization and pseudo-ops will appear in the copy.

    -

    A copy of all of the fields is also written to the clipboard, in -CSV format. If you open a program like Excel, you can use Paste Special -to put the data into individual cells.

    +

    Long comments are included, but notes are not.

    +

    By default, the label, opcode, operand, and comment fields are included. +From the +app settings dialog you can select +a different format, "Disassembly", which also includes the address and byte +columns.

    -

    By default, the label, opcode, operand, and comment fields are included -in the text form. From the -app settings you can select -a different format that also includes the address and byte columns.

    +

    A copy of all of the fields is also written to the clipboard in CSV +format. If you have a spreadsheet like Excel, you can use Paste Special +to put the data into individual cells.

    diff --git a/SourceGen/RuntimeData/Help/settings.html b/SourceGen/RuntimeData/Help/settings.html index 1f6f21f..06b8d09 100644 --- a/SourceGen/RuntimeData/Help/settings.html +++ b/SourceGen/RuntimeData/Help/settings.html @@ -21,15 +21,15 @@ project properties.

    Application settings are stored in a file called "SourceGen-settings" in the SourceGen installation directory. If the file is missing or corrupted, some default settings will be used. These settings are local -to your system, and include everything from window sizes to whether you -prefer hexadecimal values to be shown in upper case. None of them +to your system, and include everything from window sizes to whether or not +you prefer hexadecimal values to be shown in upper case. None of them affect the way the project analyzes code and data, though they may affect the way generated assembly sources look.

    Project properties are stored in each individual .dis65 project file. They specify which CPU to use, which extension scripts to load, and a variety of other things that directly impact how SourceGen processes -the project. Because of the way it impacts the project, all changes to +the project. Because of the potential impact, all changes to the project properties are made through the undo/redo buffer.

    @@ -50,7 +50,7 @@ hide columns from the code list. The buttons may be more convenient though.

    You can select a different font for the code list. Make it as large -or small as you want. Monospace fonts like Courier or Consolas are +or small as you want. Mono-space fonts like Courier or Consolas are recommended.

    You can choose to display different parts of the display in upper or @@ -147,8 +147,8 @@ you later hit Cancel, but the changes are not applied immediately.

    The choice of CPU determines the set of available instructions, as well as cycle costs and register widths. There are many variations -on the 6502, but from the perspective of a disassembler only three -matter: +on the 6502, but from the perspective of a disassembler most can be +treated as one of these three:

    1. MOS 6502. The original 8-bit instruction set.
    2. WDC W65C02S. Expanded the instruction set and smoothed @@ -156,9 +156,9 @@ matter:
    3. WDC W65C816S. Expanded instruction set, 24-bit address space, and 16-bit registers.
    -

    The Rockwell R65C02 features an expanded instruction set that is -compatible with the WDC 65C02 but incompatible with the 65816. It's -not currently supported by SourceGen.

    +

    The Rockwell R65C02, Hudson Soft HuC6280, and Commodore CSG 4510 / 65CE02 +have instruction sets that expand on the 6502/65C02, but aren't compatible +with the 65816. These are not yet supported by SourceGen.

    If "enable undocumented instructions" is checked, some additional opcodes are recognized on the 6502 and 65C02. These instructions are @@ -198,14 +198,18 @@ create two symbols with the same label.

    The Import button allows you to import symbols from another project. Only labels that have been tagged as global and exported will be imported. Existing symbols with identical labels will be replaced, so it's okay to -run the importer multiple times.

    +run the importer multiple times. Labels that aren't found will not be +removed, so you can safely import from multiple projects, but will need +to manually delete any symbols that are no longer being exported.

    Symbol Files

    From here, you can add and remove platform symbol files, or change the order in which they are loaded. See the symbols section for an -explanation of how platform symbols work.

    +explanation of how platform symbols work. +See "README.md" in the RuntimeData directory for a description of the +file syntax.

    Platform symbol files must live in the RuntimeData directory that comes with SourceGen, or in the directory where the project file lives. This @@ -222,7 +226,9 @@ you will receive a warning.

    Extension Scripts

    From here, you can add and remove extension script files. See the extension scripts section for -an explanation of how extension scripts work.

    +an overview of how extension scripts work. +There's a more detailed document in the RuntimeData directory +("ExtensionScripts.md").

    Extension script files must live in the RuntimeData directory that comes diff --git a/SourceGen/RuntimeData/Help/tools.html b/SourceGen/RuntimeData/Help/tools.html index 09b153e..cc3b136 100644 --- a/SourceGen/RuntimeData/Help/tools.html +++ b/SourceGen/RuntimeData/Help/tools.html @@ -46,7 +46,7 @@ pasting in some situations.

    If "always on top" is checked, the window will stay above all other windows that don't also declare that they should always be on top. By -default this box is checked for the project dump, and not checked for +default this box is checked when displaying project data, and not checked for external files.

    diff --git a/SourceGen/RuntimeData/Help/tutorials.html b/SourceGen/RuntimeData/Help/tutorials.html index bf52f5d..9dec157 100644 --- a/SourceGen/RuntimeData/Help/tutorials.html +++ b/SourceGen/RuntimeData/Help/tutorials.html @@ -70,15 +70,18 @@ these distracting, collapse the column.

    Click on the fourth line down, which has address 1002. The line has a label, "L1002", and is performing an indexed load from L1017. Both of these labels were automatically generated, and are named for the -address they appear. When you clicked on the line, a few things happened:

    +address at which they appear. When you clicked on the line, a few +things happened:

    Click some other lines, such as address $100B and $1014. Note how the @@ -91,17 +94,17 @@ the operand itself opens a format editor; more on that later.)

    References window. Note the selection jumps to L1002. You can immediately jump to any reference.

    At the top of the Symbols window on the right side of the screen is a -row of buttons. Make sure "Auto" is highlighted. You should see three +row of buttons. Make sure "Auto" is selected. You should see three labels in the window (L1002, L1014, L1017). Double-click on L1014. The selection jumps to the appropriate line.

    Select Edit > Find. Type "hello", and hit Enter. The selection will move to address $100E, which is a string that says "hello!". You can use Edit > Find Next to try to find the next occurrence (there isn't one). You -can search for text that appears in the rightmost columns (label, opcode, +can search for any text that appears in the rightmost columns (label, opcode, operand, comment).

    Select Edit > Go To. You can enter a label, address, or file offset. -Enter "100b" to jump the selection to $100B.

    +Enter "100b" to set the selection to $100B.

    Near the top-left of the SourceGen window is a set of toolbar icons. Click the left-arrow, and watch the selection moves. Click it again. Then @@ -118,21 +121,21 @@ something like "6502bench SourceGen vX.Y.Z". There are three ways to open the comment editor:

    1. Select Actions > Edit Long Comment from the menu bar.
    2. -
    3. Right click, and select Actions > Edit Long Comment from the - pop-up menu. (The menus area exactly the same.)
    4. +
    5. Right click, and select Edit Long Comment from the + pop-up menu. (This menu is exactly the same as the Actions menu.)
    6. Double-click the comment

    Most things in the code list will respond to a double-click. Double-clicking on addresses, flags, labels, operands, and comments will open editors for those things. Double-clicking on a value in the "bytes" column will open a floating hex dump viewer. This is usually the most -convenient way to edit something.

    +convenient way to edit something: point and click.

    Double-click the comment to open the editor. Type some words into the upper window, and note that a formatted version appears in the bottom window. Experiment with the maximum line width and "render in box" settings to see what they do. You can hit Enter to create line breaks, or let SourceGen wrap lines for you. When you're done, click OK. (Or -hit Ctrl-Enter.

    +hit Ctrl+Enter.)

    When the dialog closes, you'll see your new comment in place at the top of the file. If you typed enough words, your comment will span multiple lines. You can select the comment by selecting any line in it.

    @@ -151,15 +154,17 @@ differences:

    1. You can't pick their line width, but you can pick their color.
    2. They don't appear in generated assembly sources, making them - useful for leaving notes to yourself.
    3. + useful for leaving notes to yourself as you work.
    4. They're listed in the Notes window. Double-clicking them jumps the selection to the note, making them useful as bookmarks.
    -

    It's time to do something with the code. It's copying the instructions -from $1017 to $2000, then jumping to $2000, so it looks like it's -relocating the code before executing it. We want to do the same thing -to our disassembled code, so select the line at address $1017 and then +

    It's time to do something with the code. If you look at what the code +does you'll see that it's copying several dozen bytes from $1017 +to $2000, then jumping to $2000. It appears to be relocating the next +part of the code before +executing it. We want to let the disassembler know what's going on, so +select the line at address $1017 and then Edit > Edit Address. (Or double-click the "1017" in the addr column.) In the Edit Address dialog, type "2000", and hit Enter.)

    @@ -178,8 +183,8 @@ so you'll be forgiven if you reduce the offset column width to zero.)

    On the line at address $2000, select Actions > Edit Label, or double-click on the label "L2000". Change the label to "MAIN", and hit Enter. The label changes on that line, and on the two lines that refer -to address $2000. (If you're not sure what refers to line $2000, check -the References window.)

    +to address $2000. (If you're not sure what refers to line $2000, select +it and check the References window.)

    On that same line, select Actions > Edit Comment. Type a short comment, and hit Enter. Your comment appears in the "comment" column.

    @@ -215,12 +220,12 @@ Actions > Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are case-sensitive, so it needs to match the operand at $2005 exactly.) You'll see the new label appear, and the operand at line $2005 will use it.

    There's an easier way. Use Edit > Undo twice, to get back to the place -where line $2005 is using "L2009" as it's operand. Select that line and +where line $2005 is using "L2009" as its operand. Select that line and Actions > Edit Operand. Enter "IS_OK", then select "Create label at target address instead". Hit "OK".

    You should now see that both the operand at $2005 and the label at $2009 have changed to IS_OK, accomplishing what we wanted to do in a -single step. (There's actually a sutble difference compared to the two-step +single step. (There's actually a subtle difference compared to the two-step process: the operand at $2005 is still a numeric reference. It was automatically changed to match IS_OK in the same way that the references to MAIN were when we renamed "L2000" earlier. If you actually do want the @@ -248,7 +253,7 @@ label to "STR1". Move up a bit and select address $2030, then scroll to the bottom and shift-click address $2070. Select Actions > Edit Data Format. At the top it should now say, "65 bytes selected in 2 groups". There are two groups because the presence of a label split the data into -two separate regions. Selected "mixed ASCII and non-ASCII", then click +two separate regions. Select "mixed ASCII and non-ASCII", then click "OK".

    We now have two ".STR" lines, one for "string zero ", one with the STR1 label and the rest of the string data. This is okay, but it's not @@ -260,8 +265,8 @@ a single ".STR" line at the bottom, split across two lines with a '+'.

    but that appears to be incorrect, so let's format it as individual bytes instead. There's an easy way to do that: use Actions > Toggle Single-Byte Format (or hit Ctrl+B).

    -

    The data starting at $2025 appears to be 16-bit addresses into the -table of strings, so let's format them appropriately.

    +

    The data starting at $2025 appears to be 16-bit addresses that point +into the table of strings, so let's format them appropriately.

    Select the line at $2025, then shift-click the line at $202E. Select Actions > Edit Data Format. If you selected the correct set of bytes, the top should say, "10 bytes selected". Click the @@ -277,7 +282,7 @@ on their own line, so each string is now in a separate ".STR" statement.

    Generating Assembly Code

    -

    You can generate asssembly source code from the disassembled data. +

    You can generate assembly source code from the disassembled data. Select File > Assembler (or hit Ctrl+Shift+A) to open the generation and assembly dialog.

    Pick your favorite assembler from the drop list at the top right,