6502bench SourceGen: Using SourceGen
Starting a New Project
Select File > New, or if no project is open, click "Start new project". This opens the Create New Project window.
Start by selecting your target system from the tree on the left. The panel on the right will show the CPU that will be selected, as well as the symbol files and extension scripts that will be loaded by default. All of these may be overridden later from the project properties. (If the description in the panel on the right says "[placeholder]", it means that the system doesn't yet have a set of symbols defined for it.)
Next, click the "Select File..." button. Pick the file you wish to disassemble. The dialog will update with the pathname and some notes about the file's size. Click "OK" if all looks good to create the project.
NOTE: Support for very large 65816 programs is incomplete. The maximum size for a data file is limited to 1 MiB.
The first time you save the project (with File > Save), you will be prompted for the project name. It's best to use the data file's name with ".dis65" added, so this will be set as the default. The data file's name is not stored in the project file, so if you pick a different name, or save the project in a different directory, you will have to select the data file manually whenever you open the project.
Opening an Existing Project
Select File > Open, or if no project is open, click "Open existing project". Select the .dis65 project file from the standard file dialog.
SourceGen will try to open a data file with the project's name, minus the ".dis65". If it can't find a file with that name, or if there's something wrong with it (e.g. the CRC doesn't match), you will be given the opportunity to specify the location of the data file to use.
If non-fatal problems with the file are detected, a warning will be shown. If it's something simple, like a missing .sym65 or extension script file, you'll be notified. If it's something more complicated, e.g. the project has a comment on an offset that doesn't exist, you will be warned that the problematic data has been deleted, and will be lost if the project is saved. By default, such a project will be opened in read-only mode, though you can override this in the dialog. You will also be given the opportunity to simply cancel loading the project.
The locations of the last few projects you've worked with are saved in the application settings. You can access them from File > Recent Projects. If no project is open, links to the two most-recently-opened projects will be available.
Working With a Project
The main project window is divided into five areas:
- Center: the code list. If no project is open, this will instead have buttons to open a new or existing project.
- Top left: cross-reference list.
- Bottom left: notes list.
- Top right: symbols list.
- Bottom right: info on selected line.
Most actions are performed in the center code list. All of the sub-windows can be resized. The window sizes and column widths are saved in the application settings file.
A toolbar near the top of the screen has some shortcut buttons. If you hover your mouse over them, a tooltip with an explanation will appear.
Code List
The code list provides a view of the code being disassembled. Each line may be an instruction, data item, long comment, note, or assembler directive.
The list is divided into columns:
- Offset. The offset within the file where the instruction or data item starts. Throughout the UI, file offsets are shown as six-digit hex values with a leading '+'.
- Address. The address where the assembled code will execute. For 8-bit CPUs this is shown as a 4-digit hex number, for 16-bit CPUs the bank is shown as well. Double-click on this field to open the Edit Address dialog.
- Bytes. Shows up to four bytes from the data file that correspond to the instruction or data. To see the full dump of a longer item, such as an ASCII string, double-click on the field to open the Hex Dump Viewer. This is a floating window, so you can keep it open while you work. Double-clicking in the bytes column while the window is open will update the viewer's position and selection.
- Flags. This shows the state of the status flags as they are before the instruction is executed. Double-click on this field to open the Edit Status Flag Override dialog.
- Attributes. Some instructions and data items have interesting attributes. '@' indicates an entry point, 'T' means one or more bytes has an analyzer tag (code start/stop/skip), '#' means execution will not continue to the following instruction, '>' is shown for branch targets, and '!' appears when a conditional branch is never taken. (This column is rarely useful and can be hidden.)
- Label. If a label has been defined for this offset, by the user or generated automatically, it will appear here. Also, full-line items like long comments and notes will start in this field. Double-click on this field to open the Edit Label dialog.
- Opcode. The instruction or pseudo-opcode mnemonic. If an instruction is embedded inside this one, a ▼ symbol will appear. If you double-click this field for an instruction or data item whose operand refers to an address in the file, the selection will jump to that location. If the operand is a local variable, the selection will jump to the point where the variable was defined.
- Operand. The instruction or data operand. Data operands may span a large number of bytes. Double-click on this field to open the Edit Instruction Operand or Edit Data Operand dialog, as appropriate. (Note you can shift-double-click on data items to edit multiple lines.)
- Comment. End-of-line comment, generally shown with a ';' prefix. If enabled, cycle counts will appear here. Double-click on this field to open the Edit Comment dialog.
Double-clicking anywhere on a line with a note or long comment will open the Edit Note or Edit Long Comment dialog, respectively.
The code list is a standard Windows list view. You can left-click to select an item, ctrl-left-click to toggle individual items on and off, and shift-left-click to select a range. You can select all lines with Edit > Select All. Resize columns by left-clicking on the divider in the header and dragging it.
Selecting any part of a multi-line item, such as a long comment or character string, effectively selects the entire item.
Right-clicking opens a menu. The contents are the same as those in the Actions menu item in the menu bar. The set of options that are enabled will depend on what you have selected in the main window.
- Edit Operand. Opens the Edit Instruction Operand or Edit Data Operand window, depending on what's selected. Enabled when a single instruction line is selected, or when one or more data lines are selected.
- Edit Label. Sets the label at that offset. Enabled when a single instruction or data line is selected.
- Edit Comment. Sets the comment at that offset. Enabled when a single instruction or data line is selected.
- Edit Long Comment. Sets the long comment at that offset. Enabled when a single instruction or data line, or an existing long comment, is selected.
- Edit Note. Sets the note at that offset. Enabled when a single instruction or data line, or an existing note, is selected.
- Define Address Region. Sets the assembly address at the selected offset. Can be used to set a start point with a floating end, or specify a region with a fixed end point (useful for code that get relocated). Enabled when the first line selected is code, data, or an address start directive.
- Override Status Flags. Changes the status flags at that offset. Enabled when a single instruction line is selected.
- Edit Project Symbol. Sets the name, value, and comment of the project symbol. Enabled when a single equate directive, generated from a project symbol, is selected.
- Create Local Variable Table. Create a new local variable table.
- Edit Prior Local Variable Table. Modify or delete entries in the most recently defined local variable table.
- Create/Edit Visualization Set. Create a new visualization set or edit an existing set.
- Analyzer Tags (Tag Address As Code Start Point, Tag Address As Code Stop Point, Tag Bytes As Inline Data, Remove Analyzer Tags). Enabled when one or more code and data lines are selected. Remove Analyzer Tags is only enabled when at least one line has tags. The keyboard shortcuts are two-key combinations.
- Format Address Table. Formats a series of bytes as parts of a table of addresses.
- Toggle Single-Byte Format. Toggles a range of lines between default format and single-byte format. Enabled when one or more data lines are selected.
- Format As Word. Formats two bytes as a 16-bit little-endian word.
- Remove Formatting. Reverts instruction and data operand formats to the default. Clears embedded labels.
- Delete Note / Long Comment. Deletes the selected note or long comment. Enabled when a single note or long comment is selected.
- Show Hex Dump. Opens the hex dump viewer, with the current selection highlighted. Always enabled. If nothing is selected, the viewer will open at the top of the file.
Undo & Redo
You can undo a change with Edit > Undo, or Ctrl+Z. You can redo a change with Edit > Redo, Ctrl+Y, or Ctrl+Shift+Z.
All changes to the project, including changes to the project properties, are added to the undo/redo buffer. This has no fixed size limit, so no matter how much you change, you can always undo back to the point where the project was opened.
The undo history is not saved as part of the project. Closing a project clears it.
References Window
When a single instruction or data line is selected in the main window, all references to that offset will be shown in the References window. For each reference, the file offset, address, and some details about the type of reference will be shown.
The reference type indicates whether the origin is an instruction or data operand, and provides an indication of the nature of the reference:
- call - subroutine call
(e.g.
JSR addr
,JSL addr
) - branch - conditional or unconditional branch
(e.g.
JMP addr
,BCC addr
) - read - read from memory
(e.g.
LDA addr
,BIT addr
) - write - write to memory
(e.g.
STA addr
) - rmw - read-modify-write
(e.g.
LSR addr
,TSB addr
) - ref - reference to address by instruction
(e.g.
LDA #<addr
,PEA addr
) - data - reference to address by data
(e.g.
.DD2 addr
)
References from instructions that use indexed addressing
(e.g. LDA addr,Y
) will also show "idx" to indicate that
the instruction is using the location as a base address.
References from instructions that treat the address as a pointer
(e.g. LDA (dp),Y
) will show "ptr". This makes it easy
to identify the locations that are reading or writing through the
pointer from those that are reading or writing the pointer itself.
This will be prefixed with "Sym" or "Oth" to indicate whether or not the reference used the label at the current address. To understand this, consider that addresses can be referenced in different ways. For example:
LDA DATA0 LDX DATA0+1 RTS DATA0 .DD1 $80 DATA1 .DD2 $90
Both DATA0
and DATA1
are accessed, but
both operands used DATA0
. When the DATA0
line
is selected in the code list, the references window will show the
LDA
and LDX
instructions, because both
instructions referenced it. When DATA1
is selected, the
references window will show the LDX
, because that
instruction accessed DATA1
's location even though it didn't
use the symbol. To make the difference clear, the lines in the references
window will either show "Sym" (to indicate that the symbol at the selected
line was referenced) or "Oth" (to indicate that some other symbol, or no
symbol, was used).
When an equate directive (generated for platform and project
symbols) or local variable assignment is selected, the References
window will show all references to that symbol. Unlike in-file
references, only the uses of that symbol are shown. So if you have
both a project symbol and a local variable for address $30, they
will show disjoint sets of references. Furthermore, if you explicitly
format an instruction operand as hex, e.g. LDA $30
, it will
not appear in either set because it's not a symbolic reference.
The cross-reference data is used to generate the set of equate directives at the top of the listing. If nothing references a platform or project symbol, an equate directive will not be generated for it.
Double-clicking on a reference moves the code list selection to that reference, and adds the previous selection to the navigation stack.
Notes Window
When you add a note, it will also be added to this window. Double-clicking on a note will jump directly to it, and add the previous selection to the navigation stack. This makes notes useful as bookmarks.
Symbols Window
All known symbols are shown here. The filter buttons allow you to screen out symbols you're not interested in, such as platform symbols or constants.
Clicking on one of the column headers will sort the list on that field. Click a second time to reverse the sort direction.
Double-clicking on an auto or user label will jump to that label, and add the previous selection to the navigation stack. This can be a handy way to move around the file, jumping from label to label.
The "type" column uses a two-letter code to identify the symbol's type and scope. The first letter is one of A (auto), U (user), P (platform), J (project), R (pre-label), or V (variable). The second letter is one of N (non-unique local), L (local), G (global), X (exported), E (external), or C (constant).
Info Window
Some additional information about the currently-selected line is shown, such as the formatting applied to the operand. If the operand has a default format, any automatically-generated format will be noted. For an instruction, a summary is shown that includes the cycle count, flags affected, and a brief description of what the instruction does. The latter can be especially handy for undocumented instructions.
Messages Window
Sometimes a change will invalidate an earlier change. For example, suppose you add a code stop point, and format the data that follows as a string. Later on you change it to a code start point. You now have a block of executable code with a string format record sitting in the middle of it. SourceGen tries very hard not to throw away anything you've done, but it will ignore anything invalid.
If a problem like this is encountered, an entry is added to a list of messages displayed at the bottom of the main window. Each entry identifies the nature of the problem, the severity of the problem, the offset where it occurred, and what was done to resolve it. The problem categories include:
- Hidden label: a label placed on code or data is now stuck in the middle of a multi-byte instruction or data item.
- Hidden local variable table: a local variable table has been placed in the middle of a multi-byte item.
- Hidden visualization: a visualization set has been placed in the middle of a multi-byte item.
- Unresolved weak ref: a reference to a non-existent symbol was found.
- Non-addressable label reference: a code or data operand has a reference to a label defined in a non-addressable section of the file. (The generated code will likely fail to assemble.)
- Invalid offset or length: the offset or length in a format object had an invalid value.
- Invalid descriptor: the format descriptor is inappropriate, e.g. formatting an instruction as a string.
- Bank overrun: the generated code would run past address $ffff. The handling of this in generated code is assembler-dependent.
The "context" column will provide additional detail about the problem, and the "resolution" column will indicate how it's being handled. In most cases, the offending item will be ignored.
Double-clicking on an entry will jump to that offset.
The message list will not appear if there are no messages. You can hide the list by clicking on the "Hide" button to the left of the messages. Un-hide the list by clicking on the "N messages" button at the bottom-right corner of the application window.
Navigation
The simplest way to move through the code list is with the scroll wheel on your mouse, or by left-clicking and dragging the scroll bar. You can also use PgUp/PgDn and the arrow keys.
Use Navigate > Find to search for text. This performs a case-insensitive text search on the label, opcode, operand, and comment fields. Use Navigate > Find Next to find the next match, and Navigate > Find Previous to find the previous match. Note "next" is always downward, and "previous" is always upward, regardless of the direction of the initial search chosen in the Find dialog.
Use Navigate > Go To to jump to an offset, address, or label. Remember that offsets and addresses are always hexadecimal, and offsets start with a '+'. If you have a label that is also a valid hexadecimal address, like "FEED", the label takes precedence. To jump to the address write "$FEED" instead. If you enter a non-unique label, the selection will jump to the nearest instance.
If an instruction or data line has an operand that references an address in the file, you can navigate to the operand's location with Navigate > Jump to Operand. You can also do this by double-clicking in the opcode column.
When you edit something, lines throughout the listing can change. This is different from a source code editor, where editing a line just changes that line. To allow you to watch the effects changes have, the undo/redo commands try to keep the listing in the same position. If you want to go to the place where the last change (i.e. the change that will be undone by the next Undo operation) was made, Navigate > Go to Last Change will jump to the first offset associated with the most recent change. If the last change was to the project properties, it will jump to the first offset in the file.
When you jump around, e.g. by double-clicking on an opcode or an entry in one of the side windows, the previously-selected line is added to a navigation stack. You can use Navigate > Nav Forward and Navigate > Nav Backward to move forward and backward through the stack. (The curly arrows on the left side of the toolbar may be more convenient. You can use Alt+Left/Right Arrow, or Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)
Adding and Removing Analyzer Tags
(Note: These were referred to as code/data "hints" in older versions of SourceGen.)
To set code start or stop points, select the desired offsets and use Actions > Tag Address As Code Start Point (or Stop Point). Because these indicate a transition between code and data regions, there is rarely any need to tag multiple consecutive bytes. For this reason, only the first byte on each selected line will be tagged.
For inline data, you need to cover the entire range, so every byte in every selected line is tagged when you select Tag Bytes As Inline Data. Similarly, the Remove Analyzer Tags menu item will remove tags from every byte.
If you're having a hard time selecting just the right bytes because the instructions are caught up in a multi-byte data item, such as an auto-detected character string, you can disable uncategorized data analysis (the thing that creates the .STR and .FILL ops for you). You can do this from the project properties editor, or simply by hitting Ctrl+D. Hit that, tag the byte or bytes, then hit it again to re-enable the string & fill analyzer.
Another approach is to use the "Toggle Single-Byte Format" menu item to "flatten" the item.
Format Address Table
Tables of addresses are fairly common. Sometimes you'll find them as a series of 16-bit words, like this:
jmptab .dd2 func1 .dd2 func2 .dd2 func3
While that's fairly common in 16-bit software, 8-bit software often splits the high and low bytes into separate arrays, like this:
jmptabl .dd1 <func1 .dd1 <func2 .dd1 <func3 jmptabh .dd1 >func1 .dd1 >func2 .dd1 >func3
Sometimes the tables contain address - 1
, because the
values are to be pushed onto the stack for an RTS call.
While the .dd2 case is easy to format with the data operand editor, formatting addresses whose components are split into multiple tables can be tedious. Even in the easy case, you may want to create labels and set code start points for each item.
The Address Table Formatter helps you associate symbols with the addresses in the table. It works for simple and "split" tables.
To use it, start by selecting the entire table. In the examples above, you would select all 6 bytes. The number of bytes in each part of a split table must be equal: here, it's 3 low bytes, followed by 3 high bytes. If the number of bytes selected can't be evenly divided by the number of parts -- two parts for 16-bit data, three parts for 24-bit data -- the formatter will report an error.
With the data selected, open the format dialog with Actions > Format Split-Address Table. The rather complicated dialog is split into sections.
- Address Characteristics: select whether the table has 16-bit addresses or 24-bit addresses. (24-bit addresses are disabled if you don't have the CPU set to 65816.) If the table is split into individual sub-tables for low bytes and high bytes, check the "Parts are split across sub-tables" box. If the address parts are being pushed on the stack for an RTS/RTL, check the "Adjusted for RTS/RTL" box to adjust them by 1.
- Low Byte Source: indicate which part of the table or word holds the low bytes. For common little-endian words, the low bytes come first. In the split-table example above, the low bytes came first, followed by the high bytes, so you would select "first part of selection". If they were stored the other way around, you would click "second part" instead.
- High Byte Source: indicate which part of the table or word holds the high bytes. For a 16-bit address this will be the part you didn't pick for the low bytes. Sometimes, if all addresses land on the same 256-byte page, the high byte will be a constant in the code, and only the low bytes will be stored in a table. If that's the case, select "Constant", and enter the high byte in the text box. (Decimal, hex, and binary are accepted.)
- Bank Byte Source: for 24-bit addresses, you can select "Nth part of selection", which will just use whichever part you didn't specify for the low and high bytes. If the table holds 16-bit addresses, you can use the "Constant" field to specify the data bank.
- Options: if the table holds the addresses of executable code, check the "Tag targets as code start points" box. If the target address hasn't been identified by the code analyzer through some other execution path, it will be tagged as a code start point.
- Generated Addresses: this shows the full list of addresses that are generated with the current set of parameters. Each address is shown with a file offset and a symbol. If the address can't be mapped within the file, the offset is shown as dashes instead. If the address can be mapped, and it already has a user-specified label, the label will be shown. If no label was found, the table will show "(+)", indicating that a permanent label will be added at the target offset. If everything is set up correctly, and the addresses fall entirely within the program, you shouldn't see any unknown entries here.
For a 16-bit address, you have three choices: low byte first, high byte first, or low byte only with a constant high byte. For a 24-bit address the set of possibilities expands, but is essentially the same: pick the order in which things appear, using fixed constants if desired.
A message at the top of the screen shows how many bytes are selected. It also tells you how many groups there are, but unlike the data operand formatter, the split-address table formatter doesn't care about group boundaries. For this reason, tables do not have to be contiguous in memory. The low bytes and high bytes could be on separate 256-byte pages. You just need to have all of the data selected.
It should be mentioned that SourceGen does not record the fact that the data in question is part of a table. The formatting, labels, and code start point tags are applied as if you entered them all individually by hand. The formatter is just significantly more convenient. It also does everything as a single undoable action, so if it comes out looking wrong, just hit "undo" and try something else.
Toggle Single-Byte Format
The "Toggle Single-Byte Format" feature provides a quick way to change a range of bytes to single bytes or back to their default format. It's equivalent to opening the Edit Data Operand dialog and selecting "Single bytes" displayed as hex, or selecting "Default".
This can be handy if the default format for a range of bytes is a string, but you want to see it as bytes or set a label in the middle.
Format As Word
This is a quick way to format pairs of bytes as 16-bit words. It's equivalent to opening the Edit Data Operand dialog and selecting "16-bit words, little-endian", displayed as hex.
To avoid some confusing situations, it only works on sets of single-byte values. This means, for example, that you can't select a 10-byte string and have it turn into five 16-bit words. You can select as many bytes as you want, but they must come in pairs. (Remember that you can turn off auto-generation of strings and .FILLs with Toggle Data Scan.)
As a special case, if you select a single byte, the following byte will also be selected. This won't work if the following byte is part of a multi-byte data item, is the start of a new region (see Edit Data Operand for a definition of what splits a region), or is the last byte in the file.
Remove Formatting
Removes instruction and data operand formatting from the selected lines. This removes the visible formatting as well as any formatting instructions that got embedded inside multi-byte data items. (You will be notified of such things in the message list.)
This will also remove any labels that are embedded in multi-byte items, without removing visible labels.
Toggle Data Scan
This menu item is in the Edit menu, and acts as a shortcut to opening the Project Properties editor, and clicking on the "Analyze Uncategorized Data" checkbox. When enabled, SourceGen will look for character strings and regions of identical bytes, and generate .STR and .FILL directives. When disabled, uncategorized data is presented as one byte per line, which can be handy if you're trying to get at a byte in the middle of a string.
As with all other project property changes, this is an undoable action.
Copying to Clipboard
When you use Edit > Copy, all lines selected in the code list are copied to the system clipboard. This can be a convenient way to post code snippets into forum postings or documentation. The text is copied from the data shown on screen, so your chosen capitalization and pseudo-ops will appear in the copy.
Long comments are included, but notes are not.
By default, only the label, opcode, operand, and comment fields are included. From the app settings dialog you can select alternative formats that include additional columns.
A copy of all of the fields is also written to the clipboard in CSV format. If you have a spreadsheet like Excel, you can use Paste Special to put the data into individual cells.