SourceGen: Using SourceGen

Back to index

Starting a New Project

Select File > New, or if no project is open, you can click the Start new project button. This opens the Create New Project window.

Start by selecting your target system from the tree on the left. The panel on the right will show the CPU that will be selected, as well as the symbol files and extension scripts that will be loaded by default. All of these may be overridden later from the project properties. (If the description in the panel on the right says "[placeholder]", it means that the system doesn't yet have a set of symbols defined for it.)

Next, click the Select File... button. Pick the file you wish to disassemble. The dialog will update with the pathname and some notes about the file's size. Click OK if all looks good to create the project.

NOTE: Support for very large 65816 programs is incomplete. The maximum size for a data file is limited to 1 MiB.

The first time you save the project (with File > Save), you will be prompted for the project name. It's best to use the data file's name with ".dis65" added, so this will be set as the default. The data file's name is not stored in the project file, so if you pick a different name, or save the project in a different directory, you will have to select the data file manually whenever you open the project.

Opening an Existing Project

Select File > Open, or if no project is open, you can click the Open existing project button. Select the .dis65 project file from the standard file dialog.

SourceGen will try to open a data file with the project's name, minus the ".dis65". If it can't find a file with that name, or if there's something wrong with it (e.g. the CRC doesn't match), you will be given the opportunity to specify the location of the data file to use.

If non-fatal problems with the file are detected, a warning will be shown. If it's something simple, like a missing .sym65 or extension script file, you'll be notified. If it's something more complicated, e.g. the project has a comment on an offset that doesn't exist, you will be warned that the problematic data has been deleted, and will be lost if the project is saved. By default, such a project will be opened in read-only mode, though you can override this in the dialog. You will also be given the opportunity to simply cancel loading the project.

The locations of the last few projects you've worked with are saved in the application settings. You can access them from File > Recent Projects. If no project is open, buttons that open the two most-recently-opened projects will be available.

Working With a Project

The main project window is divided into five areas:

  1. Center: the code list. If no project is open, this will instead have buttons to open a new or existing project.
  2. Top left: cross-reference list.
  3. Bottom left: notes list.
  4. Top right: symbols list.
  5. Bottom right: info on selected line.

Most actions are performed in the center code list. All of the sub-windows can be resized. The window sizes and column widths are saved in the application settings file.

A toolbar near the top of the screen has some shortcut buttons. If you hover your mouse over them, a tooltip with an explanation will appear.

A status bar at the bottom displays a summary of the amount of code, data, and uninitialized data (variable storage or junk) found in the program. These values are updated as you work.

Code List

The code list provides a view of the code being disassembled. Each line may be an instruction, data item, long comment, note, or assembler directive.

The list is divided into columns:

Double-clicking anywhere on a line with a note or long comment will open the Edit Note or Edit Long Comment dialog, respectively.

The code list is a standard Windows list view. You can left-click to select an item, ctrl-left-click to toggle individual items on and off, and shift-left-click to select a range. You can select all lines with Edit > Select All. Resize columns by left-clicking on the divider in the header and dragging it.

Selecting any part of a multi-line item, such as a long comment or character string, effectively selects the entire item.

Right-clicking opens a menu. The contents are the same as those in the Actions menu item in the menu bar. The set of options that are enabled will depend on what you have selected in the main window.

Undo & Redo

You can undo a change with Edit > Undo, or Ctrl+Z. You can redo a change with Edit > Redo, Ctrl+Y, or Ctrl+Shift+Z.

All changes to the project, including changes to the project properties, are added to the undo/redo buffer. This has no fixed size limit, so no matter how much you change, you can always undo back to the point where the project was opened.

The undo history is not saved as part of the project. Closing a project clears it.

References Window

When a single instruction or data line is selected in the main window, all references to that offset will be shown in the References window. For each reference, the file offset, address, and some details about the type of reference will be shown.

The reference type indicates whether the origin is an instruction or data operand, and provides an indication of the nature of the reference:

References from instructions that use indexed addressing (e.g. LDA addr,Y) will also show "idx" to indicate that the instruction is using the location as a base address.

References from instructions that treat the address as a pointer (e.g. LDA (dp),Y) will show "ptr". This makes it easy to identify the locations that are reading or writing through the pointer from those that are reading or writing the pointer itself.

This will be prefixed with "Sym" or "Oth" to indicate whether or not the reference used the label at the current address. To understand this, consider that addresses can be referenced in different ways. For example:

         LDA     DATA0
         LDX     DATA0+1
         RTS
DATA0    .DD1    $80
DATA1    .DD2    $90

Both DATA0 and DATA1 are accessed, but both operands used DATA0. When the DATA0 line is selected in the code list, the references window will show the LDA and LDX instructions, because both instructions referenced it. When DATA1 is selected, the references window will show the LDX, because that instruction accessed DATA1's location even though it didn't use the symbol. To make the difference clear, the lines in the references window will either show "Sym" (to indicate that the symbol at the selected line was referenced) or "Oth" (to indicate that some other symbol, or no symbol, was used).

When an equate directive (generated for platform and project symbols) or local variable assignment is selected, the References window will show all references to that symbol. Unlike in-file references, only the uses of that symbol are shown. So if you have both a project symbol and a local variable for address $30, they will show disjoint sets of references. Furthermore, if you explicitly format an instruction operand as hex, e.g. LDA $30, it will not appear in either set because it's not a symbolic reference.

The cross-reference data is used to generate the set of equate directives at the top of the listing. If nothing references a platform or project symbol, an equate directive will not be generated for it.

Double-clicking on a reference moves the code list selection to that reference, and adds the previous selection to the navigation stack.

Notes Window

When you add a note, it will also be added to this window. Double-clicking on a note will jump directly to it, and add the previous selection to the navigation stack. This makes notes useful as bookmarks.

Symbols Window

All known symbols are shown here. The filter buttons allow you to screen out symbols you're not interested in, such as platform symbols or constants. The filters are:

Clicking on one of the column headers will sort the list on that field. Click a second time to reverse the sort direction.

Double-clicking on an auto or user label will jump to that label, and add the previous selection to the navigation stack. This can be a handy way to move around the file, jumping from label to label.

The "type" column uses a two-letter code to identify the symbol's type and scope. The first letter is one of A (auto), U (user), P (platform), J (project), R (pre-label), or V (variable). The second letter is one of N (non-unique local), L (local), G (global), X (exported), E (external), or C (constant).

Info Window

Some additional information about the currently-selected line is shown, such as the formatting applied to the operand. If the operand has a default format, any automatically-generated format will be noted. For an instruction, a summary is shown that includes the cycle count, flags affected, and a brief description of what the instruction does. The latter can be especially handy for undocumented instructions.

If multiple lines are selected, the number of selected lines and the number of bytes spanned by the selection will be shown.

Messages Window

Sometimes a change will invalidate an earlier change. For example, suppose you add a code stop point, and format the data that follows as a string. Later on you change it to a code start point. You now have a block of executable code with a string format record sitting in the middle of it. SourceGen tries very hard not to throw away anything you've done, but it will ignore anything invalid.

If a problem like this is encountered, an entry is added to a list of messages displayed at the bottom of the main window. Each entry identifies the nature of the problem, the severity of the problem, the offset where it occurred, and what was done to resolve it. The problem categories include:

The "context" column will provide additional detail about the problem, and the "resolution" column will indicate how it's being handled. In most cases, the offending item will be ignored.

Double-clicking on an entry will jump to that offset.

The message list will not appear if there are no messages. You can hide the list by clicking on the Hide button to the left of the messages. Un-hide the list by clicking on the N messages button at the bottom-right corner of the application window.

The simplest way to move through the code list is with the scroll wheel on your mouse, or by left-clicking and dragging the scroll bar. You can also use PgUp/PgDn and the arrow keys.

Use Navigate > Find to search for text. This performs a case-insensitive text search on the label, opcode, operand, and comment fields. Use Navigate > Find Next to find the next match, and Navigate > Find Previous to find the previous match. Note "next" is always downward, and "previous" is always upward, regardless of the direction of the initial search chosen in the Find dialog.

Use Navigate > Go To to jump to an offset, address, or label. Remember that offsets and addresses are always hexadecimal, and offsets start with a '+'. If you have a label that is also a valid hexadecimal address, like "FEED", the label takes precedence. To jump to the address write "$FEED" instead. If you enter a non-unique label, the selection will jump to the nearest instance.

If an instruction or data line has an operand that references an address in the file, you can navigate to the operand's location with Navigate > Jump to Operand. You can also do this by double-clicking in the opcode column.

When you edit something, lines throughout the listing can change. This is different from a source code editor, where editing a line just changes that line. To allow you to watch the effects changes have, the undo/redo commands try to keep the listing in the same position. If you want to go to the place where the last change (i.e. the change that will be undone by the next Undo operation) was made, Navigate > Go to Last Change will jump to the first offset associated with the most recent change. If the last change was to the project properties, it will jump to the first offset in the file.

When you jump around, e.g. by double-clicking on an opcode or an entry in one of the side windows, the previously-selected line is added to a navigation stack. You can use Navigate > Nav Forward and Navigate > Nav Backward to move forward and backward through the stack. (The curly arrows on the left side of the toolbar may be more convenient. You can use Alt+LeftArrow / Alt+RightArrow, or Ctrl+- / Ctrl+Shift+-, as keyboard shortcuts.)

Adding and Removing Analyzer Tags

(Note: These were referred to as code/data "hints" in older versions of SourceGen.)

To set code start or stop points, select the desired offsets and use Actions > Tag Address As Code Start Point (or Stop Point). Because these indicate a transition between code and data regions, there is rarely any need to tag multiple consecutive bytes. For this reason, only the first byte on each selected line will be tagged.

For inline data that follows a JSR/JSL/BRK, you need to cover the entire range, so every byte in every selected line is tagged when you select Tag Bytes As Inline Data. Similarly, the Remove Analyzer Tags menu item will remove tags from every byte.

Tip: while code start points and inline data tagging are both very important, code stop points are rarely useful. The code analyzer is pretty good at separating code from data. If you find yourself using stop points frequently, you're probably Doing It Wrong.

If you're having a hard time selecting just the right bytes because the instructions are caught up in a multi-byte data item, such as an auto-detected character string, you can disable uncategorized data analysis (the thing that creates the .STR and .FILL ops for you). You can do this from the project properties editor, or simply by hitting Ctrl+D. Hit that, tag the byte or bytes, then hit it again to re-enable the string & fill analyzer.

Another approach is to use the Toggle Single-Byte Format menu item to "flatten" the item, explicitly formatting everything as individual hex bytes.

Format Address Table

Tables of addresses are fairly common. Sometimes you'll find them as a series of 16-bit words, like this:

jmptab   .dd2    func1
         .dd2    func2
         .dd2    func3

While that's fairly common in 16-bit software, 8-bit software often splits the high and low bytes into separate arrays, like this:

jmptabl  .dd1    <func1
         .dd1    <func2
         .dd1    <func3
jmptabh  .dd1    >func1
         .dd1    >func2
         .dd1    >func3

Sometimes the tables contain address - 1, because the values are to be pushed onto the stack for an RTS call.

While the .dd2 case is easy to format with the data operand editor, formatting addresses whose components are split into multiple tables can be tedious. Even in the easy case, you may want to create labels and set code start points for each item.

The Address Table Formatter helps you associate symbols with the addresses in the table. It works for simple and "split" tables.

To use it, start by selecting the entire table. In the examples above, you would select all 6 bytes. The number of bytes in each part of a split table must be equal: here, it's 3 low bytes, followed by 3 high bytes. If the number of bytes selected can't be evenly divided by the number of parts -- two parts for 16-bit data, three parts for 24-bit data -- the formatter will report an error.

With the data selected, open the format dialog with Actions > Format Split-Address Table. The rather complicated dialog is split into sections.

For a 16-bit address, you have three choices: low byte first, high byte first, or low byte only with a constant high byte. For a 24-bit address the set of possibilities expands, but is essentially the same: pick the order in which things appear, using fixed constants if desired.

A message at the top of the screen shows how many bytes are selected. It also tells you how many groups there are, but unlike the data operand formatter, the split-address table formatter doesn't care about group boundaries. For this reason, tables do not have to be contiguous in memory. The low bytes and high bytes could be on separate 256-byte pages. You just need to have all of the data selected.

It should be mentioned that SourceGen does not record the fact that the data in question is part of a table. The formatting, labels, and code start point tags are applied as if you entered them all individually by hand. The formatter is just significantly more convenient. It also does everything as a single undoable action, so if it comes out looking wrong, just hit "undo" and try something else.

Toggle Single-Byte Format

The Toggle Single-Byte Format feature provides a quick way to change a range of bytes to single bytes or back to their default format. It's equivalent to opening the Edit Data Operand dialog and selecting Single bytes displayed as hex, or selecting Default.

This can be handy if the default format for a range of bytes is a string, but you want to see it as bytes or set a label in the middle.

Format As Word

This is a quick way to format pairs of bytes as 16-bit words. It's equivalent to opening the Edit Data Operand dialog and selecting "16-bit words, little-endian", displayed as hex.

To avoid some confusing situations, it only works on sets of single-byte values. This means, for example, that you can't select a 10-byte string and have it turn into five 16-bit words. You can select as many bytes as you want, but they must come in pairs. (Remember that you can turn off auto-generation of strings and .FILLs with Toggle Data Scan.)

As a special case, if you select a single byte, the following byte will also be selected. This won't work if the following byte is part of a multi-byte data item, is the start of a new region (see Edit Data Operand for a definition of what splits a region), or is the last byte in the file.

Remove Formatting

Removes instruction and data operand formatting from the selected lines. This removes the visible formatting as well as any formatting instructions that got embedded inside multi-byte data items. (You will be notified of such things in the message list.)

This will also remove any labels that are embedded in multi-byte items, without removing visible labels.

Toggle Data Scan

This menu item is in the Edit menu, and acts as a shortcut to opening the Project Properties editor, and clicking on the Analyze Uncategorized Data checkbox. When enabled, SourceGen will look for character strings and regions of identical bytes, and generate .STR and .FILL directives. When disabled, uncategorized data is presented as one byte per line, which can be handy if you're trying to get at a byte in the middle of a string.

As with all other project property changes, this is an undoable action.

Copying to Clipboard

When you use Edit > Copy, all lines selected in the code list are copied to the system clipboard. This can be a convenient way to post code snippets into forum postings or documentation. The text is copied from the data shown on screen, so your chosen capitalization and pseudo-ops will appear in the copy.

Long comments are included, but notes are not.

By default, only the label, opcode, operand, and comment fields are included. From the app settings dialog you can select alternative formats that include additional columns.

A copy of all of the fields is also written to the clipboard in CSV format. If you have a spreadsheet like Excel, you can use Paste Special to put the data into individual cells.