6502bench SourceGen: Tutorials

Back to index

The tutorials introduce SourceGen and cover some of the basic features. They skim lightly over some important concepts, like the difference between numeric and symbolic references, so reading the manual is recommended.

Tutorial #1: Basic Features

Start by launching SourceGen. The initial screen has a large center area with some buttons, and some mostly-empty windows on the sides. The buttons are shortcuts for menu items in the File menu.

Create the project

Click "Start new project".

The New Project window has three parts. The top-left window has a tree of known platforms, arranged by manufacturer. The top-right window provides some details on whichever platform is selected. The bottom window will have some information about the data file, once we choose one.

Scroll down in the list, and select "Generic 6502". Then click "Select File...", navigate to the SourceGen installation directory, open the Examples folder, then open the "Tutorial" folder. Select the file named "Tutorial1", and click "Open".

The filename now appears in the bottom window, along with an indication of the file's size.

Click "OK" to create the project.

Getting Around

The first thing we'll do is save the project. Some features create or load files from the directory where the project lives, so we want to establish that.

Select File > Save, which will bring up a standard save-file dialog. Make sure you're in still in the Examples/Tutorial folder. The default project file name is "Tutorial1.dis65", which is what we want, so just click "Save".

The display is divided into rows, one per line of disassembled code or data. This is a standard Windows "list view", so you can select a row by left-clicking anywhere in it. Use Ctrl+Click to toggle the selection on individual lines, and Shift+Click to select a range of lines. You can move the selection around with the up/down arrow keys and PgUp/PgDn. Scroll the window with the mouse wheel or by grabbing the scroll bar.

Each row is divided into nine columns. You can adjust the column widths by clicking and dragging the column dividers in the header. The columns on the right side of the screen are similar to what you'd find in assembly source code: label, opcode, operand, comment. The columns on the left are what you'd find in a disassembly (file offset, address, raw bytes), plus some information about processor status flags and line attributes that may or may not be useful to you. If you find any of these distracting, collapse the column.

Click on the fourth line down, which has address 1002. The line has a label, "L1002", and is performing an indexed load from L1017. Both of these labels were automatically generated, and are named for the address at which they appear. When you clicked on the line, a few things happened:

Click some other lines, such as address $100B and $1014. Note how the highlights and contents of other windows change.

Click on L1002 again, then double-click on the opcode ("LDA"). The selection jumps to L1017. When an operand references an in-file address, double-clicking on the opcode will take you to it. (Double-clicking on the operand itself opens a format editor; more on that later.)

With L1017 highlighted, double-click on the line that appears in the References window. Note the selection jumps to L1002. You can immediately jump to any reference.

At the top of the Symbols window on the right side of the screen is a row of buttons. Make sure "Auto" and "Addr" are selected. You should see three labels in the window (L1002, L1014, L1017). Double-click on L1014. The selection jumps to the appropriate line.

Select Edit > Find. Type "hello", and hit Enter. The selection will move to address $100E, which is a string that says "hello!". You can use Edit > Find Next to try to find the next occurrence (there isn't one). You can search for any text that appears in the rightmost columns (label, opcode, operand, comment).

Select Edit > Go To. You can enter a label, address, or file offset. Enter "100b" to set the selection to $100B.

Near the top-left of the SourceGen window is a set of toolbar icons. Click the curly left-pointing arrow, and watch the selection move. Click it again. Then click the curly right-arrow a couple of times. Whenever you jump around in the file by using the Go To feature, or by double-clicking on opcodes or lines in the side windows, the locations are added to a navigation history. The arrows let you move forward and backward through it.

Editing

Click the very first line of the file, which is a comment that says something like "6502bench SourceGen vX.Y.Z". There are three ways to open the comment editor:

  1. Select Actions > Edit Long Comment from the menu bar.
  2. Right click, and select Edit Long Comment from the pop-up menu. (This menu is exactly the same as the Actions menu.)
  3. Double-click the comment

Most things in the code list will respond to a double-click. Double-clicking on addresses, flags, labels, operands, and comments will open editors for those things. Double-clicking on a value in the "bytes" column will open a floating hex dump viewer. This is usually the most convenient way to edit something: point and click.

Double-click the comment to open the editor. Type some words into the upper window, and note that a formatted version appears in the bottom window. Experiment with the maximum line width and "render in box" settings to see what they do. You can hit Enter to create line breaks, or let SourceGen wrap lines for you. When you're done, click "OK". (Or hit Ctrl+Enter.)

When the dialog closes, you'll see your new comment in place at the top of the file. If you typed enough words, your comment will span multiple lines. You can select the comment by selecting any line in it.

Click on the comment, then shift-click on L1014. Right-click, and look at the menu. Nearly all of the menu items are disabled. Most editors are only enabled when a single instance of a relevant item is selected, so for example Edit Long Comment won't be enabled if you have an instruction selected.

Let's add a note. Click on $100E (the line with "hello!"), then select Actions > Edit Note. Type a few words, pick a color, and click "OK" (or hit Ctrl+Enter). Your note appears in the code, and also in the window on the bottom left. Notes are like long comments, with three key differences:

  1. You can't pick their line width, but you can pick their color.
  2. They don't appear in generated assembly sources, making them useful for leaving notes to yourself as you work.
  3. They're listed in the Notes window. Double-clicking them jumps the selection to the note, making them useful as bookmarks.

It's time to do something with the code. If you look at what the code does you'll see that it's copying several dozen bytes from $1017 to $2000, then jumping to $2000. It appears to be relocating the next part of the code before executing it. We want to let the disassembler know what's going on, so select the line at address $1017 and then Edit > Set Address. (Or double-click the "1017" in the Addr column.) In the Set Address dialog, type "2000", and hit Enter.)

Note the way the code list has changed. When you changed the address, the "JMP $2000" at L1014 found a home inside the bounds of the file, so the code tracer was able to find the instructions there.

From the menu, select Edit > Undo. Notice how everything reverts to the way it was. Now, select Edit > Redo. You can undo any change you make to the project. (The undo history is not saved in the project file, though, so when you exit the program the history is lost.)

Notice that, while the address column has changed, the offset column has not. File offsets never change, which is why they're shown here and in the References and Notes windows. (They can, however, be distracting, so you'll be forgiven if you reduce the offset column width to zero.)

On the line at address $2000, select Actions > Edit Label, or double-click on the label "L2000". Change the label to "MAIN", and hit Enter. The label changes on that line, and on the two lines that refer to address $2000. (If you're not sure what refers to address $2000, select that line and check the References window.)

On that same line, select Actions > Edit Comment. Type a short comment, and hit Enter. Your comment appears in the "comment" column.

Editing Instruction Operands

The operand in the LDA instruction at line $2000 refers to an address ($3000) that isn't part of the file. We want to create an equate directive to give it a name. With the line at $2000 selected, use Actions > Edit Operand, or double-click on "$3000". Select the "Symbol" radio button, then type "INPUT" in the text box. Click "OK".

Disappointed? Nothing seems to have happened. The problem is that we updated the operand to reference a symbol that doesn't exist. Open the operand editor again, but this time click on "Create Project Symbol" at the bottom left. Enter "INPUT" in the Label field, and click "OK", then click "OK" in the operand editor.

That's better. If you scroll up to the top of the project, you'll see that there's now a ".EQ" line for the symbol.

Operands that refer to in-file locations behave similarly. Select the line two down, at address $2005, and Actions > Edit Operand. Enter the symbol "IS_OK". (Note you don't actually have to click Symbol first -- if you just start typing as soon as the dialog opens, it'll select Symbol for you automatically.) Click "OK".

As before, nothing appears to have happened, but if you were watching carefully you would have noticed that the label at $2009 ("L2009") has disappeared. This happened because the code at $2005 used to have a numeric reference to $2009, and SourceGen automatically created a label. However, you changed the code at $2005 to have a symbolic reference to a symbol called "IS_OK", and there were no other numeric references to $2009, so the auto-label was no longer needed. Because IS_OK doesn't exist, the operand at $2005 is just formatted as a hexadecimal value.

Let's fix this. Select the line at address $2009, then Actions > Edit Label. Enter "IS_OK", and hit Enter. (NOTE: labels are case-sensitive, so it needs to match the operand at $2005 exactly.) You'll see the new label appear, and the operand at line $2005 will use it.

There's an easier way. Double-click on the "BCC" opcode at address $2005. This moves the selection to $2009. Double-click on the label field, and enter "IS_OK". Hit "OK".

You should now see that both the operand at $2005 and the label at $2009 have changed to IS_OK, accomplishing what we wanted to do in a single step. The key difference is that we haven't explicitly set a format for the BCC operand -- we just defined a label, and SourceGen used it automatically.

We could do the exact same thing by using Edit Operand on the BCC line, clicking the "Create Label" button, and typing "IS_OK". Sometimes one approach is more convenient than the other.

Editing Data Operands

There's some string and numeric data down at the bottom of the file. The final string appears to be multiple strings stuck together. (You may need to increase the width of the Operand column to see the whole thing.) Notice that the opcode for the very last line is '+', which means it's a continuation of the previous line. Long data items can span multiple lines, split every 64 characters (including delimiters), but they are still single items: selecting any part selects the whole.

Select the last line in the file, then Actions > Edit Operand. You'll notice that this dialog is much different from the one you got when editing the operand of an instruction. At the top it will say "65 bytes selected". You can format this as a single 65-byte string, as 65 individual items, or various things in between. For now, select "Single bytes", and then on the right, select "ASCII (low or high) character". Click "OK".

Each character is now on its own line. The selection still spans the same set of addresses.

Select address $203D on its own, then Actions > Edit Label. Set the label to "STR1". Move up a bit and select address $2030, then scroll to the bottom and shift-click address $2070. Select Actions > Edit Operand. At the top it should now say, "65 bytes selected in 2 groups". There are two groups because the presence of a label split the data into two separate regions. From the "Character encoding" pop-up select "Low or High ASCII" encoding, select the "mixed character and non-character" string type, then click "OK".

We now have two ".STR" lines, one for "string zero ", and one with the STR1 label and the rest of the string data. This is okay, but it's not really what we want. The code at $200B appears to be loading a 16-bit address from data at $2025, so we want to use that if we can.

Select Edit > Undo three times. You should be back to the state where there's a single ".STR" line at the bottom of the file, split across two lines with a '+'.

Select the line at $2026. This is currently formatted as a string, but that appears to be incorrect, so let's format it as individual bytes instead. There's an easy way to do that: use Actions > Toggle Single-Byte Format (or hit Ctrl+B).

The data starting at $2025 appears to be 16-bit addresses that point into the table of strings, so let's format them appropriately.

Double-click the operand column on line $2025 ("$30") to open the operand data format editor. Because you only have one byte selected, most of the options are disabled. This won't do what we want, so click "Cancel".

Select the line at $2025, then shift-click the line at $202E. Right-click and select Edit Operand. If you selected the correct set of bytes, the top line in the dialog should now say, "10 bytes selected". Because 10 is a multiple of two, the 16-bit formats are enabled. It's not a multiple of 3 or 4, so the 24-bit and 32-bit options are not enabled. Click the "16-bit words, little-endian" radio button, then over to the right, click the "Address" radio button. Click "OK".

We just told SourceGen that those 10 bytes are actually five 16-bit numeric references. SourceGen determined that the addresses are contained in the file, and created labels for each of them. Labels only work if they're on their own line, so the long string was automatically split into five separate ".STR" statements.

Use File > Save (or hit Ctrl+S) to save your work.

Generating Assembly Code

You can generate assembly source code from the disassembled data. Select File > Assembler (or hit Ctrl+Shift+A) to open the source generation and assembly dialog.

Pick your favorite assembler from the drop list at the top right, then click "Generate". An assembly source file will be generated in the directory where your project files lives, named after a combination of the project name and the assembler name. A preview of the assembled code appears in the top window. (It's a "preview" because it has line numbers added and is cut off after a certain limit.)

If you have a cross-assembler installed and configured, you can run it by clicking "Run Assembler". The output from the assembler will appear in the lower window, along with an indication of whether the assembled file matches the original. (Barring bugs in SourceGen or the assembler, it should always match exactly.)

Click "Close" to close the window.

End of Part One

At this point you know enough to work with a SourceGen project. Continue on to the next tutorial to learn more.

Tutorial #2: Advanced Features

This tutorial will walk you through some of the fancier things SourceGen can do. We assume you've already finished the Basic Features tutorial.

Split-Address Table Formatting

Start a new project. Select the Apple //e platform, click Select File and navigate to the Examples directory. In A2-Amper-fdraw, select AMPERFDRAW#061d60. Click "OK" to create the project.

Not a lot to see here -- just half a dozen lines of loads and stores. This particular program interfaces with Applesoft BASIC, so we can make it a bit more meaningful by loading an additional platform symbol file. Select Edit > Project Properties, then the Symbol Files tab. Click Add Symbol Files. The file browser starts in the RuntimeData directory. In the Apple folder, select Applesoft.sym65, and click Open. Click "OK" to close the project properties window.

The STA instructions now reference AMPERV, which is noted as a call vector. We can see the code setting up a jump (opcode $4c) to $1d70. As it happens, the start address of the code is $1d60 -- the last four digits of the filename -- so let's make that change. Double-click the initial .ORG statement, and change it from $2000 to $1d60. We can now see that $1d70 starts right after this initial chunk of code.

Select the line with address $1d70, then Actions > Hint As Code Entry Point. More code appears, but not much -- if you scroll down you'll see that most of the file is still data. The code at $1d70 searches through a table at $1d88 for a match with the contents of the accumulator. If it finds a match, it loads bytes from tables at $1da6 and $1d97, pushes them on the stack, and the JMPs away. This code is pushing a return address onto the stack. When the code at CHRGET returns, it'll return to that address. Because of a quirk of the 6502 architecture, the address pushed must be the target address minus one.

The first byte in the first address table at $1d97 (which has the auto-label L1D97) is $b4. The first byte in the second table is $1d. So the first address we want is $1db4 + 1 = $1db5.

Select the line at $1db5, and use Actions > Hint As Code Entry Point. More code appears, but again it's only a few lines. Let's dress this one up a bit. Set a label on the code at $1db5 called "FUNC". At $1d97, edit the data item (double-click on "$b4"), click "Single bytes", then type "FUNC" (note the text field gets focus immediately, and the radio button automatically switches to "symbolic reference" when you start typing). Click "OK". The operand at $1d97 should now say <FUNC-1. Repeat the process at $1da6, this time clicking the "High" part radio button below the symbol entry text box, to make the operand there say >FUNC. (If it says <FUNC-152, you forgot to select the High part.)

We've now changed the first entry in the table to symbolic references. You could repeat these steps for the remaining items, but there's a faster way. Click on the line at address $1d97, then shift-click the line at address $1da9 (which should be .FILL 12,$1e). Select Actions > Format Split-Address Table.

The message at the top should indicate that there are 30 bytes selected. In Address Characteristics, click the "adjusted for RTS/RTL" checkbox. As soon as you do, the first line of the Generated Addresses list should show the symbol "FUNC". The rest of the addresses will look like (+) T1DD0. The "(+)" means that a label was not found at that location, so a label will be generated automatically.

Down near the bottom, check the "add code entry hint if needed" checkbox. Because we saw the table contents being pushed onto the stack for RTS, we know that they're all code entry points.

Click "OK". The table of address bytes at $1d97 should now all be references to symbols -- 15 low parts followed by 15 high parts. If you scroll down, you should see nothing but instructions until you get to the last dozen bytes at the end of the file. (If this isn't the case, use Edit > Undo, then work through the steps again.)

The formatter did the same steps you went through earlier -- set a label, apply the label to the low and high bytes in the table, add a code entry point hint -- but did several of them at once.

We don't want to save this project, so select File > Close. When SourceGen asks for confirmation, click Discard & Continue.

Going Deeper

Start a new project. Select "Generic 6502". For the data file, navigate to the Examples directory, then from the Tutorials directory select "Tutorial2".

The first thing you'll notice is that we immediately ran into a BRK, which is a pretty reliable sign that we're not in a code section. The generic profile puts a code entry point hint on the first byte, but that's wrong here. This particular file begins with 00 20, which could be a load address (some C64 binaries look like this). So let's start with that assumption.

Click on the first line of code at address $1000, and select Actions > Remove Hints. The $20 got absorbed into a string. The string is making it hard to manipulate the next few bytes, so let's fix that by selecting Edit > Toggle Data Scan (Ctrl+D). This turns off the feature that looks for strings and .FILL regions, so now each uncategorized byte is on its own line.

You could select the first two lines and use Actions > Edit Operand to format them as a 16-bit little-endian hex value, but there's a shortcut: select only the first line of code, then Edit > Format As Word (Ctrl+W). It automatically grabbed the following byte and combined them. Since we believe $2000 is the load address for everything that follows, click on the line with address $1002, select Actions > Set Address, and enter "2000". With that line still selected, use Actions > Hint As Code Entry Point (Ctrl+H then Ctrl+C) to identify it as code.

That looks better, but it's branching off the bottom of the screen (unless you have a really tall screen or small fonts) because of all the intervening data. Use Edit > Toggle Data Scan to turn the string finder back on.

There are four strings starting at address $2004, each of which is followed by $00. These look like null-terminated strings, so let's make it official. But first, let's do it wrong. Click on the line with address $2004 to select it. Hold the shift key down, then double-click on the operand field of the line with address $2031 (i.e. double-click on the words "last string").

The Edit Data Operand dialog opens, but the null-terminated strings option is not available. This is because we didn't include the null byte on the last string. To be recognized as one of the "special" string types, every selected string must match the expected pattern.

Cancel out of the dialog. Hold the shift key down, and double-click on the operand on line $203c ($00). You should see "Null-terminated strings (4)" as an available option now (make sure the Character Encoding pop-up is set to "Low or High ASCII"). Click on that, then click "OK". The strings are now shown as .ZSTR operands.

It's wise to save your work periodically. Use File > Save to create a project file for Tutorial2.

Pointers and Parts

Let's move on to the code at $203d. It starts by storing a couple of values into direct page address $02/03. This appears to be setting up a pointer to $2063, which is a data area inside the file. So let's make it official.

Select the line at address $2063, and use Actions > Edit Label to give it the label "XDATA". Now edit the operand on line $203d, and set it to the symbol "XDATA", with the part "low". Edit the operand on line $2041, and set it to "XDATA" with the part "high". (Note the symbol text box gets focus immediately, so you can start typing the symbol name as soon as the dialog opens; you don't need to click around first.) If all went well, the operands should now read LDA #<XDATA and LDA #>XDATA.

Let's give the pointer a name. Select line $203d, and use Actions > Edit Local Variable Table to create an empty table. Click "New Symbol" on the right side. Set the Label field to "PTR1", the Value field to $02, and the width to 2 (it's a 2-byte pointer). Leave the Address button selected. Click "OK" to create the entry, and then "OK" to update the table.

There's now a ".var" statement (similar to a .equ) above line $203d, and the stores to $02/$03 have changed to "PTR1" and "PTR1+1".

Double-click on the JSR on line $2045 to jump to L209A. This just loads a value from $3000 into the accumulator and returns, so not much to see here. Hit the back-arrow in the toolbar to jump back to the JSR.

The next bit of code masks the accumulator so it holds a value between 0 and 3, then doubles it and uses it as an index into PTR1. We know PTR1 points to XDATA, which looks like it has some 16-bit addresses. The values loaded are stored in two more zero-page locations, $04-05.

Let's make these a pointer as well. Double-click the operand on line $204e ("$04"), and click "Create Local Variable". Set the Label to "PTR2" and the width to 2. Click "OK" to create the symbol, then "OK" to close the operand editor, which should still be set to Default -- we didn't actually edit the operand, we just used the operand edit dialog as a convenient way to create a local variable table entry. All accesses to $04/$05 now use PTR2, and there's a new entry in the local variable table we created earlier.

The next bit of code copies bytes from PTR2 to $0400, stopping when it hits a zero byte. Looks like this is copying null-terminated strings. This confirms our idea that XDATA holds 16-bit addresses, so let's format it. Select lines $2063 to $2066, and Actions > Edit Operand. It should say "8 bytes selected" at the top. Select "16-bit words, little-endian", and then from the Display As box, select "Address". Click "OK". XDATA should now be four .dd2 16-bit addresses. If you scroll up, you'll see that the .ZSTR strings near the top now have labels that match the operands in XDATA.

Now that we know what XDATA holds, let's rename it. Change the label to STRADDR. The symbol parts in the operands at $203d and $2041 update automatically.

Let's pause briefly to look at the cycle-count feature. Use Edit > Settings to open the app settings panel, then select the Asm Config tab. Click the "Show cycle counts" checkbox, then click "OK".

Every line with an instruction now has a cycle count on it. The cycle counts are adjusted for everything SourceGen can figure out. For example, the BEQ on line $205a shows "2+" cycles, meaning that it takes at least two cycles but might take more. That's because conditional branches take an extra cycle if the branch is taken. The BNE on line $2061" shows 3 cycles, because we know that the branch is always taken. (If you want to see why, look at the value of the 'Z' flag in the "flags" column. Lower-case 'z' means the zero-flag is clear.)

The cycle-count comments are included in assembled output as well. If you add an end-of-line comment, it appears after the cycle count.

Hit Ctrl+S to save your project. Make that a habit.

Odds & Ends

The rest of the code isn't really intended to do anything useful. It just exists to illustrate some odd situations.

Look at the code starting at $206b. It ends with a BRK at $2074, which as noted earlier is a bad sign. If you look two lines above the BRK, you'll see that it's loading the accumulator with zero, then doing a BNE, which should never be taken (note the cycle count for the BNE is 2). The trick is in the two lines before that, which use self-modifying code to change the LDA immediate operand from $00 to $ff. The BNE is actually a branch-always.

We can fix this by correcting the status flags. Select line $2072, and then Actions > Override Status Flags. This lets us specify what the flags should be before the instruction is executed. For each flag, we can override the default behavior and specify that the flag is clear (0), set (1), or indeterminate (could be 0 or 1). In this case, we know that the self-modified code will be loading a non-zero value, so in the "Z" column click on the button in the "Zero" row. Click "OK". The BNE is now an always-taken branch, and the code list rearranges itself appropriately (and the cycle count is now 3).

Continuing on, the code at $2079 touches a few consecutive locations. Edit the label on line $2074, setting it to "STUFF". Notice how the references to $2074 through $2077 have changed from auto-generated labels to references to STUFF. For some projects this may be undesirable. Use Edit > Project Properties, then in the Analysis Parameters box un-check "Seek nearby targets", and click "OK". You'll notice that the references to $2075 and later have switched back to auto labels. If you scroll up, you'll see that the references to PTR1+1 and PTR2+1 were not affected, because local variables use explicit widths rather than the "nearby" logic.

The nearby-target behavior is generally desirable, because it lets you avoid explicitly labeling every part of a multi-byte data item. For now, use Edit > Undo to switch it back on.

The code at $2085 looks a bit strange. LDX, then a BIT with a weird symbol, then another LDX. If you look at the "bytes" column, you'll notice that the three-byte BIT instruction has only one byte on its line. The trick here is that the LDX #$01 is embedded inside the BIT instruction. When the code runs through here, X is set to $00, then the BIT instruction sets some flags, then the STA runs. Several lines down there's a BNE to $2088, which is in the middle of the BIT instruction. It loads X with $01, then also continues to the STA.

Embedded instructions are unusual but not unheard-of. When you see the extra symbol in the opcode field, you need to look closely at what's going on.

Go Forth

That's it for the tutorials. There's significantly more detail on all aspects of SourceGen in the manual.

While you can do some fancy things, nothing you do will alter the data file. The assembled output will always match the original. So don't be afraid to play around.

If you want to work on something large over a long period, save your progress by putting the .dis65 project into a source code control system like git. Project files are stored in a text format that, while not meant to be human-readable, will yield reasonable diffs.