6502 Macro Assembler in a single c++ file using the struse single file text parsing library. Supports most syntaxes. x65 was recently named Asm6502 but was renamed because Asm6502 is too generic, x65 has no particular meaning.
Every assembler seems to add or change its own quirks to the 6502 syntax. This implementation aims to support all of them at once as long as there is no contradiction.
To keep up with this trend x65 is adding the following features to the mix:
* Full expression evaluation everywhere values are used: [Expressions](#expressions)
* C style scoping within '{' and '}': [Scopes](#scopes)
* Reassignment of labels. This means there is no error if you declare the same label twice, but on the other hand you can do things like label = label + 2.
* [Local labels](#labels) can be defined in a number of ways, such as leading period (.label) or leading at-sign (@label) or terminating dollar sign (label$).
* [Directives](#directives) support both with and without leading period.
* Labels don't need to end with colon, but they can.
* No indentation required for instructions, meaning that labels can't be mnemonics, macros or directives.
* As far as achievable, support the syntax of other 6502 assemblers (Merlin syntax now requires command line argument, -endm adds support for sources using macro/endmacro and repeat/endrepeat combos rather than scoeps).
In summary, if you are familiar with any 6502 assembler syntax you should feel at home with x65. If you're familiar with C programming expressions you should be familiar with '{', '}' scoping and complex expressions.
There are no hard limits on binary size so if the address exceeds $ffff it will just wrap around to $0000. I'm not sure about the best way to handle that or if it really is a problem.
There is a sublime package for coding/building in Sublime Text 3 in the *sublime* subfolder.
The command line options specifies the source file, the destination file and what type of file to generate, such as c64 or apple II dos 3.3 or binary or an x65 specific object file. You can also generate a disassembly listing with inline source code or dump the available set of opcodes as a source file. The command line can also set labels for conditional assembly to allow for distinguishing debug builds from shippable builds.
In order to manage more complex projects linking multiple assembled object files is desirable and x65 builds object files that can be included in a final linking step.
Simply build code with or without a fixed address and the -obj filename.x65 command line argument, then use INCOBJ filename.x65 in a final linking source. The linking source can be assigned a fixed address for most targets or exported as a relocatable executable for Apple II GS.
### Relocatable executable
For Apple II GS OS executable. This output requires 65816 instructions to handle the larger memory and the entry point for code needs to be implemented correctly. Using the -mrg option merges all sections together so that 16 bit addressing is safe, otherwise different code or data segments could be loaded in different banks and 3 byte referencing is required. An important note is that I have not been significantly exposed to Apple II GS or 65816 so this feature is only guaranteed as far as being able to ensure the correctness without actually building a running piece of code.
Labels come in two flavors: **Addresses** (PC based) or **Values** (Evaluated from an expression). An address label is simply placed somewhere in code and a value label is followed by '**=**' and an expression. All labels are rewritable so it is fine to do things like NumInstance = NumInstance+1. Value assignments can be prefixed with '.const' or '.label' but is not required to be prefixed by anything, the CONST keyword should cause an error if the label is modified in the same source file.
*Local labels* exist inbetween *global labels* and gets discarded whenever a new global label is added. The syntax for local labels are one of: prefix with period, at-sign, exclamation mark or suffix with $, as in: **.local** or **!local** or **@local** or **local$**. Both value labels and address labels can be local labels.
```
Function: ; global label
ldx #32
.local_label ; local label
dex
bpl .local_label
rts
Next_Function: ; next global label, the local label above is now erased.
rts
```
### <a name="directives">Directives
Directives are assembler commands that control the code generation but that does not generate code by itself. Some assemblers prefix directives with a period (.org instead of org) so a leading period is accepted but not required for directives.
Start a section with a fixed addresss. Note that source files with fixed address sections can be exported to object files and will be placed at their location in the final binary output when loaded with **INCOBJ**.
Starts a relative code section. Relative sections require a name and sections that share the same name will be linked sequentially. The labels will be evaluated at link time.
The primary purpose of relative sections (sections that are not assembled at a fixed address) is to generate object files (.x65) that can be referenced from a linking source file by using **INCOBJ** and assigned an address at that point using the **LINK** directive. Object files can mix and match relative and fixed address sections and only the relative sections need to be linked using the **LINK** directive.
Sections named DirectPage_Stack and of a BSS type (default) determine the size of the direct page + stack for the executable. If multiple sections match this rule the size will be the sum of all the sections with this name.
Zeropage sections will be linked to a fixed address (default at the highest direct page addresses) prior to exporting the relocatable code. Zeropage sections in x65 is intended to allocate ranges of the zero page / direct page which is a bit confusing with OMF that has the concept of the direct page + stack segment.
Used in files assembled to object files to share a label globally. All labels that are not xdef'd are still processed but protected so that other objects can use the same label name without colliding. **XDEF <labelname>** must be specified before the label is defined, such as at the top of the file.
Non-xdef'd labels are kept private to the object file for the purpose of late evaluations that may refer to them, and those labels should also show up in .sym and vice files.
Include an object file for linking into this file. Object files are generated by the *-obj* command line option followed by a filename ("file.x65"). Any linked segments will be linked, and multiple linked files can be generated by using the [**EXPORT**](#export) directive.
There is currently object file support (use -obj <filename> argument to generate), the recommended file extension for object files is .x65. In order to access symbols from object file code use **XDEF**<labelname> prior to declaring a label within the object.
Note that the assembler will link all segments in a reasonable order (first code segments from current file, then code from other files, then data, then BSS segments), so using the **LINK** directive is intended to give more control but is not necessary for the linking process. **INCOBJ** is necessary for bringing in external objects though otherwise the linker won't know how to find the segments to link.
For c64 .prg files this prefixes the binary file with this address.
<aname="export">**EXPORT**
Allows saving multiple binary files (prg, a2b, bin, etc.) from a single source file build
```
section gamecode_level1
export _level1
```
will export the section "gamecode_level1" to (output_file)_level1.prg while other sections would be grouped together into (output_file).prg. This allows a single linking source to combine multiple loads overlapping the same memory area ending up in separate files.
<aname="align">**ALIGN**
```
align $100
```
Add bytes of 0 up to the next address divisible by the alignment. If the section is a fixed address (using an ORG directive) align will be applied at the location it was specified, but if the section is relative (using the SECTION directive) the alignment will apply to the start of the section.
When eval is encountered on a line print out "EVAL (\<line#\>) \<message\>: \<expression\> = \<evaluated expression\>" to stdout. This can be useful to see the size of things or debugging expressions.
<aname="bytes">**BYTES**
Adds the comma separated values on the current line to the assembled output, for example
```
RandomBytes:
bytes NumRandomBytes
{
bytes 13,1,7,19,32
NumRandomBytes = * - !
}
```
**byte** or **dc.b** are also recognized.
<aname="words">**WORDS**
Adds comma separated 16 bit values similar to how **BYTES** work. **word** or **dc.w** are also recognized.
Copies the string in quotes on the same line. The plan is to do a petscii conversion step. Use the modifier 'petscii' or 'petscii_shifted' to convert alphabetic characters to range.
Example:
```
text petscii_shifted "This might work"
```
<aname="include">**INCLUDE**
Include another source file. This should also work with .sym files to import labels from another build. The plan is for x65 to export .sym files as well.
Example:
```
include "wizfx.s"
```
<aname="incbin">**INCBIN**
Include binary data from a file, this inserts the binary data at the current address.
Insert multiple types of data or code at the current address. Import takes an additional parameter to determine what to do with the file data, and can accept reading in a portion of binary data.
* c64: same as **INCBIN** but skip first two bytes of file as if this was a c64 prg file
* text: include text data from another file, default is petscii otherwise add another directive from the **TEXT** directive
* object: same as **INCOBJ**
* symbols: same as **INCSYM**, specify list of desired symbols prior to filename.
After the filename for binary and c64 files follow comma separated values for skip data size and max load size. c64 mode will add the two extra bytes to the skip size.
Add a label pool for temporary address labels. This is similar to how stack frame variables are assigned in C.
A label pool is a mini stack of addresses that can be assigned as temporary labels with a scope ('{' and '}'). This can be handy for large functions trying to minimize use of zero page addresses, the function can declare a range (or set of ranges) of available zero page addresses and labels can be assigned within a scope and be deleted on scope closure. The format of a label pool is: "pool <poolname> start-end, start-end" and labels can then be allocated from that range by '<poolname><labelname>[.b][.w]' where .b means allocate one byte and .w means allocate two bytes. The label pools themselves are local to the scope they are defined in so you can have label pools that are only valid for a section of your code. Label pools works with any addresses, not just zero page addresses.
Example:
```
Function_Name: {
pool zpWork $f6-$100 ; these zero page addresses are available for temporary labels
zpWork zpTrg.w ; zpTrg will be $fe
zpWork zpSrc.w ; zpSrc will be $fc
lda #>Src
sta zpSrc
lda #<Src
sta zpSrc+1
lda #>Dest
sta zpDst
lda #<Dest
sta zpDst+1
{
zpWork zpLen ; zpLen will be $fb
lda #Length
sta zpLen
}
nop
{
zpWork zpOff ; zpOff will be $fb (shared with previous scope zpLen)
Hierarchical data structures (dot separated sub structures)
Structs helps define complex data types, there are two basic types to define struct members, and as long as a struct is declared it can be used as a member type of another struct.
The result of a struct is that each member is an offset from the start of the data block in memory. Each substruct is referenced by separating the struct names with dots.
Note that if the -endm command line option is used (macros are not defined with curly brackets but inbetween macro and endm*) this also affects rept so the syntax for a repeat block changes to
Note that in case a 4 digit hex value is used in 8 bit mode and an immediate mode is allowed but is not currently enable a two byte value will be emitted
```
lda #$0043 ; will be 16 bit regardless of accumulator mode if in 65816 mode
lda #$43 ; will be 8 bit or 16 bit depending on accumulator mode
A variety of directives and label rules to support Merlin assembler sources. Merlin syntax is supported in x65 since there is historic relevance and readily available publicly release source.
* [Pinball Construction Set source](https://github.com/billbudge/PCS_AppleII) (Bill Budge)
* [Prince of Persia source](https://github.com/jmechner/Prince-of-Persia-Apple-II) (Jordan Mechner)
To enable Merlin 8.16 syntax use the '-merlin' command line argument. Where it causes no harm, Merlin directives are supported for non-merlin mode.
*LABELS*
]label means mutable address label, also does not seem to invalidate local labels.
:label is perfectly valid, currently treating as a local variable
labels can include '?'
Merlin labels are not allowed to include '.' as period means logical or in merlin, which also means that enums and structs are not supported when assembling with merlin syntax.
*Expressions*
Merlin may not process expressions (probably left to right, parenthesis not allowed) the same as x65 but given that it wouldn't be intuitive to read the code that way, there are probably very few cases where this would be an issue.
Change processor. The first instance of XC will switch from 6502 to 65C02, the second switches from 65C02 to 65816. To return to 6502 use **XC OFF**. To go directly to 65816 **XC XC** is supported.
MX sets the immediate mode accumulator instruction size, it takes a number and uses the lowest two bits. Bit 0 applies to index registers (x, y) where 0 means 16 bits and 1 means 8 bits, bit 1 applies to the accumulator. Normally it is specified in binary using the '%' prefix.
LUP is Merlingo for loop. The lines following the LUP directive to the keyword --^ are repeated the number of times that follows LUP.
**MAC**
MAC is short for Macro. Merlin macros are defined on line inbetween MAC and <<<orEOM.MacroargumentsarelistedonthesamelineasMACandthemacroidentifieristhelabelpreceedingtheMACdirectiveonthesameline.
An old assembler directive that does not affect the assembler but if printed would insert a page break at that point.
**DS**
Define section, followed by a number of bytes. If number is positive insert this amount of 0 bytes, if negative, reduce the current PC.
**DUM**, **DEND**
Dummy section, this will not write any opcodes or data to the binary output but all code and data will increment the PC addres up to the point of DEND.
**PUT**
A variation of **INCLUDE** that applies an oddball set of filename rules. These rules apply to **INCLUDE** as well just in case they make sense.
**USR**
In Merlin USR calls a function at a fixed address in memory, x65 safely avoids this. If there is a requirement for a user defined macro you've got the source code to do it in.
SAV causes Merlin to save the result it has generated so far, which is somewhat similar to the [EXPORT](#export) directive. If the SAV name is different than the source name the section will have a different EXPORT name appended and exported to a separate binary file.
LNK links the contents of an object file, to fit with the named section method of linking in x65 this keyword has been reworked to have a similar result, the actual linking doesn't begin until the current section is complete.
CYC starts and stops a cycle counter, x65 scoping allows for hierarchical cycle listings but the first merlin directive CYC starts the counter and the next CYC stops the counter and shows the result. This is 6502 only until data is entered for other CPUs.
Expressions contain values, such as labels or raw numbers and operators including +, -, \*, /, & (and), | (or), ^ (eor), << (shift left), >> (shift right) similar to how expressions work in C. Parenthesis are supported for managing order of operations where C style precedence needs to be overridden. In addition there are some special characters supported:
Avoid using parenthesis as the first character of the parameter of an opcode that can be relative addressed instead of an absolute address. This can be avoided by
A macro can be defined by the using the directive macro and includes the line within the following scope:
Example:
```
macro ShiftLeftA(Source) {
rol Source
rol A
}
```
The macro will be instantiated anytime the macro name is encountered:
```
lda #0
ShiftLeftA($a0)
```
The parameter field is optional for both the macro declaration and instantiation, if there is a parameter in the declaration but not in the instantiation the parameter will be removed from the macro. If there are no parameters in the declaration the parenthesis can be omitted and will be slightly more efficient to assemble, as in:
Currently macros with parameters use search and replace without checking if the parameter is a whole word, the plan is to fix this.
## <a name="scopes">Scopes
Scopes are lines inbetween '{' and '}' including macros. The purpose of scopes is to reduce the need for local labels and the scopes nest just like C code to support function level and loops and inner loop scoping. '!' is a label that is the first address of the scope and '%' the first address after the scope.
Additionally scopes have a meaning for counting cycles when exporting a .lst file, each open scope '{' will add a new counter of CPU cycles that will accumulate until the corresponding '}' which will be shown on that line in the listing file. Use -lst as a command line option to generate a listing file.
* Marc dePeo, helping me uncover the strange and unique world of Merlin's assembler syntax (and working together with me on True Crime NY gameplay code and more)
* Che Lalic, explaining the murky bits of 65816 (and a Ninja on SNES NBA Hangtime and other projects)
* John Brooks, sharing the Rastan Apple II source code so I could test 65816 and figure out a number of issues with my initial linker, and encouraging the implementation of Apple II GS OS executable file format / OMF export (and helping out with Playstation All-Stars)
* [Brutal Deluxe](http://www.brutaldeluxe.fr) for releasing the excellent OMF Analyzer tool and the source, which was a significant help generating Apple II GS OS executables.
* The C64 demo scene for sharing a great deal of 6502 programming resources and overall inspiration.
* Jordan Mechner, sharing the Prince of Persia Apple II source code so I could test out a significant part of the assembler and the Merlin syntax mode
* Bill Budge, sharing the Pinball Construction Set Apple II source code, although at the point I tried it, all of it just assembled without any assembler issues at all.
Looking for help testing various features of the assembler, I have a large number of tests that pass without fail but there are so many ways for assemblers to break.
Primarily tested with personal archive of sources written for Kick assmebler, DASM, TASM, XASM, etc. and passing most of Apple II Prince of Persia and Pinball Construction set.
* Removed the concept of linking by merging sections and instead keeping the sections separate and individually assigned memory addresses so they won't overlap.
* Fixed up Merlin LNK directive to work with new linker
* dump_x65 now shows the code offset of each section into the .x65 file which can be copied and pasted into the disassembler in case the object file assembler output needs to be inspected.
* A linker export summary is shown when building binary fixed address, this shows how the linker re-arranged the sections in memory. The section addresses are also included in the .lst file even if the section didn't generate any listing information, such as included object files.
* BSS sections are handled similar to CODE and DATA sections but will not write out BSS bytes at end of binary data. This should complete the section handling necessary to build a relocatable executable.
* Replaced the fixed address linker so it doesn't merge sections but just assigns addresses. This is more similar to how a relocatable code loader would handle it. I may need to merge sections for OMF to reduce the number of code sections.
* dc.t (3 bytes) dc.l (4 bytes) for data declaration