-0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- x65 Assembler ------------- x65 is an open source 6502 series assembler that supports object files, linking, fixed address assembling and a relocatable executable. Assemblers have existed for a long time and what they do is well documented, x65 tries to accomodate most expectations of syntax from Kick Assembler (a Java 6502 assembler) to Merlin (an Apple II assembler). For debugging, dump_x65 is a tool that will show all content of x65 object files, and x65dsasm is a disassembler intended to review the assembled result. Noteworthy features: * Code with sections, object files and linking or single file fixed address, or mix it up with fixed address sections in object files. * Assembler listing with cycle counting for code review. * Export multiple binaries with a single link operation. * C style scoping within '{' and '}' with local and pool labels respecting scopes. * Conditional assembly with if/ifdef/else etc. * Assembler directives representing a variety of features. * Local labels can be defined in a number of ways, such as leading period (.label) or leading at-sign (@label) or terminating dollar sign (label$). * String Symbols system allows building user expressions and macros during assembly. * Reassignment of symbols and labels by default. * No indentation required for instructions, meaning that labels can't be mnemonics, macros or directives. * Supporting the syntax of other 6502 assemblers (Merlin syntax requires command line argument, -endm adds support for sources using macro/endmacro and repeat/endrepeat combos rather than scoeps). * Apple II GS executable output. -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Contents -------- License Command line arguments CPU options Syntax Targets Listing Output Expressions Math expression symbols supported PC expression symbols supported Conditional operators Conditional assembly 65816 Data Macros Strings Structs and Enums Symbols Label Pool Sections Relocatable code and linking Merlin All Directives -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- License ------- Created by Carl-Henrik Skårstedt on 9/23/15. The MIT License (MIT) Copyright (c) 2015 Carl-Henrik Skårstedt Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Details, source and documentation at https://github.com/Sakrac/x65. "struse.h" can be found at https://github.com/Sakrac/struse, only the header file is required. -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Document Updates ---------------- Nov 23 2015 - Initial pass of x65 documentation Nov 24 2015 - More text Nov 26 2015 - String directive and more text -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Command line arguments ---------------------- Input, output and target options are set on the command line, many of these options can be controlled with assembler directives in code as well as the command line. x65 source target [options] Options include: * -i(path) : Add include path * -D(label)[=value] : Define a label with an optional value (otherwise defined as 1) * -cpu=6502/65c02/65c02wdc/65816: assemble with opcodes for a different cpu * -acc=8/16: set the accumulator mode for 65816 at start, default is 8 bits * -xy=8/16: set the index register mode for 65816 at start, default is 8 bits * -org = $2000 or - org = 4096: set the default start address of fixed address code * -obj (file.x65) : generate object file for later linking * -bin : Raw binary * -c64 : Include load address (default) * -a2b : Apple II Dos 3.3 Binary * -a2p : Apple II ProDos Binary * -a2o : Apple II GS OS executable (relocatable) * -mrg : Force merge all sections (use with -a2o) * -sym (file.sym) : symbol file * -lst / -lst = (file.lst) : generate disassembly text from result (file or stdout) * -opcodes / -opcodes = (file.s) : dump all available opcodes(file or stdout) * -sect: display sections loaded and built * -vice (file.vs) : export a vice symbol file * -merlin: use Merlin syntax * -endm : macros end with endm or endmacro instead of scoped('{' - '}') -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- CPU options ----------- The CPU can be defined on the command line with the -cpu= option, or as an assembler directive with the CPU directive. The supported CPU names are: * 6502 - basic 6502 instruction set * 6502ill - 6502 instruction set with illegal opcodes * 65C02 - basic 65C02 instruction set * 65c02WDC - 65C02 instruction set with added WDC instructions * 65816 - basic 65816 instruction set The CPU can be changed within a source file, the highest instruction count CPU will be used for -lst disassembly output. 65816 has additional states that the assembler needs to be aware of such as the accumulator and index register sizes (8 or 16 bit). These can be specified on the command line and using assembler directives like A16, A8, I16, I8 etc. -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Syntax ------ The syntax of x65 source is the result of trying to build code originally created for a variety of assemblers, including a number of open source games and old personal code. The primary syntax inspiration is from Kick Assembler, but also DASM, TASM and XASM. Most of the downloaded sample code was written for Apple II where Merlin, Orca and Lisa were referenced. Note that Merlin syntax requires the -merlin command line option. In normal mode x65 does not care about indentation, labels can be indented and instructions can be in column 1. In this mode labels can not use the same name as any directive or instruction and the same goes for macros, etc. Colons are optional for labels. Comments are line based and either semicolon or double forward slashes: ; comment // also a comment Local labels are any labels starting with ., !, @ or : or ending with $. A local label will be discarded after a scope ends ( '}' ) or after a global label is declared. { ; open scope ldx #2 dex beq .zero ; .zero is a local label within the current scope bne ! ; address of open scope ({) .zero } ; close scope Symbols are assigned with an equal sign or the EQU keyword and can be preceeded by 'CONST' to prevent changes: BitmapStart = $2000 CONST ColorMap EQU $400 Symbols can be removed using the UNDEF directive UNDEF BitmapStart ; BitmapStart is no longer defined By using the -merlin command line argument x65 is in Merlin syntax mode which restrics labels to be in column 1 and everything else in column 2 or higher. Merlin syntax also enables a number of Merlin specific assembler directives. See the Merlin section for more information. -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Targets ------- Most target file formats are just a binary executable code with a few bytes for load address and code size, with the exception of the Apple II GS relocatable executable. If building a fixed address target the initial address can be specified with the command line option "-org" or by using an ORG directive in the source. Multiple ORG statements is allowed in the source and inbetween space will be filled with zeroes. In order to support larger projects an intermediate (fully assembled) relocatable target format is available using the -obj command line option to generate a .x65 object file. More information about object files in Sections. Command line options for target output: * -org = $2000: set the default start address of fixed address code, default is $1000 * -obj (file.x65): generate object file for later linking * -bin : Raw binary * -c64 : Include load address (default) * -a2b : Apple II Dos 3.3 Binary (load address + file size) * -a2p : Apple II ProDos Binary (set org to $2000 otherwise binary) * -a2o : Apple II GS OS executable (relocatable) * -mrg : Force merge all sections (use with -a2o) The -mrg option will combine all segments into one to allow for 16 bit addressing to reach data in other segments, but will limit the size to fit into a 64 k bank. -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Listing Output ----------- The command line -lst option will enable list output which is a traditional way to review 6502 code. -lst=(filename) will write the list output to a file whereas -lst by itself will send the list output to stdout. The list output will be generated after the source has been assembled. The output will use spaces instead of tabs to keep the columns consistant in different editors. The order of lines in the list output will correspond to memory and not to the order of lines in the original code, and lines that doesn't generate data may be omitted. By using scoping '{' and '}' the listing starts and stops cycle counters, each cycle counter starting is marked by c>number and stopping by c1) / end (c<1 = ...) * Instruction (disassembled) * Cycle Count for Instruction * Source line that generated the data section Code c>1 Sin { $0000 a2 03 ldx #$03 2 ldx #3 c>2 { $0002 b5 e8 lda $e8,x 4 lda SinP.Ang,x $0004 95 ec sta $ec,x 4 sta SinP.R,x $0006 95 e4 sta $e4,x 4 sta SinP.W0,x $0008 95 f4 sta $f4,x 4 sta Mul824.A,x $000a 95 f0 sta $f0,x 4 sta Mul824.B,x $000c ca dex 2 dex $000d 10 f3 bpl $0002 2+ bpl ! c<2 = 24 + 1 } ; x^2, copy to W1 $000f a9 e0 lda #$e0 2 lda #SinP.W1 $0011 20 00 00 jsr $0000 6 jsr Multiply824S_Copy ; iterate value $0014 a0 00 ldy #$00 2 ldy #0 .SinIterate c>2 { ; W0 *= W1 $0016 a2 03 ldx #$03 2 ldx #3 c>3 { $0018 b5 e4 lda $e4,x 4 lda SinP.W0,x ; x^(1+2n) $001a 95 f4 sta $f4,x 4 sta Mul824.A,x $001c b5 e0 lda $e0,x 4 lda SinP.W1,x ; x^2 $001e 95 f0 sta $f0,x 4 sta Mul824.B,x $0020 ca dex 2 dex $0021 10 f5 bpl $0018 2+ bpl ! c<3 = 20 + 1 } -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Expressions ----------- Expressions contain values, such as labels or raw numbers and operators, the order of operations is based on C like precedence. Internally the expression is converted to reverse polish notation to make it easier to keep track of complex expressions. Values in expressions can be labels, symbols, strings (added as an expression within parenthesis) or raw decimal, binary or hexadecimal numbers. Math expression symbols supported: + Add two numbers (a+b) - Subtract one number from another (a-b) * Multiply two numbers (a*b) / Divide one number by another (a/b) & Logical and two numbers (a&b) | Logical or two numbers (a|b) ^ Logical exclusive or two numbers (a^b) << Shift value left (multiply a by 2^b) >> Shift value right (divide a by 2^b) ( Open parenthesis, override operator precedence ) Close parenthesis, end a parenthesis block PC expression symbols supported: * Current address (PC). This conflicts with the use of * as multiply so multiply will be interpreted only after a value or right parenthesis < If less than is not followed by another '<' in an expression this evaluates to the low byte of a value (and $ff) > If greater than is not followed by another '>' in an expression this evaluates to the high byte of a value (>>8) ^ Inbetween two values '^' is an eor operation, as a prefix to values it extracts the bank byte (v>>24). ! Start of scope (use like an address label in expression) % First address after scope (use like an address label in expression) $ Precedes hexadecimal value % If immediately followed by '0' or '1' this is a binary value and not scope closure address Conditional operators == Double equal signs yields 1 if left value is the same as the right value < If inbetween two values, less than will yield 1 if left value is less than right value > If inbetween two values, greater than will yield 1 if left value is greater than right value <= If inbetween two values, less than or equal will yield 1 if left value is less than or equal to right value >= If inbetween two values, greater than or equal will yield 1 if left value is greater than or equal to right value Example: lda #(((>SCREEN_MATRIX)&$3c)*4)+8 sta $d018 Avoid using parenthesis as the first character of the parameter of an opcode that can be relative addressed instead of an absolute address. This can be avoided by jmp (a+b) ; generates a relative jump jmp.a (a+b) ; generates an absolute jump jmp +(a+b) ; generates an absolute jump c = (a+b) jmp c ; generates an absolute jump jmp a+b ; generates an absolute jump -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Conditional assembly -------------------- IF / ELSE / ENDIF etc. works in a similar way to C, IF exp / ELIF exp assembles if the expression is non-zero, IFDEF symbol assembles if the symbol has been assigned. There isn't any particular restriction to what can be excluded in a non-assembling block of source. * ELIF - conditionals, "else if" following an IF or IFDEF condtion * ELSE - conditionals, following an IF or IFDEF or ELIF condition * ENDIF - conditionals, terminates a condition * IF - conditionals, start a block of conditional assembly if an expression evaluates to non-zero * IFDEF - conditionals, start a block of conditional assembly if a symbol or label exists at this point Example: if 0 this part of the source will not assemble, however a line can not start with a conditional assembler directive such as if, ifdef, else, elseif or endif within a block that does not assemble unless followed by a valid expression else ; this part of the source will assemble lda #0 rts endif -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- 65816 ----- 65816 is major expansion of 6502 and requires the assembler to be aware of what processor flags the user has set to select instructions. use -cpu=65816 on command line or CPU 65816 in source to set. * A16 - 65816, set accumulator immediate operators to 16 bit mode * A8 - 65816, set accumulator immediate operators to 8 bit mode * I16 - 65816, set index register immediate operators to 16 bit mode, same as XY16 * I8 - 65816, set index register immediate operators to 8 bit mode, same as XY8 * XY16 - 65816, set index register immediate operators to 16 bit mode, same as I16 * XY8 - 65816, set index register immediate operators to 8 bit mode, same as I8 -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Data ---- Data is any part of the binary that is not generate by assembler mnemonics, most of the directives declare specific data except for DS that declares a repeating value. * BYTE - data, define comma separated bytes * BYTES - data, same as byte * DC - data, define comma separated bytes (default), words, triples or longs (DC.B, DC.W, DC.T, DC.L) * DS - data, define repeated value, first value is count, optional is fill value, default is in bytes (DS.B, DS.W, DS.T, DS.L) * DV - data, same as DC but differentiated in DASM as allowing expressions * IMPORT - data and sections, load a file and include it in the assembly based on the argument * INCBIN - data, load a file and insert it at the current address * INCDIR - data and control, add a directory to search for INCLUDE, INCBIN, INCOBJ or IMPORT files in * LONG - data, define comma separated 32 bit values * TEXT - data, insert text at the current address optionally with a filter * WORD - data, insert comma separated 16 bit values, same as WORDS * WORDS - data, insert comma seperated 16 bit values, same as WORD Example: ONE_824 = 1<<24 ; 1 as a 8.24 number CosInvPermute: ; 1 + long -(ONE_824 + 1)/(2) ; x^2 * this long (ONE_824 + 3*4)/(2*3*4) ; x^4 * this long -(ONE_824 + 3*4*5*6)/(2*3*4*5*6) ; x^6 * this long -(ONE_824 + 3*4*5*6*7*8)/(2*3*4*5*6*7*8) ; x^8 * this -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Macros ------ The default macro syntax is similar to a C inline function, using the directive MACRO. MACRO [name](parameter1, parameter2, etc.) { lda #parameter1 sta parameter2 } To use the macro use the name and specify parameters: [name](1,dest) The parenthesis are optional both for the macro declaration and for the macro instantiation so macros can be used as if they were instructions MACRO neg address { sec lda #0 sbc source sta source } MACRO nega { eor #$ff sec adc #0 } Now 'neg' and 'nega' can be used as if it was an instruction: neg $7f80 ; negate byte at this hard coded address for some reason lda #$6c nega ; negate accumulator In order to support code written for other assemblers the -endm command line option changes the syntax for macro declarations to start on the line after MACRO and end before the line starting with ENDM or ENDMACRO: MACRO inca sec adc #0 ENDMACRO Directives for macros: * MACRO - macros, start a macro declaration -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Strings ------- Strings are special symbols that contain text and was included in an effort to support ORCA macros. The difference with ORCA and other assemblers is that the macros can build up string symbols (along with value symbols) and combine results into a more powerful macro system. x65 now supports the same mechanism but not the same exact keywords. Strings can be created and passed in as a value symbol in expressions or used directly as a macro (without parameters). Strings are defined using the STRING directive followed by the string name and an equal sign followed by a string expression. Strings can include value symbols which will be evaluated and represented by $ + the hexadecimal representation of the value. The UNDEF directive can be used to remove String Symbols. Example: STRING exp = "1 + 2 + 3" EVAL exp result (output): EVAL(2): "exp" = "1 + 2 + 3" = $6 Example: STRING code_str = "lda #0\nsta $fe" code_str result (code): lda #0 sta $fe Example: STRING concat_example = "ldx #0" concat_example += Directives for String Symbols * STRING - declare a string symbol * UNDEF - remove a string -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Structs and Enums ----------------- Hierarchical data structures (dot separated sub structures) Structs helps define complex data types, there are two basic types to define struct members, and as long as a struct is declared it can be used as a member type of another struct. The result of a struct is that each member is an offset from the start of the data block in memory. Each substruct is referenced by separating the struct names with dots. To get the size of a struct simply use the automatic 'bytes' member as in .bytes Example: struct MyStruct { byte count word pointer } struct TwoThings { MyStruct thing_one MyStruct thing_two } struct Mixed { word banana TwoThings things } Eval Mixed.bytes Eval Mixed.things Eval Mixed.things.thing_two Eval Mixed.things.thing_two.pointer Eval Mixed.things.thing_one.count results in the output: EVAL(15): "Mixed.bytes" = $3 EVAL(16): "Mixed.things" = $2 EVAL(27): "Mixed.things.thing_two" = $5 EVAL(28): "Mixed.things.thing_two.pointer" = $6 EVAL(29): "Mixed.things.thing_one.count" = $2 * ENUM - structs and enums, declare enumerations like C * STRUCT - structs and enums, declare a C-like structure of symbols separated by dots -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Symbols ------- Symbols are assigned with an equal sign or the keyword EQU or defined as labels within code. Structs and Enums are structured symbols. INCSYM can be used to reference symbols from previous assembled binary executables: INCSYM EntryPoint "Binary.sym" EntryPoint is defined from the previously assembled code using an optional symbol file. * INCSYM - symbols, include all or specific symbols from a .sym file * LABEL - symbols, optional prefix to symbol assignments * LABPOOL - symbols, a stack-like pool of addresses, same as POOL * STRUCT - structs and enums, declare a C-like structure of symbols separated by dots * POOL - symbols, a stack-like pool of addresses, same as LABPOOL * CONST - symbols, declare assigned symbol as constant and if changed cause an error * XDEF - sections, declare a label as external which can be referenced in other source files by using XREF * XREF - sections, reference a label that has been declared as global in another file by using XDEF * UNDEF - symbols, erase a symbol or string -0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0--0- Label Pool ---------- Add a label pool for temporary address labels. This is similar to how stack frame variables are assigned in C. A label pool is a mini stack of addresses that can be assigned as temporary labels with a scope ('{' and '}'). This can be handy for large functions trying to minimize use of zero page addresses, the function can declare a range (or set of ranges) of available zero page addresses and labels can be assigned within a scope and be deleted on scope closure. The format of a label pool is: "pool [pool name] start-end, start-end" and labels can then be allocated from that range by [pool name] [label name][.b][.w] where .b means allocate one byte and .w means allocate two bytes. The label pools themselves are local to the scope they are defined in so you can have label pools that are only valid for a section of your code. Label pools works with any addresses, not just zero page addresses. Example: ``` Function_Name: { pool zpWork $f6-$100 ; zero page addresses for temporary labels zpWork zpTrg.w ; zpTrg will be $fe zpWork zpSrc.w ; zpSrc will be $fc lda #>Src sta zpSrc lda #Dest sta zpDst lda #