mirror of
https://github.com/RevCurtisP/C02.git
synced 2024-11-24 15:31:17 +00:00
1096 lines
46 KiB
Plaintext
1096 lines
46 KiB
Plaintext
INTRODUCTION
|
||
|
||
C02 is a simple C-syntax language designed to generate highly optimized
|
||
code for the 6502 microprocessor. The C02 specification is a highly
|
||
specific subset of the C standard with some modifications and extensions
|
||
|
||
PURPOSE
|
||
|
||
Why create a whole new language, particularly one with severe restrictions,
|
||
when there are already full-featured C compilers available? It can be
|
||
argued that standard C is a poor fit for processors like the 6502. The C
|
||
was language designed to translate directly to machine language instructions
|
||
whenever possible. This works well on 32-bit processors, but requires either
|
||
a byte-code interpreter or the generation of complex code on a typical
|
||
8-bit processor. C02, on the other hand, has been designed to translate
|
||
directly to 6502 machine language instructions.
|
||
|
||
The C02 language and compiler were designed with two goals in mind.
|
||
|
||
The first goal is the ability to target machines with low memory: a few
|
||
kilobytes of RAM (assuming the generated object code is to be loaded into
|
||
and ran from RAM), or as little as 128 bytes of RAM and 2 kilobytes of ROM
|
||
(assuming the object code is to be run from a ROM or PROM).
|
||
|
||
The compiler is agnostic with regard to system calls and library functions.
|
||
Calculations and comparisons are done with 8 bit precision. Intermediate
|
||
results, array indexing, and function calls use the 6502 internal registers.
|
||
While this results in compiled code with virtually no overhead, it severely
|
||
restricts the syntax of the language.
|
||
|
||
The second goal is to port the compiler to C02 code so that it may be
|
||
compiled by itself and run on any 6502 based machine with sufficient memory
|
||
and appropriate peripherals. This slightly restricts the implementation of
|
||
code structures.
|
||
|
||
SOURCE AND OUTPUT FILES
|
||
|
||
C02 source code files are denoted with the .c02 extension. The compiler
|
||
reads the source code file, processes it, and generates an assembly
|
||
language file with the same name as the source code file, but with
|
||
the .asm extension instead of the .c02 extension. This assembly language
|
||
file is then assembled to create the final object code file.
|
||
|
||
Note: The default implementation of the compiler creates assembly
|
||
language code formatted for the DASM assembler. The generation of the
|
||
assembly language is parameterized, so it may be easily changed to
|
||
work with other assemblers.
|
||
|
||
COMMENTS
|
||
|
||
The parser recognizes both C style and C++ style comments.
|
||
|
||
C style comments begin with /* and end at next */. Nested C style comments
|
||
are not supported.
|
||
|
||
C++ style comments begin with // and end at the next newline. C++ style
|
||
comments my be nested inside C style comments.
|
||
|
||
DIRECTIVES
|
||
|
||
Directives are special instructions to the compiler. Depending on the
|
||
directive, it may or may not generate compiled code. A directive is
|
||
denoted by a leading # character. Unlike a statements, a directives is
|
||
not followed by a semicolon.
|
||
|
||
Note: Unlike standard C and C++, which use a preprocessor to process
|
||
directives, the C02 compiler processes directives directly.
|
||
|
||
DEFINE DIRECTIVE
|
||
|
||
The #define directive is used to define constants (see below).
|
||
|
||
INCLUDE DIRECTIVE
|
||
|
||
The #include directive causes the compiler to read and process and external
|
||
file. In most cases, #include directives will be used with libraries of
|
||
function calls, but they can also be used to modularize the code that makes
|
||
up a program.
|
||
|
||
An #include directive is followed by the file name to be included. This
|
||
file name may be surrounded with either a < and > character, or by two "
|
||
characters. In the former case, the compiler looks for the file in an
|
||
implementation specific library directory (the default being ./include),
|
||
while in the latter case, the compiler looks for the file in the current
|
||
working directory. Two file types are currently supported.
|
||
|
||
Header files are denoted by the .h02 extension. A header file is used to
|
||
provide the compiler with the information necessary to use machine
|
||
language system and/or library routines written in assembly language,
|
||
and consists of comments and declarations. The declarations in a header
|
||
file added to the symbol table, but do not directly generate code. After
|
||
a header file has been processed, the compiler reads and process a
|
||
assembly language file with the same name as the header file, but with
|
||
the .a02 extension instead of the .h02 extension.
|
||
|
||
The compiler does not currently generate any assembler required
|
||
pseudo-operators, such as the specification of the target processor,
|
||
or the starting address of the assembled object code. Therefore, at least
|
||
one header file, with an accompanying assembly language file is needed
|
||
in order to successfully assemble the compiler generated code. Details
|
||
on the structure and implementation of a typical header file can be
|
||
found in the file header.txt.
|
||
|
||
Assembly language files are denoted by the .a02 extension. When the
|
||
compiler processes an assembly language file, it simply inserts the contents
|
||
of the file into the generated code.
|
||
|
||
PRAGMA DIRECTIVE
|
||
|
||
The #pragma directive is used to set various compiler options. When using
|
||
a #pragma directive it is followed by the pragma name and possibly an
|
||
option, each separated by whitespace.
|
||
|
||
Note: The various pragma directives are specific to the cross-compiler and
|
||
may be changed or omitted in future versions of the compiler.
|
||
|
||
PRAGMA ASCII
|
||
|
||
The #pragma ascii directive instructs the compiler to convert the characters
|
||
in literal strings to a form expected by the target machine.
|
||
|
||
Options:
|
||
#pragma ascii high //Sets the high bit to 1 (e.g. Apple II)
|
||
#pragma ascii invert //Swaps upper and lower case (e.g. PETSCII)
|
||
|
||
PRAGMA ORIGIN
|
||
|
||
The #pragma origin directive sets the target address of compiled code.
|
||
|
||
Examples:
|
||
#pragma origin $0400 //Compiled code starts at address 1024
|
||
#pragma origin 8192 //Compiled code starts at address 8192
|
||
|
||
PRAGMA PADDING
|
||
|
||
The #pragma padding directive adds empty bytes to the end of the compiled
|
||
program. It should be used with target systems that require the object
|
||
code to be padded with a specific number of bytes.
|
||
|
||
Examples:
|
||
#pragma padding 1 //Add one empty byte to end of code
|
||
#pragma padding $FF //Add 255 empty bytes to end of code
|
||
|
||
PRAGMA RAMBASE
|
||
|
||
The #pragma rambase directive sets the base address for variables in RAM
|
||
(not declared const). This is normally used when the compiled code will be
|
||
stored in ROM (such as in an EPROM or Cartridge), but can be used any time
|
||
variables should be in a specific area of RAM.
|
||
|
||
Examples:
|
||
#pragma rambase $0200 //Define Variable RAM base address for NES
|
||
#pragma rambase 828 //Define Variable RAM in C64 Tape Buffer
|
||
#pragma rambase 4096 //Define RAM base for Vic 20 ROM cartridge
|
||
|
||
PRAGMA VARTABLE
|
||
|
||
The #pragma vartable directive forces the variable table to be written.
|
||
It should be used before any #include directives that need to generate
|
||
code following the table.
|
||
|
||
PRAGMA WRITEBASE
|
||
|
||
The #pragma writebase directive sets the base address for writing to variables
|
||
in RAM. This is used when target system uses different addresses for reading
|
||
and writing the same memory locations. This directive must be preceded by
|
||
#pragma rambase directive with a non-zero argument.
|
||
|
||
Examples:
|
||
#pragma rambase $F080 //Define Superchip RAM Read Base for Atari 2600
|
||
#pragma writebase $F000 //Define Superchip RAM Write Base for Atari 2600
|
||
|
||
Note: Setting a RAM write base causes the compiler to generate a write offset
|
||
which is concatenated to the variable name on all assignments.
|
||
|
||
PRAGMA ZEROPAGE
|
||
|
||
The #pragma zeropage directive sets the base address for variables declared
|
||
as zeropage.
|
||
|
||
Example:
|
||
#Pragma zeropage $80 //Start zeropage variables at address 128
|
||
|
||
LITERALS
|
||
|
||
A literal represents a value between 0 and 255. Values may be written as
|
||
a number (binary, decimal, osir hexadecimal) or a character literal.
|
||
|
||
A binary number consists of a % followed by eight binary digits (0 or 1).
|
||
|
||
A decimal number consists of one to three decimal digits (0 through 9).
|
||
|
||
A hexadecimal number consists of a $ followed by two hexadecimal digits
|
||
(0 through 9 or A through F).
|
||
|
||
A character literal consists of a single character surrounded by ' symbols.
|
||
A ' character may be specified by escaping it with a \.
|
||
|
||
Examples:
|
||
&0101010 Binary Number
|
||
123 Decimal Number
|
||
$FF Hexadecimal Number
|
||
'A' Character Literal
|
||
'\'' Escaped Character Literal
|
||
|
||
STRINGS
|
||
|
||
A string is a consecutive series of characters terminated by an ASCII null
|
||
character (a byte with the value 0).
|
||
|
||
A string literal is written as up to 255 printable characters. prefixed and
|
||
suffixed with " characters.
|
||
|
||
The " character and a subset of ASCII control characters can be specified
|
||
in a string literal by using escape sequences prefixed with the \ symbol:
|
||
|
||
\b $08 Backspace
|
||
\e $1B Escape
|
||
\f $0C Form Feed
|
||
\n $0A Line Feed
|
||
\r $0D Carriage Return
|
||
\t $09 Tab
|
||
\v $0B Vertical Tab
|
||
\" $22 Double Quotation Mark
|
||
\\ $5C Backslash
|
||
|
||
SYMBOLS
|
||
|
||
A symbol consists of an alphabetic character followed by zero to five
|
||
alphanumeric characters. Four types of symbols are supported: labels,
|
||
simple variables, variable arrays, and functions.
|
||
|
||
A label specifies a target point for a goto statement. A label is written
|
||
as a symbol suffixed by a : character.
|
||
|
||
A constant represents a literal value. A constant is written as a symbol
|
||
prefixed by the # character.
|
||
|
||
A simple variable represents a single byte of memory. A variable is written
|
||
as a symbol without a suffix.
|
||
|
||
A variable array represents a block of up to 256 continuous bytes in
|
||
memory. An Array reference are written as a symbol suffixed a [ character,
|
||
index, and ] character. The lowest index of an array is 0, and the highest
|
||
index is one less than the number of bytes in the array. There is no bounds
|
||
checking on arrays: referencing an element beyond the end of the array will
|
||
access indeterminate memory locations.
|
||
|
||
A function is a subroutine that receives multiple values as arguments and
|
||
optionally returns a value. A function is written as a symbol suffixed with
|
||
a ( character, up to three arguments separated by commas, and a ) character,
|
||
|
||
The special symbols A, X, and Y represent the 6502 registers with the same
|
||
names. Registers may only be used in specific circumstances (which are
|
||
detailed in the following text). Various C02 statements modify registers
|
||
as they are processed, care should be taken when using them. However, when
|
||
used properly, register references can increase the efficiency of compiled
|
||
code.
|
||
|
||
STATEMENTS
|
||
|
||
Statements include declarations, assignments, stand-alone function calls,
|
||
and control structures. Most statements are suffixed with ; characters,
|
||
but some may be followed with program blocks.
|
||
|
||
BLOCKS
|
||
|
||
A program block is a series of statements surrounded by the { and }
|
||
characters. They may only be used with function definitions and control
|
||
structures.
|
||
|
||
CONSTANTS
|
||
|
||
A constant is defined by using the #define directive followed the constant
|
||
name (without the # prefix) and the literal value to be assigned to it.
|
||
|
||
Examples:
|
||
#define TRUE $FF
|
||
#define FALSE 0
|
||
#define BITS %01010101
|
||
#define ZED 'Z'
|
||
|
||
ENUMERATIONS
|
||
|
||
An enumeration is a sequential list of constants. Enumerations are used to
|
||
generate sets of related but distinct values.
|
||
|
||
An enumeration is defined using an enum statement. When using the enum
|
||
keyword, it is followed by a { character, one or more constant names
|
||
separated by commas, and a } character. A period may be used in place
|
||
of a constant name, in which case the sequence will be skipped. The
|
||
enum statement is terminated with a semicolon.
|
||
|
||
Examples:
|
||
enum {BLACK, WHITE, RED, CYAN, PURPLE, GREEN, BLUE, YELLOW};
|
||
enum {., FIRST, SECOND, THIRD};
|
||
enum {ZERO, ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN};
|
||
|
||
Note: Values are automatically assigned to the constants in an enumeration.
|
||
The first constant in the enumeration is assigned the value 0, the second
|
||
is assigned the value 1, and so on.
|
||
|
||
BITMASKS
|
||
|
||
Bitmasks are a list of constants with values that correspond to each bit
|
||
in a byte. Bitmasks are used to allow multiple true/false flags to be
|
||
combined into a single char variable.
|
||
|
||
An enumeration is defined using a bitmask statement. When using the bitmask
|
||
keyword, it is followed by a { character, one to eight constant names
|
||
separated by commas, and a } character. A period may be used in place of a
|
||
constant name, in which case the bit value will be skipped. The bitmask
|
||
statement is terminated with a semicolon.
|
||
|
||
Examples:
|
||
bitmask {BLUE, GREEN, RED, BRIGHT, INVERT, BLINK, FLIP, BKGRND};
|
||
bitmask {RD, RTS, DTR, RI, CD, ., CTS, DSR};
|
||
|
||
Note: Values are automatically assigned to the constants in a bitmask,
|
||
each of which is a sequential power of two. The first constant in the
|
||
bitmask is assigned the value 1, the second is assigned the value 2,
|
||
the third is assigned the value 4, and so on.
|
||
|
||
DECLARATIONS
|
||
|
||
A declaration statement consists of type a keyword (char, int, or void)
|
||
followed one or more variable names (and optional definitions) or a single
|
||
function name and optional function block, or the struct keyword followed
|
||
by a structure name and either a definition or a variable name.
|
||
|
||
Variables may only be of type char or int, and all variable declaration
|
||
statements are suffixed with a ; character. Variables of type char may be
|
||
delared as arrays, by appending the variable name with the [ character,
|
||
the upper bound of the array (0 to 255), and the ] character.
|
||
|
||
Examples:
|
||
char c; //Defines 8-bit variable c
|
||
char hi,lo; //Defines 8-bit variables hi and lo
|
||
char r[7]; //Defines 8 byte array r
|
||
int addr; //Defines 16-bit variable addr
|
||
int i, j; //Defines 16-bit variables i and j
|
||
|
||
A function declaration consists of the function name suffixed with a (
|
||
character, followed by an optional parameter set and a ) character.
|
||
|
||
The parameter set, if specified, may be either one to three simple
|
||
char variables, a single int variable, or a char variable followed
|
||
by an int variable.
|
||
|
||
A function declaration terminated with a ; character is called a forward
|
||
declaration and does not generate any code, while one followed by a
|
||
program block creates the specified function.
|
||
|
||
Functions of type char and int explicitly return one or more values
|
||
(using a return statement), while functions of type void return no
|
||
explicit value.
|
||
|
||
The return statement causes the function to exit, after which control
|
||
returns to the statement immediately following the function call. If
|
||
the last statement before the closing } of the function body is not
|
||
a return, then an implicit return is assumed.
|
||
|
||
A return statement may be followed by an list of one to three variables
|
||
following the same rules as function arguments (see FUNCTION CALLS,
|
||
below), in which case those values are returned by the function call,
|
||
otherwise the function call will not return any explicit values.
|
||
|
||
Examples:
|
||
void myfunc(); //Forward declaration of function myfunc
|
||
char not(tmp) {tmp = tmp ^ $FF;}
|
||
char max(tmp1, tmp2) {if (tmp1 > tmp2) return tmp1; else return tmp2;}
|
||
char min(tmp1, tmp2) {if (tmp1 < tmp2) return tmp1; else return tmp2;}
|
||
char test(b,c,d) {if (b:-) return min(c,d); else return max(c,d);}
|
||
int swap(*,msb,lsb) {return *,lsb,msb;} //Swap bytes in integer
|
||
int incdec(m,i) {if (m:-) i--; else i++; return i}; //inc/dec integer
|
||
|
||
Note: Like all variables, function parameters are global. They must be
|
||
declared prior to the function declaration, and retain there values after
|
||
the function call. Although functions may be called recursively, they are
|
||
not re-entrant. Allocation of variables and functions is implementation
|
||
dependent, they could be placed in any part of memory and in any order.
|
||
The default behavior is to place variables directly after the program code,
|
||
including them as part of the generated object file.
|
||
|
||
Function arguments and return values are passed through the 6502 registers.
|
||
Char type values are passed by loading A, Y, and X respectively, while int
|
||
type values are passed by loading Y with the most significant byte and
|
||
X with the least significant bit.
|
||
|
||
A return statement without explicit return values will return whatever
|
||
happens to be in the registers at that time.
|
||
|
||
STRUCTS
|
||
|
||
A struct is a special type of variable which is composed of one or more
|
||
unique variables called members. Each member may be either a simple
|
||
variable or an array. However, the total size of the struct may not
|
||
exceed 256 characters.
|
||
|
||
Member names are local to a struct: each member within a struct must have
|
||
a unique name, but the same member name can be used in different structs
|
||
and may also have the same name as a global variable.
|
||
|
||
The struct keyword is used for both defining the members of a struct type
|
||
as well as declaring struct type variables.
|
||
|
||
When defining a struct type, the struct keyword is followed by the name of
|
||
the struct type, the { character, the member definitions separated by
|
||
commas, and the } character. The struct definition is terminated with a
|
||
semicolon. Each member definition is composed of a type keyword (char, int,
|
||
or struct) and one or more member names, separated with commas. If a member
|
||
is an array, the member name is suffixed the [ character, the upper bound of
|
||
the array, and the ] character. Each member definition is terminated with a
|
||
semicolon. Any number of comments may appear before the first member, between
|
||
members, and after the last member.
|
||
|
||
Member names are limited to six alphanumeric characters, the first of which
|
||
must be alphabetic. Any names are allowed including reserved words, as well
|
||
as A, X, and Y (which in this case, do not refer to registers).
|
||
|
||
When declaring a struct variable, the struct keyword is followed by the
|
||
struct type name, and one or more variable names, separated with commas.
|
||
The struct declaration is terminated with a semicolon.
|
||
|
||
Examples:
|
||
|
||
struct record {char name[8]; char index; int addr, char data[128];};
|
||
struct record rec;
|
||
struct record srcrec, dstrec;
|
||
struct point {char x, y;}
|
||
struct point pnt;
|
||
struct line {struct pnt bgnpnt, endpnt;}
|
||
|
||
Note: Unlike simple and array variable, the members of a struct variable
|
||
may not be initialized during declaration.
|
||
|
||
MODIFIERS
|
||
|
||
A modifier is used with a declaration to override the default properties of
|
||
an object. Modifiers may currently only be used with simple variable and
|
||
array declarations, although this may be expanded in the future.
|
||
|
||
The alias modifier specifies that a variable is to be located at a specific
|
||
address. The specified address may either be a literal in the range 0 to
|
||
65534 ($0 to $FFFF) or a previously defined variable name. When using the
|
||
alias modifier, the declared variable must be followed by the = character
|
||
and the literal or variable name to be aliased to.
|
||
|
||
The aligned modifier specifies that the the variable or array will start on
|
||
a page variable. This is used to ensure that accessing an array element will
|
||
not cross a page boundary, which requires extra CPU cycles to execute.
|
||
|
||
The const modifier specifies that a variable or array should not be changed
|
||
by program code. The const modifier may be preceded by an` aligned or
|
||
zeropage modifier.
|
||
|
||
A const variable declaration may include an initial value definition in
|
||
the form of an = character and literal after the variable name.
|
||
|
||
A const array is declared in one of two ways: the variable name
|
||
suffixed with a [ character, a literal specifying the upper bound of
|
||
the array, and a ] character; or a variable name followed by an = character
|
||
and string literal or series of atring and/or numeric literals separated by
|
||
commas and surrounded by the { or } characters.
|
||
|
||
The zeropage modifier specifies that the variable will be defined in page
|
||
zero (addresses 0 through 255). It should be used in conjunction with the
|
||
pragma zeropage directive.
|
||
|
||
Examples:
|
||
alias char putcon = $F001; //Defines variable putcon with address $F001
|
||
alias char alpha = omega; //Defines variable alpha aliased to omega
|
||
aligned char table[240]; //Defines 241 byte array aligned to page boundary
|
||
const char debug = #TRUE; //Defines variable debug initialized to constant
|
||
const char flag = 1; //Defines variable flag initialized to 1
|
||
const char s = "string"; //Defines 7 byte string s initialized to "string"
|
||
const char n = {1,2,3}; //Defines 3 byte array m containing 1, 2, and 3
|
||
const char m = {"abc", 123); //Defines 5 byte array containing string and byte
|
||
const char t = {"ab", "cd"); //Defines 6 byte array of two strings
|
||
aligned const char fbncci = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};
|
||
zeropage char ptr, tmp; //Defines zero page variables
|
||
|
||
EXPRESSIONS
|
||
|
||
An expression is a series of one or more terms separated by operators.
|
||
|
||
Each term in an expression may be any of the following:
|
||
function call (first term only)
|
||
subscripted array element
|
||
char type variable, struct member, constant, or literal
|
||
byte operation
|
||
register (A, X, or Y).
|
||
|
||
An expression may be preceded with a - character, in which case the first
|
||
term is assumed to be a literal 0.
|
||
|
||
Operators:
|
||
+ — Add the following value.
|
||
- — Subtract the following value.
|
||
& — Bitwise AND with the following value.
|
||
| — Bitwise OR with the following value.
|
||
^ — Bitwise Exclusive OR with the following value.
|
||
|
||
Arithmetic operators have no precedence. All operations are performed in
|
||
left to right order. Expressions may not contain parenthesis.
|
||
|
||
Note: the character ! may be substituted for | on systems that do not
|
||
support the latter character. No escaping is necessary because a ! may
|
||
not appear anywere a | would.
|
||
|
||
After an expression has been evaluated, the A register will contain the
|
||
result.
|
||
|
||
Note: Function calls are allowed in the first term of an expression
|
||
because upon return from the function the return value will be in the
|
||
Accumulator. However, due to the 6502 having only one Accumulator, which
|
||
is used for all operations between two bytes, there is no simple system
|
||
agnostic method for allowing function calls in subsequent terms.
|
||
|
||
BYTE OPERATIONS
|
||
|
||
Byte operation allows the the bytes in an integer value to be accessed
|
||
as individual character values. A byte operation consists of a byte
|
||
operator prefixed to an integer value.
|
||
|
||
Byte Operators:
|
||
< - Get Least Significant Byte
|
||
> - Get Most Significant Byte
|
||
|
||
The integer value may be an integer literal, an address, or an int type
|
||
variable or struct member.
|
||
|
||
Examples:
|
||
hi = >&r; lo = <&r; //Set hi, lo to MSB, LSB of address of array R
|
||
page = >53281; //Set page to MSB of the integer literal 53281
|
||
lsb = <count; //Set lsb to low byte of integer variable count
|
||
|
||
CONTENTIONS
|
||
|
||
An contention is a construct which generates either TRUE or FALSE condition,
|
||
and may be an expressions, comparisons, or test.
|
||
|
||
A stand-alone expression evaluates to TRUE if the result is non-zero, or
|
||
FALSE if the result is zero.
|
||
|
||
A comparison consists of an expression, a comparator, and a term (subscripted
|
||
array element, simple variable, literal, or constant).
|
||
|
||
Comparators:
|
||
= — Evaluates to TRUE if expression is equal to term
|
||
< — Evaluates to TRUE if expression is less than term
|
||
<= — Evaluates to TRUE if expression is less than or equal to term
|
||
> — Evaluates to TRUE if expression is greater than term
|
||
>= — Evaluates to TRUE if expression is greater than or equal to term
|
||
<> — Evaluates to TRUE if expression is not equal to term
|
||
|
||
The parser considers == equivalent to a single =. The operator <>
|
||
was chosen instead of the usual != because it simplified the parser design.
|
||
|
||
A test consists of an expression followed by a test-op.
|
||
|
||
Test-Ops:
|
||
:+ — Evaluates to TRUE if the result of the expression is positive
|
||
:- — Evaluates to TRUE if the result of the expression is negative
|
||
|
||
A negative value is one in which the high bit is a 1 (128 — 255), while a
|
||
positive value is one in which the high bit is a 0 (0 — 127). The primary
|
||
purpose of test operators is to check the results of functions that return
|
||
a positive value upon succesful completion and a negative value if an error
|
||
was encounters. They compile into smaller code than would be generated
|
||
using the equivalent comparison operators.
|
||
|
||
An contention may be preceded by negation operator (the ! character), which
|
||
reverses the result of the entire contention. For example:
|
||
! expr
|
||
evaluates to TRUE if expr is zero, or FALSE if it is non-zero; while
|
||
! expr = term
|
||
evaluates to TRUE if expr and term are not equal, or FALSE if they are; and
|
||
! expr :+
|
||
evaluates to TRUE if expr is negative, or FALSE if it is positive
|
||
|
||
Note: Contentions are compiled directly into 6502 conditional branch
|
||
instructions, which precludes their use inside expressions. Standalone
|
||
expressions and test-ops generate a single branch instruction, and
|
||
therefore result in the most efficient code. Comparisons generate a
|
||
compare instruction and one or two branch instructions (=. <. >=, and <>
|
||
generate one, while <= and > generate two). A preceding negation operator
|
||
will switch the number of branch instructions used in a comparison, but
|
||
otherwise does not change the size of the generated code.
|
||
|
||
CONDITIONALS
|
||
|
||
A conditional consists of one or more contentions joined with the
|
||
conjunctors "and" and "or".
|
||
|
||
If only one contention is present, the result of the conditional is the
|
||
same as the result of the contention.
|
||
|
||
If two contentions are joined with "and", then the conditional is true only
|
||
if both of the contentions are true. If either or both of the contentions
|
||
are false, then the conditional is false.
|
||
|
||
If two contentions are joined with "or", then the conditional is true if
|
||
either or both of the contentions are true. If both of the contentions are
|
||
false, then the conditional is false.
|
||
|
||
When more three or more contentions are chained together, the conjunctors
|
||
are evaluated in left to right order, using short-circuit evaluation. If
|
||
the contention to the left of an "and" is false, then the entire conditional
|
||
evaluates to false, and if the contention to the left of an "or" is true,
|
||
then the entire conditional evaluates to true. In either case, no further
|
||
contentions in the conditional are evaluated.
|
||
|
||
ARRAY SUBSCRIPTS
|
||
|
||
Individual elements of an array are accessed using subscript notation.
|
||
Subscripted array elements may be used as a terms in an expression, as
|
||
well as the target variable in an assignments. They are written as the
|
||
variable name suffixed with a [ character, followed by an index, and
|
||
the ] character.
|
||
|
||
When assigning to an array element, the index may be a literal, constant,
|
||
or simple variable.
|
||
|
||
When using an array element in an expression or pop statement, the index
|
||
may be any expression.
|
||
|
||
Examples:
|
||
z = r[i]; //Store the value from element i of array r into variable z
|
||
r[0] = z; //Store the value of variable z into the first element of r
|
||
z = d[15-i]; //Store the value element 15-i of array d into variable z
|
||
c = t[getc()]; //Get a character, translate using array t and store in c
|
||
|
||
Note: Register references may be used as array indexes within expressions,
|
||
but the contents of each registers may change with each term evaluation.
|
||
Using a constant, literal, or the X or Y registers as an array index will
|
||
generates the same amount of code as a simple variable reference and leave
|
||
both the X and Y registers unchanged. Using the A register as an index will
|
||
generate one extra byte of code, while using a simple variable as index
|
||
will generate 1 to 2 extra bytes of code. In either case, the index value
|
||
will be left in the X register. When an expression is used as an index,
|
||
one extra byte of stack space is used, and an additional three bytes of
|
||
code is generated. The X register will contain the result of the expression
|
||
and the Y register will be left in an unknown state.
|
||
|
||
STRUCTS
|
||
|
||
Individual members of a struct variable are specified using the struct
|
||
variable name, a period, and the member name. If a member is an array,
|
||
it's elements are accessed using the same syntax as an array variable.
|
||
|
||
A struct variable can also be treated like an array variable, with each
|
||
byte of the variable accessed as an array index.
|
||
|
||
Examples:
|
||
|
||
i = rec.index; //Get Struct Member
|
||
rec.data[i] = i; //Set Struct Member Element
|
||
arr[i] = rec[i]; //Copy Struct Byte into Array
|
||
|
||
Note: Unlike standard C, structs may not be assigned using an equals
|
||
sign. One struct variable may be copied to another byte by byte or
|
||
through a function call.
|
||
|
||
SIZE-OF OPERATOR
|
||
|
||
The size-of operator @ generates a literal value equal to the size in bytes
|
||
of a specified variable. It is allowed anywhere a literal would be and
|
||
should be used anytime the size of an array, struct, or member is required.
|
||
|
||
When using the size-of operator, it is prefixed to the variable name or
|
||
member specification.
|
||
|
||
Examples:
|
||
|
||
for (i=0; i<=@z; i++) z[i] = r[i]; //Copy elements from r[] to z[]
|
||
blkput(@rec ,&rec); //Copy struct rec to next block segment
|
||
memcpy(@rec.data, &rec.data); //Copy member data to destination array
|
||
|
||
Note: The size-of operator is evaluated at compile time and generates two
|
||
bytes of machine language code. It is the most efficient method of specifying
|
||
a variable length.
|
||
|
||
INDEX-OF OPERATOR
|
||
|
||
The index-of operator ? generates a literal value equal to the offset in bytes
|
||
of a specified structure member. It is allowed anywhere a literal would be and
|
||
should be used anytime the offset of the member of a struct is required.
|
||
|
||
When using the size-of operator, it is prefixed to the member specification.
|
||
|
||
Examples:
|
||
|
||
blkmem(?rec.data, &s); //Search block for segment where data contains s
|
||
memcpy(?rec.data, &t); //Copy bytes of rec up to member data into t
|
||
|
||
Note: The index-of operator is evaluated at compile time and generates two
|
||
bytes of machine language code. It is the most efficient method of specifying
|
||
a the offset of a struct member.
|
||
|
||
FUNCTION CALLS
|
||
|
||
A function call may be used as a stand-alone statement, or as the first
|
||
term in an expression. A function call consists of the function name
|
||
appended with a ( character, followed by zero to three arguments separated
|
||
with commas, and a closing ) character.
|
||
|
||
The first argument of a function call may be an expression, integer,
|
||
address, or string (see below).
|
||
|
||
The second argument may be a term (subscripted array element, simple
|
||
variable, or constant), integer, address, or string,
|
||
|
||
The third argument may only be a simple variable or constant.
|
||
|
||
If the first or second argument is an integer address or string, then
|
||
no more arguments may be passed.
|
||
|
||
When passing the address of a variable, array, struct, or struct member
|
||
into a function, the variable specification is prefixed with the
|
||
address-of operator &. When passing a literal string, it is simply
|
||
specified as is.
|
||
|
||
Examples:
|
||
c = getc(); //Get character from keyboard
|
||
n = abs(b+c-d); //Return the absolute value of result of expression
|
||
m = min(r[i], r[j]); //Return lesser of to array elements
|
||
l = strlen(&s); //Return the length of string s
|
||
p = strchr(c, &s); //Return position of character c in string s
|
||
putc(getc()); //Echo typed characters to screen
|
||
puts("Hello World"); //Write "Hello World" to screen
|
||
memdst(&dstrec); //Set struct variable as destination
|
||
memcpy(140, &srcrec); //Copy struct variable to destination struct
|
||
puts(&rec.name); //Write struct member to screen
|
||
|
||
Note: This particular argument passing convention has been chosen because
|
||
of the 6502's limited number of registers and stack processing instructions.
|
||
When an address is passed, the high byte is stored in the Y register and
|
||
the low byte in the X register. If a string is passed, it is turned into
|
||
anonymous array, and it's address is passed in the Y and X registers.
|
||
Otherwise, the first argument is passed in the A register, the second in
|
||
the Y register, and the third in the X register.
|
||
|
||
EXTENDED PARAMETER PASSING
|
||
|
||
To enable direct calling of machine language routines that that do not match
|
||
the built-in parameter passing convention, C02 supports the non-standard
|
||
statements push, pop, and inline.
|
||
|
||
The push statement is used to push arguments onto the machine stack prior
|
||
to a function call. When using a push statement, it is followed by one or
|
||
more arguments, separated by commas, and terminated with a semicolon. An
|
||
argument may be an expression, in which case the single byte result is
|
||
pushed onto the stack, or it may be an address or string, in which case the
|
||
address is pushed onto the stack, high byte first and low byte second.
|
||
|
||
The pop statement is likewise used to pop arguments off of the machine
|
||
stack after a function call. When using a pop statement, it is followed
|
||
with one or more simple variables or subscripted array elements , separated
|
||
by commas, and terminated with a semicolon. If any of the arguments are to
|
||
be discarded, a period may be specified instead of a variable name.
|
||
|
||
The number of arguments pushed and popped may or may not be the same,
|
||
depending on how the machine language routine manipulates the stack pointer.
|
||
|
||
Examples:
|
||
push d,r; mult(); pop p; //multiply d times r and store in p
|
||
push x1,y1,x2,y2; rect(); pop .,.,.,.; //draw rectangle from x1,y1 to x2,y2
|
||
push &s, "tail"; strcat(); //concatenate "tail" onto string s
|
||
push x[i],y[i]; rotate(d); pop x[i],y[i]; //rotate point x[1],y[i] by d
|
||
|
||
Note: The push and pop statements could also be used to manipulate the
|
||
stack inside or separate from a function, but this should be done with
|
||
care.
|
||
|
||
The inline statement is used when calling machine language routines that
|
||
expect constant byte or word values immediately following the 6502 JSR
|
||
instruction. A routine of this type will adjust the return address to the
|
||
point directly after the last instruction. When using the inline statement,
|
||
it is followed by one or more arguments, separated by commas, and
|
||
terminated with a semicolon. The arguments may be constants, addresses,
|
||
or strings.
|
||
|
||
Examples;
|
||
iprint(); inline "Hello World"; //Print "Hello World"
|
||
irect(); inline 10,10,100,100; //Draw rectangle from (10,10) to (100,100)
|
||
|
||
Note: If a string is specified in an inline statement, rather than creating
|
||
an anonymous string and compiling the address inline, the entire string will
|
||
be compiled directly inline.
|
||
|
||
ASSIGNMENTS
|
||
|
||
An assignment is a statement in which the result of an expression is stored
|
||
in a variable. An assignment usually consists of a simple variable or
|
||
subscripted array element, an = character, and an expression, terminated
|
||
with a ; character.
|
||
|
||
Examples:
|
||
i = i + 1; //Add 1 to contents variable i
|
||
c = getchr(); //Call function and store result in variable c
|
||
s[i] = 0; //Terminate string at position i
|
||
|
||
SHORTCUT-IFS
|
||
|
||
A shortcut-if is a special form of assignment consisting of an contention
|
||
and two expressions, of which one will be assigned based on the result
|
||
of the contention. A shortcut-if is written as a condition surrounded
|
||
by ( and ) characters, followed by a ? character, the expression to be
|
||
evaluated if the condition was true, a : character, and the expression to
|
||
be evaluated if the condition was false.
|
||
|
||
Example:
|
||
result = (value1 < value) ? value1 : value2;
|
||
|
||
Note: Shortcut-ifs may only be used with assignments. This may change in
|
||
the future.
|
||
|
||
POST-OPERATORS
|
||
|
||
A post-operator is a special form of assignment which modifies the value
|
||
of a variable. The post-operator is suffixed to the variable it modifies.
|
||
|
||
Post-Operators:
|
||
++ Increment variable (increase it's value by 1)
|
||
-- Decrement variable (decrease it's value by 1)
|
||
<< Left shift variable
|
||
>> Right shift variable
|
||
|
||
Post-operators may be used with either simple variables or subscripted
|
||
array elements.
|
||
|
||
Examples:
|
||
i++; //Increment the contents variable i
|
||
b[i]<<; //Left shift the contents of element i of array b
|
||
|
||
Note: Post-operators may only be used in stand-alone statements, although
|
||
this may change in the future.
|
||
|
||
ASSIGNMENTS TO REGISTERS
|
||
|
||
Registers A, X, and Y may assigned to using the = character. Register A
|
||
(but not X or Y) may be used with the << and >> post-operators, while
|
||
registers X and Y (but not A) may be used with the ++ and -- post-operators.
|
||
|
||
IMPLICIT ASSIGNMENTS
|
||
|
||
A statement consisting of only a simple variable is treated as an
|
||
implicit assignment of the A register to the variable in question.
|
||
|
||
This is useful on systems that use memory locations as strobe registers.
|
||
|
||
Examples:
|
||
HMOVE; //Move Objects (Atari VCS)
|
||
S80VID; //Enable 80-Column Video (Apple II)
|
||
|
||
Note: An implicit assignment generates an STA opcode with the variable
|
||
as the operand.
|
||
|
||
PLURAL ASSIGNMENTS
|
||
|
||
C02 allows a function to return up to three values by specifying multiple
|
||
variables, separated by commas, to the left of the assignment operator (=).
|
||
|
||
All three variables to be assigned may be either simple variables or
|
||
subscripted array elements. Registers are not allowed in plural assignments.
|
||
|
||
Examples:
|
||
row, col = scnpos(); //Get current screen position
|
||
cr, mn, mx = cpmnmx(a, b); //Compare two values, return min and max
|
||
x[i], y[i] = rotate(x[i],y[i],d); //Rotate x[i] and y[i] by d degrees
|
||
x[i], y[i], z[i] = get3d(i); //Generate 3d coordinate for index i
|
||
|
||
Note: When compiled, a plural assignment generates an STX for the third
|
||
assignment (if specified), an STY for the second assignment and an STA for
|
||
the first assignment. Using a subscripted array element for the third
|
||
assignment generates an overhead of three bytes of machine code.
|
||
|
||
GOTO STATEMENT
|
||
|
||
A goto statement unconditionally transfers program execution to the
|
||
specified label. When using a goto statement, it is followed by the
|
||
label name and a terminating semicolon.
|
||
|
||
Example:
|
||
goto end;
|
||
|
||
Note: A goto statement may be executed from within a loop structure
|
||
(although a break or continue statement is preferred), but should not
|
||
normally be used to jump from inside a function to outside of it, as
|
||
this would leave the return address on the machine stack.
|
||
|
||
IF AND ELSE STATEMENTS
|
||
|
||
The if then and else statements are used to conditionally execute blocks
|
||
of code.
|
||
|
||
When using the if keyword, it is followed by a conditional (surrounded by
|
||
parenthesis) and the block of code to be executed if the conditional was
|
||
true.
|
||
|
||
An else statement may directly follow an if statement (with no other
|
||
executable code intervening). The else keyword is followed by the block
|
||
of code to be executed if the conditional was false.
|
||
|
||
Examples:
|
||
if (c = 27) goto end;
|
||
if (n) q = div(n,d) else putln("Division by 0!");
|
||
if (r[q]<r[p]) {t=r[p],r[p]=r[q],r[q]=t)}
|
||
if (!>i | <i) puts("i is zero.");
|
||
if (>i > >j || >i = >j && <i > <j) k = i; else k = j;
|
||
|
||
Note: In order to optimize the compiled code, the if and else statements
|
||
are to 6502 relative branch instructions. This limits the amount of
|
||
generated code between the if statement and the end of the if/else block
|
||
to slightly less than 127 bytes. This should be sufficient in most cases,
|
||
but larger code blocks can be accommodated using function calls or goto
|
||
statements.
|
||
|
||
SELECT, CASE, AND DEFAULT STATEMENTS
|
||
|
||
The select, case, an default statements are used to execute a specific
|
||
block of code depending on the result of an expression.
|
||
|
||
When using the select keyword, it is followed by an expression (surrounded
|
||
by parenthesis) and an opening curly brace, which begins the select block.
|
||
This must then be followed by a case statement.
|
||
|
||
Each use of the case keyword is followed by one or more comma-separated
|
||
terms and a colon. If the term is equal to the select expression then the
|
||
code immediately following the is executed, otherwise, program execution
|
||
transfers to the next case or default statement.
|
||
|
||
The code between two case statements or a case and default statement is
|
||
called a case block. At the end of a case block, program execution
|
||
transfers to the end of the select block (the closing curly brace at
|
||
the end of the default block).
|
||
|
||
The last case block must be followed by a default statement. When using
|
||
the default keyword, it is followed by a colon. The code between the
|
||
default statement and the end of the select block (marked with a closing
|
||
curly-brace) is called the default block and is executed if none of
|
||
the case arguments matched the select expression.
|
||
|
||
If the constant 0 is to be used as an argument to any of the case
|
||
statements, using it as the first argument of the first case statement
|
||
will produce slightly more efficient code.
|
||
|
||
Example:
|
||
puts("You pressed ");
|
||
select (getc()) {
|
||
case $00: putln("Nothing");
|
||
case $0D: putln("The Enter key");
|
||
case ' ': putln("The space bar");
|
||
case 'A','a': putln ("The letter A");
|
||
case ltr: putln("The character in variable 'ltr'");
|
||
case s[2]: putln("The third character of string 's'");
|
||
default: putln("some other key");
|
||
}
|
||
|
||
Unlike the switch statement in C, the break statement is not needed to
|
||
exit from a case block. It may be used, however, to prematurely exit a
|
||
case block if desired.
|
||
|
||
Example:
|
||
select (arg) {
|
||
case foo:
|
||
puts("fu");
|
||
if (!bar) break;
|
||
puts("bar");
|
||
default: //do nothing
|
||
}
|
||
|
||
In addition, fall through of case blocks can be duplicated using the goto
|
||
statement with a label.
|
||
|
||
select (num)
|
||
case 1:
|
||
putc('I');
|
||
goto two;
|
||
case 2:
|
||
two:
|
||
putc('I');
|
||
default: //do nothing
|
||
}
|
||
|
||
Note: It's possible for multiple case statement arguments to evaluate to
|
||
the same value. In this case, only the first case block matching the
|
||
select expression will be executed.
|
||
|
||
WHILE LOOPS
|
||
|
||
The while statement is used to conditionally execute code in a loop. When
|
||
using the while keyword, it is followed by a conditional (surrounded by
|
||
parenthesis) and the the block of code to be executed while the conditional
|
||
is true. If the conditional is false when the while statement is entered,
|
||
the code in the block will never be executed.
|
||
|
||
Alternatively, the while keyword may be followed by a pair of empty
|
||
parenthesis, in which case a conditional of true is implied.
|
||
|
||
Examples:
|
||
c = 'A' ; while (c <= 'Z') {putc(c); c++;} //Print letters A-Z
|
||
while() if (rdkey()) break; //Wait for a keypress
|
||
|
||
Note: While loops are compiled using the 6502 JMP statements, so the code
|
||
blocks may be arbitrarily large.
|
||
|
||
DO WHILE LOOPS
|
||
|
||
The do statement used with to conditionally execute code in a loop at
|
||
least once. When using the do keyword, it is followed by the block of
|
||
code to be executed, a while statement, a conditional (surrounded
|
||
by parenthesis), and a terminating semicolon.
|
||
|
||
A while statement that follows a do loop must contain a conditional.
|
||
The while statement is evaluated after each iteration of the loop, and
|
||
if it is true, the code block is repeated.
|
||
|
||
Examples:
|
||
do c = rdkey(); while (c=0); //Wait for keypress
|
||
do (c = getchr(); putchr(c); while (c<>13) //Echo line to screen
|
||
i=0; do {i++;} while (>i <= >j && <i < <j); //Count from 0 to J
|
||
|
||
Note: Unlike the other loop structures do/while statements do not use
|
||
6502 JMP instructions. This optimizes the compiled code, but limits
|
||
the code inside the loop to just under 127 bytes.
|
||
|
||
FOR LOOPS
|
||
|
||
The for statement allows the initialization, evaluation, and modification
|
||
of a loop condition in one place. For statements are usually used to
|
||
execute a piece of code a specific number of times, or to iterate through
|
||
a set of values.
|
||
|
||
When using the if keyword, it is followed by a pair of parenthesis
|
||
containing an initialization assignment statement (which is executed once),
|
||
a semicolon separator, a conditional (which determines if the code block
|
||
is executed), another semicolon separator, and an increment assignment
|
||
(which is executed after each iteration of the code block). This is then
|
||
followed by the block of code to be conditionally executed.
|
||
|
||
The assignments and conditional of a for loop must be populated. If an
|
||
infinite loop is desired, use a while () statement.
|
||
|
||
Examples:
|
||
for (c='A'; c<='Z'; c++) putc(c); //Print letters A-Z
|
||
for (i=strlen(s)-1;i:+;i--) putc(s[i]); //Print string s backwards
|
||
for (i=0;c>0;i++) {c=getc();s[i]=c} //Read characters into string s
|
||
|
||
Note: For loops are compiled using the 6502 JMP statements, so the code
|
||
blocks may be arbitrarily large. A for loop generates less efficient code
|
||
more than a simple while loop, but will always execute the increment
|
||
assignment on a continue.
|
||
|
||
BREAK AND CONTINUE
|
||
|
||
A break statement is used to exit out of a do, for, or while loop or a
|
||
case block. The continue statement is used to jump to the beginning of
|
||
a do, for, or while loop. Neither may be used outside it's corresponding
|
||
control structures.
|
||
|
||
When a break statement is encountered, program execution is transferred
|
||
to the statement immediately following the end of the block associated
|
||
with the innermost do, for, while, or case statement. When using the
|
||
break keyword, it is followed with a trailing semicolon.
|
||
|
||
When a continue statement is encountered, program execution is transferred
|
||
to the beginning of the block associated with the innermost do, for, or
|
||
while statement. In the case of a for statement, the increment assignment
|
||
is executed, followed by the conditional, and in the case of a while
|
||
statement, the conditional is executed. When using the continue keyword, it
|
||
is followed with a trailing semicolon.
|
||
|
||
Examples:
|
||
do {c=rdkey(); if (c=0) continue; if (c=27) break;} while (c<>13);`
|
||
for (i=0;i<strlen(s);i++) {if (s[i]=0) break; putchr(s[i]);}
|
||
while() {c=rdkey;if (c=0) continue;putchr(c);if (c=13) break;}
|
||
|
||
UNIMPLEMENTED FEATURES
|
||
|
||
The #define directive allows the definition of constants but not macros.
|
||
|
||
The #if, #else, and #endif directives are not recognized at all by the
|
||
compiler. They may be added in the future.
|
||
|
||
The only types recognized by the compiler are char and int. Int values
|
||
may only be used in limited contexts. Since the 6502 is an 8-bit processor,
|
||
multi-byte types would generate over-complicated code. In addition, the
|
||
signed and unsigned keywords are unrecognized, due to the 6502's limited
|
||
signed comparison functionality.
|
||
|
||
Because of the 6502's peculiar indirect addressing modes, pointers are not
|
||
currently implemented. Limited pointer operations may be implemented using
|
||
zero page variables in the future.
|