1
0
mirror of https://github.com/RevCurtisP/C02.git synced 2024-11-24 15:31:17 +00:00
C02/doc/c02.txt
2019-05-12 23:34:47 -04:00

1096 lines
46 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

INTRODUCTION
C02 is a simple C-syntax language designed to generate highly optimized
code for the 6502 microprocessor. The C02 specification is a highly
specific subset of the C standard with some modifications and extensions
PURPOSE
Why create a whole new language, particularly one with severe restrictions,
when there are already full-featured C compilers available? It can be
argued that standard C is a poor fit for processors like the 6502. The C
was language designed to translate directly to machine language instructions
whenever possible. This works well on 32-bit processors, but requires either
a byte-code interpreter or the generation of complex code on a typical
8-bit processor. C02, on the other hand, has been designed to translate
directly to 6502 machine language instructions.
The C02 language and compiler were designed with two goals in mind.
The first goal is the ability to target machines with low memory: a few
kilobytes of RAM (assuming the generated object code is to be loaded into
and ran from RAM), or as little as 128 bytes of RAM and 2 kilobytes of ROM
(assuming the object code is to be run from a ROM or PROM).
The compiler is agnostic with regard to system calls and library functions.
Calculations and comparisons are done with 8 bit precision. Intermediate
results, array indexing, and function calls use the 6502 internal registers.
While this results in compiled code with virtually no overhead, it severely
restricts the syntax of the language.
The second goal is to port the compiler to C02 code so that it may be
compiled by itself and run on any 6502 based machine with sufficient memory
and appropriate peripherals. This slightly restricts the implementation of
code structures.
SOURCE AND OUTPUT FILES
C02 source code files are denoted with the .c02 extension. The compiler
reads the source code file, processes it, and generates an assembly
language file with the same name as the source code file, but with
the .asm extension instead of the .c02 extension. This assembly language
file is then assembled to create the final object code file.
Note: The default implementation of the compiler creates assembly
language code formatted for the DASM assembler. The generation of the
assembly language is parameterized, so it may be easily changed to
work with other assemblers.
COMMENTS
The parser recognizes both C style and C++ style comments.
C style comments begin with /* and end at next */. Nested C style comments
are not supported.
C++ style comments begin with // and end at the next newline. C++ style
comments my be nested inside C style comments.
DIRECTIVES
Directives are special instructions to the compiler. Depending on the
directive, it may or may not generate compiled code. A directive is
denoted by a leading # character. Unlike a statements, a directives is
not followed by a semicolon.
Note: Unlike standard C and C++, which use a preprocessor to process
directives, the C02 compiler processes directives directly.
DEFINE DIRECTIVE
The #define directive is used to define constants (see below).
INCLUDE DIRECTIVE
The #include directive causes the compiler to read and process and external
file. In most cases, #include directives will be used with libraries of
function calls, but they can also be used to modularize the code that makes
up a program.
An #include directive is followed by the file name to be included. This
file name may be surrounded with either a < and > character, or by two "
characters. In the former case, the compiler looks for the file in an
implementation specific library directory (the default being ./include),
while in the latter case, the compiler looks for the file in the current
working directory. Two file types are currently supported.
Header files are denoted by the .h02 extension. A header file is used to
provide the compiler with the information necessary to use machine
language system and/or library routines written in assembly language,
and consists of comments and declarations. The declarations in a header
file added to the symbol table, but do not directly generate code. After
a header file has been processed, the compiler reads and process a
assembly language file with the same name as the header file, but with
the .a02 extension instead of the .h02 extension.
The compiler does not currently generate any assembler required
pseudo-operators, such as the specification of the target processor,
or the starting address of the assembled object code. Therefore, at least
one header file, with an accompanying assembly language file is needed
in order to successfully assemble the compiler generated code. Details
on the structure and implementation of a typical header file can be
found in the file header.txt.
Assembly language files are denoted by the .a02 extension. When the
compiler processes an assembly language file, it simply inserts the contents
of the file into the generated code.
PRAGMA DIRECTIVE
The #pragma directive is used to set various compiler options. When using
a #pragma directive it is followed by the pragma name and possibly an
option, each separated by whitespace.
Note: The various pragma directives are specific to the cross-compiler and
may be changed or omitted in future versions of the compiler.
PRAGMA ASCII
The #pragma ascii directive instructs the compiler to convert the characters
in literal strings to a form expected by the target machine.
Options:
#pragma ascii high //Sets the high bit to 1 (e.g. Apple II)
#pragma ascii invert //Swaps upper and lower case (e.g. PETSCII)
PRAGMA ORIGIN
The #pragma origin directive sets the target address of compiled code.
Examples:
#pragma origin $0400 //Compiled code starts at address 1024
#pragma origin 8192 //Compiled code starts at address 8192
PRAGMA PADDING
The #pragma padding directive adds empty bytes to the end of the compiled
program. It should be used with target systems that require the object
code to be padded with a specific number of bytes.
Examples:
#pragma padding 1 //Add one empty byte to end of code
#pragma padding $FF //Add 255 empty bytes to end of code
PRAGMA RAMBASE
The #pragma rambase directive sets the base address for variables in RAM
(not declared const). This is normally used when the compiled code will be
stored in ROM (such as in an EPROM or Cartridge), but can be used any time
variables should be in a specific area of RAM.
Examples:
#pragma rambase $0200 //Define Variable RAM base address for NES
#pragma rambase 828 //Define Variable RAM in C64 Tape Buffer
#pragma rambase 4096 //Define RAM base for Vic 20 ROM cartridge
PRAGMA VARTABLE
The #pragma vartable directive forces the variable table to be written.
It should be used before any #include directives that need to generate
code following the table.
PRAGMA WRITEBASE
The #pragma writebase directive sets the base address for writing to variables
in RAM. This is used when target system uses different addresses for reading
and writing the same memory locations. This directive must be preceded by
#pragma rambase directive with a non-zero argument.
Examples:
#pragma rambase $F080 //Define Superchip RAM Read Base for Atari 2600
#pragma writebase $F000 //Define Superchip RAM Write Base for Atari 2600
Note: Setting a RAM write base causes the compiler to generate a write offset
which is concatenated to the variable name on all assignments.
PRAGMA ZEROPAGE
The #pragma zeropage directive sets the base address for variables declared
as zeropage.
Example:
#Pragma zeropage $80 //Start zeropage variables at address 128
LITERALS
A literal represents a value between 0 and 255. Values may be written as
a number (binary, decimal, osir hexadecimal) or a character literal.
A binary number consists of a % followed by eight binary digits (0 or 1).
A decimal number consists of one to three decimal digits (0 through 9).
A hexadecimal number consists of a $ followed by two hexadecimal digits
(0 through 9 or A through F).
A character literal consists of a single character surrounded by ' symbols.
A ' character may be specified by escaping it with a \.
Examples:
&0101010 Binary Number
123 Decimal Number
$FF Hexadecimal Number
'A' Character Literal
'\'' Escaped Character Literal
STRINGS
A string is a consecutive series of characters terminated by an ASCII null
character (a byte with the value 0).
A string literal is written as up to 255 printable characters. prefixed and
suffixed with " characters.
The " character and a subset of ASCII control characters can be specified
in a string literal by using escape sequences prefixed with the \ symbol:
\b $08 Backspace
\e $1B Escape
\f $0C Form Feed
\n $0A Line Feed
\r $0D Carriage Return
\t $09 Tab
\v $0B Vertical Tab
\" $22 Double Quotation Mark
\\ $5C Backslash
SYMBOLS
A symbol consists of an alphabetic character followed by zero to five
alphanumeric characters. Four types of symbols are supported: labels,
simple variables, variable arrays, and functions.
A label specifies a target point for a goto statement. A label is written
as a symbol suffixed by a : character.
A constant represents a literal value. A constant is written as a symbol
prefixed by the # character.
A simple variable represents a single byte of memory. A variable is written
as a symbol without a suffix.
A variable array represents a block of up to 256 continuous bytes in
memory. An Array reference are written as a symbol suffixed a [ character,
index, and ] character. The lowest index of an array is 0, and the highest
index is one less than the number of bytes in the array. There is no bounds
checking on arrays: referencing an element beyond the end of the array will
access indeterminate memory locations.
A function is a subroutine that receives multiple values as arguments and
optionally returns a value. A function is written as a symbol suffixed with
a ( character, up to three arguments separated by commas, and a ) character,
The special symbols A, X, and Y represent the 6502 registers with the same
names. Registers may only be used in specific circumstances (which are
detailed in the following text). Various C02 statements modify registers
as they are processed, care should be taken when using them. However, when
used properly, register references can increase the efficiency of compiled
code.
STATEMENTS
Statements include declarations, assignments, stand-alone function calls,
and control structures. Most statements are suffixed with ; characters,
but some may be followed with program blocks.
BLOCKS
A program block is a series of statements surrounded by the { and }
characters. They may only be used with function definitions and control
structures.
CONSTANTS
A constant is defined by using the #define directive followed the constant
name (without the # prefix) and the literal value to be assigned to it.
Examples:
#define TRUE $FF
#define FALSE 0
#define BITS %01010101
#define ZED 'Z'
ENUMERATIONS
An enumeration is a sequential list of constants. Enumerations are used to
generate sets of related but distinct values.
An enumeration is defined using an enum statement. When using the enum
keyword, it is followed by a { character, one or more constant names
separated by commas, and a } character. A period may be used in place
of a constant name, in which case the sequence will be skipped. The
enum statement is terminated with a semicolon.
Examples:
enum {BLACK, WHITE, RED, CYAN, PURPLE, GREEN, BLUE, YELLOW};
enum {., FIRST, SECOND, THIRD};
enum {ZERO, ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN};
Note: Values are automatically assigned to the constants in an enumeration.
The first constant in the enumeration is assigned the value 0, the second
is assigned the value 1, and so on.
BITMASKS
Bitmasks are a list of constants with values that correspond to each bit
in a byte. Bitmasks are used to allow multiple true/false flags to be
combined into a single char variable.
An enumeration is defined using a bitmask statement. When using the bitmask
keyword, it is followed by a { character, one to eight constant names
separated by commas, and a } character. A period may be used in place of a
constant name, in which case the bit value will be skipped. The bitmask
statement is terminated with a semicolon.
Examples:
bitmask {BLUE, GREEN, RED, BRIGHT, INVERT, BLINK, FLIP, BKGRND};
bitmask {RD, RTS, DTR, RI, CD, ., CTS, DSR};
Note: Values are automatically assigned to the constants in a bitmask,
each of which is a sequential power of two. The first constant in the
bitmask is assigned the value 1, the second is assigned the value 2,
the third is assigned the value 4, and so on.
DECLARATIONS
A declaration statement consists of type a keyword (char, int, or void)
followed one or more variable names (and optional definitions) or a single
function name and optional function block, or the struct keyword followed
by a structure name and either a definition or a variable name.
Variables may only be of type char or int, and all variable declaration
statements are suffixed with a ; character. Variables of type char may be
delared as arrays, by appending the variable name with the [ character,
the upper bound of the array (0 to 255), and the ] character.
Examples:
char c; //Defines 8-bit variable c
char hi,lo; //Defines 8-bit variables hi and lo
char r[7]; //Defines 8 byte array r
int addr; //Defines 16-bit variable addr
int i, j; //Defines 16-bit variables i and j
A function declaration consists of the function name suffixed with a (
character, followed by an optional parameter set and a ) character.
The parameter set, if specified, may be either one to three simple
char variables, a single int variable, or a char variable followed
by an int variable.
A function declaration terminated with a ; character is called a forward
declaration and does not generate any code, while one followed by a
program block creates the specified function.
Functions of type char and int explicitly return one or more values
(using a return statement), while functions of type void return no
explicit value.
The return statement causes the function to exit, after which control
returns to the statement immediately following the function call. If
the last statement before the closing } of the function body is not
a return, then an implicit return is assumed.
A return statement may be followed by an list of one to three variables
following the same rules as function arguments (see FUNCTION CALLS,
below), in which case those values are returned by the function call,
otherwise the function call will not return any explicit values.
Examples:
void myfunc(); //Forward declaration of function myfunc
char not(tmp) {tmp = tmp ^ $FF;}
char max(tmp1, tmp2) {if (tmp1 > tmp2) return tmp1; else return tmp2;}
char min(tmp1, tmp2) {if (tmp1 < tmp2) return tmp1; else return tmp2;}
char test(b,c,d) {if (b:-) return min(c,d); else return max(c,d);}
int swap(*,msb,lsb) {return *,lsb,msb;} //Swap bytes in integer
int incdec(m,i) {if (m:-) i--; else i++; return i}; //inc/dec integer
Note: Like all variables, function parameters are global. They must be
declared prior to the function declaration, and retain there values after
the function call. Although functions may be called recursively, they are
not re-entrant. Allocation of variables and functions is implementation
dependent, they could be placed in any part of memory and in any order.
The default behavior is to place variables directly after the program code,
including them as part of the generated object file.
Function arguments and return values are passed through the 6502 registers.
Char type values are passed by loading A, Y, and X respectively, while int
type values are passed by loading Y with the most significant byte and
X with the least significant bit.
A return statement without explicit return values will return whatever
happens to be in the registers at that time.
STRUCTS
A struct is a special type of variable which is composed of one or more
unique variables called members. Each member may be either a simple
variable or an array. However, the total size of the struct may not
exceed 256 characters.
Member names are local to a struct: each member within a struct must have
a unique name, but the same member name can be used in different structs
and may also have the same name as a global variable.
The struct keyword is used for both defining the members of a struct type
as well as declaring struct type variables.
When defining a struct type, the struct keyword is followed by the name of
the struct type, the { character, the member definitions separated by
commas, and the } character. The struct definition is terminated with a
semicolon. Each member definition is composed of a type keyword (char, int,
or struct) and one or more member names, separated with commas. If a member
is an array, the member name is suffixed the [ character, the upper bound of
the array, and the ] character. Each member definition is terminated with a
semicolon. Any number of comments may appear before the first member, between
members, and after the last member.
Member names are limited to six alphanumeric characters, the first of which
must be alphabetic. Any names are allowed including reserved words, as well
as A, X, and Y (which in this case, do not refer to registers).
When declaring a struct variable, the struct keyword is followed by the
struct type name, and one or more variable names, separated with commas.
The struct declaration is terminated with a semicolon.
Examples:
struct record {char name[8]; char index; int addr, char data[128];};
struct record rec;
struct record srcrec, dstrec;
struct point {char x, y;}
struct point pnt;
struct line {struct pnt bgnpnt, endpnt;}
Note: Unlike simple and array variable, the members of a struct variable
may not be initialized during declaration.
MODIFIERS
A modifier is used with a declaration to override the default properties of
an object. Modifiers may currently only be used with simple variable and
array declarations, although this may be expanded in the future.
The alias modifier specifies that a variable is to be located at a specific
address. The specified address may either be a literal in the range 0 to
65534 ($0 to $FFFF) or a previously defined variable name. When using the
alias modifier, the declared variable must be followed by the = character
and the literal or variable name to be aliased to.
The aligned modifier specifies that the the variable or array will start on
a page variable. This is used to ensure that accessing an array element will
not cross a page boundary, which requires extra CPU cycles to execute.
The const modifier specifies that a variable or array should not be changed
by program code. The const modifier may be preceded by an` aligned or
zeropage modifier.
A const variable declaration may include an initial value definition in
the form of an = character and literal after the variable name.
A const array is declared in one of two ways: the variable name
suffixed with a [ character, a literal specifying the upper bound of
the array, and a ] character; or a variable name followed by an = character
and string literal or series of atring and/or numeric literals separated by
commas and surrounded by the { or } characters.
The zeropage modifier specifies that the variable will be defined in page
zero (addresses 0 through 255). It should be used in conjunction with the
pragma zeropage directive.
Examples:
alias char putcon = $F001; //Defines variable putcon with address $F001
alias char alpha = omega; //Defines variable alpha aliased to omega
aligned char table[240]; //Defines 241 byte array aligned to page boundary
const char debug = #TRUE; //Defines variable debug initialized to constant
const char flag = 1; //Defines variable flag initialized to 1
const char s = "string"; //Defines 7 byte string s initialized to "string"
const char n = {1,2,3}; //Defines 3 byte array m containing 1, 2, and 3
const char m = {"abc", 123); //Defines 5 byte array containing string and byte
const char t = {"ab", "cd"); //Defines 6 byte array of two strings
aligned const char fbncci = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};
zeropage char ptr, tmp; //Defines zero page variables
EXPRESSIONS
An expression is a series of one or more terms separated by operators.
Each term in an expression may be any of the following:
function call (first term only)
subscripted array element
char type variable, struct member, constant, or literal
byte operation
register (A, X, or Y).
An expression may be preceded with a - character, in which case the first
term is assumed to be a literal 0.
Operators:
+ — Add the following value.
- — Subtract the following value.
& — Bitwise AND with the following value.
| — Bitwise OR with the following value.
^ — Bitwise Exclusive OR with the following value.
Arithmetic operators have no precedence. All operations are performed in
left to right order. Expressions may not contain parenthesis.
Note: the character ! may be substituted for | on systems that do not
support the latter character. No escaping is necessary because a ! may
not appear anywere a | would.
After an expression has been evaluated, the A register will contain the
result.
Note: Function calls are allowed in the first term of an expression
because upon return from the function the return value will be in the
Accumulator. However, due to the 6502 having only one Accumulator, which
is used for all operations between two bytes, there is no simple system
agnostic method for allowing function calls in subsequent terms.
BYTE OPERATIONS
Byte operation allows the the bytes in an integer value to be accessed
as individual character values. A byte operation consists of a byte
operator prefixed to an integer value.
Byte Operators:
< - Get Least Significant Byte
> - Get Most Significant Byte
The integer value may be an integer literal, an address, or an int type
variable or struct member.
Examples:
hi = >&r; lo = <&r; //Set hi, lo to MSB, LSB of address of array R
page = >53281; //Set page to MSB of the integer literal 53281
lsb = <count; //Set lsb to low byte of integer variable count
CONTENTIONS
An contention is a construct which generates either TRUE or FALSE condition,
and may be an expressions, comparisons, or test.
A stand-alone expression evaluates to TRUE if the result is non-zero, or
FALSE if the result is zero.
A comparison consists of an expression, a comparator, and a term (subscripted
array element, simple variable, literal, or constant).
Comparators:
= — Evaluates to TRUE if expression is equal to term
< — Evaluates to TRUE if expression is less than term
<= — Evaluates to TRUE if expression is less than or equal to term
> — Evaluates to TRUE if expression is greater than term
>= — Evaluates to TRUE if expression is greater than or equal to term
<> — Evaluates to TRUE if expression is not equal to term
The parser considers == equivalent to a single =. The operator <>
was chosen instead of the usual != because it simplified the parser design.
A test consists of an expression followed by a test-op.
Test-Ops:
:+ — Evaluates to TRUE if the result of the expression is positive
:- — Evaluates to TRUE if the result of the expression is negative
A negative value is one in which the high bit is a 1 (128 — 255), while a
positive value is one in which the high bit is a 0 (0 — 127). The primary
purpose of test operators is to check the results of functions that return
a positive value upon succesful completion and a negative value if an error
was encounters. They compile into smaller code than would be generated
using the equivalent comparison operators.
An contention may be preceded by negation operator (the ! character), which
reverses the result of the entire contention. For example:
! expr
evaluates to TRUE if expr is zero, or FALSE if it is non-zero; while
! expr = term
evaluates to TRUE if expr and term are not equal, or FALSE if they are; and
! expr :+
evaluates to TRUE if expr is negative, or FALSE if it is positive
Note: Contentions are compiled directly into 6502 conditional branch
instructions, which precludes their use inside expressions. Standalone
expressions and test-ops generate a single branch instruction, and
therefore result in the most efficient code. Comparisons generate a
compare instruction and one or two branch instructions (=. <. >=, and <>
generate one, while <= and > generate two). A preceding negation operator
will switch the number of branch instructions used in a comparison, but
otherwise does not change the size of the generated code.
CONDITIONALS
A conditional consists of one or more contentions joined with the
conjunctors "and" and "or".
If only one contention is present, the result of the conditional is the
same as the result of the contention.
If two contentions are joined with "and", then the conditional is true only
if both of the contentions are true. If either or both of the contentions
are false, then the conditional is false.
If two contentions are joined with "or", then the conditional is true if
either or both of the contentions are true. If both of the contentions are
false, then the conditional is false.
When more three or more contentions are chained together, the conjunctors
are evaluated in left to right order, using short-circuit evaluation. If
the contention to the left of an "and" is false, then the entire conditional
evaluates to false, and if the contention to the left of an "or" is true,
then the entire conditional evaluates to true. In either case, no further
contentions in the conditional are evaluated.
ARRAY SUBSCRIPTS
Individual elements of an array are accessed using subscript notation.
Subscripted array elements may be used as a terms in an expression, as
well as the target variable in an assignments. They are written as the
variable name suffixed with a [ character, followed by an index, and
the ] character.
When assigning to an array element, the index may be a literal, constant,
or simple variable.
When using an array element in an expression or pop statement, the index
may be any expression.
Examples:
z = r[i]; //Store the value from element i of array r into variable z
r[0] = z; //Store the value of variable z into the first element of r
z = d[15-i]; //Store the value element 15-i of array d into variable z
c = t[getc()]; //Get a character, translate using array t and store in c
Note: Register references may be used as array indexes within expressions,
but the contents of each registers may change with each term evaluation.
Using a constant, literal, or the X or Y registers as an array index will
generates the same amount of code as a simple variable reference and leave
both the X and Y registers unchanged. Using the A register as an index will
generate one extra byte of code, while using a simple variable as index
will generate 1 to 2 extra bytes of code. In either case, the index value
will be left in the X register. When an expression is used as an index,
one extra byte of stack space is used, and an additional three bytes of
code is generated. The X register will contain the result of the expression
and the Y register will be left in an unknown state.
STRUCTS
Individual members of a struct variable are specified using the struct
variable name, a period, and the member name. If a member is an array,
it's elements are accessed using the same syntax as an array variable.
A struct variable can also be treated like an array variable, with each
byte of the variable accessed as an array index.
Examples:
i = rec.index; //Get Struct Member
rec.data[i] = i; //Set Struct Member Element
arr[i] = rec[i]; //Copy Struct Byte into Array
Note: Unlike standard C, structs may not be assigned using an equals
sign. One struct variable may be copied to another byte by byte or
through a function call.
SIZE-OF OPERATOR
The size-of operator @ generates a literal value equal to the size in bytes
of a specified variable. It is allowed anywhere a literal would be and
should be used anytime the size of an array, struct, or member is required.
When using the size-of operator, it is prefixed to the variable name or
member specification.
Examples:
for (i=0; i<=@z; i++) z[i] = r[i]; //Copy elements from r[] to z[]
blkput(@rec ,&rec); //Copy struct rec to next block segment
memcpy(@rec.data, &rec.data); //Copy member data to destination array
Note: The size-of operator is evaluated at compile time and generates two
bytes of machine language code. It is the most efficient method of specifying
a variable length.
INDEX-OF OPERATOR
The index-of operator ? generates a literal value equal to the offset in bytes
of a specified structure member. It is allowed anywhere a literal would be and
should be used anytime the offset of the member of a struct is required.
When using the size-of operator, it is prefixed to the member specification.
Examples:
blkmem(?rec.data, &s); //Search block for segment where data contains s
memcpy(?rec.data, &t); //Copy bytes of rec up to member data into t
Note: The index-of operator is evaluated at compile time and generates two
bytes of machine language code. It is the most efficient method of specifying
a the offset of a struct member.
FUNCTION CALLS
A function call may be used as a stand-alone statement, or as the first
term in an expression. A function call consists of the function name
appended with a ( character, followed by zero to three arguments separated
with commas, and a closing ) character.
The first argument of a function call may be an expression, integer,
address, or string (see below).
The second argument may be a term (subscripted array element, simple
variable, or constant), integer, address, or string,
The third argument may only be a simple variable or constant.
If the first or second argument is an integer address or string, then
no more arguments may be passed.
When passing the address of a variable, array, struct, or struct member
into a function, the variable specification is prefixed with the
address-of operator &. When passing a literal string, it is simply
specified as is.
Examples:
c = getc(); //Get character from keyboard
n = abs(b+c-d); //Return the absolute value of result of expression
m = min(r[i], r[j]); //Return lesser of to array elements
l = strlen(&s); //Return the length of string s
p = strchr(c, &s); //Return position of character c in string s
putc(getc()); //Echo typed characters to screen
puts("Hello World"); //Write "Hello World" to screen
memdst(&dstrec); //Set struct variable as destination
memcpy(140, &srcrec); //Copy struct variable to destination struct
puts(&rec.name); //Write struct member to screen
Note: This particular argument passing convention has been chosen because
of the 6502's limited number of registers and stack processing instructions.
When an address is passed, the high byte is stored in the Y register and
the low byte in the X register. If a string is passed, it is turned into
anonymous array, and it's address is passed in the Y and X registers.
Otherwise, the first argument is passed in the A register, the second in
the Y register, and the third in the X register.
EXTENDED PARAMETER PASSING
To enable direct calling of machine language routines that that do not match
the built-in parameter passing convention, C02 supports the non-standard
statements push, pop, and inline.
The push statement is used to push arguments onto the machine stack prior
to a function call. When using a push statement, it is followed by one or
more arguments, separated by commas, and terminated with a semicolon. An
argument may be an expression, in which case the single byte result is
pushed onto the stack, or it may be an address or string, in which case the
address is pushed onto the stack, high byte first and low byte second.
The pop statement is likewise used to pop arguments off of the machine
stack after a function call. When using a pop statement, it is followed
with one or more simple variables or subscripted array elements , separated
by commas, and terminated with a semicolon. If any of the arguments are to
be discarded, a period may be specified instead of a variable name.
The number of arguments pushed and popped may or may not be the same,
depending on how the machine language routine manipulates the stack pointer.
Examples:
push d,r; mult(); pop p; //multiply d times r and store in p
push x1,y1,x2,y2; rect(); pop .,.,.,.; //draw rectangle from x1,y1 to x2,y2
push &s, "tail"; strcat(); //concatenate "tail" onto string s
push x[i],y[i]; rotate(d); pop x[i],y[i]; //rotate point x[1],y[i] by d
Note: The push and pop statements could also be used to manipulate the
stack inside or separate from a function, but this should be done with
care.
The inline statement is used when calling machine language routines that
expect constant byte or word values immediately following the 6502 JSR
instruction. A routine of this type will adjust the return address to the
point directly after the last instruction. When using the inline statement,
it is followed by one or more arguments, separated by commas, and
terminated with a semicolon. The arguments may be constants, addresses,
or strings.
Examples;
iprint(); inline "Hello World"; //Print "Hello World"
irect(); inline 10,10,100,100; //Draw rectangle from (10,10) to (100,100)
Note: If a string is specified in an inline statement, rather than creating
an anonymous string and compiling the address inline, the entire string will
be compiled directly inline.
ASSIGNMENTS
An assignment is a statement in which the result of an expression is stored
in a variable. An assignment usually consists of a simple variable or
subscripted array element, an = character, and an expression, terminated
with a ; character.
Examples:
i = i + 1; //Add 1 to contents variable i
c = getchr(); //Call function and store result in variable c
s[i] = 0; //Terminate string at position i
SHORTCUT-IFS
A shortcut-if is a special form of assignment consisting of an contention
and two expressions, of which one will be assigned based on the result
of the contention. A shortcut-if is written as a condition surrounded
by ( and ) characters, followed by a ? character, the expression to be
evaluated if the condition was true, a : character, and the expression to
be evaluated if the condition was false.
Example:
result = (value1 < value) ? value1 : value2;
Note: Shortcut-ifs may only be used with assignments. This may change in
the future.
POST-OPERATORS
A post-operator is a special form of assignment which modifies the value
of a variable. The post-operator is suffixed to the variable it modifies.
Post-Operators:
++ Increment variable (increase it's value by 1)
-- Decrement variable (decrease it's value by 1)
<< Left shift variable
>> Right shift variable
Post-operators may be used with either simple variables or subscripted
array elements.
Examples:
i++; //Increment the contents variable i
b[i]<<; //Left shift the contents of element i of array b
Note: Post-operators may only be used in stand-alone statements, although
this may change in the future.
ASSIGNMENTS TO REGISTERS
Registers A, X, and Y may assigned to using the = character. Register A
(but not X or Y) may be used with the << and >> post-operators, while
registers X and Y (but not A) may be used with the ++ and -- post-operators.
IMPLICIT ASSIGNMENTS
A statement consisting of only a simple variable is treated as an
implicit assignment of the A register to the variable in question.
This is useful on systems that use memory locations as strobe registers.
Examples:
HMOVE; //Move Objects (Atari VCS)
S80VID; //Enable 80-Column Video (Apple II)
Note: An implicit assignment generates an STA opcode with the variable
as the operand.
PLURAL ASSIGNMENTS
C02 allows a function to return up to three values by specifying multiple
variables, separated by commas, to the left of the assignment operator (=).
All three variables to be assigned may be either simple variables or
subscripted array elements. Registers are not allowed in plural assignments.
Examples:
row, col = scnpos(); //Get current screen position
cr, mn, mx = cpmnmx(a, b); //Compare two values, return min and max
x[i], y[i] = rotate(x[i],y[i],d); //Rotate x[i] and y[i] by d degrees
x[i], y[i], z[i] = get3d(i); //Generate 3d coordinate for index i
Note: When compiled, a plural assignment generates an STX for the third
assignment (if specified), an STY for the second assignment and an STA for
the first assignment. Using a subscripted array element for the third
assignment generates an overhead of three bytes of machine code.
GOTO STATEMENT
A goto statement unconditionally transfers program execution to the
specified label. When using a goto statement, it is followed by the
label name and a terminating semicolon.
Example:
goto end;
Note: A goto statement may be executed from within a loop structure
(although a break or continue statement is preferred), but should not
normally be used to jump from inside a function to outside of it, as
this would leave the return address on the machine stack.
IF AND ELSE STATEMENTS
The if then and else statements are used to conditionally execute blocks
of code.
When using the if keyword, it is followed by a conditional (surrounded by
parenthesis) and the block of code to be executed if the conditional was
true.
An else statement may directly follow an if statement (with no other
executable code intervening). The else keyword is followed by the block
of code to be executed if the conditional was false.
Examples:
if (c = 27) goto end;
if (n) q = div(n,d) else putln("Division by 0!");
if (r[q]<r[p]) {t=r[p],r[p]=r[q],r[q]=t)}
if (!>i | <i) puts("i is zero.");
if (>i > >j || >i = >j && <i > <j) k = i; else k = j;
Note: In order to optimize the compiled code, the if and else statements
are to 6502 relative branch instructions. This limits the amount of
generated code between the if statement and the end of the if/else block
to slightly less than 127 bytes. This should be sufficient in most cases,
but larger code blocks can be accommodated using function calls or goto
statements.
SELECT, CASE, AND DEFAULT STATEMENTS
The select, case, an default statements are used to execute a specific
block of code depending on the result of an expression.
When using the select keyword, it is followed by an expression (surrounded
by parenthesis) and an opening curly brace, which begins the select block.
This must then be followed by a case statement.
Each use of the case keyword is followed by one or more comma-separated
terms and a colon. If the term is equal to the select expression then the
code immediately following the is executed, otherwise, program execution
transfers to the next case or default statement.
The code between two case statements or a case and default statement is
called a case block. At the end of a case block, program execution
transfers to the end of the select block (the closing curly brace at
the end of the default block).
The last case block must be followed by a default statement. When using
the default keyword, it is followed by a colon. The code between the
default statement and the end of the select block (marked with a closing
curly-brace) is called the default block and is executed if none of
the case arguments matched the select expression.
If the constant 0 is to be used as an argument to any of the case
statements, using it as the first argument of the first case statement
will produce slightly more efficient code.
Example:
puts("You pressed ");
select (getc()) {
case $00: putln("Nothing");
case $0D: putln("The Enter key");
case ' ': putln("The space bar");
case 'A','a': putln ("The letter A");
case ltr: putln("The character in variable 'ltr'");
case s[2]: putln("The third character of string 's'");
default: putln("some other key");
}
Unlike the switch statement in C, the break statement is not needed to
exit from a case block. It may be used, however, to prematurely exit a
case block if desired.
Example:
select (arg) {
case foo:
puts("fu");
if (!bar) break;
puts("bar");
default: //do nothing
}
In addition, fall through of case blocks can be duplicated using the goto
statement with a label.
select (num)
case 1:
putc('I');
goto two;
case 2:
two:
putc('I');
default: //do nothing
}
Note: It's possible for multiple case statement arguments to evaluate to
the same value. In this case, only the first case block matching the
select expression will be executed.
WHILE LOOPS
The while statement is used to conditionally execute code in a loop. When
using the while keyword, it is followed by a conditional (surrounded by
parenthesis) and the the block of code to be executed while the conditional
is true. If the conditional is false when the while statement is entered,
the code in the block will never be executed.
Alternatively, the while keyword may be followed by a pair of empty
parenthesis, in which case a conditional of true is implied.
Examples:
c = 'A' ; while (c <= 'Z') {putc(c); c++;} //Print letters A-Z
while() if (rdkey()) break; //Wait for a keypress
Note: While loops are compiled using the 6502 JMP statements, so the code
blocks may be arbitrarily large.
DO WHILE LOOPS
The do statement used with to conditionally execute code in a loop at
least once. When using the do keyword, it is followed by the block of
code to be executed, a while statement, a conditional (surrounded
by parenthesis), and a terminating semicolon.
A while statement that follows a do loop must contain a conditional.
The while statement is evaluated after each iteration of the loop, and
if it is true, the code block is repeated.
Examples:
do c = rdkey(); while (c=0); //Wait for keypress
do (c = getchr(); putchr(c); while (c<>13) //Echo line to screen
i=0; do {i++;} while (>i <= >j && <i < <j); //Count from 0 to J
Note: Unlike the other loop structures do/while statements do not use
6502 JMP instructions. This optimizes the compiled code, but limits
the code inside the loop to just under 127 bytes.
FOR LOOPS
The for statement allows the initialization, evaluation, and modification
of a loop condition in one place. For statements are usually used to
execute a piece of code a specific number of times, or to iterate through
a set of values.
When using the if keyword, it is followed by a pair of parenthesis
containing an initialization assignment statement (which is executed once),
a semicolon separator, a conditional (which determines if the code block
is executed), another semicolon separator, and an increment assignment
(which is executed after each iteration of the code block). This is then
followed by the block of code to be conditionally executed.
The assignments and conditional of a for loop must be populated. If an
infinite loop is desired, use a while () statement.
Examples:
for (c='A'; c<='Z'; c++) putc(c); //Print letters A-Z
for (i=strlen(s)-1;i:+;i--) putc(s[i]); //Print string s backwards
for (i=0;c>0;i++) {c=getc();s[i]=c} //Read characters into string s
Note: For loops are compiled using the 6502 JMP statements, so the code
blocks may be arbitrarily large. A for loop generates less efficient code
more than a simple while loop, but will always execute the increment
assignment on a continue.
BREAK AND CONTINUE
A break statement is used to exit out of a do, for, or while loop or a
case block. The continue statement is used to jump to the beginning of
a do, for, or while loop. Neither may be used outside it's corresponding
control structures.
When a break statement is encountered, program execution is transferred
to the statement immediately following the end of the block associated
with the innermost do, for, while, or case statement. When using the
break keyword, it is followed with a trailing semicolon.
When a continue statement is encountered, program execution is transferred
to the beginning of the block associated with the innermost do, for, or
while statement. In the case of a for statement, the increment assignment
is executed, followed by the conditional, and in the case of a while
statement, the conditional is executed. When using the continue keyword, it
is followed with a trailing semicolon.
Examples:
do {c=rdkey(); if (c=0) continue; if (c=27) break;} while (c<>13);`
for (i=0;i<strlen(s);i++) {if (s[i]=0) break; putchr(s[i]);}
while() {c=rdkey;if (c=0) continue;putchr(c);if (c=13) break;}
UNIMPLEMENTED FEATURES
The #define directive allows the definition of constants but not macros.
The #if, #else, and #endif directives are not recognized at all by the
compiler. They may be added in the future.
The only types recognized by the compiler are char and int. Int values
may only be used in limited contexts. Since the 6502 is an 8-bit processor,
multi-byte types would generate over-complicated code. In addition, the
signed and unsigned keywords are unrecognized, due to the 6502's limited
signed comparison functionality.
Because of the 6502's peculiar indirect addressing modes, pointers are not
currently implemented. Limited pointer operations may be implemented using
zero page variables in the future.