C02/doc/design.txt

C02 Design Considerations

Variable Storage
================

Zero Page
---------

The 6502 has a Zero Page addressing mode. This differs from the absolute
addressing mode in that the address operand is only one byte instead of
two and the instructions take less cycles to execute.

Also, some systems have very limited RAM, requiring all variables to be
stored in zero page. One example is the Atari 2600, which maps the 128
bytes of RAM in the RIOT chip to the range $80-$8F and mirrors it at
$180-$18F (for the machine stack).

To allow the use of zero page variables, the compiler recognizes the
"zeropage" declaration modifier and the "#pragma zeropage" directive,
the latter of which specifies a base address for allocating zero page
variables.

I may add a -Z command line option to allow the specification of the
zero page variable base address.

ROM Based Code
--------------

For compiled programs that will reside in ROM, such as an EPROM or a
cartridge, variables will need to reside in a separate memory area than
the program.

The compiler normally allocates all variables directly after the generated
code, const variables first and regular variables afterward. In addition,
the regular variables are all all allocated as zero bytes in the assembled
object code, which can unnecesarilly inflate the size of the generated
binary file.

The compiler allows the location of non-const variables to be specified
using the "#pragma rambase" directive.

I may add a -R command line option to allow the specification of the RAM
variable base address.

Read/Write Variables
--------------------

Some systems used different memory addresses for reading and writing to
RAM. One example is Atari 2600 catridges with RAM. The SARA Superchip
contained 128 bytes of RAM, which were written at addresses $F000 through
$F01F and read from addresses $F080 through $F0FF, while CBS RAM+ had 256
bytes which were written from $F000 though $F0FF and read from $F100
through $F1FF.

To allow for this, the compiler recognizes an additional directive,
"#pragma writebase" directive. In this case the "#pragma rambase"
will specify the base address for reads. When a write base address is
specified, the compiler will allocate all variables using the read base
address and generate an offset, which will be included in all variable
assignments. Thus the code

    #pragma rambase $F100
    #pragma writebase $F000
    char b,c;
    c = b;

would generate the assembly code

        LDA B
        STA C-256
    B   EQU $F100
    C   EQU $F101

I may add a -W command line option to allow the specification of the write
base address.

Variable Types
==============

Pointers
--------

The 6502 differs from nearly all other microprocessors by not having
a 16 bit index register, instead using indirect indexed mode in
conjunction with zero page.

On other processors, it would be trivial to dereference a pointer
stored anywhere in memory. For example, on the 6800:

    LDX P       ;N = *P
    LDAA IX

There is no direct equivalent on the 6502, but by using indirect
indexed mode, any zero page variable pair can be used as a pointer:

    LDY #0      ;N = *P
    LDA (P),Y

Additionally, accessing elements of a dereferenced zero page pointer
is just as trivial:

    LDY I       ;N = *P[I]
    LDA (P),Y

    expr        ;N - *P[expr]
    TAY
    LDA (P),Y

However, trying to use a 16-bit value stored outside of zero-page
would take extra code and require using a a zero page byte pair:

    LDY P       ;N = *P
    STY ZP
    LDY P+1
    STY ZP+1
    LDY #0
    LDA (ZP),Y

This violates two principles of the C02 design philosophy: that
each token correspond to one or two machine instructions and
that the compiler is completely agnostic of the system configuration.

Therefore, if pointers are implemented, they will have to be in
zero page.

Implementation:

Declaration of a pointer should use two bytes of zero page storage,
the address of which would be taken from the free zero page space
specified by the #pragma zeropage directive.

    P   EQU $80 ;char *p
    Q   EQU $82 ;char *q

Any pointers declared in a header file would added to the variable
table but not allocated, so would be defined in the accompanying
assembly file:

    char *dst;  //stddef.h02
    char dstlo,dsthi;

    DST     EQU $30 ;stddef.a02
    DSTLO   EQU $30
    DSTHI   EQU $31

Since they are 16-bit values, raw pointers cannot be passed into
used directly in expressions. But could be used in standalone
assignments:

    LDX Q       ;p = q
    STX P
    LDY Q+1     ;X is used for LSB and Y as MSB to match
    STY P+1     ;the address passing convention for functions

Likewise if a 16-bit integer variable type was added to C02,
then one could be assigned to the other using the exact same
assembly code.

A dereferenced pointer can be used anywhere an array references
is allowed and would be subject to the same restrictions.

A raw pointer used as an argument to a function will be passed
using the same convention as an address:

    LDY P+1     ;func(p)
    LDX P
    JSR FUNC

    JSR FUNC    ;p = func()
    STY P+1
    STX P

Since pointers are zero page the address of operator on a pointer
will generate a char value:

    LDA #P      ;&p

This will allow passing of pointers for use inside of functions
using indexed mode:

    LDA #P      ;inc(&p)
    JSR FUNC

    TAX         ;void inc()
    INC ($00,X)
    RTS

The memio module makes extensive use of pointer addresses as
function arguments.

Structs
-------

For the declarations

    STRUCT RECORD { CHAR NAME[8]; CHAR INDEX; CHAR DATA[128]; };
    STRUCT RECORD REC;

references to the members of the struct REC will generate the code:

    XXX REC+$09     ;REC.INDEX

    LDX I           ;REC.DATA[I]
    XXX REC+$0A,X

Using the address of operator on a struct member generates assembly
code with parentherical expressions, which are not recognized by all
assemblers:

    LDY #>(REC+$0A)  ;FUNC(&REC.DATA)
    LDX #<(REC+$0A)
    JSR FUNC

The compiler could optimize the generation of code for references
to the first member of a struct, producing

    XXX REC     ;REC.NAME

instead of

    XXX REC+$00 ;REC.NAME

but the machine code produced by the assembler should be identical
in either case.


Expression Evaluation
=====================

Array Indexes
-------------

Array indexing normally uses the X register and indexed addressing mode:

    LDX I       ;R[I]
    XXX R,X

If the index is a constant or literal, absolute addressing is used:

    LDA R+1     ;R[1]
    XXX S+0     ;S[0]

Specifying a register as the index also uses indexed addressing mode:

    TAX         ;R[A]
    XXX R,X
    XXX R,Y     ;R[Y]
    XXX R,X     ;R[X]

Allowing for an expression as the index in the first term of an
expression uses only one extra byte of code and two machine cycles:

    expression  ;R[expr]
    TAX
    LDAA R,X

while in any other termm is uses an extra three extra bytes of code
and ten machine cycles:

    PHA         ;R[expr]
    expr code   ;code to evaluate the expression
    TAX
    PLA
    XXX R,X

compared to the extra four to six bytes and six to eight machine cycles
used by the equivalent C02 code required to achieve the same result:

    expr code   ;Z = expr
    STA Z
    LDX Z       ;R[Z]
    XXX R,X

Function Calls
--------------

A function call in the first term of an expression requires additional
processing by the compiler, since the accumulator holds the return
value upon completion of the function call:

    JSR FUNC    ;R = func()
    STA R

Allowing a function call in susbsequent terms, however, requires
extensive stack manipulation:


whereas the equivalent C02 code generates much simpler machine code:


Shift Operators
----------------

In standard C, the shift operators use the number of bits to shift as
the operand. Since the 6502 shift and rotate instructions only shift
one bit at a time, this would require the use of an index register and
six to seven bytes of code

            expr code    ;expr
    .LOOP   LDY B        ;>> B
            LSR
            DEY
            BNE .LOOP

whereas a library function would require five bytes of code:

    SHIFTR: LSR         ;A=Value to Shift
            DEY         ;Y=Bits to Shift
            BNE SHIFTR
            RTS

and each function call would use five bytes of code

    expr code   ;shiftr(expr, B)
    LDY B
    JSR SHIFTR

Following the philosophy that a operator should correspond to a single
machine instruction, in C02 shifting is implemented as post-operators
using the various available addressing modes:

    ASL S       ;S<<

    LDX I       ;T[I]>>
    LSR T,X

    ASL         ;A<<


Post-Operators and Pre-Operators
--------------------------------

Parsing for post-operators in a standalone expression is trivial, since
that is the only time the relevant characters will only follow the operand.

Implementing pre-operators on standalone expressions would be redundant
since their would be no difference in the generated code.

Parsing for post-operators and/or preoperators within an expression or
evaluation, however, would complicate the detection of operators and
comparators.

In addition, the code generated from a post-operator or pre-operator
withing an expression:

    DEC I       ;R[--I]
    LDX I
    XXX R,X

    LDX I       ;R[I++]
    XXX R,X
    INC I

Is indentical to the code generated when using a standalone post-operator;

    DEC I       ;I--
    LDX I       ;R[I]
    XXX R,X

    LDX I       ;R[I]
    XXX R,X
    INC I       ;I++

Assignments
===========

Assignment to a variable generates an STA instruction, using either absolute
or indexed addressing mode:

    expr code   ;R = expr
    STA R

    expr code   ;R[2] = expr
    STA R+2

    expr code   ;R[I] = expr
    LDX I
    STA R,I

while assignment to an index register generates a transfer instruction:

    expr code   ;Y = expr
    TAY

    expr code   ;X = expr
    TAX

and assignment to the Accumulator is simply a syntactical convenience:

    expr code   ;A = expr

Specific to C02 is the implied assignment, which also generates an STA:

    STA S       ;S

Allowing an expression as an array index in an assignment is problematic
on an NMOS 6502 since the index expression will be evaluated prior to the
expression being assigned to the array element and neither index register
may be directly pushed or pulled from the stack.

The addition of the PLX instruction to the 65C02 would allows this to be
done using only two extra bytes of code for each variable assignment:

    indexp oode ;R[indexp] = expr
    PHA
    expr code
    PLX
    STA R,X

    expr oode   ;R[expr], S[exps] = func()
    PHA
    exps code
    PHA
    JSR FUNC
    PLX
    STY S,X
    PLX
    STA R,X

however, this would only work with the A and Y variables of a plural
assignment.

A workaround for the NMOS 6502 would require an extra five bytes of code

    indexp oode ;R[indexp] = expr
    PHA
    expr code
    TAY
    PLA
    TAX
    TYA
    STA R,X

and the use of the Y register would limit it to the only the A variable
of a plural assigment, whereas the equivalent C02 code would use an
extra four to six bytes of code

    indexp code ;I = indexp;
    STA I
    expr code   ;R[I] = expr
    LDX I
    STA R,X

and works with all three variables of a plural assignment.

Conditionals
============

Conditionals are separate and distinct from expressions, due to the fact
that the comparison operators all set status flags which affect the various
branch instructions:

     CMP DATA       Normal      Inverted
    Reg < Data       BCC          BCS
    Reg = Data       BEQ          BNE
    Reg ≥ Data       BCS          BCC
    Reg ≠ Data       BNE          BEQ

The remaining operators can be implemented with CLC and SBC, but this will
change the value in the accumulator.

    CLC:SBC DATA    Normal      Inverted
    Reg ≤ Data       BCC          BCS
    Reg > Data       BCS          BCC

Or they could be implemented using multiple branch instructions. This would
leave the value of the left side expression in the accumulator, but use one
more byte of code and require and extra label.

     CMP DATA         Normal           Inverted
    Reg ≤ Data   BEQ exec:BCC exec   BNE exec:BCS exec
    Reg > Data   BEQ skip:BCS exec   BNE skip:BCC exec

When compiling a comparison, the generated code will usually, but not
always, branch when the condition is false, skipping to the end of the
block of code following the comparison. In addition, the logical not
operator will invert the comparison.

By arranging the eight standard comparisons, along with evaluation of
a term as true when non-zero, in this order:

     0   1   2   3   4   5   6   7
    !0   =   <   ≤   ≥   >   ≠   0

the comparison can be inverted with a simple exclusive-or. For this
reason, The ! operator is logical rather than bitwise and affects
the result of the comparison rather than an individual expression
or term within the expression.

Standalone Expressions
----------------------

Flags Operators
---------------

Logical Operators
-----------------

Parsing the logical operators && and || is trivial if the preceding
condition is a comparison or flag operation, since any expression
evaluation is complete before the parse encounters the initial & or |
character. For a standalone evaluation of an expression as true or
false, however, the expression evaluator will mistake the initial
character of the && or || as a bitwise operator. Differentiating
the two would require changing to a look-ahead parser.

One solution is to enclose require parentheses around each comparison
when using logical operators, which is allowable in C syntax, but
it's just as easy, and arguably cleaner looking, to use the words
"and" and "or" instead. This is allowable in standard C by using
the #define directive to alias "and" to "&&" and "or" to "||".

The most efficient way to implement logical operators is to use shortcut
evaluations.

Under the normal circumstances, where the generated code branches when
the condition is false, && can be implemented by evaluation the next
comparison only in the event of the first condition being true.

For ||, however, the following comparison will be evaluated only if the
first comparison was false.

    LDA I       ;IF (I<J
    CMP J
    BCS ENDIF
    LDA M       ;&& M<>N)
    CMP N
    BEQ ENDIF

    LDA I       ;IF (I<J
    CMP J
    BCC block
    LDA M       ;|| M<>N)
    CMP N
    BEQ ENDIF
    block

When chaining multiple && and/or || operators, the shortcut evaluation
effectively make them right-associative.