mirror of
https://github.com/cc65/cc65.git
synced 2025-01-25 11:30:06 +00:00
731 lines
30 KiB
Plaintext
731 lines
30 KiB
Plaintext
|
<!doctype linuxdoc system>
|
||
|
|
||
|
<article>
|
||
|
<title>Defining a Custom cc65 Target
|
||
|
<author>Bruce Reidenbach
|
||
|
<date>2010-02-22
|
||
|
|
||
|
<abstract>
|
||
|
This section provides step-by-step instructions on how to use the cc65
|
||
|
toolset for a custom hardware platform (a target system not currently
|
||
|
supported by the cc65 library set).
|
||
|
</abstract>
|
||
|
|
||
|
<!-- Table of contents -->
|
||
|
|
||
|
<toc>
|
||
|
|
||
|
<!-- Begin the document -->
|
||
|
|
||
|
<sect>Overview<p>
|
||
|
|
||
|
The cc65 toolset provides a set of pre-defined libraries that allow the
|
||
|
user to target the executable image to a variety of hardware platforms.
|
||
|
In addition, the user can create a customized environment so that the
|
||
|
executable can be targeted to a custom platform. The following
|
||
|
instructions provide step-by-step instructions on how to customize the
|
||
|
toolset for a target that is not supported by the standard cc65
|
||
|
installation.
|
||
|
|
||
|
The platform used in this example is a Xilinx Field Programmable Gate
|
||
|
Array (FPGA) with an embedded 65C02 core. The processor core supports
|
||
|
the additional opcodes/addressing modes of the 65SC02, along with the
|
||
|
STP and WAI instructions. These instructions will create a set of files
|
||
|
to create a custom target, named SBC, for <bf>S</bf>ingle <bf>B</bf>oard
|
||
|
<bf>C</bf>omputer.
|
||
|
|
||
|
<sect>System Memory Map Definition<p>
|
||
|
|
||
|
The targeted system uses block RAM contained on the XILINX FPGA for the
|
||
|
system memory (both RAM and ROM). The block RAMs are available in
|
||
|
various aspect ratios, and they will be used in this system as 2K*8
|
||
|
devices. There will be two RAMs used for data storage, starting at
|
||
|
location $0000 and growing upwards. There will be one ROM (realized as
|
||
|
initialized RAM) used code storage, starting at location $FFFF and
|
||
|
growing downwards.
|
||
|
|
||
|
The cc65 toolset requires a memory configuration file to define the
|
||
|
memory that is available to the cc65 run-time environment, which is
|
||
|
defined as follows:
|
||
|
|
||
|
<tscreen><code>
|
||
|
MEMORY {
|
||
|
ZP: start = $0, size = $100, type = rw, define = yes;
|
||
|
RAM: start = $200, size = $0E00, define = yes;
|
||
|
ROM: start = $F800, size = $0800, file = %O;
|
||
|
}
|
||
|
</code></tscreen>
|
||
|
|
||
|
ZP defines the available zero page locations, which in this case starts
|
||
|
at $0 and has a length of $100. Keep in mind that certain systems may
|
||
|
require access to some zero page locations, so the starting address may
|
||
|
need to be adjusted accordingly to prevent cc65 from attempting to reuse
|
||
|
those locations. Also, at a minimum, the cc65 run-time environment uses
|
||
|
26 zero page locations, so the smallest zero page size that can be
|
||
|
specified is $1A. The usable RAM memory area begins after the 6502
|
||
|
stack storage in page 1, so it is defined as starting at location $200
|
||
|
and filling the remaining 4K of space (4096 - 2 *
|
||
|
256 = 3584 = $0E00). The 2K of ROM space begins at
|
||
|
address $F800 and goes to $FFFF (size = $0800).
|
||
|
|
||
|
Next, the memory segments within the memory devices need to be defined.
|
||
|
A standard segment definition is used, with one notable exception. The
|
||
|
interrupt and reset vector locations need to be defined at locations
|
||
|
$FFFA through $FFFF. A special segment named VECTORS is defined that
|
||
|
resides at these locations. Later, the interrupt vector map will be
|
||
|
created and placed in the VECTORS segment, and the linker will put these
|
||
|
vectors at the proper memory locations. The segment definition is:
|
||
|
|
||
|
<tscreen><code>
|
||
|
SEGMENTS {
|
||
|
ZEROPAGE: load = ZP, type = zp, define = yes;
|
||
|
DATA: load = ROM, type = rw, define = yes, run = RAM;
|
||
|
BSS: load = RAM, type = bss, define = yes;
|
||
|
HEAP: load = RAM, type = bss, optional = yes;
|
||
|
STARTUP: load = ROM, type = ro;
|
||
|
INIT: load = ROM, type = ro, optional = yes;
|
||
|
CODE: load = ROM, type = ro;
|
||
|
RODATA: load = ROM, type = ro;
|
||
|
VECTORS: load = ROM, type = ro, start = $FFFA;
|
||
|
}
|
||
|
</code></tscreen>
|
||
|
|
||
|
The meaning of each of these segments is as follows.
|
||
|
|
||
|
<p><tt> ZEROPAGE: </tt>Data in page 0, defined by ZP as starting at $0 with length $100
|
||
|
<p><tt> DATA: </tt>Initialized data that can be modified by the program, stored in RAM
|
||
|
<p><tt> BSS: </tt>Uninitialized data stored in RAM (used for variable storage)
|
||
|
<p><tt> HEAP: </tt>Uninitialized C-level heap storage in RAM, optional
|
||
|
<p><tt> STARTUP: </tt>The program initialization code, stored in ROM
|
||
|
<p><tt> INIT: </tt>The code needed to initialize the system, stored in ROM
|
||
|
<p><tt> CODE: </tt>The program code, stored in ROM
|
||
|
<p><tt> RODATA: </tt>Initialized data that cannot be modified by the program, stored in ROM
|
||
|
<p><tt> VECTORS: </tt>The interrupt vector table, stored in ROM at location $FFFA
|
||
|
|
||
|
A note about initialized data: any variables that require an initial
|
||
|
value, such as external (global) variables, require that the initial
|
||
|
values be stored in the ROM code image. However, variables stored in
|
||
|
ROM cannot change; therefore the data must be moved into variables that
|
||
|
are located in RAM. Specifying <tt>run = RAM</tt> as part of
|
||
|
the DATA segment definition will indicate that those variables will
|
||
|
require their initialization value to be copied via a call to the
|
||
|
<tt>copydata</tt> routine in the startup assembly code. In addition,
|
||
|
there are system level variables that will need to be initialized as
|
||
|
well, especially if the heap segment is used via a C-level call to
|
||
|
<tt>malloc ()</tt>.
|
||
|
|
||
|
The final section of the definition file contains the data constructors
|
||
|
and destructors used for system startup. In addition, if the heap is
|
||
|
used, the maximum C-level stack size needs to be defined in order for
|
||
|
the system to be able to reliably allocate blocks of memory. The stack
|
||
|
size selection must be greater than the maximum amount of storage
|
||
|
required to run the program, keeping in mind that the C-level subroutine
|
||
|
call stack and all local variables are stored in this stack. The
|
||
|
<tt>FEATURES</tt> section defines the required constructor/destructor
|
||
|
attributes and the <tt>SYMBOLS</tt> section defines the stack size. The
|
||
|
constructors will be run via a call to <tt>initlib</tt> in the startup
|
||
|
assembly code and the destructors will be run via an assembly language
|
||
|
call to <tt>donelib</tt> during program termination.
|
||
|
|
||
|
<tscreen><code>
|
||
|
FEATURES {
|
||
|
CONDES: segment = STARTUP,
|
||
|
type = constructor,
|
||
|
label = __CONSTRUCTOR_TABLE__,
|
||
|
count = __CONSTRUCTOR_COUNT__;
|
||
|
CONDES: segment = STARTUP,
|
||
|
type = destructor,
|
||
|
label = __DESTRUCTOR_TABLE__,
|
||
|
count = __DESTRUCTOR_COUNT__;
|
||
|
}
|
||
|
|
||
|
SYMBOLS {
|
||
|
# Define the stack size for the application
|
||
|
__STACKSIZE__: value = $0200, weak = yes;
|
||
|
}
|
||
|
</code></tscreen>
|
||
|
|
||
|
These definitions are placed in a file named "sbc.cfg"
|
||
|
and are referred to during the ld65 linker stage.
|
||
|
|
||
|
<sect>Startup Code Definition<p>
|
||
|
|
||
|
In the cc65 toolset, a startup routine must be defined that is executed
|
||
|
when the CPU is reset. This startup code is marked with the STARTUP
|
||
|
segment name, which was defined in the system configuration file as
|
||
|
being in read only memory. The standard convention used in the
|
||
|
predefined libraries is that this code is resident in the crt0 module.
|
||
|
For this custom system, all that needs to be done is to perform a little
|
||
|
bit of 6502 housekeeping, set up the C-level stack pointer, initialize
|
||
|
the memory storage, and call the C-level routine <tt>main ()</tt>.
|
||
|
The following code was used for the crt0 module, defined in the file
|
||
|
"crt0.s":
|
||
|
|
||
|
<tscreen><code>
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; crt0.s
|
||
|
; ---------------------------------------------------------------------------
|
||
|
;
|
||
|
; Startup code for cc65 (Single Board Computer version)
|
||
|
|
||
|
.export _init, _exit
|
||
|
.import _main
|
||
|
|
||
|
.export __STARTUP__ : absolute = 1 ; Mark as startup
|
||
|
.import __RAM_START__, __RAM_SIZE__ ; Linker generated
|
||
|
|
||
|
.import copydata, zerobss, initlib, donelib
|
||
|
|
||
|
.include "zeropage.inc"
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Place the startup code in a special segment
|
||
|
|
||
|
.segment "STARTUP"
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; A little light 6502 housekeeping
|
||
|
|
||
|
_init: LDX #$FF ; Initialize stack pointer to $01FF
|
||
|
TXS
|
||
|
CLD ; Clear decimal mode
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Set cc65 argument stack pointer
|
||
|
|
||
|
LDA #<(__RAM_START__ + __RAM_SIZE__)
|
||
|
STA sp
|
||
|
LDA #>(__RAM_START__ + __RAM_SIZE__)
|
||
|
STA sp+1
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Initialize memory storage
|
||
|
|
||
|
JSR zerobss ; Clear BSS segment
|
||
|
JSR copydata ; Initialize DATA segment
|
||
|
JSR initlib ; Run constructors
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Call main()
|
||
|
|
||
|
JSR _main
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Back from main (this is also the _exit entry): force a software break
|
||
|
|
||
|
_exit: JSR donelib ; Run destructors
|
||
|
BRK
|
||
|
</code></tscreen>
|
||
|
|
||
|
The following discussion explains the purpose of several important
|
||
|
assembler level directives in this file.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.export _init, _exit
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler that the symbols <tt>_init</tt> and
|
||
|
<tt>_exit</tt> are to be accessible from other modules. In this
|
||
|
example, <tt>_init</tt> is the location that the CPU should jump to when
|
||
|
reset, and <tt>_exit</tt> is the location that will be called when the
|
||
|
program is finished.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.import _main
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler to import the symbol <tt>_main</tt>
|
||
|
from another module. cc65 names all C-level routines as
|
||
|
{underscore}{name} in assembler, thus the <tt>main ()</tt> routine
|
||
|
in C is named <tt>_main</tt> in the assembler. This is how the startup
|
||
|
code will link to the C-level code.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.export __STARTUP__ : absolute = 1 ; Mark as startup
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line marks this code as startup code (code that is executed when
|
||
|
the processor is reset), which will then be automatically linked into
|
||
|
the executable code.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.import __RAM_START__, __RAM_SIZE__ ; Linker generated
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line imports the RAM starting address and RAM size constants, which
|
||
|
are used to initialize the cc65 C-level argument stack pointer.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.segment "STARTUP"
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler that the code is to be placed in the
|
||
|
STARTUP segment of memory.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
JSR zerobss ; Clear BSS segment
|
||
|
JSR copydata ; Initialize DATA segment
|
||
|
JSR initlib ; Run constructors
|
||
|
</verb></tscreen>
|
||
|
|
||
|
These three lines initialize the external (global) and system
|
||
|
variables. The first line sets the BSS segment -- the memory locations
|
||
|
used for external variables -- to 0. The second line copies the
|
||
|
initialization value stored in ROM to the RAM locations used for
|
||
|
initialized external variables. The last line runs the constructors
|
||
|
that are used to initialize the system run-time variables.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
JSR _main
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This is the actual call to the C-level <tt>main ()</tt> routine,
|
||
|
which is called after the startup code completes.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
_exit: JSR donelib ; Run destructors
|
||
|
BRK
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This is the code that will be executed when <tt>main ()</tt>
|
||
|
terminates. The first thing that must be done is run the destructors
|
||
|
via a call to <tt>donelib</tt>. Then the program can terminate. In
|
||
|
this example, the program is expected to run forever. Therefore, there
|
||
|
needs to be a way of indicating when something has gone wrong and the
|
||
|
system needs to be shut down, requiring a restart only by a hard reset.
|
||
|
The BRK instruction will be used to indicate a software fault. This is
|
||
|
advantageous because cc65 uses the BRK instruction as the fill byte in
|
||
|
the final binary code. In addition, the hardware has been designed to
|
||
|
force the data lines to $00 for all illegal memory accesses, thereby
|
||
|
also forcing a BRK instruction into the CPU.
|
||
|
|
||
|
<sect>Custom Run-Time Library Creation<p>
|
||
|
|
||
|
The next step in customizing the cc65 toolset is creating a run-time
|
||
|
library for the targeted hardware. The easiest way to do this is to
|
||
|
modify a standard library from the cc65 distribution. In this example,
|
||
|
there is no console I/O, mouse, joystick, etc. in the system, so it is
|
||
|
most appropriate to use the simplest library as the base, which is for
|
||
|
the Watara Supervision and is named "supervision.lib" in the
|
||
|
lib directory of the distribution.
|
||
|
|
||
|
The only modification required is to replace the <tt>crt0</tt> module in
|
||
|
the supervision.lib library with custom startup code. This is simply
|
||
|
done by first copying the library and giving it a new name, compiling
|
||
|
the startup code with ca65, and finally using the ar65 archiver to
|
||
|
replace the module in the new library. The steps are shown below:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
$ copy "C:\Program Files\cc65\lib\supervision.lib" sbc.lib
|
||
|
$ ca65 crt0.s
|
||
|
$ ar65 a sbc.lib crt0.o
|
||
|
</verb></tscreen>
|
||
|
|
||
|
<sect>Interrupt Service Routine Definition<p>
|
||
|
|
||
|
For this system, the CPU is put into a wait condition prior to allowing
|
||
|
interrupt processing. Therefore, the interrupt service routine is very
|
||
|
simple: return from all valid interrupts. However, as mentioned
|
||
|
before, the BRK instruction is used to indicate a software fault, which
|
||
|
will call the same interrupt service routine as the maskable interrupt
|
||
|
signal IRQ. The interrupt service routine must be able to tell the
|
||
|
difference between the two, and act appropriately.
|
||
|
|
||
|
The interrupt service routine shown below includes code to detect when a
|
||
|
BRK instruction has occurred and stops the CPU from further processing.
|
||
|
The interrupt service routine is in a file named
|
||
|
"interrupt.s".
|
||
|
|
||
|
<tscreen><code>
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; interrupt.s
|
||
|
; ---------------------------------------------------------------------------
|
||
|
;
|
||
|
; Interrupt handler.
|
||
|
;
|
||
|
; Checks for a BRK instruction and returns from all valid interrupts.
|
||
|
|
||
|
.import _stop
|
||
|
.export _irq_int, _nmi_int
|
||
|
|
||
|
.segment "CODE"
|
||
|
|
||
|
.PC02 ; Force 65C02 assembly mode
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Non-maskable interrupt (NMI) service routine
|
||
|
|
||
|
_nmi_int: RTI ; Return from all NMI interrupts
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Maskable interrupt (IRQ) service routine
|
||
|
|
||
|
_irq_int: PHX ; Save X register contents to stack
|
||
|
TSX ; Transfer stack pointer to X
|
||
|
PHA ; Save accumulator contents to stack
|
||
|
INX ; Increment X so it points to the status
|
||
|
INX ; register value saved on the stack
|
||
|
LDA $100,X ; Load status register contents
|
||
|
AND #$10 ; Isolate B status bit
|
||
|
BNE break ; If B = 1, BRK detected
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; IRQ detected, return
|
||
|
|
||
|
irq: PLA ; Restore accumulator contents
|
||
|
PLX ; Restore X register contents
|
||
|
RTI ; Return from all IRQ interrupts
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; BRK detected, stop
|
||
|
|
||
|
break: JMP _stop ; If BRK is detected, something very bad
|
||
|
; has happened, so stop running
|
||
|
</code></tscreen>
|
||
|
|
||
|
The following discussion explains the purpose of several important
|
||
|
assembler level directives in this file.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.import _stop
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler to import the symbol <tt>_stop</tt>
|
||
|
from another module. This routine will be called if a BRK instruction
|
||
|
is encountered, signaling a software fault.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.export _irq_int, _nmi_int
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler that the symbols <tt>_irq_int</tt> and
|
||
|
<tt>_nmi_int</tt> are to be accessible from other modules. In this
|
||
|
example, the address of these symbols will be placed in the interrupt
|
||
|
vector table.
|
||
|
|
||
|
<tscreen><verb>
|
||
|
.segment "CODE"
|
||
|
</verb></tscreen>
|
||
|
|
||
|
This line instructs the assembler that the code is to be placed in the
|
||
|
CODE segment of memory. Note that because there are 65C02 mnemonics in
|
||
|
the assembly code, the assembler is forced to use the 65C02 instruction
|
||
|
set with the <tt>.PC02</tt> directive.
|
||
|
|
||
|
The final step is to define the interrupt vector memory locations.
|
||
|
Recall that a segment named VECTORS was defined in the memory
|
||
|
configuration file, which started at location $FFFA. The addresses of
|
||
|
the interrupt service routines from "interrupt.s" along with
|
||
|
the address for the initialization code in crt0 are defined in a file
|
||
|
named "vectors.s". Note that these vectors will be placed in
|
||
|
memory in their proper little-endian format as:
|
||
|
|
||
|
<p><tt> $FFFA - $FFFB:</tt> NMI interrupt vector (low byte, high byte)
|
||
|
<p><tt> $FFFC - $FFFD:</tt> Reset vector (low byte, high byte)
|
||
|
<p><tt> $FFFE - $FFFF:</tt> IRQ/BRK interrupt vector (low byte, high byte)
|
||
|
|
||
|
using the <tt>.addr</tt> assembler directive. The contents of the file are:
|
||
|
|
||
|
<tscreen><code>
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; vectors.s
|
||
|
; ---------------------------------------------------------------------------
|
||
|
;
|
||
|
; Defines the interrupt vector table.
|
||
|
|
||
|
.import _init
|
||
|
.import _nmi_int, _irq_int
|
||
|
|
||
|
.segment "VECTORS"
|
||
|
|
||
|
.addr _nmi_int ; NMI vector
|
||
|
.addr _init ; Reset vector
|
||
|
.addr _irq_int ; IRQ/BRK vector
|
||
|
</code></tscreen>
|
||
|
|
||
|
The cc65 toolset will replace the address symbols defined here with the
|
||
|
actual addresses of the routines during the link process.
|
||
|
|
||
|
<sect>Adding Custom Instructions<p>
|
||
|
|
||
|
The cc65 instruction set only supports the WAI (Wait for Interrupt) and
|
||
|
STP (Stop) instructions when used with the 65816 CPU (accessed via the
|
||
|
--cpu command line option of the ca65 macro assembler). The 65C02 core
|
||
|
used in this example supports these two instructions, and in fact the
|
||
|
system benefits from the use of both the WAI and STP instructions.
|
||
|
|
||
|
In order to use the WAI instruction in this case, a C routine named
|
||
|
"wait" was created that consists of the WAI opcode followed by
|
||
|
a subroutine return. It was convenient in this example to put the IRQ
|
||
|
interrupt enable in this subroutine as well, since interrupts should
|
||
|
only be enabled when the code is in this wait condition.
|
||
|
|
||
|
For both the WAI and STP instructions, the assembler is
|
||
|
"fooled" into placing those opcodes into memory by inserting a
|
||
|
single byte of data that just happens to be the opcode for those
|
||
|
instructions. The assembly code routines are placed in a file, named
|
||
|
"wait.s", which is shown below:
|
||
|
|
||
|
<tscreen><code>
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; wait.s
|
||
|
; ---------------------------------------------------------------------------
|
||
|
;
|
||
|
; Wait for interrupt and return
|
||
|
|
||
|
.export _wait, _stop
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Wait for interrupt: Forces the assembler to emit a WAI opcode ($CB)
|
||
|
; ---------------------------------------------------------------------------
|
||
|
|
||
|
.segment "CODE"
|
||
|
|
||
|
.proc _wait: near
|
||
|
|
||
|
CLI ; Enable interrupts
|
||
|
.byte $CB ; Inserts a WAI opcode
|
||
|
RTS ; Return to caller
|
||
|
|
||
|
.endproc
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Stop: Forces the assembler to emit a STP opcode ($DB)
|
||
|
; ---------------------------------------------------------------------------
|
||
|
|
||
|
.proc _stop: near
|
||
|
|
||
|
.byte $DB ; Inserts a STP opcode
|
||
|
|
||
|
.endproc
|
||
|
</code></tscreen>
|
||
|
|
||
|
The label <tt>_wait</tt>, when exported, can be called by using the
|
||
|
<tt>wait ()</tt> subroutine call in C. The section is marked as
|
||
|
code so that it will be stored in read-only memory, and the procedure is
|
||
|
tagged for 16-bit absolute addressing via the "near"
|
||
|
modifier. Similarly, the <tt>_stop</tt> routine can be called from
|
||
|
within the C-level code via a call to <tt>stop ()</tt>. In
|
||
|
addition, the routine can be called from assembly code by calling
|
||
|
<tt>_stop</tt> (as was done in the interrupt service routine).
|
||
|
|
||
|
<sect>Hardware Drivers<p>
|
||
|
|
||
|
Oftentimes, it can be advantageous to create small application helpers
|
||
|
in assembly language to decrease codespace and increase execution speed
|
||
|
of the overall program. An example of this would be the transfer of
|
||
|
characters to a FIFO (<bf>F</bf>irst-<bf>I</bf>n,
|
||
|
<bf>F</bf>irst-<bf>O</bf>ut) storage buffer for transmission over a
|
||
|
serial port. This simple action could be performed by an assembly
|
||
|
language driver which would execute much quicker than coding it in C.
|
||
|
The following discussion outlines a method of interfacing a C program
|
||
|
with an assembly language subroutine.
|
||
|
|
||
|
The first step in creating the assembly language code for the driver is
|
||
|
to determine how to pass the C arguments to the assembly language
|
||
|
routine. The cc65 toolset allows the user to specify whether the data
|
||
|
is passed to a subroutine via the stack or by the processor registers by
|
||
|
using the <tt>__fastcall__</tt> function declaration (note that there
|
||
|
are two underscore characters in front of and two behind the
|
||
|
<tt>fastcall</tt> declaration). When <tt>__fastcall__</tt> is
|
||
|
specified, the rightmost argument in the function call is passed to the
|
||
|
subroutine using the 6502 registers instead of the stack. Note that if
|
||
|
there is only one argument in the function call, the execution overhead
|
||
|
required by the stack interface routines is completely avoided.
|
||
|
|
||
|
Without <tt>__fastcall__</tt>, the argument is loaded in the A and X
|
||
|
registers and then pushed onto the stack via a call to <tt>pushax</tt>.
|
||
|
The first thing the subroutine does is retrieve the argument from the
|
||
|
stack via a call to <tt>ldax0sp</tt>, which copies the values into the A
|
||
|
and X. When the subroutine is finished, the values on the stack must be
|
||
|
popped off and discarded via a jump to <tt>incsp2</tt>, which includes
|
||
|
the RTS subroutine return command. This is shown in the following code
|
||
|
sample.
|
||
|
|
||
|
Calling sequence:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
lda #<(L0001) ; Load A with the high order byte
|
||
|
ldx #>(L0001) ; Load X with the low order byte
|
||
|
jsr pushax ; Push A and X onto the stack
|
||
|
jsr _foo ; Call foo, i.e., foo (arg)
|
||
|
</verb></tscreen>
|
||
|
|
||
|
Subroutine code:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
_foo: jsr ldax0sp ; Retrieve A and X from the stack
|
||
|
sta ptr ; Store A in ptr
|
||
|
stx ptr+1 ; Store X in ptr+1
|
||
|
... ; (more subroutine code goes here)
|
||
|
jmp incsp2 ; Pop A and X from the stack (includes return)
|
||
|
</verb></tscreen>
|
||
|
|
||
|
If <tt>__fastcall__</tt> is specified, the argument is loaded into the A
|
||
|
and X registers as before, but the subroutine is then called
|
||
|
immediately. The subroutine does not need to retrieve the argument
|
||
|
since the value is already available in the A and X registers.
|
||
|
Furthermore, the subroutine can be terminated with an RTS statement
|
||
|
since there is no stack cleanup which needs to be performed. This is
|
||
|
shown in the following code sample.
|
||
|
|
||
|
Calling sequence:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
lda #<(L0001) ; Load A with the high order byte
|
||
|
ldx #>(L0001) ; Load X with the low order byte
|
||
|
jsr _foo ; Call foo, i.e., foo (arg)
|
||
|
</verb></tscreen>
|
||
|
|
||
|
Subroutine code:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
_foo: sta ptr ; Store A in ptr
|
||
|
stx ptr+1 ; Store X in ptr+1
|
||
|
... ; (more subroutine code goes here)
|
||
|
rts ; Return from subroutine
|
||
|
</verb></tscreen>
|
||
|
|
||
|
The hardware driver in this example writes a string of character data to
|
||
|
a hardware FIFO located at memory location $1000. Each character is
|
||
|
read and is compared to the C string termination value ($00), which will
|
||
|
terminate the loop. All other character data is written to the FIFO.
|
||
|
For convenience, a carriage return/line feed sequence is automatically
|
||
|
appended to the serial stream. The driver defines a local pointer
|
||
|
variable which is stored in the zero page memory space in order to allow
|
||
|
for retrieval of each character in the string via the indirect indexed
|
||
|
addressing mode.
|
||
|
|
||
|
The assembly language routine is stored in a file names
|
||
|
"rs232_tx.s" and is shown below:
|
||
|
|
||
|
<tscreen><code>
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; rs232_tx.s
|
||
|
; ---------------------------------------------------------------------------
|
||
|
;
|
||
|
; Write a string to the transmit UART FIFO
|
||
|
|
||
|
.export _rs232_tx
|
||
|
.exportzp _rs232_data: near
|
||
|
|
||
|
.define TX_FIFO $1000 ; Transmit FIFO memory location
|
||
|
|
||
|
.zeropage
|
||
|
|
||
|
_rs232_data: .res 2, $00 ; Reserve a local zero page pointer
|
||
|
|
||
|
.segment "CODE"
|
||
|
|
||
|
.proc _rs232_tx: near
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Store pointer to zero page memory and load first character
|
||
|
|
||
|
sta _rs232_data ; Set zero page pointer to string address
|
||
|
stx _rs232_data+1 ; (pointer passed in via the A/X registers)
|
||
|
ldy #00 ; Initialize Y to 0
|
||
|
lda (_rs232_data) ; Load first character
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Main loop: read data and store to FIFO until \0 is encountered
|
||
|
|
||
|
loop: sta TX_FIFO ; Loop: Store character in FIFO
|
||
|
iny ; Increment Y index
|
||
|
lda (_rs232_data),y ; Get next character
|
||
|
bne loop ; If character == 0, exit loop
|
||
|
|
||
|
; ---------------------------------------------------------------------------
|
||
|
; Append CR/LF to output stream and return
|
||
|
|
||
|
lda #$0D ; Store CR
|
||
|
sta TX_FIFO
|
||
|
lda #$0A ; Store LF
|
||
|
sta TX_FIFO
|
||
|
rts ; Return
|
||
|
|
||
|
.endproc
|
||
|
</code></tscreen>
|
||
|
|
||
|
<sect>Hello World! Example<p>
|
||
|
|
||
|
The following short example demonstrates programming in C using the cc65
|
||
|
toolset with a custom run-time environment. In this example, a Xilinx
|
||
|
FPGA contains a UART which is connected to a 65c02 processor with FIFO
|
||
|
(<bf>F</bf>irst-<bf>I</bf>n, <bf>F</bf>irst-<bf>O</bf>ut) storage to
|
||
|
buffer the data. The C program will wait for an interrupt generated by
|
||
|
the receive UART and then respond by transmitting the string "Hello
|
||
|
World! " every time a question mark character is received via a
|
||
|
call to the hardware driver <tt>rs232_tx ()</tt>. The driver
|
||
|
prototype uses the <tt>__fastcall__</tt> extension to indicate that the
|
||
|
driver does not use the stack. The FIFO data interface is at address
|
||
|
$1000 and is defined as the symbolic constant <tt>FIFO_DATA</tt>.
|
||
|
Writing to <tt>FIFO_DATA</tt> transfers a byte of data into the transmit
|
||
|
FIFO for subsequent transmission over the serial interface. Reading
|
||
|
from <tt>FIFO_DATA</tt> transfers a byte of previously received data out
|
||
|
of the receive FIFO. The FIFO status data is at address $1001 and is
|
||
|
defined as the symbolic constant <tt>FIFO_STATUS</tt>. For convenience,
|
||
|
the symbolic constants <tt>TX_FIFO_FULL</tt> (which isolates bit 0 of
|
||
|
the register) and <tt>RX_FIFO_EMPTY</tt> (which isolates bit 1 of the
|
||
|
register) have been defined to read the FIFO status.
|
||
|
|
||
|
The following C code is saved in the file "main.c". As this
|
||
|
example demonstrates, the run-time environment has been set up such that
|
||
|
all of the behind-the-scene work is transparent to the user.
|
||
|
|
||
|
<tscreen><code>
|
||
|
#define FIFO_DATA (*(unsigned char *) 0x1000)
|
||
|
#define FIFO_STATUS (*(unsigned char *) 0x1001)
|
||
|
|
||
|
#define TX_FIFO_FULL (FIFO_STATUS & 0x01)
|
||
|
#define RX_FIFO_EMPTY (FIFO_STATUS & 0x02)
|
||
|
|
||
|
extern void wait ();
|
||
|
extern void __fastcall__ rs232_tx (char *str);
|
||
|
|
||
|
int main () {
|
||
|
while (1) { // Run forever
|
||
|
wait (); // Wait for an RX FIFO interrupt
|
||
|
|
||
|
while (RX_FIFO_EMPTY == 0) { // While the RX FIFO is not empty
|
||
|
if (FIFO_DATA == '?') { // Does the RX character = '?'
|
||
|
rs232_tx ("Hello World!"); // Transmit "Hello World!"
|
||
|
} // Discard any other RX characters
|
||
|
}
|
||
|
}
|
||
|
|
||
|
return (0); // We should never get here!
|
||
|
}
|
||
|
</code></tscreen>
|
||
|
|
||
|
<sect>Putting It All Together<p>
|
||
|
|
||
|
The following commands will create a ROM image named "a.out"
|
||
|
that can be used as the initialization data for the Xilinx Block RAM
|
||
|
used for code storage:
|
||
|
|
||
|
<tscreen><verb>
|
||
|
$ cc65 -t none -O --cpu 65sc02 main.c
|
||
|
$ ca65 --cpu 65sc02 main.s
|
||
|
$ ca65 --cpu 65sc02 rs232_tx.s
|
||
|
$ ca65 --cpu 65sc02 interrupt.s
|
||
|
$ ca65 --cpu 65sc02 vectors.s
|
||
|
$ ca65 --cpu 65sc02 wait.s
|
||
|
$ ld65 -C sbc.cfg -m main.map interrupt.o vectors.o wait.o rs232_tx.o
|
||
|
main.o sbc.lib
|
||
|
</verb></tscreen>
|
||
|
|
||
|
During the C-level code compilation phase (<tt>cc65</tt>), assumptions
|
||
|
about the target system are disabled via the <tt>-t none</tt> command
|
||
|
line option. During the object module linker phase (<tt>ld65</tt>), the
|
||
|
target customization is enabled via inclusion of the <tt>sbc.lib</tt>
|
||
|
file and the selection of the configuration file via the <tt>-C
|
||
|
sbc.cfg</tt> command line option.
|
||
|
|
||
|
The 65C02 core used most closely matches the cc65 toolset processor
|
||
|
named 65SC02 (the 65C02 extensions without the bit manipulation
|
||
|
instructions), so all the commands specify the use of that processor via
|
||
|
the <tt>--cpu 65sc02</tt> option.
|
||
|
|
||
|
</article>
|