mirror of
https://github.com/MoleskiCoder/EightBit.git
synced 2025-01-24 09:30:05 +00:00
f5125b2a35
Signed-off-by: Adrian Conlon <Adrian.conlon@gmail.com>
1605 lines
70 KiB
Plaintext
1605 lines
70 KiB
Plaintext
[C= commodore 64]
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
64doc
|
|
|
|
# $Id: 64doc,v 1.8 1994/06/03 19:50:04 jopi Exp $
|
|
#
|
|
# This file is part of Commodore 64 emulator
|
|
# and Program Development System.
|
|
#
|
|
# See README for copyright notice
|
|
#
|
|
# This file contains documentation for 6502/6510/8500/8502 instruction set.
|
|
#
|
|
#
|
|
# Written by
|
|
# John West (john@ucc.gu.uwa.edu.au)
|
|
# Marko Mäkelä (Marko.Makela@HUT.FI)
|
|
#
|
|
#
|
|
# $Log: 64doc,v $
|
|
# Revision 1.8 1994/06/03 19:50:04 jopi
|
|
# Patchlevel 2
|
|
#
|
|
# Revision 1.7 1994/04/15 13:07:04 jopi
|
|
# 65xx Register descriptions added
|
|
#
|
|
# Revision 1.6 1994/02/18 16:09:36 jopi
|
|
#
|
|
# Revision 1.5 1994/01/26 16:08:37 jopi
|
|
# X64 version 0.2 PL 1
|
|
#
|
|
# Revision 1.4 1993/11/10 01:55:34 jopi
|
|
#
|
|
# Revision 1.3 93/06/21 13:37:18 jopi
|
|
# X64 version 0.2 PL 0
|
|
#
|
|
# Revision 1.2 93/06/21 13:07:15 jopi
|
|
# *** empty log message ***
|
|
#
|
|
#
|
|
|
|
Note: To extract the uuencoded ML programs in this article most
|
|
easily you may use e.g. "uud" by Edwin Kremer ,
|
|
which extracts them all at once.
|
|
|
|
Documentation for the NMOS 65xx/85xx Instruction Set
|
|
|
|
6510 Instructions by Addressing Modes
|
|
6502 Registers
|
|
6510/8502 Undocumented Commands
|
|
Register selection for load and store
|
|
Decimal mode in NMOS 6500 series
|
|
6510 features
|
|
Different CPU types
|
|
6510 Instruction Timing
|
|
How Real Programmers Acknowledge Interrupts
|
|
Memory Management
|
|
Autostart Code
|
|
Notes
|
|
References
|
|
|
|
6510 Instructions by Addressing Modes
|
|
|
|
off- ++++++++++ Positive ++++++++++ ---------- Negative ----------
|
|
set 00 20 40 60 80 a0 c0 e0 mode
|
|
|
|
+00 BRK JSR RTI RTS NOP* LDY CPY CPX Impl/immed
|
|
+01 ORA AND EOR ADC STA LDA CMP SBC (indir,x)
|
|
+02 t t t t NOP*t LDX NOP*t NOP*t ? /immed
|
|
+03 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* (indir,x)
|
|
+04 NOP* BIT NOP* NOP* STY LDY CPY CPX Zeropage
|
|
+05 ORA AND EOR ADC STA LDA CMP SBC Zeropage
|
|
+06 ASL ROL LSR ROR STX LDX DEC INC Zeropage
|
|
+07 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Zeropage
|
|
|
|
+08 PHP PLP PHA PLA DEY TAY INY INX Implied
|
|
+09 ORA AND EOR ADC NOP* LDA CMP SBC Immediate
|
|
+0a ASL ROL LSR ROR TXA TAX DEX NOP Accu/impl
|
|
+0b ANC** ANC** ASR** ARR** ANE** LXA** SBX** SBC* Immediate
|
|
+0c NOP* BIT JMP JMP () STY LDY CPY CPX Absolute
|
|
+0d ORA AND EOR ADC STA LDA CMP SBC Absolute
|
|
+0e ASL ROL LSR ROR STX LDX DEC INC Absolute
|
|
+0f SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Absolute
|
|
|
|
+10 BPL BMI BVC BVS BCC BCS BNE BEQ Relative
|
|
+11 ORA AND EOR ADC STA LDA CMP SBC (indir),y
|
|
+12 t t t t t t t t ?
|
|
+13 SLO* RLA* SRE* RRA* SHA** LAX* DCP* ISB* (indir),y
|
|
+14 NOP* NOP* NOP* NOP* STY LDY NOP* NOP* Zeropage,x
|
|
+15 ORA AND EOR ADC STA LDA CMP SBC Zeropage,x
|
|
+16 ASL ROL LSR ROR STX y) LDX y) DEC INC Zeropage,x
|
|
+17 SLO* RLA* SRE* RRA* SAX* y) LAX* y) DCP* ISB* Zeropage,x
|
|
|
|
+18 CLC SEC CLI SEI TYA CLV CLD SED Implied
|
|
+19 ORA AND EOR ADC STA LDA CMP SBC Absolute,y
|
|
+1a NOP* NOP* NOP* NOP* TXS TSX NOP* NOP* Implied
|
|
+1b SLO* RLA* SRE* RRA* SHS** LAS** DCP* ISB* Absolute,y
|
|
+1c NOP* NOP* NOP* NOP* SHY** LDY NOP* NOP* Absolute,x
|
|
+1d ORA AND EOR ADC STA LDA CMP SBC Absolute,x
|
|
+1e ASL ROL LSR ROR SHX**y) LDX y) DEC INC Absolute,x
|
|
+1f SLO* RLA* SRE* RRA* SHA**y) LAX* y) DCP* ISB* Absolute,x
|
|
|
|
ROR intruction is available on MC650x microprocessors after
|
|
June, 1976.
|
|
|
|
Legend:
|
|
|
|
t Jams the machine
|
|
*t Jams very rarely
|
|
* Undocumented command
|
|
** Unusual operation
|
|
y) indexed using Y instead of X
|
|
() indirect instead of absolute
|
|
|
|
Note that the NOP instructions do have other addressing modes
|
|
than the implied addressing. The NOP instruction is just like
|
|
any other load instruction, except it does not store the
|
|
result anywhere nor affects the flags.
|
|
|
|
6502 Registers
|
|
|
|
The NMOS 65xx processors are not ruined with too many registers. In
|
|
addition to that, the registers are mostly 8-bit. Here is a brief
|
|
description of each register:
|
|
|
|
PC Program Counter
|
|
|
|
This register points the address from which the next
|
|
instruction byte (opcode or parameter) will be fetched.
|
|
Unlike other registers, this one is 16 bits in length. The
|
|
low and high 8-bit halves of the register are called PCL
|
|
and PCH, respectively.
|
|
|
|
The Program Counter may be read by pushing its value on
|
|
the stack. This can be done either by jumping to a
|
|
subroutine or by causing an interrupt.
|
|
|
|
S Stack pointer
|
|
|
|
The NMOS 65xx processors have 256 bytes of stack memory,
|
|
ranging from $0100 to $01FF. The S register is a 8-bit
|
|
offset to the stack page. In other words, whenever
|
|
anything is being pushed on the stack, it will be stored
|
|
to the address $0100+S.
|
|
|
|
The Stack pointer can be read and written by transfering
|
|
its value to or from the index register X (see below) with
|
|
the TSX and TXS instructions.
|
|
|
|
P Processor status
|
|
|
|
This 8-bit register stores the state of the processor. The
|
|
bits in this register are called flags. Most of the flags
|
|
have something to do with arithmetic operations.
|
|
|
|
The P register can be read by pushing it on the stack
|
|
(with PHP or by causing an interrupt). If you only need to
|
|
read one flag, you can use the branch instructions.
|
|
Setting the flags is possible by pulling the P register
|
|
from stack or by using the flag set or clear instructions.
|
|
|
|
Following is a list of the flags, starting from the 8th
|
|
bit of the P register (bit 7, value $80):
|
|
|
|
N Negative flag
|
|
|
|
This flag will be set after any arithmetic operations
|
|
(when any of the registers A, X or Y is being loaded
|
|
with a value). Generally, the N flag will be copied
|
|
from the topmost bit of the register being loaded.
|
|
|
|
Note that TXS (Transfer X to S) is not an arithmetic
|
|
operation. Also note that the BIT instruction affects
|
|
the Negative flag just like arithmetic operations.
|
|
Finally, the Negative flag behaves differently in
|
|
Decimal operations (see description below).
|
|
|
|
V oVerflow flag
|
|
|
|
Like the Negative flag, this flag is intended to be
|
|
used with 8-bit signed integer numbers. The flag will
|
|
be affected by addition and subtraction, the
|
|
instructions PLP, CLV and BIT, and the hardware signal
|
|
-SO. Note that there is no SEV instruction, even though
|
|
the MOS engineers loved to use East European abbreviations,
|
|
like DDR (Deutsche Demokratische Republik vs. Data
|
|
Direction Register). (The Russian abbreviation for their
|
|
former trade association COMECON is SEV.) The -SO
|
|
(Set Overflow) signal is available on some processors,
|
|
at least the 6502, to set the V flag. This enables
|
|
response to an I/O activity in equal or less than
|
|
three clock cycles when using a BVC instruction branching
|
|
to itself ($50 $FE).
|
|
|
|
The CLV instruction clears the V flag, and the PLP and
|
|
BIT instructions copy the flag value from the bit 6 of
|
|
the topmost stack entry or from memory.
|
|
|
|
After a binary addition or subtraction, the V flag
|
|
will be set on a sign overflow, cleared otherwise.
|
|
What is a sign overflow? For instance, if you are
|
|
trying to add 123 and 45 together, the result (168)
|
|
does not fit in a 8-bit signed integer (upper limit
|
|
127 and lower limit -128). Similarly, adding -123 to
|
|
-45 causes the overflow, just like subtracting -45
|
|
from 123 or 123 from -45 would do.
|
|
|
|
Like the N flag, the V flag will not be set as
|
|
expected in the Decimal mode. Later in this document
|
|
is a precise operation description.
|
|
|
|
A common misbelief is that the V flag could only be
|
|
set by arithmetic operations, not cleared.
|
|
|
|
1 unused flag
|
|
|
|
To the current knowledge, this flag is always 1.
|
|
|
|
B Break flag
|
|
|
|
This flag is used to distinguish software (BRK)
|
|
interrupts from hardware interrupts (IRQ or NMI). The
|
|
B flag is always set except when the P register is
|
|
being pushed on stack when jumping to an interrupt
|
|
routine to process only a hardware interrupt.
|
|
|
|
The official NMOS 65xx documentation claims that the
|
|
BRK instruction could only cause a jump to the IRQ
|
|
vector ($FFFE). However, if an NMI interrupt occurs
|
|
while executing a BRK instruction, the processor will
|
|
jump to the NMI vector ($FFFA), and the P register
|
|
will be pushed on the stack with the B flag set.
|
|
|
|
D Decimal mode flag
|
|
|
|
This flag is used to select the (Binary Coded) Decimal
|
|
mode for addition and subtraction. In most
|
|
applications, the flag is zero.
|
|
|
|
The Decimal mode has many oddities, and it operates
|
|
differently on CMOS processors. See the description
|
|
of the ADC, SBC and ARR instructions below.
|
|
|
|
I Interrupt disable flag
|
|
|
|
This flag can be used to prevent the processor from
|
|
jumping to the IRQ handler vector ($FFFE) whenever the
|
|
hardware line -IRQ is active. The flag will be
|
|
automatically set after taking an interrupt, so that
|
|
the processor would not keep jumping to the interrupt
|
|
routine if the -IRQ signal remains low for several
|
|
clock cycles.
|
|
|
|
Z Zero flag
|
|
|
|
The Zero flag will be affected in the same cases than
|
|
the Negative flag. Generally, it will be set if an
|
|
arithmetic register is being loaded with the value
|
|
zero, and cleared otherwise. The flag will behave
|
|
differently in Decimal operations.
|
|
|
|
C Carry flag
|
|
|
|
This flag is used in additions, subtractions,
|
|
comparisons and bit rotations. In additions and
|
|
subtractions, it acts as a 9th bit and lets you to
|
|
chain operations to calculate with bigger than 8-bit
|
|
numbers. When subtracting, the Carry flag is the
|
|
negative of Borrow: if an overflow occurs, the flag
|
|
will be clear, otherwise set. Comparisons are a
|
|
special case of subtraction: they assume Carry flag
|
|
set and Decimal flag clear, and do not store the
|
|
result of the subtraction anywhere.
|
|
|
|
There are four kinds of bit rotations. All of them
|
|
store the bit that is being rotated off to the Carry
|
|
flag. The left shifting instructions are ROL and ASL.
|
|
ROL copies the initial Carry flag to the lowmost bit
|
|
of the byte; ASL always clears it. Similarly, the ROR
|
|
and LSR instructions shift to the right.
|
|
|
|
A Accumulator
|
|
|
|
The accumulator is the main register for arithmetic and
|
|
logic operations. Unlike the index registers X and Y, it
|
|
has a direct connection to the Arithmetic and Logic Unit
|
|
(ALU). This is why many operations are only available for
|
|
the accumulator, not the index registers.
|
|
|
|
X Index register X
|
|
|
|
This is the main register for addressing data with
|
|
indices. It has a special addressing mode, indexed
|
|
indirect, which lets you to have a vector table on the
|
|
zero page.
|
|
|
|
Y Index register Y
|
|
|
|
The Y register has the least operations available. On the
|
|
other hand, only it has the indirect indexed addressing
|
|
mode that enables access to any memory place without
|
|
having to use self-modifying code.
|
|
|
|
6510/8502 Undocumented Commands
|
|
|
|
-- A brief explanation about what may happen while
|
|
using don't care states.
|
|
|
|
ANE $8B A = (A | #$EE) & X & #byte
|
|
same as
|
|
A = ((A & #$11 & X) | ( #$EE & X)) & #byte
|
|
|
|
In real 6510/8502 the internal parameter #$11
|
|
may occasionally be #$10, #$01 or even #$00.
|
|
This occurs when the video chip starts DMA
|
|
between the opcode fetch and the parameter fetch
|
|
of the instruction. The value probably depends
|
|
on the data that was left on the bus by the VIC-II.
|
|
|
|
LXA $AB C=Lehti: A = X = ANE
|
|
Alternate: A = X = (A & #byte)
|
|
|
|
TXA and TAX have to be responsible for these.
|
|
|
|
SHA $93,$9F Store (A & X & (ADDR_HI + 1))
|
|
SHX $9E Store (X & (ADDR_HI + 1))
|
|
SHY $9C Store (Y & (ADDR_HI + 1))
|
|
SHS $9B SHA and TXS, where X is replaced by (A & X).
|
|
|
|
Note: The value to be stored is copied also
|
|
to ADDR_HI if page boundary is crossed.
|
|
|
|
SBX $CB Carry and Decimal flags are ignored but the
|
|
Carry flag will be set in substraction. This
|
|
is due to the CMP command, which is executed
|
|
instead of the real SBC.
|
|
|
|
ARR $6B This instruction first performs an AND
|
|
between the accumulator and the immediate
|
|
parameter, then it shifts the accumulator to
|
|
the right. However, this is not the whole
|
|
truth. See the description below.
|
|
|
|
Many undocumented commands do not use AND between registers, the CPU
|
|
just throws the bytes to a bus simultaneously and lets the
|
|
open-collector drivers perform the AND. I.e. the command called 'SAX',
|
|
which is in the STORE section (opcodes $A0...$BF), stores the result
|
|
of (A & X) by this way.
|
|
|
|
More fortunate is its opposite, 'LAX' which just loads a byte
|
|
simultaneously into both A and X.
|
|
|
|
$6B ARR
|
|
|
|
This instruction seems to be a harmless combination of AND and ROR at
|
|
first sight, but it turns out that it affects the V flag and also has
|
|
a special kind of decimal mode. This is because the instruction has
|
|
inherited some properties of the ADC instruction ($69) in addition to
|
|
the ROR ($6A).
|
|
|
|
In Binary mode (D flag clear), the instruction effectively does an AND
|
|
between the accumulator and the immediate parameter, and then shifts
|
|
the accumulator to the right, copying the C flag to the 8th bit. It
|
|
sets the Negative and Zero flags just like the ROR would. The ADC code
|
|
shows up in the Carry and oVerflow flags. The C flag will be copied
|
|
from the bit 6 of the result (which doesn't seem too logical), and the
|
|
V flag is the result of an Exclusive OR operation between the bit 6
|
|
and the bit 5 of the result. This makes sense, since the V flag will
|
|
be normally set by an Exclusive OR, too.
|
|
|
|
In Decimal mode (D flag set), the ARR instruction first performs the
|
|
AND and ROR, just like in Binary mode. The N flag will be copied from
|
|
the initial C flag, and the Z flag will be set according to the ROR
|
|
result, as expected. The V flag will be set if the bit 6 of the
|
|
accumulator changed its state between the AND and the ROR, cleared
|
|
otherwise.
|
|
|
|
Now comes the funny part. If the low nybble of the AND result,
|
|
incremented by its lowmost bit, is greater than 5, the low nybble in
|
|
the ROR result will be incremented by 6. The low nybble may overflow
|
|
as a consequence of this BCD fixup, but the high nybble won't be
|
|
adjusted. The high nybble will be BCD fixed in a similar way. If the
|
|
high nybble of the AND result, incremented by its lowmost bit, is
|
|
greater than 5, the high nybble in the ROR result will be incremented
|
|
by 6, and the Carry flag will be set. Otherwise the C flag will be
|
|
cleared.
|
|
|
|
To help you understand this description, here is a C routine that
|
|
illustrates the ARR operation in Decimal mode:
|
|
|
|
unsigned
|
|
A, /* Accumulator */
|
|
AL, /* low nybble of accumulator */
|
|
AH, /* high nybble of accumulator */
|
|
|
|
C, /* Carry flag */
|
|
Z, /* Zero flag */
|
|
V, /* oVerflow flag */
|
|
N, /* Negative flag */
|
|
|
|
t, /* temporary value */
|
|
s; /* value to be ARRed with Accumulator */
|
|
|
|
t = A & s; /* Perform the AND. */
|
|
|
|
AH = t >> 4; /* Separate the high */
|
|
AL = t & 15; /* and low nybbles. */
|
|
|
|
N = C; /* Set the N and */
|
|
Z = !(A = (t >> 1) | (C << 7)); /* Z flags traditionally */
|
|
V = (t ^ A) & 64; /* and V flag in a weird way. */
|
|
|
|
if (AL + (AL & 1) > 5) /* BCD "fixup" for low nybble. */
|
|
A = (A & 0xF0) | ((A + 6) & 0xF);
|
|
|
|
if (C = AH + (AH & 1) > 5) /* Set the Carry flag. */
|
|
A = (A + 0x60) & 0xFF; /* BCD "fixup" for high nybble. */
|
|
|
|
$CB SBX X <- (A & X) - Immediate
|
|
|
|
The 'SBX' ($CB) may seem to be very complex operation, even though it
|
|
is a combination of the subtraction of accumulator and parameter, as
|
|
in the 'CMP' instruction, and the command 'DEX'. As a result, both A
|
|
and X are connected to ALU but only the subtraction takes place. Since
|
|
the comparison logic was used, the result of subtraction should be
|
|
normally ignored, but the 'DEX' now happily stores to X the value of
|
|
(A & X) - Immediate. That is why this instruction does not have any
|
|
decimal mode, and it does not affect the V flag. Also Carry flag will
|
|
be ignored in the subtraction but set according to the result.
|
|
|
|
Proof:
|
|
|
|
begin 644 vsbx
|
|
M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```*D`H#V1*Z`_D2N@09$KJ0>%
|
|
M^QBE^VEZJ+$KH#F1*ZD`2"BI`*(`RP`(:-B@.5$K*4#P`E@`H#VQ*SAI`)$K
|
|
JD-Z@/[$K:0"1*Y#4J2X@TO\XH$&Q*VD`D2N0Q,;[$+188/_^]_:_OK>V
|
|
`
|
|
end
|
|
|
|
and
|
|
|
|
begin 644 sbx
|
|
M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI`*!-D2N@3Y$KH%&1*ZD#
|
|
MA?L8I?M*2)`#J1@LJ3B@29$K:$J0`ZGX+*G8R)$K&/BXJ?2B8\L)AOP(:(7]
|
|
MV#B@3;$KH$\Q*Z!1\2L(1?SP`0!H1?TIM]#XH$VQ*SAI`)$KD,N@3[$K:0"1
|
|
9*Y#!J2X@TO\XH%&Q*VD`D2N0L<;[$))88-#X
|
|
`
|
|
end
|
|
|
|
These test programs show if your machine is compatible with ours
|
|
regarding the opcode $CB. The first test, vsbx, proves that SBX does
|
|
not affect the V flag. The latter one, sbx, proves the rest of our
|
|
theory. The vsbx test tests 33554432 SBX combinations (16777216
|
|
different A, X and Immediate combinations, and two different V flag
|
|
states), and the sbx test doubles that amount (16777216*4 D and C flag
|
|
combinations). Both tests have run successfully on a C64 and a Vic20.
|
|
They ought to run on C16, +4 and the PET series as well. The tests
|
|
stop with BRK, if the opcode $CB does not work as expected. Successful
|
|
operation ends in RTS. As the tests are very slow, they print dots on
|
|
the screen while running so that you know that the machine has not
|
|
jammed. On computers running at 1 MHz, the first test prints
|
|
approximately one dot every four seconds and a total of 2048 dots,
|
|
whereas the second one prints half that amount, one dot every seven
|
|
seconds.
|
|
|
|
If the tests fail on your machine, please let us know your processor's
|
|
part number and revision. If possible, save the executable (after it
|
|
has stopped with BRK) under another name and send it to us so that we
|
|
know at which stage the program stopped.
|
|
|
|
The following program is a Commodore 64 executable that Marko Mäkelä
|
|
developed when trying to find out how the V flag is affected by SBX.
|
|
(It was believed that the SBX affects the flag in a weird way, and
|
|
this program shows how SBX sets the flag differently from SBC.) You
|
|
may find the subroutine at $C150 useful when researching other
|
|
undocumented instructions' flags. Run the program in a machine
|
|
language monitor, as it makes use of the BRK instruction. The result
|
|
tables will be written on pages $C2 and $C3.
|
|
|
|
begin 644 sbx-c100
|
|
M`,%XH`",#L&,$,&,$L&XJ8*B@LL7AOL(:(7\N#BM#L$M$,'M$L$(Q?OP`B@`
|
|
M:$7\\`,@4,'N#L'0U.X0P=#/SB#0[A+!T,<``````````````)BJ\!>M#L$M
|
|
L$,'=_\'0":T2P=W_PM`!8,K0Z:T.P2T0P9D`PID`!*T2P9D`PYD`!HL2N@
|
|
M3Y$KH%R1*XII>ZBQ*Z!3D2N@8)$KBFE_J+$KH%61*Z!BD2OX.+BE^^;\Q_S8
|
|
L"&B%_3BXI?OF_,?\"&A%_?`!`.;[T-_F_-#;RA"M8!@X&#CFYL;&Q\?GYP#8
|
|
`
|
|
end
|
|
|
|
6510 features
|
|
|
|
o PHP always pushes the Break (B) flag as a `1' to the stack.
|
|
Jukka Tapanimäki claimed in C=lehti issue 3/89, on page 27 that the
|
|
processor makes a logical OR between the status register's bit 4
|
|
and the bit 8 of the stack pointer register (which is always 1).
|
|
He did not give any reasons for this argument, and has refused to clarify
|
|
it afterwards. Well, this was not the only error in his article...
|
|
|
|
o Indirect addressing modes do not handle page boundary crossing at all.
|
|
When the parameter's low byte is $FF, the effective address wraps
|
|
around and the CPU fetches high byte from $xx00 instead of $xx00+$0100.
|
|
E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100,
|
|
and LDA ($FF),Y fetches the base address from $FF and $00.
|
|
|
|
o Indexed zero page addressing modes never fix the page address on
|
|
crossing the zero page boundary.
|
|
E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01.
|
|
|
|
o The processor always fetches the byte following a relative branch
|
|
instruction. If the branch is taken, the processor reads then the
|
|
opcode from the destination address. If page boundary is crossed, it
|
|
first reads a byte from the old page from a location that is bigger
|
|
or smaller than the correct address by one page.
|
|
|
|
o If you cross a page boundary in any other indexed mode,
|
|
the processor reads an incorrect location first, a location that is
|
|
smaller by one page.
|
|
|
|
o Read-Modify-Write instructions write unmodified data, then modified
|
|
(so INC effectively does LDX loc;STX loc;INX;STX loc)
|
|
|
|
o -RDY is ignored during writes
|
|
(This is why you must wait 3 cycles before doing any DMA --
|
|
the maximum number of consecutive writes is 3, which occurs
|
|
during interrupts except -RESET.)
|
|
|
|
o Some undefined opcodes may give really unpredictable results.
|
|
|
|
o All registers except the Program Counter remain unmodified after -RESET.
|
|
(This is why you must preset D and I flags in the RESET handler.)
|
|
|
|
Different CPU types
|
|
|
|
The Rockwell data booklet 29651N52 (technical information about R65C00
|
|
microprocessors, dated October 1984), lists the following differences between
|
|
NMOS R6502 microprocessor and CMOS R65C00 family:
|
|
|
|
1. Indexed addressing across page boundary.
|
|
NMOS: Extra read of invalid address.
|
|
CMOS: Extra read of last instruction byte.
|
|
|
|
2. Execution of invalid op codes.
|
|
NMOS: Some terminate only by reset. Results are undefined.
|
|
CMOS: All are NOPs (reserved for future use).
|
|
|
|
3. Jump indirect, operand = XXFF.
|
|
NMOS: Page address does not increment.
|
|
CMOS: Page address increments and adds one additional cycle.
|
|
|
|
4. Read/modify/write instructions at effective address.
|
|
NMOS: One read and two write cycles.
|
|
CMOS: Two read and one write cycle.
|
|
|
|
5. Decimal flag.
|
|
NMOS: Indeterminate after reset.
|
|
CMOS: Initialized to binary mode (D=0) after reset and interrupts.
|
|
|
|
6. Flags after decimal operation.
|
|
NMOS: Invalid N, V and Z flags.
|
|
CMOS: Valid flag adds one additional cycle.
|
|
|
|
7. Interrupt after fetch of BRK instruction.
|
|
NMOS: Interrupt vector is loaded, BRK vector is ignored.
|
|
CMOS: BRK is executed, then interrupt is executed.
|
|
|
|
6510 Instruction Timing
|
|
|
|
The NMOS 6500 series processors always perform at least two reads
|
|
for each instruction. In addition to the operation code (opcode), they
|
|
fetch the next byte. This is quite efficient, as most instructions are
|
|
two or three bytes long.
|
|
|
|
The processors also use a sort of pipelining. If an instruction does
|
|
not store data in memory on its last cycle, the processor can fetch
|
|
the opcode of the next instruction while executing the last cycle. For
|
|
instance, the instruction EOR #$FF truly takes three cycles. On the
|
|
first cycle, the opcode $49 will be fetched. During the second cycle
|
|
the processor decodes the opcode and fetches the parameter #$FF. On
|
|
the third cycle, the processor will perform the operation and store
|
|
the result to accumulator, but simultaneously it fetches the opcode
|
|
for the next instruction. This is why the instruction effectively
|
|
takes only two cycles.
|
|
|
|
The following tables show what happens on the bus while executing
|
|
different kinds of instructions.
|
|
|
|
Interrupts
|
|
|
|
NMI and IRQ both take 7 cycles. Their timing diagram is much like
|
|
BRK's (see below). IRQ will be executed only when the I flag is
|
|
clear. IRQ and BRK both set the I flag, whereas the NMI does not
|
|
affect its state.
|
|
|
|
The processor will usually wait for the current instruction to
|
|
complete before executing the interrupt sequence. To process the
|
|
interrupt before the next instruction, the interrupt must occur
|
|
before the last cycle of the current instruction.
|
|
|
|
There is one exception to this rule: the BRK instruction. If a
|
|
hardware interrupt (NMI or IRQ) occurs before the fourth (flags
|
|
saving) cycle of BRK, the BRK instruction will be skipped, and
|
|
the processor will jump to the hardware interrupt vector. This
|
|
sequence will always take 7 cycles.
|
|
|
|
You do not completely lose the BRK interrupt, the B flag will be
|
|
set in the pushed status register if a BRK instruction gets
|
|
interrupted. When BRK and IRQ occur at the same time, this does
|
|
not cause any problems, as your program will consider it as a
|
|
BRK, and the IRQ would occur again after the processor returned
|
|
from your BRK routine, unless you cleared the interrupt source in
|
|
your BRK handler. But the simultaneous occurrence of NMI and BRK
|
|
is far more fatal. If you do not check the B flag in the NMI
|
|
routine and subtract two from the return address when needed, the
|
|
BRK instruction will be skipped.
|
|
|
|
If the NMI and IRQ interrupts overlap each other (one interrupt
|
|
occurs before fetching the interrupt vector for the other
|
|
interrupt), the processor will most probably jump to the NMI
|
|
vector in every case, and then jump to the IRQ vector after
|
|
processing the first instruction of the NMI handler. This has not
|
|
been measured yet, but the IRQ is very similar to BRK, and many
|
|
sources state that the NMI has higher priority than IRQ. However,
|
|
it might be that the processor takes the interrupt that comes
|
|
later, i.e. you could lose an NMI interrupt if an IRQ occurred in
|
|
four cycles after it.
|
|
|
|
After finishing the interrupt sequence, the processor will start
|
|
to execute the first instruction of the interrupt routine. This
|
|
proves that the processor uses a sort of pipelining: it finishes
|
|
the current instruction (or interrupt sequence) while reading the
|
|
opcode of the next instruction.
|
|
|
|
RESET does not push program counter on stack, and it lasts
|
|
probably 6 cycles after deactivating the signal. Like NMI, RESET
|
|
preserves all registers except PC.
|
|
|
|
Instructions accessing the stack
|
|
|
|
BRK
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away),
|
|
increment PC
|
|
3 $0100,S W push PCH on stack, decrement S
|
|
4 $0100,S W push PCL on stack, decrement S
|
|
5 $0100,S W push P on stack (with B flag set), decrement S
|
|
6 $FFFE R fetch PCL
|
|
7 $FFFF R fetch PCH
|
|
|
|
RTI
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away)
|
|
3 $0100,S R increment S
|
|
4 $0100,S R pull P from stack, increment S
|
|
5 $0100,S R pull PCL from stack, increment S
|
|
6 $0100,S R pull PCH from stack
|
|
|
|
RTS
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away)
|
|
3 $0100,S R increment S
|
|
4 $0100,S R pull PCL from stack, increment S
|
|
5 $0100,S R pull PCH from stack
|
|
6 PC R increment PC
|
|
|
|
PHA, PHP
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away)
|
|
3 $0100,S W push register on stack, decrement S
|
|
|
|
PLA, PLP
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away)
|
|
3 $0100,S R increment S
|
|
4 $0100,S R pull register from stack
|
|
|
|
JSR
|
|
|
|
# address R/W description
|
|
--- ------- --- -------------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low address byte, increment PC
|
|
3 $0100,S R internal operation (predecrement S?)
|
|
4 $0100,S W push PCH on stack, decrement S
|
|
5 $0100,S W push PCL on stack, decrement S
|
|
6 PC R copy low address byte to PCL, fetch high address
|
|
byte to PCH
|
|
|
|
Accumulator or implied addressing
|
|
|
|
# address R/W description
|
|
--- ------- --- -----------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R read next instruction byte (and throw it away)
|
|
|
|
Immediate addressing
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch value, increment PC
|
|
|
|
Absolute addressing
|
|
|
|
JMP
|
|
|
|
# address R/W description
|
|
--- ------- --- -------------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low address byte, increment PC
|
|
3 PC R copy low address byte to PCL, fetch high address
|
|
byte to PCH
|
|
|
|
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
|
|
LAX, NOP)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address, increment PC
|
|
4 address R read from effective address
|
|
|
|
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
|
|
SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address, increment PC
|
|
4 address R read from effective address
|
|
5 address W write the value back to effective address,
|
|
and do the operation on it
|
|
6 address W write the new value to effective address
|
|
|
|
Write instructions (STA, STX, STY, SAX)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address, increment PC
|
|
4 address W write register to effective address
|
|
|
|
Zero page addressing
|
|
|
|
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
|
|
LAX, NOP)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address R read from effective address
|
|
|
|
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
|
|
SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address R read from effective address
|
|
4 address W write the value back to effective address,
|
|
and do the operation on it
|
|
5 address W write the new value to effective address
|
|
|
|
Write instructions (STA, STX, STY, SAX)
|
|
|
|
# address R/W description
|
|
--- ------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address W write register to effective address
|
|
|
|
Zero page indexed addressing
|
|
|
|
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
|
|
LAX, NOP)
|
|
|
|
# address R/W description
|
|
--- --------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address R read from address, add index register to it
|
|
4 address+I* R read from effective address
|
|
|
|
Notes: I denotes either index register (X or Y).
|
|
|
|
* The high byte of the effective address is always zero,
|
|
i.e. page boundary crossings are not handled.
|
|
|
|
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
|
|
SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- --------- --- ---------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address R read from address, add index register X to it
|
|
4 address+X* R read from effective address
|
|
5 address+X* W write the value back to effective address,
|
|
and do the operation on it
|
|
6 address+X* W write the new value to effective address
|
|
|
|
Note: * The high byte of the effective address is always zero,
|
|
i.e. page boundary crossings are not handled.
|
|
|
|
Write instructions (STA, STX, STY, SAX)
|
|
|
|
# address R/W description
|
|
--- --------- --- -------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch address, increment PC
|
|
3 address R read from address, add index register to it
|
|
4 address+I* W write to effective address
|
|
|
|
Notes: I denotes either index register (X or Y).
|
|
|
|
* The high byte of the effective address is always zero,
|
|
i.e. page boundary crossings are not handled.
|
|
|
|
Absolute indexed addressing
|
|
|
|
Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
|
|
LAX, LAE, SHS, NOP)
|
|
|
|
# address R/W description
|
|
--- --------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address,
|
|
add index register to low address byte,
|
|
increment PC
|
|
4 address+I* R read from effective address,
|
|
fix the high byte of effective address
|
|
5+ address+I R re-read from effective address
|
|
|
|
Notes: I denotes either index register (X or Y).
|
|
|
|
* The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100.
|
|
|
|
+ This cycle will be executed only if the effective address
|
|
was invalid during cycle #4, i.e. page boundary was crossed.
|
|
|
|
Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC,
|
|
SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- --------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address,
|
|
add index register X to low address byte,
|
|
increment PC
|
|
4 address+X* R read from effective address,
|
|
fix the high byte of effective address
|
|
5 address+X R re-read from effective address
|
|
6 address+X W write the value back to effective address,
|
|
and do the operation on it
|
|
7 address+X W write the new value to effective address
|
|
|
|
Notes: * The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100.
|
|
|
|
Write instructions (STA, STX, STY, SHA, SHX, SHY)
|
|
|
|
# address R/W description
|
|
--- --------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch low byte of address, increment PC
|
|
3 PC R fetch high byte of address,
|
|
add index register to low address byte,
|
|
increment PC
|
|
4 address+I* R read from effective address,
|
|
fix the high byte of effective address
|
|
5 address+I W write to effective address
|
|
|
|
Notes: I denotes either index register (X or Y).
|
|
|
|
* The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100. Because
|
|
the processor cannot undo a write to an invalid
|
|
address, it always reads from the address first.
|
|
|
|
Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS)
|
|
|
|
# address R/W description
|
|
--- --------- --- ---------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch operand, increment PC
|
|
3 PC R Fetch opcode of next instruction,
|
|
If branch is taken, add operand to PCL.
|
|
Otherwise increment PC.
|
|
4+ PC* R Fetch opcode of next instruction.
|
|
Fix PCH. If it did not change, increment PC.
|
|
5! PC R Fetch opcode of next instruction,
|
|
increment PC.
|
|
|
|
Notes: The opcode fetch of the next instruction is included to
|
|
this diagram for illustration purposes. When determining
|
|
real execution times, remember to subtract the last
|
|
cycle.
|
|
|
|
* The high byte of Program Counter (PCH) may be invalid
|
|
at this time, i.e. it may be smaller or bigger by $100.
|
|
|
|
+ If branch is taken, this cycle will be executed.
|
|
|
|
! If branch occurs to different page, this cycle will be
|
|
executed.
|
|
|
|
Indexed indirect addressing
|
|
|
|
Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R read from the address, add X to it
|
|
4 pointer+X R fetch effective address low
|
|
5 pointer+X+1 R fetch effective address high
|
|
6 address R read from effective address
|
|
|
|
Note: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R read from the address, add X to it
|
|
4 pointer+X R fetch effective address low
|
|
5 pointer+X+1 R fetch effective address high
|
|
6 address R read from effective address
|
|
7 address W write the value back to effective address,
|
|
and do the operation on it
|
|
8 address W write the new value to effective address
|
|
|
|
Note: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
Write instructions (STA, SAX)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R read from the address, add X to it
|
|
4 pointer+X R fetch effective address low
|
|
5 pointer+X+1 R fetch effective address high
|
|
6 address W write to effective address
|
|
|
|
Note: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
Indirect indexed addressing
|
|
|
|
Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R fetch effective address low
|
|
4 pointer+1 R fetch effective address high,
|
|
add Y to low byte of effective address
|
|
5 address+Y* R read from effective address,
|
|
fix high byte of effective address
|
|
6+ address+Y R read from effective address
|
|
|
|
Notes: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
* The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100.
|
|
|
|
+ This cycle will be executed only if the effective address
|
|
was invalid during cycle #5, i.e. page boundary was crossed.
|
|
|
|
Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R fetch effective address low
|
|
4 pointer+1 R fetch effective address high,
|
|
add Y to low byte of effective address
|
|
5 address+Y* R read from effective address,
|
|
fix high byte of effective address
|
|
6 address+Y R read from effective address
|
|
7 address+Y W write the value back to effective address,
|
|
and do the operation on it
|
|
8 address+Y W write the new value to effective address
|
|
|
|
Notes: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
* The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100.
|
|
|
|
Write instructions (STA, SHA)
|
|
|
|
# address R/W description
|
|
--- ----------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address, increment PC
|
|
3 pointer R fetch effective address low
|
|
4 pointer+1 R fetch effective address high,
|
|
add Y to low byte of effective address
|
|
5 address+Y* R read from effective address,
|
|
fix high byte of effective address
|
|
6 address+Y W write to effective address
|
|
|
|
Notes: The effective address is always fetched from zero page,
|
|
i.e. the zero page boundary crossing is not handled.
|
|
|
|
* The high byte of the effective address may be invalid
|
|
at this time, i.e. it may be smaller by $100.
|
|
|
|
Absolute indirect addressing (JMP)
|
|
|
|
# address R/W description
|
|
--- --------- --- ------------------------------------------
|
|
1 PC R fetch opcode, increment PC
|
|
2 PC R fetch pointer address low, increment PC
|
|
3 PC R fetch pointer address high, increment PC
|
|
4 pointer R fetch low address to latch
|
|
5 pointer+1* R fetch PCH, copy latch to PCL
|
|
|
|
Note: * The PCH will always be fetched from the same page
|
|
than PCL, i.e. page boundary crossing is not handled.
|
|
|
|
How Real Programmers Acknowledge Interrupts
|
|
|
|
With RMW instructions:
|
|
|
|
; beginning of combined raster/timer interrupt routine
|
|
LSR $D019 ; clear VIC interrupts, read raster interrupt flag to C
|
|
BCS raster ; jump if VIC caused an interrupt
|
|
... ; timer interrupt routine
|
|
|
|
Operational diagram of LSR $D019:
|
|
|
|
# data address R/W
|
|
--- ---- ------- --- ---------------------------------
|
|
1 4E PC R fetch opcode
|
|
2 19 PC+1 R fetch address low
|
|
3 D0 PC+2 R fetch address high
|
|
4 xx $D019 R read memory
|
|
5 xx $D019 W write the value back, rotate right
|
|
6 xx/2 $D019 W write the new value back
|
|
|
|
The 5th cycle acknowledges the interrupt by writing the same
|
|
value back. If only raster interrupts are used, the 6th cycle
|
|
has no effect on the VIC. (It might acknowledge also some
|
|
other interrupts.)
|
|
|
|
With indexed addressing:
|
|
|
|
; acknowledge interrupts to both CIAs
|
|
LDX #$10
|
|
LDA $DCFD,X
|
|
|
|
Operational diagram of LDA $DCFD,X:
|
|
|
|
# data address R/W description
|
|
--- ---- ------- --- ---------------------------------
|
|
1 BD PC R fetch opcode
|
|
2 FD PC+1 R fetch address low
|
|
3 DC PC+2 R fetch address high, add X to address low
|
|
4 xx $DC0D R read from address, fix high byte of address
|
|
5 yy $DD0D R read from right address
|
|
|
|
; acknowledge interrupts to CIA 2
|
|
LDX #$10
|
|
STA $DDFD,X
|
|
|
|
Operational diagram of STA $DDFD,X:
|
|
|
|
# data address R/W description
|
|
--- ---- ------- --- ---------------------------------
|
|
1 9D PC R fetch opcode
|
|
2 FD PC+1 R fetch address low
|
|
3 DC PC+2 R fetch address high, add X to address low
|
|
4 xx $DD0D R read from address, fix high byte of address
|
|
5 ac $DE0D W write to right address
|
|
|
|
With branch instructions:
|
|
|
|
; acknowledge interrupts to CIA 2
|
|
LDA #$00 ; clear N flag
|
|
JMP $DD0A
|
|
DD0A BPL $DC9D ; branch
|
|
DC9D BRK ; return
|
|
|
|
You need the following preparations to initialize the CIA registers:
|
|
|
|
LDA #$91 ; argument of BPL
|
|
STA $DD0B
|
|
LDA #$10 ; BPL
|
|
STA $DD0A
|
|
STA $DD08 ; load the ToD values from the latches
|
|
LDA $DD0B ; freeze the ToD display
|
|
LDA #$7F
|
|
STA $DC0D ; assure that $DC0D is $00
|
|
|
|
Operational diagram of BPL $DC9D:
|
|
|
|
# data address R/W description
|
|
--- ---- ------- --- ---------------------------------
|
|
1 10 $DD0A R fetch opcode
|
|
2 91 $DD0B R fetch argument
|
|
3 xx $DD0C R fetch opcode, add argument to PCL
|
|
4 yy $DD9D R fetch opcode, fix PCH
|
|
( 5 00 $DC9D R fetch opcode )
|
|
|
|
; acknowledge interrupts to CIA 1
|
|
LSR ; clear N flag
|
|
JMP $DCFA
|
|
DCFA BPL $DD0D
|
|
DD0D BRK
|
|
|
|
; Again you need to set the ToD registers of CIA 1 and the
|
|
; Interrupt Control Register of CIA 2 first.
|
|
|
|
Operational diagram of BPL $DD0D:
|
|
|
|
# data address R/W description
|
|
--- ---- ------- --- ---------------------------------
|
|
1 10 $DCFA R fetch opcode
|
|
2 11 $DCFB R fetch argument
|
|
3 xx $DCFC R fetch opcode, add argument to PCL
|
|
4 yy $DC0D R fetch opcode, fix PCH
|
|
( 5 00 $DD0D R fetch opcode )
|
|
|
|
; acknowledge interrupts to CIA 2 automagically
|
|
; preparations
|
|
LDA #$7F
|
|
STA $DD0D ; disable all interrupt sources of CIA2
|
|
LDA $DD0E
|
|
AND #$BE ; ensure that $DD0C remains constant
|
|
STA $DD0E ; and stop the timer
|
|
LDA #$FD
|
|
STA $DD0C ; parameter of BPL
|
|
LDA #$10
|
|
STA $DD0B ; BPL
|
|
LDA #$40
|
|
STA $DD0A ; RTI/parameter of LSR
|
|
LDA #$46
|
|
STA $DD09 ; LSR
|
|
STA $DD08 ; load the ToD values from the latches
|
|
LDA $DD0B ; freeze the ToD display
|
|
LDA #$09
|
|
STA $0318
|
|
LDA #$DD
|
|
STA $0319 ; change NMI vector to $DD09
|
|
LDA #$FF ; Try changing this instruction's operand
|
|
STA $DD05 ; (see comment below).
|
|
LDA #$FF
|
|
STA $DD04 ; set interrupt frequency to 1/65536 cycles
|
|
LDA $DD0E
|
|
AND #$80
|
|
ORA #$11
|
|
LDX #$81
|
|
STX $DD0D ; enable timer interrupt
|
|
STA $DD0E ; start timer
|
|
|
|
LDA #$00 ; To see that the interrupts really occur,
|
|
STA $D011 ; use something like this and see how
|
|
LOOP DEC $D020 ; changing the byte loaded to $DD05 from
|
|
BNE LOOP ; #$FF to #$0F changes the image.
|
|
|
|
When an NMI occurs, the processor jumps to Kernal code, which jumps to
|
|
($0318), which points to the following routine:
|
|
|
|
DD09 LSR $40 ; clear N flag
|
|
BPL $DD0A ; Note: $DD0A contains RTI.
|
|
|
|
Operational diagram of BPL $DD0A:
|
|
|
|
# data address R/W description
|
|
--- ---- ------- --- ---------------------------------
|
|
1 10 $DD0B R fetch opcode
|
|
2 11 $DD0C R fetch argument
|
|
3 xx $DD0D R fetch opcode, add argument to PCL
|
|
4 40 $DD0A R fetch opcode, (fix PCH)
|
|
|
|
With RTI:
|
|
|
|
; the fastest possible interrupt handler in the 6500 family
|
|
; preparations
|
|
SEI
|
|
LDA $01 ; disable ROM and enable I/O
|
|
AND #$FD
|
|
ORA #$05
|
|
STA $01
|
|
LDA #$7F
|
|
STA $DD0D ; disable CIA 2's all interrupt sources
|
|
LDA $DD0E
|
|
AND #$BE ; ensure that $DD0C remains constant
|
|
STA $DD0E ; and stop the timer
|
|
LDA #$40
|
|
STA $DD0C ; store RTI to $DD0C
|
|
LDA #$0C
|
|
STA $FFFA
|
|
LDA #$DD
|
|
STA $FFFB ; change NMI vector to $DD0C
|
|
LDA #$FF ; Try changing this instruction's operand
|
|
STA $DD05 ; (see comment below).
|
|
LDA #$FF
|
|
STA $DD04 ; set interrupt frequency to 1/65536 cycles
|
|
LDA $DD0E
|
|
AND #$80
|
|
ORA #$11
|
|
LDX #$81
|
|
STX $DD0D ; enable timer interrupt
|
|
STA $DD0E ; start timer
|
|
|
|
LDA #$00 ; To see that the interrupts really occur,
|
|
STA $D011 ; use something like this and see how
|
|
LOOP DEC $D020 ; changing the byte loaded to $DD05 from
|
|
BNE LOOP ; #$FF to #$0F changes the image.
|
|
|
|
When an NMI occurs, the processor jumps to Kernal code, which
|
|
jumps to ($0318), which points to the following routine:
|
|
|
|
DD0C RTI
|
|
|
|
How on earth can this clear the interrupts? Remember, the
|
|
processor always fetches two successive bytes for each
|
|
instruction.
|
|
|
|
A little more practical version of this is redirecting the NMI
|
|
(or IRQ) to your own routine, whose last instruction is JMP
|
|
$DD0C or JMP $DC0C. If you want to confuse more, change the 0
|
|
in the address to a hexadecimal digit different from the one
|
|
you used when writing the RTI.
|
|
|
|
Or you can combine the latter two methods:
|
|
|
|
DD09 LSR $xx ; xx is any appropriate BCD value 00-59.
|
|
BPL $DCFC
|
|
DCFC RTI
|
|
|
|
This example acknowledges interrupts to both CIAs.
|
|
|
|
If you want to confuse the examiners of your code, you can use any
|
|
of these techniques. Although these examples use no undefined opcodes,
|
|
they do not necessarily run correctly on CMOS processors. However, the
|
|
RTI example should run on 65C02 and 65C816, and the latter branch
|
|
instruction example might work as well.
|
|
|
|
The RMW instruction method has been used in some demos, others were
|
|
developed by Marko Mäkelä. His favourite is the automagical RTI
|
|
method, although it does not have any practical applications, except
|
|
for some time dependent data decryption routines for very complicated
|
|
copy protections.
|
|
|
|
Memory Management
|
|
|
|
The processor's point of view
|
|
|
|
The Commodore 64 has access to more memory than its processor can
|
|
directly handle. This is possible by banking the memory. There are
|
|
five user configurable inputs that affect the banking. Three of them
|
|
can be controlled by program, and the rest two serve as control lines
|
|
on the memory expansion port.
|
|
|
|
The 6510 MPU has an integrated I/O port with six I/O lines. This
|
|
port is accessed through the memory locations 0 and 1. The location 0
|
|
is the Data Direction Register for the Peripheral data Register, which
|
|
is mapped to the other location. When a bit in the DDR is set, the
|
|
corresponding PR bit controls the state of a corresponding Peripheral
|
|
line as an output. When it is clear, the state of the Peripheral line
|
|
is reflected by the Peripheral register. The Peripheral lines are
|
|
numbered from 0 to 5, and they are mapped to the DDR and PR bits 0 - 5,
|
|
respectively. The 8502 processor, which is used in the Commodore 128,
|
|
has seven Peripheral lines in its I/O port. The pin P6 is connected to
|
|
the ASC/CC key (Caps lock in English versions).
|
|
|
|
The I/O lines have the following functions:
|
|
|
|
Direction Line Function
|
|
--------- ---- --------
|
|
out P5 Cassette motor control. (0 = motor spins)
|
|
in P4 Cassette sense. (0 = PLAY button depressed)
|
|
out P3 Cassette write data.
|
|
out P2 CHAREN
|
|
out P1 HIRAM
|
|
out P0 LORAM
|
|
|
|
The default value of the DDR register is $2F, so all lines except
|
|
Cassette sense are outputs. The default PR value is $37 (Datassette
|
|
motor stopped, and all three memory management lines high).
|
|
If you turn any memory management line to input, the external pull-up
|
|
resistors make it to look like it is outputting logical "1". This
|
|
is actually why the computer always switches the ROMs in upon startup:
|
|
Pulling the -RESET line low resets all Peripheral lines to inputs,
|
|
thus setting all three processor-driven memory management lines to
|
|
logical "1" level.
|
|
|
|
The two remaining memory management lines are -EXROM and -GAME on
|
|
the cartridge port. Each line has a pull-up resistor, so the lines
|
|
are "1" by default.
|
|
|
|
Even though the memory banking has been implemented with a 82S100
|
|
Programmable _Logic_ Array, there is only one control line that seems
|
|
to behave logically at first sight, the -CHAREN line. It is mostly
|
|
used to choose between I/O address space and the character generator
|
|
ROM. The following memory map introduces the oddities of -CHAREN and
|
|
the other memory management lines. It is based on the memory maps in
|
|
the Commodore 64 Programmer's Reference Guide, pp. 263 - 267, and some
|
|
errors and inaccuracies have been corrected.
|
|
|
|
The leftmost column of the table contains addresses in hexadecimal
|
|
notation. The columns aside it introduce all possible memory
|
|
configurations. The default mode is on the left, and the absolutely
|
|
most rarely used Ultimax game console configuration is on the right.
|
|
(Has anybody ever seen any Ultimax games?) Each memory configuration
|
|
column has one or more four-digit binary numbers as a title. The bits,
|
|
from left to right, represent the state of the -LORAM, -HIRAM, -GAME
|
|
and -EXROM lines, respectively. The bits whose state does not matter
|
|
are marked with "x". For instance, when the Ultimax video game
|
|
configuration is active (the -GAME line is shorted to ground), the
|
|
-LORAM and -HIRAM lines have no effect.
|
|
|
|
default 001x Ultimax
|
|
1111 101x 1000 011x 00x0 1110 0100 1100 xx01
|
|
10000
|
|
----------------------------------------------------------------------
|
|
F000
|
|
Kernal RAM RAM Kernal RAM Kernal Kernal Kernal ROMH(*
|
|
E000
|
|
----------------------------------------------------------------------
|
|
D000 IO/C IO/C IO/RAM IO/C RAM IO/C IO/C IO/C I/O
|
|
----------------------------------------------------------------------
|
|
C000 RAM RAM RAM RAM RAM RAM RAM RAM -
|
|
----------------------------------------------------------------------
|
|
B000
|
|
BASIC RAM RAM RAM RAM BASIC ROMH ROMH -
|
|
A000
|
|
----------------------------------------------------------------------
|
|
9000
|
|
RAM RAM RAM RAM RAM ROML RAM ROML ROML(*
|
|
8000
|
|
----------------------------------------------------------------------
|
|
7000
|
|
|
|
6000
|
|
RAM RAM RAM RAM RAM RAM RAM RAM -
|
|
5000
|
|
|
|
4000
|
|
----------------------------------------------------------------------
|
|
3000
|
|
|
|
2000 RAM RAM RAM RAM RAM RAM RAM RAM -
|
|
|
|
1000
|
|
----------------------------------------------------------------------
|
|
0000 RAM RAM RAM RAM RAM RAM RAM RAM RAM
|
|
----------------------------------------------------------------------
|
|
|
|
*) Internal memory does not respond to write accesses to these
|
|
areas.
|
|
|
|
Legend: Kernal E000-FFFF Kernal ROM.
|
|
|
|
IO/C D000-DFFF I/O address space or Character
|
|
generator ROM, selected by
|
|
-CHAREN. If the CHAREN bit is
|
|
clear, the character generator
|
|
ROM will be selected. If it is
|
|
set, the I/O chips are
|
|
accessible.
|
|
|
|
IO/RAM D000-DFFF I/O address space or RAM,
|
|
selected by -CHAREN. If the
|
|
CHAREN bit is clear, the
|
|
character generator ROM will
|
|
be selected. If it is set, the
|
|
internal RAM is accessible.
|
|
|
|
I/O D000-DFFF I/O address space.
|
|
The -CHAREN line has no effect.
|
|
|
|
BASIC A000-BFFF BASIC ROM.
|
|
|
|
ROMH A000-BFFF or External ROM with the -ROMH line
|
|
E000-FFFF connected to its -CS line.
|
|
|
|
ROML 8000-9FFF External ROM with the -ROML line
|
|
connected to its -CS line.
|
|
|
|
RAM various ranges Commodore 64's internal RAM.
|
|
|
|
- 1000-7FFF and Open address space.
|
|
A000-CFFF The Commodore 64's memory chips
|
|
do not detect any memory accesses
|
|
to this area except the VIC-II's
|
|
DMA and memory refreshes.
|
|
|
|
NOTE: Whenever the processor tries to write to any ROM area
|
|
(Kernal, BASIC, CHAROM, ROML, ROMH), the data will get
|
|
"through the ROM" to the C64's internal RAM.
|
|
|
|
For this reason, you can easily copy data from ROM to RAM,
|
|
without any bank switching. But implementing external
|
|
memory expansions without DMA is very hard, as you have to
|
|
use a 256 byte window on the I/O1 or I/O2 area, like
|
|
GEORAM, or the Ultimax memory configuration, if you do not
|
|
want the data to be written both to internal and external
|
|
RAM.
|
|
|
|
However, this is not true for the Ultimax video game
|
|
configuration. In that mode, the internal RAM ignores all
|
|
memory accesses outside the area $0000-$0FFF, unless they
|
|
are performed by the VIC, and you can write to external
|
|
memory at $1000-$CFFF and $E000-$FFFF, if any, without
|
|
changing the contents of the internal RAM.
|
|
|
|
A note concerning the I/O area
|
|
|
|
The I/O area of the Commodore 64 is divided as follows:
|
|
|
|
Address range Owner
|
|
------------- -----
|
|
D000-D3FF MOS 6567/6569 VIC-II Video Interface Controller
|
|
D400-D7FF MOS 6581 SID Sound Interface Device
|
|
D800-DBFF Color RAM (only lower nybbles are connected)
|
|
DC00-DCFF MOS 6526 CIA Complex Interface Adapter #1
|
|
DD00-DDFF MOS 6526 CIA Complex Interface Adapter #2
|
|
DE00-DEFF User expansion #1 (-I/O1 on Expansion Port)
|
|
DF00-DFFF User expansion #2 (-I/O2 on Expansion Port)
|
|
|
|
As you can see, the address ranges for the chips are much larger
|
|
than required. Because of this, you can access the chips through
|
|
multiple memory areas. The VIC-II appears in its window every $40
|
|
addresses. For instance, the addresses $D040 and $D080 are both mapped
|
|
to the Sprite 0 X co-ordinate register. The SID has one register
|
|
selection line less, thus it appears at every $20 bytes. The CIA chips
|
|
have only 16 registers, so there are 16 copies of each in their memory
|
|
area.
|
|
|
|
However, you should not use other addresses than those specified by
|
|
Commodore. For instance, the Commodore 128 mapped its additional I/O
|
|
chips to this same memory area, and the SID responds only to the
|
|
addresses D400-D4FF, also when in C64 mode. And the Commodore 65, or
|
|
the C64DX, which unfortunately did not make its way to the market,
|
|
could narrow the memory window reserved for its CSG 4567 VIC-III.
|
|
|
|
The video chip
|
|
|
|
The MOS 6567/6569 VIC-II Video Interface Controller has access to
|
|
only 16 kilobytes at a time. To enable the VIC-II to access the whole
|
|
64 kB memory space, the main memory is divided to four banks of 16 kB
|
|
each. The lines PA0 and PA1 of the second CIA are the inverse of the
|
|
virtual VIC-II address lines VA14 and VA15, respectively. To select a
|
|
VIC-II bank other than the default, you must program the CIA lines to
|
|
output the desired bit pair. For instance, the following code selects
|
|
the memory area $4000-$7FFF (bank 1) for the video controller:
|
|
|
|
LDA $DD02 ; Data Direction Register A
|
|
ORA #$03 ; Set pins PA0 and PA1 to outputs
|
|
STA $DD02
|
|
LDA $DD00
|
|
AND #$FC ; Mask the lowmost bit pair off
|
|
ORA #$02 ; Select VIC-II bank 1 (the inverse of binary 01 is 10)
|
|
STA $DD00
|
|
|
|
Why should you set the pins to outputs? Hardware RESET resets all
|
|
I/O lines to inputs, and thanks to the CIA's internal pull-up
|
|
resistors, the inputs actually output logical high voltage level. So,
|
|
upon -RESET, the video bank 0 is selected automatically, and older
|
|
Kernals could leave it uninitialized.
|
|
|
|
Note that the VIC-II always fetches its information from the
|
|
internal RAM, totally ignoring the memory configuration lines. There
|
|
is only one exception to this rule: The character generator ROM.
|
|
Unless the Ultimax mode is selected, VIC-II "sees" character generator
|
|
ROM in the memory areas 1000-1FFF and 9000-9FFF. If the Ultimax
|
|
configuration is active, the VIC-II fetches all data from the internal
|
|
RAM.
|
|
|
|
Accessing the memory places 0 and 1
|
|
|
|
Although the addresses 0 and 1 of the processor are hard-wired to
|
|
its on-chip I/O port registers, you can access the memory places 0 and
|
|
1. The video chip always reads from RAM (or character generator ROM),
|
|
so you can use it to read also from 0 and 1. Enable the bit-map screen
|
|
and set the start address of the graphics screen to 0. Now you can see
|
|
these two memory locations in the upper left corner. Alternatively,
|
|
you could set the character generator start address to 0, in which
|
|
case you would see these locations in @ characters (code 0). Or, you
|
|
can activate a sprite with start address 0. Whichever method you
|
|
choose, you can read these locations with sprite collision registers.
|
|
Define a sprite consisting of only one dot, and move it to read the 8
|
|
bits of each byte with the sprite to sprite or sprite to background
|
|
collision registers.
|
|
|
|
But how can you write to these locations? If you execute the command
|
|
POKE 53265,59, you will see that the memory place 1 changes its value
|
|
wildly. If you disable the interrupts (POKE53664,127), it will remain
|
|
stable. How is this possible? When the processor writes to 0 or 1, it
|
|
will put the address on the address bus and set the R/-W line to indicate
|
|
a write cycle, but it does not put the data on the data bus. Thus, it
|
|
writes "random" data. Of course this data is not truly random. Actually
|
|
it is something that the video chip left on the bus on its clock half.
|
|
So, if you want to write a certain value on 0 or 1, you have to make the
|
|
video chip to read that value just before the store cycle. This requires
|
|
very accurate timing, but it can be achieved even with a carefully
|
|
written BASIC program. Just wait the video chip to be in the top or
|
|
bottom border and the beam to be in the middle of the screen (not in the
|
|
side borders). At this area, the video chip will always read the last
|
|
byte of the video bank (by default $3FFF). Now, if you store anything to
|
|
the I/O port registers 0 or 1 while the video chip is refreshing this
|
|
screen area, the contents of the memory place $3FFF will be written to
|
|
the respective memory place (0 or 1). Note that this trick does not work
|
|
reliably on all computers. You need good RF protection, as the data bus
|
|
will not be driven at all when the value remains on it.
|
|
|
|
On the C128 in its 2 MHz mode, you can write to the memory places
|
|
with an easier kludge. Just make sure that the video chip is not
|
|
performing the memory refresh (as it would slow down to 1 MHz in that
|
|
case), and use some instruction that reads from a proper memory location
|
|
before writing to 0 or 1. Indexed zero-page addressing modes are good
|
|
for it. I tested this trick with LDX#1 followed by STA $FF,X. As you
|
|
can read from the instruction timing section of this document, the
|
|
instruction first reads from $FF (the base address) and then writes to 0.
|
|
The timing can be done with a simple LDA$D012:CMP$D012:BEQ *-3 loop.
|
|
But in the C128 mode you can relocate the stack page to zero page, so
|
|
this trick is not really useful.
|
|
|
|
You can also read the memory places 0 and 1 much faster than with
|
|
sprite collisions. Just make the video chip to read from 0 or 1, and
|
|
then read from non-connected address space ($DE00-$DFFF on a stock C64;
|
|
also $D700-$D7FF on C128's). Actually, you can produce a complete map
|
|
of the video timing on your computer by making a loop that reads from
|
|
open address space, pausing one frame and one cycle in between. And if
|
|
you are into copy protections, you could write a program on the open
|
|
address space. Just remember that there must be a byte on the bus for
|
|
each clock cycle.
|
|
|
|
These tricks unfortunately do not work reliably on all units. So far
|
|
I have had the opportunity to try it on three computers, two of which
|
|
were Commodore 128 DCR's (C128's housed in metal case with a 1571 floppy
|
|
disk drive, whose controller is integrated on the mother board). One
|
|
C128DCR drove some of its data bits too heavily to high state. No wonder,
|
|
since its housing consisted of some newspapers spread on the floor.
|
|
|
|
Autostart Code
|
|
|
|
Although this document concentrates on hardware, there is one thing
|
|
that you must know about the firmware to get complete control over
|
|
your computer. As the Commodore 64 always switches the ROMs on upon
|
|
-RESET, you cannot relocate the RESET vector by writing something in
|
|
RAM. Instead, you have to use the Autostart code that will be
|
|
recognized by the KERNAL ROM. If the memory places from $8004 through
|
|
$8008 contain the PETSCII string 'CBM80' (C3 C2 CD 38 30), the RESET
|
|
routine jumps to ($8000) and the default NMI handler jumps to ($8002).
|
|
|
|
Some programs that load into RAM take advantage of this and don't
|
|
let the machine to be reset. You don't have to modify the ROM to get
|
|
rid of this annoying behaviour. Simply ground the -EXROM line for the
|
|
beginning of the RESET sequence.
|
|
|
|
Notes
|
|
|
|
See the MCS 6500 Microcomputer Family Programming Manual for less
|
|
information.
|
|
|
|
References:
|
|
C64 Memory Maps C64 Programmer's Reference Guide, pp. 262-267
|
|
C64 Schematic Diagram
|
|
6510 Block Diagram C64 Programmer's Reference Guide, p. 404
|
|
Instruction Set C64 Programmer's Reference Guide, pp. 254-255, 416-417
|
|
C64/128 Real Programmer's Revenge Guide
|
|
C=Lehti magazine 4/87
|
|
|
|
---------------------------------------------------------------------------
|
|
Marko Mäkelä (Marko.Makela@HUT.FI)
|
|
|