lovebyte: long-form README

This commit is contained in:
Vince Weaver 2023-02-10 23:33:26 -05:00
parent 1045460437
commit e597725d15
1 changed files with 331 additions and 104 deletions

View File

@ -1,131 +1,358 @@
tiny_hgr8
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tiny_hgr8
an 8-byte hi-res Apple II demo
8-byte hi-res Apple II demo by Deater / dSr
by Deater / dSr
Lovebyte 2023
Lovebyte 2023
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
I really wanted a hi-res 8-byte demo but that is trickier than you can think.
TLDR: I wrote an Apple II graphics
demo that's only 8 bytes of
6502 assembly language
On Apple II/6502 to enable graphics you need three bytes, either
LINK
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
I really wanted to make a hi-res 8-byte
demo but that is trickier than you
might think.
=== THE CHALLENGE ===
The Apple II has a 6502 processor in it.
To enable hi-res graphics you need three
bytes, typically a jump to the HGR
routine in the Applesoft BASIC ROM:
JSR HGR
which takes 3 bytes to jump to the ROM and enable graphics, clear the screen,
and set which PAGE is being viewed.
You can also try setting the graphics "soft switches" yourself, something like
BIT $C050
which is also 3-bytes, but to get hi-res you need to also set the hi-res
switch so too many bytes.
The HGR routine will flip the proper
soft-switches to enable graphics mode,
enable split graphics/text mode, select
viewing the 8K of graphics info in PAGE1
and then clear the screen to black.
(The nearby HGR2 call is similar but
makes the graphics full-screen and
uses PAGE2 instead).
Once you set hi-res mode, you still need to draw to graphics memory.
The various ways of doing this like calling HPLOT need setup in A,X and Y
as well as the HCOLOR value so this can take a lot of bytes.
My 16-byte entries use the shapetable/XDRAW interface but even when x-or
drawing you usually still have to call HPOSN to set up some zero-page values
like GBASL/GBASH first, and you can't trust on them having good values
at boot. You can try drawing directly to screen memory at $2000 or $4000,
but that usually takes 3 bytes too and if you want to draw on the full screen
(which is 8k) you need to increment two bytes of addresses. In theory it's
a byte smaller if you have a pointer in the zero page, but unfortunately
that doesn't happen by default.
Once you set hi-res mode you still
need to draw some graphics. It is
hard to do this compactly. The most
obvious way is the ROM HPLOT call,
but this depends on the A, X, and Y
registers holding the screen
co-ordinates as well as the desired
color being set up at a zero page
location.
So in theory to do hi-res it takes 3 bytes to init and at least 3 to draw,
and then finally if you want a loop that takes 2 bytes. So we're at
8 bytes and no room for demo effects like actually changing the color.
So what can we do?
When I create 16 byte demos I often
use the built-in ROM vector drawing
shapetable/XDRAW functionality which
avoids the need for color setting
because it just XORs pixels.
However you still usually need to
call the HPOSN routine to set up
the co-ordinate values in the zero
page such as GBASL/GBASH. The default
values from uninitialized RAM at boot
usually aren't useful.
One trick I used for a previous 8-byte lo-res entry is abuse some code put into
the zero page by the Applesoft ROM (so on any Apple II from the Apple II+
onward, which is most of them). This is the CHRGET code for stepping
through BASIC programs, which is put in the zero page by the ROM on boot
so the address being loaded can be self-modified. Part of this routine
does a 16-bit increment into the self modified region, followed by 7 bytes
of code ending in a branch instruction. So if we can drop our 8 bytes
of code into this area here (starting roughly at $B1) we can get the benefits
of the increment as well as the branch, and have a few more bytes to work with.
You can try drawing directly to screen
memory at addresses $2000 (PAGE1)
or $4000 (PAGE2), but that takes
3 bytes and if you want to draw to
all 8K of the screen you need to
have a way to increment a 16-bit
pointer. If we were lucky at boot
there'd be an indirect pointer in
the zero page with a good address
for this, but alas there isn't.
So for this code to work we use two calls into the ROM. One to clear
the screen to full-screen hi-res.
jsr HGR2
As said before this sets the graphics modes, in this case full-screen hi-res
displaying PAGE2 ($4000). It does a linear clear of the screen to 0 (black),
but on the Apple II due to the weird way Woz designed the graphics memory
map this gives a horizontal venetian-blind effect which looks pretty neat.
So to summarize, to do hi-res graphics
it takes 3 bytes to init, at least 3
to draw a pixel, and then 2 bytes for
a loop. We're at 8-bytes already and
we haven't even done anything useful
like increment the pixel location or
change the color.
The other thing we call into is
jsr BKGND0
this is a semi-unofficial entry point into the HGR2 code, the portion
that does the screen clear. It will clear the screen with the bit-pattern
in the accumulator.
So is all hope lost?
So for this demo we just clear the screen to a random bit pattern (which
gives a variety of colors) and then immediate re-clear the screen to zero
over and over again.
=== THE CHRGET TRICK ===
You might say, doesn't that only take 6-bytes of code? Well we need to
set a random value in the accumulator. Here we load so we over-write
the CHRGET address being loaded with some values. By default it is $800,
the default load address of BASIC programs. If we can point this value
to somewhere more interesting, like into ROM, it will treat the code
there as random values. The problem is when we load our demo these bytes
will be the first things executed so we have to make sure they get executed
harmlessly as no-ops. An obvious choice that points to rom would be
$EAEA, or two NOPs. We'll see in a minute though there are some
complications here.
We can use a trick I found in a
previous lo-res graphics entry
shown at Lovebyte 2022.
So if we drop a call to BKGND0 followed by a call to HGR2 and have it
followed up by the existing CHRGET BEQ instruction we have what we need,
as HGR2 always exits with Y=0 and the Zero flag set.
Try and run this though and the text screen will go weird and your program
will crash into the monitor (unhelpfully with the machine in graphics
mode so hard to tell what's going on).
We can abuse some code put into the
zero page by the Applesoft ROM at
boot (this is available on any
Apple II from the Apple II+ onward,
which is to say most of them).
The problem here is BKGND0 assumes the first page of graphics you want
to write to are in zero-page location HGR_PAGE $E6. On bootup this is likely
$00 or $FF, so the routine happily writes your color across the first 8
pages of RAM which is where the zero-page, stack, and your code live.
So not good. So we need a way to skip BKGND0 the first time through
the loop.
The ROM uses this code when parsing
BASIC programs, and it is apparently
put into the zero page so the address
being loaded can be self-modified.
If we were entering the code from the keyboard it would be fine, we could
just specify the start after the BKGND0 code. However we'd like this able
to be BRUN from disk. So what can we do?
The code looks like this:
Well there's one way to sneakily skip code on 6502. This is the famous
BIT instruction. If you put the first byte of a BIT instruction in your
code, it will treat the next 2 bytes as a value to check bits on which
is (usually) harmless. So if we load our code into the middle of
the 16-bit LDA instruction in CHRGET, start on a bit instruction, it will
skip the next 2 bytes the first time through, but when the loop happens
this bit instruction will be part of the load address to LDA and so
no skipping happens the rest of the executions. This is good, as the
call to HGR2 does properly set $E6 to the graphics page we want and
BKGDN0 will work properly after that.
CHRGET:
00B1- E6 B8 INC $B8
00B3- D0 02 BNE $00B7
00B5- E6 B9 INC $B9
00B7- AD 05 02 LDA $0205
00BA- C9 3A CMP #$3A
00BC- B0 0A BCS $00C8
00BE- C9 20 CMP #$20
00C0- F0 EF BEQ 00B1
There is a problem though, the code the first time through eats the two
next bytes, avoiding the JSR to BKGND0. But it means the following
two bytes, the $F4F3 ($F3, $F4 in little endian) bytes get executed as
code. Will that be a problem? It turns out those are un-specified
opcodes on both 6502 and 65c02 but on both chips those apparently
are treated as NOP and so our code works. With the BIT in place
the "random" memory values are pulled initially from
$EA2C (where 2c is the bit, and EA can be arbitrary but why not use
a NOP. In theory we could alter the colors we get by moving things around).
What the code originally does is not
important, what is interesting is that
it does a 16-bit increment of the
address of the load accumulator
instruction at $B7, and there's
a convenient BEQ (branch if equal)
back to the beginning of the routine
at $C0. If we drop our code in
between these two chunks of code we
can just barely do some interesting
graphics.
The two values at the beginning are incremented in a self-modified way
by the earlier unchanged CHRGET code so we walk through ROM getting
random color patterns in the accumulator, writing them to the screen,
and quickly clearning back to black again in a venetian-blind
pattern. It actually looks lovely, much nicer than some 16-byte
demos I've done.
=== THE PLAN ===
You can try things out on your own Apple II with
the following commands from the BASIC prompt
The first thing we need to do is get
into hi-res graphics mode. As
discussed earlier doing a 3-byte
jsr HGR2
will do this. It uses soft-switches
to enable graphics, switch to hi-res,
set it to full-screen (no text), and
finally to get the graphics from
PAGE2 ($4000). It then drops into
a routine that does a linear clear of
the screen to color 0 (black). This
might seem boring, but on the Apple II
due to the weird (and clever) way Woz
designed the DRAM/video refresh
circuitry this gives a venetian-blind
effect which looks pretty neat.
This is great, but we want some pretty
pixels on the screen too. It turns
out that if we jump into the middle
of the previously mentioned routine
we can hit the screen clearing
code at a point where it is drawing
the pattern in the A register to
the screen. So if we do a
jsr BKGND0
it will fill the screen with a nice
pattern. This is an unofficial entry
point in the ROM, but for various
complex reasons involving the license
with Microsoft it turns out Apple never
updated the Applesoft BASIC ROMs despite
there being various known bugs.
So now we in theory have 6 bytes of
code we can drop into the middle of
the CHRGET routine and theory have it
repeatedly clear the screen to a color
and then clear it to black, with a
nice blinds effect between them.
That's boring though, can we switch
up the colors drawn? It'd be nice
to load a random value into the
accumulator (A register) before the
call to fill the screen. The existing
code does a load from an always-
incrementing 16-bit address, let's
point it into the ROM code and that
can act as a random enough series
of bytes.
== LOAD ADDRESS CONSIDERATIONS ==
By default the load address is $800,
the default load address of BASIC
programs. We want to point it to ROM
which is at the top of the address
space. The easiest way to do this
is just have some high address bytes
at the start of the code and just load
the program so it drops into the middle
of the LDA instruction.
If we were running code by entering
it into the assembly language monitor
that would be fine, we could load
the bytes and then jump to an arbitrary
memory offset. However for the
competition we are going to load from
disk so we have to start executing
from the start of our binary. This
means these address bytes also need
to be valid code with no bad side
effects. An obvious choice would
be the no-operation NOP instruction,
which is $EA and $EAEA points nicely
into the ROM. It turns out there
are some complications with doing
this.
=== WHEREIN WE GET A BEEP AND ===
====== A TEXT SCREEN OF Ws ======
So we set our code to load in
the middle of CHRGET, calling BKGND0
first as the needed color pattern is
in A. We can't call HGR2 first as
it always will reset A to be $60.
We run this though, and you'll get
a text screen filled with characters
as it crashes to the monitor.
The problem here is BKGND0 assumes the
value of the first page of graphics
you want to is in zero-page location
HGR_PAGE $E6. On bootup this is
likely $00 or $FF, so when you call
the routine it happily writes your
color pattern across the first 8k
of RAM which unfortunately is where the
zero-page, stack, and your code live.
Not Good.
We need a way to skip BKGND0 the first
time through the loop.
=== SKIPPING CHUNKS OF INSTRUCTIONS ===
= SURPRISINGLY YOU DO THIS A LOT WHEN =
======= WRITING 6502 ASSEMBLY ========
There's one famous way to skip ahead
on the 6502. This is to use the BIT
instruction. By putting a $2C byte
in your code it will do a BIT
(logical AND to set bits but throw
away the result) with the address
being the two bytes after it you
want to skip. This is usually
harmless (unless those address bits
point to a soft-switch). You can use
this trick to compactly have code
where you can jump into the middle
of the BIT instruction to execute
the two address bytes as code,
or otherwise execute the code as sort
of a 3-byte almost NOP.
We can construct our code so the
entry point is a BIT instruction
that skips the first JSR, but later
loop iterations branch earlier and
instead the BIT is part of the address
to the LDA instruction and the JSR
happens as normal.
So the first time through HGR2 gets
called which usefully sets up the
HGR_PAGE value in $E6 to a good
value so the BKGND0 call works in
all future loop iterations.
=== ALMOST ON THE HOME STRETCH ===
We should be just about there, right?
There is a problem though, the first
time through the loop the BIT consumes
the next two bytes, avoiding the
JSR to BKGND0. However it means
the address of BKGND0, $F3F4,
(actually $F4, $F3 as the 6502 is
little-endian) get executed as code.
Is this a problem?
It turns out those two instructions
are invalid opcodes on both 6502
and 65c02 processors. Luckily, though,
instead of trapping like a modern
processor would the processor tries
to execute them anyway. You can
look up the side effects for these
invalid instructions online, on the
NMOS 6502 at least you get behavior
based on the don't care terms in the
instruction PLA. Happily though in
our case the instructions are close
enough to NOPs that our code will
work.
=== POINTING TO ROM ===
So with the BIT in place the last
step is to make sure we are pointing
to ROM when we load the accumulator.
If we load at address $B8 we can
have $2C of the bit as the low
byte of the LDA instruction, and
the high byte can be anything we want.
I arbitrarily put a NOP there even
though the code never gets executed
as $EA works to give a nice "random"
set of color patterns starting
at $EA2C (If you're curious, this is
in the Floating Point addition routine).
=== FINALLY, THE LOOP ===
We can't forget we need to loop.
If we load at $B8, this stops just
short of the BEQ branch-if-equal
instruction back to the beginning.
BEQ checks the Zero flag, but luckily
the HGR2 call always ends with the
Zero flag set so this nicely turns
the BEQ into a branch-always.
=== ALL FINISHED ===
The program loads, it skips the
first color fill, inits the screen,
then loops back alternately setting
and clearing the screen based on
a color pattern from an incrementing
part of ROM, leading to a colorful
animated venetian-blind pattern.
It actually looks lovely, arguably
nicer than many of the 16-byte intros
I've done.
=== TRY IT FOR YOURSELF ===
On an Apple II (or emulator) get to
the ']' BASIC prompt and enter
these commands to run it for yourself:
CALL -151
B8: 2c ea 20 f4 f3 20 d8 f3
B8G
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
by Vince `deater` Weaver
http://www.deater.net/weave
11 February 2023
with apologies to 4AM for vaguely
stealing his writeup format