dos33fsprogs/demos/lovebyte2023/tinyhgr_8/README

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
               tiny_hgr8
    an 8-byte hi-res Apple II demo

             by Deater / dSr

              Lovebyte 2023
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

TLDR: I wrote an Apple II graphics
      demo that's only 8 bytes of
      6502 assembly language

      LINK

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

I really wanted to make a hi-res 8-byte
demo but that is trickier than you
might think.

=== THE CHALLENGE ===

The Apple II has a 6502 processor in it.

To enable hi-res graphics you need three
bytes, typically a jump to the HGR
routine in the Applesoft BASIC ROM:
	JSR	HGR

The HGR routine will flip the proper
soft-switches to enable graphics mode,
enable split graphics/text mode, select
viewing the 8K of graphics info in PAGE1
and then clear the screen to black.
(The nearby HGR2 call is similar but
makes the graphics full-screen and
uses PAGE2 instead).

Once you set hi-res mode you still
need to draw some graphics.  It is
hard to do this compactly.  The most
obvious way is the ROM HPLOT call,
but this depends on the A, X, and Y
registers holding the screen
co-ordinates as well as the desired
color being set up at a zero page
location.

When I create 16 byte demos I often
use the built-in ROM vector drawing
shapetable/XDRAW functionality which
avoids the need for color setting
because it just XORs pixels.
However you still usually need to
call the HPOSN routine to set up
the co-ordinate values in the zero
page such as GBASL/GBASH.  The default
values from uninitialized RAM at boot
usually aren't useful.

You can try drawing directly to screen
memory at addresses $2000 (PAGE1)
or $4000 (PAGE2), but that takes
3 bytes and if you want to draw to
all 8K of the screen you need to
have a way to increment a 16-bit
pointer.  If we were lucky at boot
there'd be an indirect pointer in
the zero page with a good address
for this, but alas there isn't.

So to summarize, to do hi-res graphics
it takes 3 bytes to init, at least 3
to draw a pixel, and then 2 bytes for
a loop.  We're at 8-bytes already and
we haven't even done anything useful
like increment the pixel location or
change the color.

So is all hope lost?

=== THE CHRGET TRICK ===

We can use a trick I found in a
previous lo-res graphics entry
shown at Lovebyte 2022.

We can abuse some code put into the
zero page by the Applesoft ROM at
boot (this is available on any 
Apple II from the Apple II+  onward,
which is to say most of them).

The ROM uses this code when parsing
BASIC programs, and it is apparently
put into the zero page so the address
being loaded can be self-modified.

The code looks like this:

CHRGET:
00B1- E6 B8      INC $B8
00B3- D0 02      BNE $00B7
00B5- E6 B9      INC $B9
00B7- AD 05 02   LDA $0205
00BA- C9 3A      CMP #$3A
00BC- B0 0A      BCS $00C8
00BE- C9 20      CMP #$20
00C0- F0 EF      BEQ 00B1

What the code originally does is not
important, what is interesting is that
it does a 16-bit increment of the
address of the load accumulator
instruction at $B7, and there's
a convenient BEQ (branch if equal)
back to the beginning of the routine
at $C0.  If we drop our code in
between these two chunks of code we
can just barely do some interesting
graphics.

=== THE PLAN ===

The first thing we need to do is get
into hi-res graphics mode.  As
discussed earlier doing a 3-byte
      jsr    HGR2
will do this.  It uses soft-switches
to enable graphics, switch to hi-res,
set it to full-screen (no text), and
finally to get the graphics from
PAGE2 ($4000).  It then drops into
a routine that does a linear clear of
the screen to color 0 (black).  This
might seem boring, but on the Apple II
due to the weird (and clever) way Woz
designed the DRAM/video refresh
circuitry this gives a venetian-blind
effect which looks pretty neat.

This is great, but we want some pretty
pixels on the screen too.  It turns
out that if we jump into the middle
of the previously mentioned routine
we can hit the screen clearing
code at a point where it is drawing
the pattern in the A register to
the screen.  So if we do a
        jsr    BKGND0
it will fill the screen with a nice
pattern.  This is an unofficial entry
point in the ROM, but for various
complex reasons involving the license
with Microsoft it turns out Apple never
updated the Applesoft BASIC ROMs despite
there being various known bugs.

So now we in theory have 6 bytes of
code we can drop into the middle of
the CHRGET routine and theory have it
repeatedly clear the screen to a color
and then clear it to black, with a
nice blinds effect between them.

That's boring though, can we switch
up the colors drawn?  It'd be nice
to load a random value into the
accumulator (A register) before the
call to fill the screen.  The existing
code does a load from an always-
incrementing 16-bit address, let's
point it into the ROM code and that
can act as a random enough series
of bytes.

== LOAD ADDRESS CONSIDERATIONS ==

By default the load address is $800, 
the default load address of BASIC
programs.  We want to point it to ROM
which is at the top of the address
space.  The easiest way to do this
is just have some high address bytes
at the start of the code and just load
the program so it drops into the middle
of the LDA instruction.

If we were running code by entering
it into the assembly language monitor
that would be fine, we could load
the bytes and then jump to an arbitrary
memory offset.  However for the
competition we are going to load from
disk so we have to start executing
from the start of our binary.  This
means these address bytes also need
to be valid code with no bad side
effects.  An obvious choice would
be the no-operation NOP instruction,
which is $EA and $EAEA points nicely
into the ROM.  It turns out there
are some complications with doing
this.

=== WHEREIN WE GET A BEEP AND  ===
====== A TEXT SCREEN OF Ws ======

So we set our code to load in
the middle of CHRGET, calling BKGND0
first as the needed color pattern is
in A.  We can't call HGR2 first as
it always will reset A to be $60.

We run this though, and you'll get
a text screen filled with characters
as it crashes to the monitor.

The problem here is BKGND0 assumes the
value of the first page of graphics
you want to is in zero-page location 
HGR_PAGE $E6.  On bootup this is 
likely $00 or $FF, so when you call
the routine it happily writes your 
color pattern across the first 8k
of RAM which unfortunately is where the 
zero-page, stack, and your code live.
Not Good.

We need a way to skip BKGND0 the first 
time through the loop.


=== SKIPPING CHUNKS OF INSTRUCTIONS ===
= SURPRISINGLY YOU DO THIS A LOT WHEN =
======= WRITING 6502 ASSEMBLY  ========

There's one famous way to skip ahead
on the 6502.  This is to use the BIT
instruction.  By putting a $2C byte
in your code it will do a BIT 
(logical AND to set bits but throw
away the result) with the address
being the two bytes after it you
want to skip.  This is usually
harmless (unless those address bits
point to a soft-switch).  You can use
this trick to compactly have code
where you can jump into the middle
of the BIT instruction to execute
the two address bytes as code,
or otherwise execute the code as sort
of a 3-byte almost NOP.

We can construct our code so the
entry point is a BIT instruction
that skips the first JSR, but later
loop iterations branch earlier and
instead the BIT is part of the address
to the LDA instruction and the JSR
happens as normal.

So the first time through HGR2 gets
called which usefully sets up the
HGR_PAGE value in $E6 to a good
value so the BKGND0 call works in
all future loop iterations.

=== ALMOST ON THE HOME STRETCH ===

We should be just about there, right?

There is a problem though, the first
time through the loop the BIT consumes
the next two bytes, avoiding the 
JSR to BKGND0.  However it means
the address of BKGND0, $F3F4,
(actually $F4, $F3 as the 6502 is
little-endian) get executed as code.
Is this a problem?

It turns out those two instructions
are invalid opcodes on both 6502
and 65c02 processors.  Luckily, though,
instead of trapping like a modern
processor would the processor tries
to execute them anyway.  You can 
look up the side effects for these
invalid instructions online, on the
NMOS 6502 at least you get behavior
based on the don't care terms in the
instruction PLA.  Happily though in
our case the instructions are close
enough to NOPs that our code will
work.

=== POINTING TO ROM ===

So with the BIT in place the last
step is to make sure we are pointing
to ROM when we load the accumulator.

If we load at address $B8 we can
have $2C of the bit as the low
byte of the LDA instruction, and
the high byte can be anything we want.
I arbitrarily put a NOP there even
though the code never gets executed
as $EA works to give a nice "random"
set of color patterns starting
at $EA2C (If you're curious, this is
in the Floating Point addition routine).

=== FINALLY, THE LOOP ===

We can't forget we need to loop.
If we load at $B8, this stops just
short of the BEQ branch-if-equal
instruction back to the beginning.
BEQ checks the Zero flag, but luckily
the HGR2 call always ends with the
Zero flag set so this nicely turns
the BEQ into a branch-always.

=== ALL FINISHED ===

The program loads, it skips the
first color fill, inits the screen,
then loops back alternately setting
and clearing the screen based on
a color pattern from an incrementing
part of ROM, leading to a colorful
animated venetian-blind pattern.

It actually looks lovely, arguably
nicer than many of the 16-byte intros
I've done.


=== TRY IT FOR YOURSELF ===

On an Apple II (or emulator) get to
the ']' BASIC prompt and enter
these commands to run it for yourself:

CALL -151
B8: 2c ea 20 f4 f3 20 d8 f3 
B8G

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

by Vince `deater` Weaver
   http://www.deater.net/weave
   11 February 2023

with apologies to 4AM for vaguely
     stealing his writeup format
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=`
			`tiny_hgr8`
			`an 8-byte hi-res Apple II demo`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`by Deater / dSr`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`Lovebyte 2023`
			`=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`TLDR: I wrote an Apple II graphics`
			`demo that's only 8 bytes of`
			`6502 assembly language`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`LINK`

			`=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=`

			`I really wanted to make a hi-res 8-byte`
			`demo but that is trickier than you`
			`might think.`

			`=== THE CHALLENGE ===`

			`The Apple II has a 6502 processor in it.`

			`To enable hi-res graphics you need three`
			`bytes, typically a jump to the HGR`
			`routine in the Applesoft BASIC ROM:`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00			`JSR HGR`
lovebyte: long-form README 2023-02-11 04:33:26 +00:00
			`The HGR routine will flip the proper`
			`soft-switches to enable graphics mode,`
			`enable split graphics/text mode, select`
			`viewing the 8K of graphics info in PAGE1`
			`and then clear the screen to black.`
			`(The nearby HGR2 call is similar but`
			`makes the graphics full-screen and`
			`uses PAGE2 instead).`

			`Once you set hi-res mode you still`
			`need to draw some graphics. It is`
			`hard to do this compactly. The most`
			`obvious way is the ROM HPLOT call,`
			`but this depends on the A, X, and Y`
			`registers holding the screen`
			`co-ordinates as well as the desired`
			`color being set up at a zero page`
			`location.`

			`When I create 16 byte demos I often`
			`use the built-in ROM vector drawing`
			`shapetable/XDRAW functionality which`
			`avoids the need for color setting`
			`because it just XORs pixels.`
			`However you still usually need to`
			`call the HPOSN routine to set up`
			`the co-ordinate values in the zero`
			`page such as GBASL/GBASH. The default`
			`values from uninitialized RAM at boot`
			`usually aren't useful.`

			`You can try drawing directly to screen`
			`memory at addresses $2000 (PAGE1)`
			`or $4000 (PAGE2), but that takes`
			`3 bytes and if you want to draw to`
			`all 8K of the screen you need to`
			`have a way to increment a 16-bit`
			`pointer. If we were lucky at boot`
			`there'd be an indirect pointer in`
			`the zero page with a good address`
			`for this, but alas there isn't.`

			`So to summarize, to do hi-res graphics`
			`it takes 3 bytes to init, at least 3`
			`to draw a pixel, and then 2 bytes for`
			`a loop. We're at 8-bytes already and`
			`we haven't even done anything useful`
			`like increment the pixel location or`
			`change the color.`

			`So is all hope lost?`

			`=== THE CHRGET TRICK ===`

			`We can use a trick I found in a`
			`previous lo-res graphics entry`
			`shown at Lovebyte 2022.`

			`We can abuse some code put into the`
			`zero page by the Applesoft ROM at`
			`boot (this is available on any`
			`Apple II from the Apple II+ onward,`
			`which is to say most of them).`

			`The ROM uses this code when parsing`
			`BASIC programs, and it is apparently`
			`put into the zero page so the address`
			`being loaded can be self-modified.`

			`The code looks like this:`

			`CHRGET:`
			`00B1- E6 B8 INC $B8`
			`00B3- D0 02 BNE $00B7`
			`00B5- E6 B9 INC $B9`
			`00B7- AD 05 02 LDA $0205`
			`00BA- C9 3A CMP #$3A`
			`00BC- B0 0A BCS $00C8`
			`00BE- C9 20 CMP #$20`
			`00C0- F0 EF BEQ 00B1`

			`What the code originally does is not`
			`important, what is interesting is that`
			`it does a 16-bit increment of the`
			`address of the load accumulator`
			`instruction at $B7, and there's`
			`a convenient BEQ (branch if equal)`
			`back to the beginning of the routine`
			`at $C0. If we drop our code in`
			`between these two chunks of code we`
			`can just barely do some interesting`
			`graphics.`

			`=== THE PLAN ===`

			`The first thing we need to do is get`
			`into hi-res graphics mode. As`
			`discussed earlier doing a 3-byte`
			`jsr HGR2`
			`will do this. It uses soft-switches`
			`to enable graphics, switch to hi-res,`
			`set it to full-screen (no text), and`
			`finally to get the graphics from`
			`PAGE2 ($4000). It then drops into`
			`a routine that does a linear clear of`
			`the screen to color 0 (black). This`
			`might seem boring, but on the Apple II`
			`due to the weird (and clever) way Woz`
			`designed the DRAM/video refresh`
			`circuitry this gives a venetian-blind`
			`effect which looks pretty neat.`

			`This is great, but we want some pretty`
			`pixels on the screen too. It turns`
			`out that if we jump into the middle`
			`of the previously mentioned routine`
			`we can hit the screen clearing`
			`code at a point where it is drawing`
			`the pattern in the A register to`
			`the screen. So if we do a`
			`jsr BKGND0`
			`it will fill the screen with a nice`
			`pattern. This is an unofficial entry`
			`point in the ROM, but for various`
			`complex reasons involving the license`
			`with Microsoft it turns out Apple never`
			`updated the Applesoft BASIC ROMs despite`
			`there being various known bugs.`

			`So now we in theory have 6 bytes of`
			`code we can drop into the middle of`
			`the CHRGET routine and theory have it`
			`repeatedly clear the screen to a color`
			`and then clear it to black, with a`
			`nice blinds effect between them.`

			`That's boring though, can we switch`
			`up the colors drawn? It'd be nice`
			`to load a random value into the`
			`accumulator (A register) before the`
			`call to fill the screen. The existing`
			`code does a load from an always-`
			`incrementing 16-bit address, let's`
			`point it into the ROM code and that`
			`can act as a random enough series`
			`of bytes.`

			`== LOAD ADDRESS CONSIDERATIONS ==`

			`By default the load address is $800,`
			`the default load address of BASIC`
			`programs. We want to point it to ROM`
			`which is at the top of the address`
			`space. The easiest way to do this`
			`is just have some high address bytes`
			`at the start of the code and just load`
			`the program so it drops into the middle`
			`of the LDA instruction.`

			`If we were running code by entering`
			`it into the assembly language monitor`
			`that would be fine, we could load`
			`the bytes and then jump to an arbitrary`
			`memory offset. However for the`
			`competition we are going to load from`
			`disk so we have to start executing`
			`from the start of our binary. This`
			`means these address bytes also need`
			`to be valid code with no bad side`
			`effects. An obvious choice would`
			`be the no-operation NOP instruction,`
			`which is $EA and $EAEA points nicely`
			`into the ROM. It turns out there`
			`are some complications with doing`
			`this.`

			`=== WHEREIN WE GET A BEEP AND ===`
			`====== A TEXT SCREEN OF Ws ======`

			`So we set our code to load in`
			`the middle of CHRGET, calling BKGND0`
			`first as the needed color pattern is`
			`in A. We can't call HGR2 first as`
			`it always will reset A to be $60.`

			`We run this though, and you'll get`
			`a text screen filled with characters`
			`as it crashes to the monitor.`

			`The problem here is BKGND0 assumes the`
			`value of the first page of graphics`
			`you want to is in zero-page location`
			`HGR_PAGE $E6. On bootup this is`
			`likely $00 or $FF, so when you call`
			`the routine it happily writes your`
			`color pattern across the first 8k`
			`of RAM which unfortunately is where the`
			`zero-page, stack, and your code live.`
			`Not Good.`

			`We need a way to skip BKGND0 the first`
			`time through the loop.`


			`=== SKIPPING CHUNKS OF INSTRUCTIONS ===`
			`= SURPRISINGLY YOU DO THIS A LOT WHEN =`
			`======= WRITING 6502 ASSEMBLY ========`

			`There's one famous way to skip ahead`
			`on the 6502. This is to use the BIT`
			`instruction. By putting a $2C byte`
			`in your code it will do a BIT`
			`(logical AND to set bits but throw`
			`away the result) with the address`
			`being the two bytes after it you`
			`want to skip. This is usually`
			`harmless (unless those address bits`
			`point to a soft-switch). You can use`
			`this trick to compactly have code`
			`where you can jump into the middle`
			`of the BIT instruction to execute`
			`the two address bytes as code,`
			`or otherwise execute the code as sort`
			`of a 3-byte almost NOP.`

			`We can construct our code so the`
			`entry point is a BIT instruction`
			`that skips the first JSR, but later`
			`loop iterations branch earlier and`
			`instead the BIT is part of the address`
			`to the LDA instruction and the JSR`
			`happens as normal.`

			`So the first time through HGR2 gets`
			`called which usefully sets up the`
			`HGR_PAGE value in $E6 to a good`
			`value so the BKGND0 call works in`
			`all future loop iterations.`

			`=== ALMOST ON THE HOME STRETCH ===`

			`We should be just about there, right?`

			`There is a problem though, the first`
			`time through the loop the BIT consumes`
			`the next two bytes, avoiding the`
			`JSR to BKGND0. However it means`
			`the address of BKGND0, $F3F4,`
			`(actually $F4, $F3 as the 6502 is`
			`little-endian) get executed as code.`
			`Is this a problem?`

			`It turns out those two instructions`
			`are invalid opcodes on both 6502`
			`and 65c02 processors. Luckily, though,`
			`instead of trapping like a modern`
			`processor would the processor tries`
			`to execute them anyway. You can`
			`look up the side effects for these`
			`invalid instructions online, on the`
			`NMOS 6502 at least you get behavior`
			`based on the don't care terms in the`
			`instruction PLA. Happily though in`
			`our case the instructions are close`
			`enough to NOPs that our code will`
			`work.`

			`=== POINTING TO ROM ===`

			`So with the BIT in place the last`
			`step is to make sure we are pointing`
			`to ROM when we load the accumulator.`

			`If we load at address $B8 we can`
			`have $2C of the bit as the low`
			`byte of the LDA instruction, and`
			`the high byte can be anything we want.`
			`I arbitrarily put a NOP there even`
			`though the code never gets executed`
			`as $EA works to give a nice "random"`
			`set of color patterns starting`
			`at $EA2C (If you're curious, this is`
			`in the Floating Point addition routine).`

			`=== FINALLY, THE LOOP ===`

			`We can't forget we need to loop.`
			`If we load at $B8, this stops just`
			`short of the BEQ branch-if-equal`
			`instruction back to the beginning.`
			`BEQ checks the Zero flag, but luckily`
			`the HGR2 call always ends with the`
			`Zero flag set so this nicely turns`
			`the BEQ into a branch-always.`

			`=== ALL FINISHED ===`

			`The program loads, it skips the`
			`first color fill, inits the screen,`
			`then loops back alternately setting`
			`and clearing the screen based on`
			`a color pattern from an incrementing`
			`part of ROM, leading to a colorful`
			`animated venetian-blind pattern.`

			`It actually looks lovely, arguably`
			`nicer than many of the 16-byte intros`
			`I've done.`


			`=== TRY IT FOR YOURSELF ===`

			`On an Apple II (or emulator) get to`
			`the ']' BASIC prompt and enter`
			`these commands to run it for yourself:`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
			`CALL -151`
			`B8: 2c ea 20 f4 f3 20 d8 f3`
			`B8G`

lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			by Vince `deater` Weaver
			`http://www.deater.net/weave`
			`11 February 2023`
lovebyte2023: add README on the 8-byte code much longer than the actual program 2023-02-10 16:31:28 +00:00
lovebyte: long-form README 2023-02-11 04:33:26 +00:00			`with apologies to 4AM for vaguely`
			`stealing his writeup format`