=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
               tiny_hgr8
    an 8-byte hi-res Apple II demo

             by Deater / dSr

              Lovebyte 2023
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

TLDR: I wrote an Apple II graphics
      demo that's only 8 bytes of
      6502 assembly language

      LINK: 
	https://youtu.be/8QYezzXC9PA

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

I really wanted to make a hi-res 8-byte
demo but that is trickier than you
might think.

=== THE CHALLENGE ===

The Apple II has a 6502 processor.

To enable hi-res graphics you need three
bytes, typically a jump to the HGR
routine in the Applesoft BASIC ROM:
	JSR	HGR

The HGR routine will flip the proper
soft-switches to enable graphics mode,
enable split graphics/text mode, select
viewing the 8K of graphics info in PAGE1
and then clear the screen to black.
(The nearby HGR2 call is similar but
makes the graphics full-screen and
uses PAGE2 instead).

Once you set hi-res mode you still
need to draw some graphics.  It is
hard to do this compactly.  The most
obvious way is the ROM HPLOT call,
but this depends on the A, X, and Y
registers holding the screen
co-ordinates as well as the desired
color being set up at a zero page
location.

When I create 16 byte demos I often
use the built-in ROM vector drawing
shapetable/XDRAW functionality which
avoids the need for color setting
because it just XORs pixels.
However you still usually need to
call the HPOSN routine to set up
the co-ordinate values in the zero
page such as GBASL/GBASH.  The default
values from uninitialized RAM at boot
usually aren't useful.

You can try drawing directly to screen
memory at addresses $2000* (PAGE1)
or $4000 (PAGE2) , but that takes
3 bytes and if you want to draw to
all 8K of the screen you need to
have a way to increment a 16-bit
pointer.  If we were lucky at boot
there'd be an indirect pointer in
the zero page with a good address
for this, but alas there isn't.

So to summarize, to do hi-res graphics
it takes 3 bytes to init, at least 3
to draw a pixel, and then 2 bytes for
a loop.  We're at 8-bytes already and
we haven't even done anything useful
like increment the pixel location or
change the color.

So is all hope lost?

* note a leading $ is how you 
  traditionally indicate hexadecimal
  numbers on 6502 computers

=== THE CHRGET TRICK ===

We can use a trick I found in a
previous lo-res graphics entry
shown at Lovebyte 2022.

We can abuse some code put into the
zero page by the Applesoft ROM at
boot (this is available on any 
Apple II from the Apple II+ onward,
which is to say most of them).

Applesoft uses this code when parsing
BASIC programs, and it is apparently
put into the zero page so the address
being loaded can be self-modified.

The code looks like this:

CHRGET:
00B1- E6 B8      INC $B8
00B3- D0 02      BNE $00B7
00B5- E6 B9      INC $B9
00B7- AD 05 02   LDA $0205
00BA- C9 3A      CMP #$3A
00BC- B0 0A      BCS $00C8
00BE- C9 20      CMP #$20
00C0- F0 EF      BEQ 00B1

What the code originally does is not
important, what is interesting is that
it does a 16-bit increment of the
address of the LDA (load accumulator)
instruction at $B7, and there's
a convenient BEQ (branch if equal)
back to the beginning of the routine
at $C0.  If we drop our code in
between these two chunks of code we
can just barely do some interesting
graphics.

=== THE PLAN ===

The first thing we need to do is get
into hi-res graphics mode.  As
discussed earlier doing a 3-byte
      jsr    HGR2
will do this.  It uses soft-switches
to enable graphics, switch to hi-res,
set it to full-screen (no text), and
finally to get the graphics from
PAGE2 ($4000).  It then drops into
a routine that does a linear clear of
the screen to color 0 (black).  This
might seem boring, but on the Apple II
due to the weird (and clever) way Woz
designed the DRAM/video refresh
circuitry this gives a venetian-blind
effect which looks pretty neat.

This is great, but we want some pretty
pixels on the screen too.  It turns
out that if we jump into the middle
of the previously mentioned routine
we can hit the screen clearing
code at a point where it is drawing
the pattern in the A register to
the screen.  So if we do a
        jsr    BKGND0
it will fill the screen with a nice
pattern.  This is an unofficial entry
point in the ROM, but for various
complex reasons involving the license
with Microsoft it turns out Apple never
updated the Applesoft BASIC ROMs despite
there being various known bugs.

So now we in theory have 6 bytes of
code we can drop into the middle of
the CHRGET routine and have it
repeatedly clear the screen to a color
and then clear it back to black, with a
nice blinds effect.

That's boring though, can we switch
up the colors drawn?  It'd be nice
to load a random value into the
accumulator (A register) before the
call to fill the screen.  The existing
code does a load from an always-
incrementing 16-bit address, let's
point it into the ROM code and that
can act as a random enough series
of bytes.

== LOAD ADDRESS CONSIDERATIONS ==

The CHRGET load address starts at
$800, the default load address of BASIC
programs.  We want to point it to ROM
which is at the top of the address
space.  The easiest way to do this
is just have some high address bytes
at the start of the code and just load
the program so it drops into the middle
of the LDA instruction.

If we were running code by entering
it into the assembly language monitor
that would be fine, we could load
the bytes and then jump to an arbitrary
memory offset.  However for the
competition we are going to load from
disk so we have to start executing
from the start of our binary.  This
means these address bytes also need
to be valid code with no bad side
effects.  An obvious choice would
be the no-operation NOP instruction,
which is $EA. Convenient, as $EAEA
points nicely into the ROM.  It turns
out there are some fun** complications
with doing this.

** As per 4am, no fun is actually
	guaranteed in this process

=== WHEREIN WE GET A BEEP AND  ===
====== A TEXT SCREEN OF Ws =======

So we set our code to load in
the middle of CHRGET, calling BKGND0
immediately after the LDA which
puts the needed color pattern into
the A register.  We can't call HGR2
first as it will always reset A to
be $60.

Sadly, if you run this, you'll get
a text screen filled with characters
before crashing into the monitor.

The problem here is BKGND0 assumes the
value of the first page of graphics
you want to fill is in zero-page
location HGR_PAGE ($E6).  On bootup
this is likely uninitialized (it 
often ends up $00 or $FF), so when
you call the routine it happily writes
your color pattern across the first 8k
of RAM which unfortunately is where the 
zero-page, stack, and your code live.
Not Good.

We need a way to skip BKGND0 the first 
time through the loop.


=== SKIPPING CHUNKS OF INSTRUCTIONS ===
= SURPRISINGLY YOU DO THIS A LOT WHEN =
======= WRITING 6502 ASSEMBLY  ========

There's one famous way to skip ahead
on the 6502.  This is to use the BIT
instruction.  By putting a $2C byte
in your code it will do a BIT 
(logical AND to set bits but throw
away the result) and it will use
two bytes following (that you are
trying to skip) as an address.
This is usually harmless (unless those
address bits point to a soft-switch).
You can use this trick to compactly
have code where you can jump into the
middle of the BIT instruction to
execute the two address bytes as code,
but otherwise execute the BIT as sort
of a 3-byte almost NOP.

We can construct our code so the
entry point is a BIT instruction
that skips the first JSR, but later
loop iterations branch earlier and
instead the BIT is part of the address
to the LDA instruction and the JSR
happens as normal.

So the first time through the loop
BKGND0 is skipped and HGR2 gets
called first.  HGR2 usefully sets
up the HGR_PAGE value in $E6 to a good
value so the BKGND0 call works in
all future loop iterations.

=== ALMOST ON THE HOME STRETCH ===

We should be just about there, right?

There is a problem though, the first
time through the loop the BIT consumes
the next two bytes, avoiding the 
JSR to BKGND0.  However it means
the address of BKGND0, $F3F4,
(actually $F4, $F3 as the 6502 is
little-endian) get executed as code.
Is this a problem?

It turns out those two instructions
are invalid opcodes on both 6502
and 65c02 processors.  Luckily, though,
instead of trapping like a modern
processor would the processor tries
to execute them anyway.  You can 
look up the side effects for these
invalid instructions online; on the
NMOS 6502 at least you get behavior
based on the don't care terms in the
instruction PLA.  Happily though in
our case the instructions are close
enough to NOPs that our code will
work.

=== POINTING TO ROM ===

So with the BIT in place the last
step is to make sure we are pointing
to ROM when we load the accumulator.

If we load our 8-bytes of code at
address $B8 we can have $2C of the
BIT as the low byte of the LDA
instruction address, and the high
byte can be anything we want.
I arbitrarily put a NOP there even
though the code never gets executed
as $EA works to give a nice "random"
set of color patterns starting
at $EA2C (If you're curious, this is
in the middle of the ROM Floating
Point addition routine).

=== FINALLY, THE LOOP ===

We can't forget we need to loop.
If we load our code at $B8, the
8-bytes stop just short of the BEQ
branch-if-equal instruction back to
the beginning.  BEQ checks the Zero
flag, but luckily the HGR2 call always
ends with the Zero flag set so this
nicely turns the BEQ into a
branch-always.

=== ALL FINISHED ===

The program loads, it skips the
first color fill, inits the screen,
then loops back alternately setting
and clearing the screen based on
a color pattern from an incrementing
pointer into ROM, leading to a colorful
animated venetian-blind pattern.

It actually looks lovely, arguably
nicer than many of the 16-byte intros
I've done.


=== TRY IT FOR YOURSELF ===

On an Apple II (or emulator) get to
the ']' BASIC prompt and enter
these commands to run it for yourself:

CALL -151
B8: 2C EA 20 F4 F3 20 D8 F3 
B8G

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

by Vince `deater` Weaver
   http://www.deater.net/weave
   11 February 2023

with apologies to 4AM for vaguely
     stealing his writeup format