diff --git a/demos/lovebyte2023/tinyhgr_8/README b/demos/lovebyte2023/tinyhgr_8/README index bd9b09e6..bc7ecbc6 100644 --- a/demos/lovebyte2023/tinyhgr_8/README +++ b/demos/lovebyte2023/tinyhgr_8/README @@ -1,131 +1,358 @@ -tiny_hgr8 +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + tiny_hgr8 + an 8-byte hi-res Apple II demo -8-byte hi-res Apple II demo by Deater / dSr + by Deater / dSr -Lovebyte 2023 + Lovebyte 2023 +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -I really wanted a hi-res 8-byte demo but that is trickier than you can think. +TLDR: I wrote an Apple II graphics + demo that's only 8 bytes of + 6502 assembly language -On Apple II/6502 to enable graphics you need three bytes, either + LINK + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + +I really wanted to make a hi-res 8-byte +demo but that is trickier than you +might think. + +=== THE CHALLENGE === + +The Apple II has a 6502 processor in it. + +To enable hi-res graphics you need three +bytes, typically a jump to the HGR +routine in the Applesoft BASIC ROM: JSR HGR -which takes 3 bytes to jump to the ROM and enable graphics, clear the screen, -and set which PAGE is being viewed. -You can also try setting the graphics "soft switches" yourself, something like - BIT $C050 -which is also 3-bytes, but to get hi-res you need to also set the hi-res -switch so too many bytes. +The HGR routine will flip the proper +soft-switches to enable graphics mode, +enable split graphics/text mode, select +viewing the 8K of graphics info in PAGE1 +and then clear the screen to black. +(The nearby HGR2 call is similar but +makes the graphics full-screen and +uses PAGE2 instead). -Once you set hi-res mode, you still need to draw to graphics memory. -The various ways of doing this like calling HPLOT need setup in A,X and Y -as well as the HCOLOR value so this can take a lot of bytes. -My 16-byte entries use the shapetable/XDRAW interface but even when x-or -drawing you usually still have to call HPOSN to set up some zero-page values -like GBASL/GBASH first, and you can't trust on them having good values -at boot. You can try drawing directly to screen memory at $2000 or $4000, -but that usually takes 3 bytes too and if you want to draw on the full screen -(which is 8k) you need to increment two bytes of addresses. In theory it's -a byte smaller if you have a pointer in the zero page, but unfortunately -that doesn't happen by default. +Once you set hi-res mode you still +need to draw some graphics. It is +hard to do this compactly. The most +obvious way is the ROM HPLOT call, +but this depends on the A, X, and Y +registers holding the screen +co-ordinates as well as the desired +color being set up at a zero page +location. -So in theory to do hi-res it takes 3 bytes to init and at least 3 to draw, -and then finally if you want a loop that takes 2 bytes. So we're at -8 bytes and no room for demo effects like actually changing the color. -So what can we do? +When I create 16 byte demos I often +use the built-in ROM vector drawing +shapetable/XDRAW functionality which +avoids the need for color setting +because it just XORs pixels. +However you still usually need to +call the HPOSN routine to set up +the co-ordinate values in the zero +page such as GBASL/GBASH. The default +values from uninitialized RAM at boot +usually aren't useful. -One trick I used for a previous 8-byte lo-res entry is abuse some code put into -the zero page by the Applesoft ROM (so on any Apple II from the Apple II+ -onward, which is most of them). This is the CHRGET code for stepping -through BASIC programs, which is put in the zero page by the ROM on boot -so the address being loaded can be self-modified. Part of this routine -does a 16-bit increment into the self modified region, followed by 7 bytes -of code ending in a branch instruction. So if we can drop our 8 bytes -of code into this area here (starting roughly at $B1) we can get the benefits -of the increment as well as the branch, and have a few more bytes to work with. +You can try drawing directly to screen +memory at addresses $2000 (PAGE1) +or $4000 (PAGE2), but that takes +3 bytes and if you want to draw to +all 8K of the screen you need to +have a way to increment a 16-bit +pointer. If we were lucky at boot +there'd be an indirect pointer in +the zero page with a good address +for this, but alas there isn't. -So for this code to work we use two calls into the ROM. One to clear -the screen to full-screen hi-res. - jsr HGR2 -As said before this sets the graphics modes, in this case full-screen hi-res -displaying PAGE2 ($4000). It does a linear clear of the screen to 0 (black), -but on the Apple II due to the weird way Woz designed the graphics memory -map this gives a horizontal venetian-blind effect which looks pretty neat. +So to summarize, to do hi-res graphics +it takes 3 bytes to init, at least 3 +to draw a pixel, and then 2 bytes for +a loop. We're at 8-bytes already and +we haven't even done anything useful +like increment the pixel location or +change the color. -The other thing we call into is - jsr BKGND0 -this is a semi-unofficial entry point into the HGR2 code, the portion -that does the screen clear. It will clear the screen with the bit-pattern -in the accumulator. +So is all hope lost? -So for this demo we just clear the screen to a random bit pattern (which -gives a variety of colors) and then immediate re-clear the screen to zero -over and over again. +=== THE CHRGET TRICK === -You might say, doesn't that only take 6-bytes of code? Well we need to -set a random value in the accumulator. Here we load so we over-write -the CHRGET address being loaded with some values. By default it is $800, -the default load address of BASIC programs. If we can point this value -to somewhere more interesting, like into ROM, it will treat the code -there as random values. The problem is when we load our demo these bytes -will be the first things executed so we have to make sure they get executed -harmlessly as no-ops. An obvious choice that points to rom would be -$EAEA, or two NOPs. We'll see in a minute though there are some -complications here. +We can use a trick I found in a +previous lo-res graphics entry +shown at Lovebyte 2022. -So if we drop a call to BKGND0 followed by a call to HGR2 and have it -followed up by the existing CHRGET BEQ instruction we have what we need, -as HGR2 always exits with Y=0 and the Zero flag set. -Try and run this though and the text screen will go weird and your program -will crash into the monitor (unhelpfully with the machine in graphics -mode so hard to tell what's going on). +We can abuse some code put into the +zero page by the Applesoft ROM at +boot (this is available on any +Apple II from the Apple II+ onward, +which is to say most of them). -The problem here is BKGND0 assumes the first page of graphics you want -to write to are in zero-page location HGR_PAGE $E6. On bootup this is likely -$00 or $FF, so the routine happily writes your color across the first 8 -pages of RAM which is where the zero-page, stack, and your code live. -So not good. So we need a way to skip BKGND0 the first time through -the loop. +The ROM uses this code when parsing +BASIC programs, and it is apparently +put into the zero page so the address +being loaded can be self-modified. -If we were entering the code from the keyboard it would be fine, we could -just specify the start after the BKGND0 code. However we'd like this able -to be BRUN from disk. So what can we do? +The code looks like this: -Well there's one way to sneakily skip code on 6502. This is the famous -BIT instruction. If you put the first byte of a BIT instruction in your -code, it will treat the next 2 bytes as a value to check bits on which -is (usually) harmless. So if we load our code into the middle of -the 16-bit LDA instruction in CHRGET, start on a bit instruction, it will -skip the next 2 bytes the first time through, but when the loop happens -this bit instruction will be part of the load address to LDA and so -no skipping happens the rest of the executions. This is good, as the -call to HGR2 does properly set $E6 to the graphics page we want and -BKGDN0 will work properly after that. +CHRGET: +00B1- E6 B8 INC $B8 +00B3- D0 02 BNE $00B7 +00B5- E6 B9 INC $B9 +00B7- AD 05 02 LDA $0205 +00BA- C9 3A CMP #$3A +00BC- B0 0A BCS $00C8 +00BE- C9 20 CMP #$20 +00C0- F0 EF BEQ 00B1 -There is a problem though, the code the first time through eats the two -next bytes, avoiding the JSR to BKGND0. But it means the following -two bytes, the $F4F3 ($F3, $F4 in little endian) bytes get executed as -code. Will that be a problem? It turns out those are un-specified -opcodes on both 6502 and 65c02 but on both chips those apparently -are treated as NOP and so our code works. With the BIT in place -the "random" memory values are pulled initially from -$EA2C (where 2c is the bit, and EA can be arbitrary but why not use -a NOP. In theory we could alter the colors we get by moving things around). +What the code originally does is not +important, what is interesting is that +it does a 16-bit increment of the +address of the load accumulator +instruction at $B7, and there's +a convenient BEQ (branch if equal) +back to the beginning of the routine +at $C0. If we drop our code in +between these two chunks of code we +can just barely do some interesting +graphics. -The two values at the beginning are incremented in a self-modified way -by the earlier unchanged CHRGET code so we walk through ROM getting -random color patterns in the accumulator, writing them to the screen, -and quickly clearning back to black again in a venetian-blind -pattern. It actually looks lovely, much nicer than some 16-byte -demos I've done. +=== THE PLAN === -You can try things out on your own Apple II with -the following commands from the BASIC prompt +The first thing we need to do is get +into hi-res graphics mode. As +discussed earlier doing a 3-byte + jsr HGR2 +will do this. It uses soft-switches +to enable graphics, switch to hi-res, +set it to full-screen (no text), and +finally to get the graphics from +PAGE2 ($4000). It then drops into +a routine that does a linear clear of +the screen to color 0 (black). This +might seem boring, but on the Apple II +due to the weird (and clever) way Woz +designed the DRAM/video refresh +circuitry this gives a venetian-blind +effect which looks pretty neat. + +This is great, but we want some pretty +pixels on the screen too. It turns +out that if we jump into the middle +of the previously mentioned routine +we can hit the screen clearing +code at a point where it is drawing +the pattern in the A register to +the screen. So if we do a + jsr BKGND0 +it will fill the screen with a nice +pattern. This is an unofficial entry +point in the ROM, but for various +complex reasons involving the license +with Microsoft it turns out Apple never +updated the Applesoft BASIC ROMs despite +there being various known bugs. + +So now we in theory have 6 bytes of +code we can drop into the middle of +the CHRGET routine and theory have it +repeatedly clear the screen to a color +and then clear it to black, with a +nice blinds effect between them. + +That's boring though, can we switch +up the colors drawn? It'd be nice +to load a random value into the +accumulator (A register) before the +call to fill the screen. The existing +code does a load from an always- +incrementing 16-bit address, let's +point it into the ROM code and that +can act as a random enough series +of bytes. + +== LOAD ADDRESS CONSIDERATIONS == + +By default the load address is $800, +the default load address of BASIC +programs. We want to point it to ROM +which is at the top of the address +space. The easiest way to do this +is just have some high address bytes +at the start of the code and just load +the program so it drops into the middle +of the LDA instruction. + +If we were running code by entering +it into the assembly language monitor +that would be fine, we could load +the bytes and then jump to an arbitrary +memory offset. However for the +competition we are going to load from +disk so we have to start executing +from the start of our binary. This +means these address bytes also need +to be valid code with no bad side +effects. An obvious choice would +be the no-operation NOP instruction, +which is $EA and $EAEA points nicely +into the ROM. It turns out there +are some complications with doing +this. + +=== WHEREIN WE GET A BEEP AND === +====== A TEXT SCREEN OF Ws ====== + +So we set our code to load in +the middle of CHRGET, calling BKGND0 +first as the needed color pattern is +in A. We can't call HGR2 first as +it always will reset A to be $60. + +We run this though, and you'll get +a text screen filled with characters +as it crashes to the monitor. + +The problem here is BKGND0 assumes the +value of the first page of graphics +you want to is in zero-page location +HGR_PAGE $E6. On bootup this is +likely $00 or $FF, so when you call +the routine it happily writes your +color pattern across the first 8k +of RAM which unfortunately is where the +zero-page, stack, and your code live. +Not Good. + +We need a way to skip BKGND0 the first +time through the loop. + + +=== SKIPPING CHUNKS OF INSTRUCTIONS === += SURPRISINGLY YOU DO THIS A LOT WHEN = +======= WRITING 6502 ASSEMBLY ======== + +There's one famous way to skip ahead +on the 6502. This is to use the BIT +instruction. By putting a $2C byte +in your code it will do a BIT +(logical AND to set bits but throw +away the result) with the address +being the two bytes after it you +want to skip. This is usually +harmless (unless those address bits +point to a soft-switch). You can use +this trick to compactly have code +where you can jump into the middle +of the BIT instruction to execute +the two address bytes as code, +or otherwise execute the code as sort +of a 3-byte almost NOP. + +We can construct our code so the +entry point is a BIT instruction +that skips the first JSR, but later +loop iterations branch earlier and +instead the BIT is part of the address +to the LDA instruction and the JSR +happens as normal. + +So the first time through HGR2 gets +called which usefully sets up the +HGR_PAGE value in $E6 to a good +value so the BKGND0 call works in +all future loop iterations. + +=== ALMOST ON THE HOME STRETCH === + +We should be just about there, right? + +There is a problem though, the first +time through the loop the BIT consumes +the next two bytes, avoiding the +JSR to BKGND0. However it means +the address of BKGND0, $F3F4, +(actually $F4, $F3 as the 6502 is +little-endian) get executed as code. +Is this a problem? + +It turns out those two instructions +are invalid opcodes on both 6502 +and 65c02 processors. Luckily, though, +instead of trapping like a modern +processor would the processor tries +to execute them anyway. You can +look up the side effects for these +invalid instructions online, on the +NMOS 6502 at least you get behavior +based on the don't care terms in the +instruction PLA. Happily though in +our case the instructions are close +enough to NOPs that our code will +work. + +=== POINTING TO ROM === + +So with the BIT in place the last +step is to make sure we are pointing +to ROM when we load the accumulator. + +If we load at address $B8 we can +have $2C of the bit as the low +byte of the LDA instruction, and +the high byte can be anything we want. +I arbitrarily put a NOP there even +though the code never gets executed +as $EA works to give a nice "random" +set of color patterns starting +at $EA2C (If you're curious, this is +in the Floating Point addition routine). + +=== FINALLY, THE LOOP === + +We can't forget we need to loop. +If we load at $B8, this stops just +short of the BEQ branch-if-equal +instruction back to the beginning. +BEQ checks the Zero flag, but luckily +the HGR2 call always ends with the +Zero flag set so this nicely turns +the BEQ into a branch-always. + +=== ALL FINISHED === + +The program loads, it skips the +first color fill, inits the screen, +then loops back alternately setting +and clearing the screen based on +a color pattern from an incrementing +part of ROM, leading to a colorful +animated venetian-blind pattern. + +It actually looks lovely, arguably +nicer than many of the 16-byte intros +I've done. + + +=== TRY IT FOR YOURSELF === + +On an Apple II (or emulator) get to +the ']' BASIC prompt and enter +these commands to run it for yourself: CALL -151 B8: 2c ea 20 f4 f3 20 d8 f3 B8G +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- +by Vince `deater` Weaver + http://www.deater.net/weave + 11 February 2023 - - +with apologies to 4AM for vaguely + stealing his writeup format