diff --git a/mode7_demo/README.mode7_demo b/mode7_demo/README.mode7_demo index 82fa6413..be19a247 100644 --- a/mode7_demo/README.mode7_demo +++ b/mode7_demo/README.mode7_demo @@ -1,6 +1,331 @@ -Plan: - Load at $1000 - Decompress to $2000 + Challenges found writing an 8k Lores Apple II Demo +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + by DEATER (Vince Weaver, vince@deater.net) + + http://www.deater.net/weave/vmwprod/mode7_demo/ +==================================================== + 19 March 2018 + +GOAL: +~~~~~ + This started out as some SNES style mode7 pseudo-3d graphics code + I came up with while working on my TF7 game. The graphics looked + pretty cool, so I started developing a demo around it. + + The codesize ended up being roughly around 8kB, so I thought I'd + make it into an 8k demo. There aren't many out there for the Apple II. + and a Mockingboard sound card. + + The demo tries to hit the lowest common denominator for Apple II systems, + so in theory you could have run this on an Apple II in 1977 if you + were rich enough to afford 48k of RAM. The Mockingboard sound wasn't + available until 1981, but still this all predates the Commodore 64. + +USING: +~~~~~~ + Boot disk on a real system, or emulator with Mockingboard support. + + Applewin works fine (even under Wine on Linux). + MESS does too, it's harder to setup (ROMs) but the audio sounds clearer. + + If you have no emulator you can try one of the online javascript ones. + https://www.scullinsteel.com/apple2/ + + +Hardware: +~~~~~~~~~ + The Apple II has a 6502 processor running at roughly 1.023MHz. + + Early models only shipped with 4k of RAM, but later 48k, 64k, and 128k + systems were common. + + The most common disk drive was the Disk II which typically held + 140k of data (single-sided). + + The only sound available was a bit-banged speaker. No timer, + if you wanted music you had to cycle-count via the CPU. + + Later some sound cards were available. This demo uses the + Mockingboard which has dual AY-3-8910 sound chips. Each + chip provides 3 channels of square waves, with noise and + envelope effects available. + + GRAPHICS + ~~~~~~~~ + + The Apple II had nice graphics for its time, with this time being + around 1977. Otherwise it is quite limited. + Hardware Sprites? No + Linear framebuffer? No + User-defined charset? No + Blanking interrupts? No + Palette selection? No + Hardware scrolling? No + Hardware page flip? Yes + + The hi-res graphics mode was a complex mess of NTSC hacks by Woz. + You got 280x192 graphics, with 6 colors available. However the colors + were from NTSC artifacts and there were limitations on which colors + could be next to each other (in blocks of 3.5 pixels) as well as + fringing. Also the addresses were interleaved, so not a linear + framebuffer. Hi-res page0 is at $2000 and page1 at $4000. + Optionally 4 lines of text can be shown at the bottom of the + screen instead of graphics. + + The lo-res mode is a bit easier to use. It is 40x48 blocks + (40x40 if 4 lines of text are displayed at the bottom). + 15 colors are available, though there is fringing at the edges. + Again the addresses are interleaved. Lo-res page0 is at $400 + and page1 is at $800. + +======================================== +DETAILED STEP-BY-STEP REVIEW OF THE DEMO +======================================== + + BOOTLOADER + ~~~~~~~~~~ + A BASIC "HELLO" program loads the binary. + This just makes things auto-boot at startup, this doesn't count + towards the executable size, you could manually BRUN the 8k program + if you wanted. + + The binary is loaded at $2000 (hi-res page0) and BASIC kicks into + HIRES mode before loading so you can watch as the memory is loaded + from disk in a seemingly random pattern. + + Since this is an 8k demo, the entirety of the program is shown on + the screen (or would be if we POKEd the right address to turn off + the 4 lines of text on the bottom of the screen). + + Execution starts at address $2000 + + DECOMPRESSER + ~~~~~~~~~~~~ + The binary is LZ4 encoded. The decompresser flips to HGR page 1 so + we can watch memory as the program is decompressed. + + The LZ4 decompression code was written by qkumba (Peter Ferrie). + http://pferrie.host22.com/misc/appleii.htm + + The actual program/data decompresses to around 22k starting at $4000. + It over-writes parts of DOS3.3, but since we won't be using the disk + anymore this isn't an issue. + + At the top left corner of the screen you'll see the VMW triangles logo + as it decompresses. To do this I had to put the proper bit pattern + at $4000, $4400, $4800, and $4C00. I mean to have some words too + but ran out of disk space. The bit pattern at $4000 is executable + and is run as code. + + Optimizing for code size inside of a compressed binary is a pain. + Removing instructions sometimes made the binary larger as it no longer + compressed as well. Long runs of values (such as 0 padding) are + essentially free. This was a difficult challenge. + +FADE EFFECT +~~~~~~~~~~~ + The title screen fades in from black. + + This is a software hack, with a lookup table copying from an off-screen + buffer. The Apple II doesn't have any palette support. + +TITLE SCREEN +~~~~~~~~~~~~ + Once things are decompressed, we jump to $4000. We switch to low-res + mode for the rest of the DEMO. + + A background image is loaded from disk. This is RLE encoded (probably + unnecessary when being further LZ4 encoded). + + Why not just load the program at $400 and load the graphics image for + free? Well, remember the graphics are 40x48 (shared with the text). + Really it's 40x24, with each text char mapping to 4-bits top/bottom + for color. Do the math, we have 1k reserved for this mode but 40x24 + is only 960 bytes. It turns out there are "holes" in the address range + that aren't displayed, and various pieces of hardware use these holes + as scratchpad memory. So if you just blindly uncompress graphics data + there you can corrupt the scratchpad. So you have to be careful + when uncompressing to skip the holes. + + The title screen has scrolling text at the bottom. This is nothing fancy, + the text is in a buffer off screen and a 40x4 chunk of RAM is copied in + every so many cycles. + + You might notice that there is tearing/jitter in the scrolling, even + though we are double-buffering the graphics. This is because there is + not a reliable cross-platform way to get the VBLANK info (especially + on older machines) so we are having some bad luck about when we flip + pages. + +MOCKINGBOARD MUSIC +~~~~~~~~~~~~~~~~~~ + I like chiptune music, especially that for AY-3-8910 based systems. + Before obtaining a Mockingboard I built a Raspberry Pi chiptune player + that is essentially the same hardware. + + Most of my sound infrastructure involves YM5 files, which are often used + by ZX Spectrum and ATARI ST users. These are usually register dumps + taken typically at 50Hz. So to play them back you just have to interrupt + 50 times a second and write the registers. + + To program the Mockingboard, each AY-3-8910 chip has 14 sound related + registers that control the 3 channels. Each AY chip has a dedicated + VIA 6522 parallel I/O chip that handles the I/O. + + Doing this quickly enough is a challenge on the Apple II. For each + register you have to do a handshake, set the register # and the value. + This can take upwards of 40 1MHz cycles per register. + + For complex chiptune files (especially those written on an ST with much + faster hardware) it's sometimes not possible to get exact playback + due to the delay. Also one AY is on the left channel and one on the right + so you have to write both if you want sound from both speakers. + + I have a whole suite of code for manipulating YM sound data, in my + vmw-meter git repository. + + The first step for getting this to work is detecting if a mockingboard is + there. This can be in any slot 1-7 on the Apple II, though typically + Slot 4 is standard (in this demo we only check slot 4). + + The board is initialized, and then one of the 6522 timers is set to + interrupt at 25Hz (it has to be an on-board timer as the default + Apple II has no timers). + + Why 25Hz and not 50Hz? At 50Hz with 14 registers you use 700 bytes/s. + So a 2 minute song would take 84k of RAM, much more than is available. + + For this demo I run at 25Hz, and also pack the 14 registers of the data + into 11 (there are various fields that are not packed well, we can + unpack at play time). Also I stripped out the envelope data as many + songs do not use it (so this is a lossy compression method). + + Also, we keep track of the last values written last frame and only + write out to the board if things change, which helps with the latency + a bit. + + The sound quality suffered a bit, but it's hard to fit a catchy chiptune + file in 8K. + + The song being played is a stripped down and re-arranged version of + "Electric Wave" from CC'00 by EA (Ilya Abrosimov). + + +MODE7 BACKGROUND +~~~~~~~~~~~~~~~~ + "MODE7" was a Super Nintendo (SNES) graphics mode that took a tiled + background and transformed it to look as if it was squashed out to + the horizon, giving a 3d look. The SNES did this in hardware, but + in this demo we do this in software. + + As found on Wikipedia, the transform is of the type + + [x'] = [a b]([x]-[x0])+[x0] + [y'] [c d]([y] [y0]) [y0] + + For our code, we managed to reduce things to a small number of additions + and subtractions for each pixel on the screen. Of course the 6502 can't + do floating point, so we do fixed point math. We convert as much as we + can to table lookups that are pre-calculated. We also make liberal use + of self-modifying code. + + Despite all of this there are still some cases where we have to do a + 16bit x 16bit = 32bit multiply, something that is *really* slow on 6502, + around 700 cycles (for a 8.8 x 8.8 fixed point multiply). + + To make this faster we use a method described by Stephen Judd. + + The key to note is that (a+b)^2 = a^2+2ab+b^2 and (a-b)^2=a^2-2ab+b^2 + and if you add them you can simplify to: + (a+b)^2 (a-b)^2 + a*b = --------- - ------- + 4 4 + This is you have a table of squares from 0..511 (all 8-bit a+b and a-b + will fall in this range) then you can convert a multiply into a table + lookup plus a subtract. + + The downsize is you will need 2kB of squares lookup tables (which can + be generated at startup). This reduces the multiply cost to the order + of 200 to 250 cycles. + + By using the fast multiply and a lot of careful optimization you can + generate a Mode7 background in 40x40 graphics mode at about 5 frames/second. + + The engine can be parameterized with different tilesets to use, which we + do to provide both a black+white checkerboard background, as well as the + island background from the TFV game. + +BOUNCING BALL ON CHECKERBOARD +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + What would a demo be without some sort of bouncing geometric shape. + + This is just done with 16 sprites. The sphere was modeled in OpenGL + from a 2000-era game-engine that I never finished. I then took screenshots + and then reduced the size/color to an appropriate value. + + The shadow is also just sprites. + + The clicking noise on bounce is just touching the speaker at $C030. + It's mostly there to give some sound effects for those playing the demo + without a mockingboard. + +TFV SPACESHIP FLYING +~~~~~~~~~~~~~~~~~~~~ + The spaceship, water splash, and shadows are all sprites. This is all + done in software, the Apple II has no sprite hardware. + + This is the TFV game engine flying-spaceship code, with the keyboard + routines replaced to read from memory instead (sort of like a script + of what to do when). + +STARFIELD +~~~~~~~~~ + The starfield is your typical starfield code. Only 16 stars are modeled. + It re-uses the fast-multiply code from the mode7 graphics. + + Random number generation is not fast on the 6502, so we cheat. + Originally we had a 256-byte blob of "random" values generated earlier. + + This wasted space, so now instead we just treat the executable code + at $5000 as if it were a block of random numbers. This was arbitrarily + chosen, I tried different areas of memory until I got one where the + stars seemed to move in a pleasing pattern. + + A simple state machine controls if the stars move or not, whether the + background is cleared or not (the streak effect) and what color the + background is (for the blue flash). + + The ship moving to the distance is just done with different sized sprites. + +RASTERBARS/CREDITS +~~~~~~~~~~~~~~~~~~ + + The credits happen with the starfield continuing to run. + + The text is written in the bottom 4 lines of the screen. Some inverse-mode + space characters are used to try to make it look like graphics are surrounding + the text. It's actually possible with careful cycle counting to switch + modes fast enough to have actual mixed graphics/text (See the FrenchTouch + demos) but I was too lazy to attempt that here. + + The rasterbar effect isn't really rasterbars, it's just a rainbow assortment + of lines being drawn with a SINEWAVE lookup table. + + It's the same rasterbar code from my chiptune player demo. I ended up + optimizing it a lot via inlining and a few other ways because it turned + out just drawing a horizontal line can take a very long time. + + The rotating text is just taking the output string and rapidly rotating the + character values through the ASCII table. + + The annoying clicking noise is the same speaker effect caused by hitting + $C030. + + Choosing who to thank ended up being extremely critical to fitting in 8kB, + as unique text strings do not compress well. I'm also still not satisfied + with how the centering looks. + + Memory Map ========== @@ -35,51 +360,3 @@ Memory Map |zero pg | 0.25 ------- $0000 -============================================= -Getting the VMW logo to appear on page2 HGR -============================================== - -; Need to have lines at - ; $4000 AA,AD,D5,AC,95 - ; $4400 A8,D5,95,35,85 1k - ; $4800 A0,55,26,55,81 2k - ; $4C00 00,00,00,00,00 3k - - -MAIN: 0000 - 013A = 0x13A = 314 -.include "deater.scrolltext" 13DF - 1577 = 0x198 = 408 -.include "a2.scrolltext" 1577 - 1695 = 0x11E = 286 - ============= - 1008 - -.include "starfield_demo.s" 1695 - 19Ac = 0x317 = 791 -.include "rasterbars.s" 19AC - 1A9E = 0xF2 = 242 - ============= - 1033 - -.include "../asm_routines/gr_fast_clear.s" 01B6 - 02A0 = 0xEA = 234 -.include "credits.s" 1A9E - 1CEA = 0x257 = 599 -.include "interrupt_handler.s" 1CEA - 1DE3 = 0xD9 = 217 - =================== - 1050 -3D (61) too many, want 173 - - -.include "../asm_routines/gr_unrle.s" 013A - 01B6 - -.include "../asm_routines/gr_hlin.s" 02A0 - 02FD -.include "../asm_routines/gr_setpage.s" 02FD - 0311 -.include "../asm_routines/pageflip.s" 0311 - 032B -.include "../asm_routines/gr_fade.s" 032B - 0459 -.include "../asm_routines/gr_copy.s" 0459 - 0491 -.include "../asm_routines/gr_scroll.s" 0491 - 0565 = 0xC5 = 197 -.include "../asm_routines/gr_offsets.s" 0565 - 0595 -.include "../asm_routines/gr_plot.s" 0595 - 05C7 -.include "../asm_routines/text_print.s" 05C7 - 060F - -.include "../asm_routines/mockingboard_a.s" 060F - 06BC = 0xAD = 173 - -.include "mode7.s" 06BC - 1201 = 0xB43 = 2883 - -.include "mode7_demo_backgrounds.inc" 1201 - 13DF = 0x1DE = 478 -