mirror of
https://github.com/deater/dos33fsprogs.git
synced 2025-01-25 09:30:38 +00:00
Challenges found writing an 8k Lores Apple II Demo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ by DEATER (Vince Weaver, vince@deater.net) http://www.deater.net/weave/vmwprod/mode7_demo/ ==================================================== 19 March 2018 GOAL: ~~~~~ This started out as some SNES style mode7 pseudo-3d graphics code I came up with while working on my TF7 game. The graphics looked pretty cool, so I started developing a demo around it. The codesize ended up being roughly around 8kB, so I thought I'd make it into an 8k demo. There aren't many out there for the Apple II. and a Mockingboard sound card. The demo tries to hit the lowest common denominator for Apple II systems, so in theory you could have run this on an Apple II in 1977 if you were rich enough to afford 48k of RAM. The Mockingboard sound wasn't available until 1981, but still this all predates the Commodore 64. USING: ~~~~~~ Boot disk on a real system, or emulator with Mockingboard support. Applewin works fine (even under Wine on Linux). MESS does too, it's harder to setup (ROMs) but the audio sounds clearer. If you have no emulator you can try one of the online javascript ones. https://www.scullinsteel.com/apple2/ Hardware: ~~~~~~~~~ The Apple II has a 6502 processor running at roughly 1.023MHz. Early models only shipped with 4k of RAM, but later 48k, 64k, and 128k systems were common. The most common disk drive was the Disk II which typically held 140k of data (single-sided). The only sound available was a bit-banged speaker. No timer, if you wanted music you had to cycle-count via the CPU. Later some sound cards were available. This demo uses the Mockingboard which has dual AY-3-8910 sound chips. Each chip provides 3 channels of square waves, with noise and envelope effects available. GRAPHICS ~~~~~~~~ The Apple II had nice graphics for its time, with this time being around 1977. Otherwise it is quite limited. Hardware Sprites? No Linear framebuffer? No User-defined charset? No Blanking interrupts? No Palette selection? No Hardware scrolling? No Hardware page flip? Yes The hi-res graphics mode was a complex mess of NTSC hacks by Woz. You got 280x192 graphics, with 6 colors available. However the colors were from NTSC artifacts and there were limitations on which colors could be next to each other (in blocks of 3.5 pixels) as well as fringing. Also the addresses were interleaved, so not a linear framebuffer. Hi-res page0 is at $2000 and page1 at $4000. Optionally 4 lines of text can be shown at the bottom of the screen instead of graphics. The lo-res mode is a bit easier to use. It is 40x48 blocks (40x40 if 4 lines of text are displayed at the bottom). 15 colors are available, though there is fringing at the edges. Again the addresses are interleaved. Lo-res page0 is at $400 and page1 is at $800. ======================================== DETAILED STEP-BY-STEP REVIEW OF THE DEMO ======================================== BOOTLOADER ~~~~~~~~~~ A BASIC "HELLO" program loads the binary. This just makes things auto-boot at startup, this doesn't count towards the executable size, you could manually BRUN the 8k program if you wanted. The binary is loaded at $2000 (hi-res page0) and BASIC kicks into HIRES mode before loading so you can watch as the memory is loaded from disk in a seemingly random pattern. Since this is an 8k demo, the entirety of the program is shown on the screen (or would be if we POKEd the right address to turn off the 4 lines of text on the bottom of the screen). Execution starts at address $2000 DECOMPRESSER ~~~~~~~~~~~~ The binary is LZ4 encoded. The decompresser flips to HGR page 1 so we can watch memory as the program is decompressed. The LZ4 decompression code was written by qkumba (Peter Ferrie). http://pferrie.host22.com/misc/appleii.htm The actual program/data decompresses to around 22k starting at $4000. It over-writes parts of DOS3.3, but since we won't be using the disk anymore this isn't an issue. At the top left corner of the screen you'll see the VMW triangles logo as it decompresses. To do this I had to put the proper bit pattern at $4000, $4400, $4800, and $4C00. I mean to have some words too but ran out of disk space. The bit pattern at $4000 is executable and is run as code. Optimizing for code size inside of a compressed binary is a pain. Removing instructions sometimes made the binary larger as it no longer compressed as well. Long runs of values (such as 0 padding) are essentially free. This was a difficult challenge. FADE EFFECT ~~~~~~~~~~~ The title screen fades in from black. This is a software hack, with a lookup table copying from an off-screen buffer. The Apple II doesn't have any palette support. TITLE SCREEN ~~~~~~~~~~~~ Once things are decompressed, we jump to $4000. We switch to low-res mode for the rest of the DEMO. A background image is loaded from disk. This is RLE encoded (probably unnecessary when being further LZ4 encoded). Why not just load the program at $400 and load the graphics image for free? Well, remember the graphics are 40x48 (shared with the text). Really it's 40x24, with each text char mapping to 4-bits top/bottom for color. Do the math, we have 1k reserved for this mode but 40x24 is only 960 bytes. It turns out there are "holes" in the address range that aren't displayed, and various pieces of hardware use these holes as scratchpad memory. So if you just blindly uncompress graphics data there you can corrupt the scratchpad. So you have to be careful when uncompressing to skip the holes. The title screen has scrolling text at the bottom. This is nothing fancy, the text is in a buffer off screen and a 40x4 chunk of RAM is copied in every so many cycles. You might notice that there is tearing/jitter in the scrolling, even though we are double-buffering the graphics. This is because there is not a reliable cross-platform way to get the VBLANK info (especially on older machines) so we are having some bad luck about when we flip pages. MOCKINGBOARD MUSIC ~~~~~~~~~~~~~~~~~~ I like chiptune music, especially that for AY-3-8910 based systems. Before obtaining a Mockingboard I built a Raspberry Pi chiptune player that is essentially the same hardware. Most of my sound infrastructure involves YM5 files, which are often used by ZX Spectrum and ATARI ST users. These are usually register dumps taken typically at 50Hz. So to play them back you just have to interrupt 50 times a second and write the registers. To program the Mockingboard, each AY-3-8910 chip has 14 sound related registers that control the 3 channels. Each AY chip has a dedicated VIA 6522 parallel I/O chip that handles the I/O. Doing this quickly enough is a challenge on the Apple II. For each register you have to do a handshake, set the register # and the value. This can take upwards of 40 1MHz cycles per register. For complex chiptune files (especially those written on an ST with much faster hardware) it's sometimes not possible to get exact playback due to the delay. Also one AY is on the left channel and one on the right so you have to write both if you want sound from both speakers. I have a whole suite of code for manipulating YM sound data, in my vmw-meter git repository. The first step for getting this to work is detecting if a mockingboard is there. This can be in any slot 1-7 on the Apple II, though typically Slot 4 is standard (in this demo we only check slot 4). The board is initialized, and then one of the 6522 timers is set to interrupt at 25Hz (it has to be an on-board timer as the default Apple II has no timers). Why 25Hz and not 50Hz? At 50Hz with 14 registers you use 700 bytes/s. So a 2 minute song would take 84k of RAM, much more than is available. For this demo I run at 25Hz, and also pack the 14 registers of the data into 11 (there are various fields that are not packed well, we can unpack at play time). Also I stripped out the envelope data as many songs do not use it (so this is a lossy compression method). Also, we keep track of the last values written last frame and only write out to the board if things change, which helps with the latency a bit. The sound quality suffered a bit, but it's hard to fit a catchy chiptune file in 8K. The song being played is a stripped down and re-arranged version of "Electric Wave" from CC'00 by EA (Ilya Abrosimov). MODE7 BACKGROUND ~~~~~~~~~~~~~~~~ "MODE7" was a Super Nintendo (SNES) graphics mode that took a tiled background and transformed it to look as if it was squashed out to the horizon, giving a 3d look. The SNES did this in hardware, but in this demo we do this in software. As found on Wikipedia, the transform is of the type [x'] = [a b]([x]-[x0])+[x0] [y'] [c d]([y] [y0]) [y0] For our code, we managed to reduce things to a small number of additions and subtractions for each pixel on the screen. Of course the 6502 can't do floating point, so we do fixed point math. We convert as much as we can to table lookups that are pre-calculated. We also make liberal use of self-modifying code. Despite all of this there are still some cases where we have to do a 16bit x 16bit = 32bit multiply, something that is *really* slow on 6502, around 700 cycles (for a 8.8 x 8.8 fixed point multiply). To make this faster we use a method described by Stephen Judd. The key to note is that (a+b)^2 = a^2+2ab+b^2 and (a-b)^2=a^2-2ab+b^2 and if you add them you can simplify to: (a+b)^2 (a-b)^2 a*b = --------- - ------- 4 4 This is you have a table of squares from 0..511 (all 8-bit a+b and a-b will fall in this range) then you can convert a multiply into a table lookup plus a subtract. The downsize is you will need 2kB of squares lookup tables (which can be generated at startup). This reduces the multiply cost to the order of 200 to 250 cycles. By using the fast multiply and a lot of careful optimization you can generate a Mode7 background in 40x40 graphics mode at about 5 frames/second. The engine can be parameterized with different tilesets to use, which we do to provide both a black+white checkerboard background, as well as the island background from the TFV game. BOUNCING BALL ON CHECKERBOARD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What would a demo be without some sort of bouncing geometric shape. This is just done with 16 sprites. The sphere was modeled in OpenGL from a 2000-era game-engine that I never finished. I then took screenshots and then reduced the size/color to an appropriate value. The shadow is also just sprites. The clicking noise on bounce is just touching the speaker at $C030. It's mostly there to give some sound effects for those playing the demo without a mockingboard. TFV SPACESHIP FLYING ~~~~~~~~~~~~~~~~~~~~ The spaceship, water splash, and shadows are all sprites. This is all done in software, the Apple II has no sprite hardware. This is the TFV game engine flying-spaceship code, with the keyboard routines replaced to read from memory instead (sort of like a script of what to do when). STARFIELD ~~~~~~~~~ The starfield is your typical starfield code. Only 16 stars are modeled. It re-uses the fast-multiply code from the mode7 graphics. Random number generation is not fast on the 6502, so we cheat. Originally we had a 256-byte blob of "random" values generated earlier. This wasted space, so now instead we just treat the executable code at $5000 as if it were a block of random numbers. This was arbitrarily chosen, I tried different areas of memory until I got one where the stars seemed to move in a pleasing pattern. A simple state machine controls if the stars move or not, whether the background is cleared or not (the streak effect) and what color the background is (for the blue flash). The ship moving to the distance is just done with different sized sprites. RASTERBARS/CREDITS ~~~~~~~~~~~~~~~~~~ The credits happen with the starfield continuing to run. The text is written in the bottom 4 lines of the screen. Some inverse-mode space characters are used to try to make it look like graphics are surrounding the text. It's actually possible with careful cycle counting to switch modes fast enough to have actual mixed graphics/text (See the FrenchTouch demos) but I was too lazy to attempt that here. The rasterbar effect isn't really rasterbars, it's just a rainbow assortment of lines being drawn with a SINEWAVE lookup table. It's the same rasterbar code from my chiptune player demo. I ended up optimizing it a lot via inlining and a few other ways because it turned out just drawing a horizontal line can take a very long time. The rotating text is just taking the output string and rapidly rotating the character values through the ASCII table. The annoying clicking noise is the same speaker effect caused by hitting $C030. Choosing who to thank ended up being extremely critical to fitting in 8kB, as unique text strings do not compress well. I'm also still not satisfied with how the centering looks. Memory Map ========== (not to scale) -------- $ffff | ROM/IO | -------- $c000 | | 32k decompress -------- $4000 | load | 8k -------- $2000 | free | -------- $1c00 | Scroll | | Data | -------- $1800 |Multiply| | Tables | -------- $1000 |GR pg 2 | 1k |-------- $0c00 |GR pg 1 | 1k |-------- $0800 |GR pg 0 | 1k -------- $0400 | | 0.5 -------- $0200 | stack | 0.25 -------- $0100 |zero pg | 0.25 ------- $0000