dos33fsprogs/mode7_demo/README.mode7_demo
2018-03-20 00:19:15 -04:00

363 lines
14 KiB
Plaintext

Challenges found writing an 8k Lores Apple II Demo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
by DEATER (Vince Weaver, vince@deater.net)
http://www.deater.net/weave/vmwprod/mode7_demo/
====================================================
19 March 2018
GOAL:
~~~~~
This started out as some SNES style mode7 pseudo-3d graphics code
I came up with while working on my TF7 game. The graphics looked
pretty cool, so I started developing a demo around it.
The codesize ended up being roughly around 8kB, so I thought I'd
make it into an 8k demo. There aren't many out there for the Apple II.
and a Mockingboard sound card.
The demo tries to hit the lowest common denominator for Apple II systems,
so in theory you could have run this on an Apple II in 1977 if you
were rich enough to afford 48k of RAM. The Mockingboard sound wasn't
available until 1981, but still this all predates the Commodore 64.
USING:
~~~~~~
Boot disk on a real system, or emulator with Mockingboard support.
Applewin works fine (even under Wine on Linux).
MESS does too, it's harder to setup (ROMs) but the audio sounds clearer.
If you have no emulator you can try one of the online javascript ones.
https://www.scullinsteel.com/apple2/
Hardware:
~~~~~~~~~
The Apple II has a 6502 processor running at roughly 1.023MHz.
Early models only shipped with 4k of RAM, but later 48k, 64k, and 128k
systems were common.
The most common disk drive was the Disk II which typically held
140k of data (single-sided).
The only sound available was a bit-banged speaker. No timer,
if you wanted music you had to cycle-count via the CPU.
Later some sound cards were available. This demo uses the
Mockingboard which has dual AY-3-8910 sound chips. Each
chip provides 3 channels of square waves, with noise and
envelope effects available.
GRAPHICS
~~~~~~~~
The Apple II had nice graphics for its time, with this time being
around 1977. Otherwise it is quite limited.
Hardware Sprites? No
Linear framebuffer? No
User-defined charset? No
Blanking interrupts? No
Palette selection? No
Hardware scrolling? No
Hardware page flip? Yes
The hi-res graphics mode was a complex mess of NTSC hacks by Woz.
You got 280x192 graphics, with 6 colors available. However the colors
were from NTSC artifacts and there were limitations on which colors
could be next to each other (in blocks of 3.5 pixels) as well as
fringing. Also the addresses were interleaved, so not a linear
framebuffer. Hi-res page0 is at $2000 and page1 at $4000.
Optionally 4 lines of text can be shown at the bottom of the
screen instead of graphics.
The lo-res mode is a bit easier to use. It is 40x48 blocks
(40x40 if 4 lines of text are displayed at the bottom).
15 colors are available, though there is fringing at the edges.
Again the addresses are interleaved. Lo-res page0 is at $400
and page1 is at $800.
========================================
DETAILED STEP-BY-STEP REVIEW OF THE DEMO
========================================
BOOTLOADER
~~~~~~~~~~
A BASIC "HELLO" program loads the binary.
This just makes things auto-boot at startup, this doesn't count
towards the executable size, you could manually BRUN the 8k program
if you wanted.
The binary is loaded at $2000 (hi-res page0) and BASIC kicks into
HIRES mode before loading so you can watch as the memory is loaded
from disk in a seemingly random pattern.
Since this is an 8k demo, the entirety of the program is shown on
the screen (or would be if we POKEd the right address to turn off
the 4 lines of text on the bottom of the screen).
Execution starts at address $2000
DECOMPRESSER
~~~~~~~~~~~~
The binary is LZ4 encoded. The decompresser flips to HGR page 1 so
we can watch memory as the program is decompressed.
The LZ4 decompression code was written by qkumba (Peter Ferrie).
http://pferrie.host22.com/misc/appleii.htm
The actual program/data decompresses to around 22k starting at $4000.
It over-writes parts of DOS3.3, but since we won't be using the disk
anymore this isn't an issue.
At the top left corner of the screen you'll see the VMW triangles logo
as it decompresses. To do this I had to put the proper bit pattern
at $4000, $4400, $4800, and $4C00. I mean to have some words too
but ran out of disk space. The bit pattern at $4000 is executable
and is run as code.
Optimizing for code size inside of a compressed binary is a pain.
Removing instructions sometimes made the binary larger as it no longer
compressed as well. Long runs of values (such as 0 padding) are
essentially free. This was a difficult challenge.
FADE EFFECT
~~~~~~~~~~~
The title screen fades in from black.
This is a software hack, with a lookup table copying from an off-screen
buffer. The Apple II doesn't have any palette support.
TITLE SCREEN
~~~~~~~~~~~~
Once things are decompressed, we jump to $4000. We switch to low-res
mode for the rest of the DEMO.
A background image is loaded from disk. This is RLE encoded (probably
unnecessary when being further LZ4 encoded).
Why not just load the program at $400 and load the graphics image for
free? Well, remember the graphics are 40x48 (shared with the text).
Really it's 40x24, with each text char mapping to 4-bits top/bottom
for color. Do the math, we have 1k reserved for this mode but 40x24
is only 960 bytes. It turns out there are "holes" in the address range
that aren't displayed, and various pieces of hardware use these holes
as scratchpad memory. So if you just blindly uncompress graphics data
there you can corrupt the scratchpad. So you have to be careful
when uncompressing to skip the holes.
The title screen has scrolling text at the bottom. This is nothing fancy,
the text is in a buffer off screen and a 40x4 chunk of RAM is copied in
every so many cycles.
You might notice that there is tearing/jitter in the scrolling, even
though we are double-buffering the graphics. This is because there is
not a reliable cross-platform way to get the VBLANK info (especially
on older machines) so we are having some bad luck about when we flip
pages.
MOCKINGBOARD MUSIC
~~~~~~~~~~~~~~~~~~
I like chiptune music, especially that for AY-3-8910 based systems.
Before obtaining a Mockingboard I built a Raspberry Pi chiptune player
that is essentially the same hardware.
Most of my sound infrastructure involves YM5 files, which are often used
by ZX Spectrum and ATARI ST users. These are usually register dumps
taken typically at 50Hz. So to play them back you just have to interrupt
50 times a second and write the registers.
To program the Mockingboard, each AY-3-8910 chip has 14 sound related
registers that control the 3 channels. Each AY chip has a dedicated
VIA 6522 parallel I/O chip that handles the I/O.
Doing this quickly enough is a challenge on the Apple II. For each
register you have to do a handshake, set the register # and the value.
This can take upwards of 40 1MHz cycles per register.
For complex chiptune files (especially those written on an ST with much
faster hardware) it's sometimes not possible to get exact playback
due to the delay. Also one AY is on the left channel and one on the right
so you have to write both if you want sound from both speakers.
I have a whole suite of code for manipulating YM sound data, in my
vmw-meter git repository.
The first step for getting this to work is detecting if a mockingboard is
there. This can be in any slot 1-7 on the Apple II, though typically
Slot 4 is standard (in this demo we only check slot 4).
The board is initialized, and then one of the 6522 timers is set to
interrupt at 25Hz (it has to be an on-board timer as the default
Apple II has no timers).
Why 25Hz and not 50Hz? At 50Hz with 14 registers you use 700 bytes/s.
So a 2 minute song would take 84k of RAM, much more than is available.
For this demo I run at 25Hz, and also pack the 14 registers of the data
into 11 (there are various fields that are not packed well, we can
unpack at play time). Also I stripped out the envelope data as many
songs do not use it (so this is a lossy compression method).
Also, we keep track of the last values written last frame and only
write out to the board if things change, which helps with the latency
a bit.
The sound quality suffered a bit, but it's hard to fit a catchy chiptune
file in 8K.
The song being played is a stripped down and re-arranged version of
"Electric Wave" from CC'00 by EA (Ilya Abrosimov).
MODE7 BACKGROUND
~~~~~~~~~~~~~~~~
"MODE7" was a Super Nintendo (SNES) graphics mode that took a tiled
background and transformed it to look as if it was squashed out to
the horizon, giving a 3d look. The SNES did this in hardware, but
in this demo we do this in software.
As found on Wikipedia, the transform is of the type
[x'] = [a b]([x]-[x0])+[x0]
[y'] [c d]([y] [y0]) [y0]
For our code, we managed to reduce things to a small number of additions
and subtractions for each pixel on the screen. Of course the 6502 can't
do floating point, so we do fixed point math. We convert as much as we
can to table lookups that are pre-calculated. We also make liberal use
of self-modifying code.
Despite all of this there are still some cases where we have to do a
16bit x 16bit = 32bit multiply, something that is *really* slow on 6502,
around 700 cycles (for a 8.8 x 8.8 fixed point multiply).
To make this faster we use a method described by Stephen Judd.
The key to note is that (a+b)^2 = a^2+2ab+b^2 and (a-b)^2=a^2-2ab+b^2
and if you add them you can simplify to:
(a+b)^2 (a-b)^2
a*b = --------- - -------
4 4
This is you have a table of squares from 0..511 (all 8-bit a+b and a-b
will fall in this range) then you can convert a multiply into a table
lookup plus a subtract.
The downsize is you will need 2kB of squares lookup tables (which can
be generated at startup). This reduces the multiply cost to the order
of 200 to 250 cycles.
By using the fast multiply and a lot of careful optimization you can
generate a Mode7 background in 40x40 graphics mode at about 5 frames/second.
The engine can be parameterized with different tilesets to use, which we
do to provide both a black+white checkerboard background, as well as the
island background from the TFV game.
BOUNCING BALL ON CHECKERBOARD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What would a demo be without some sort of bouncing geometric shape.
This is just done with 16 sprites. The sphere was modeled in OpenGL
from a 2000-era game-engine that I never finished. I then took screenshots
and then reduced the size/color to an appropriate value.
The shadow is also just sprites.
The clicking noise on bounce is just touching the speaker at $C030.
It's mostly there to give some sound effects for those playing the demo
without a mockingboard.
TFV SPACESHIP FLYING
~~~~~~~~~~~~~~~~~~~~
The spaceship, water splash, and shadows are all sprites. This is all
done in software, the Apple II has no sprite hardware.
This is the TFV game engine flying-spaceship code, with the keyboard
routines replaced to read from memory instead (sort of like a script
of what to do when).
STARFIELD
~~~~~~~~~
The starfield is your typical starfield code. Only 16 stars are modeled.
It re-uses the fast-multiply code from the mode7 graphics.
Random number generation is not fast on the 6502, so we cheat.
Originally we had a 256-byte blob of "random" values generated earlier.
This wasted space, so now instead we just treat the executable code
at $5000 as if it were a block of random numbers. This was arbitrarily
chosen, I tried different areas of memory until I got one where the
stars seemed to move in a pleasing pattern.
A simple state machine controls if the stars move or not, whether the
background is cleared or not (the streak effect) and what color the
background is (for the blue flash).
The ship moving to the distance is just done with different sized sprites.
RASTERBARS/CREDITS
~~~~~~~~~~~~~~~~~~
The credits happen with the starfield continuing to run.
The text is written in the bottom 4 lines of the screen. Some inverse-mode
space characters are used to try to make it look like graphics are surrounding
the text. It's actually possible with careful cycle counting to switch
modes fast enough to have actual mixed graphics/text (See the FrenchTouch
demos) but I was too lazy to attempt that here.
The rasterbar effect isn't really rasterbars, it's just a rainbow assortment
of lines being drawn with a SINEWAVE lookup table.
It's the same rasterbar code from my chiptune player demo. I ended up
optimizing it a lot via inlining and a few other ways because it turned
out just drawing a horizontal line can take a very long time.
The rotating text is just taking the output string and rapidly rotating the
character values through the ASCII table.
The annoying clicking noise is the same speaker effect caused by hitting
$C030.
Choosing who to thank ended up being extremely critical to fitting in 8kB,
as unique text strings do not compress well. I'm also still not satisfied
with how the centering looks.
Memory Map
==========
(not to scale)
-------- $ffff
| ROM/IO |
-------- $c000
| | 32k decompress
-------- $4000
| load | 8k
-------- $2000
| free |
-------- $1c00
| Scroll |
| Data |
-------- $1800
|Multiply|
| Tables |
-------- $1000
|GR pg 2 | 1k
|-------- $0c00
|GR pg 1 | 1k
|-------- $0800
|GR pg 0 | 1k
-------- $0400
| | 0.5
-------- $0200
| stack | 0.25
-------- $0100
|zero pg | 0.25
------- $0000