mirror of
https://github.com/deater/dos33fsprogs.git
synced 2024-11-03 14:05:58 +00:00
363 lines
14 KiB
Plaintext
363 lines
14 KiB
Plaintext
Challenges found writing an 8k Lores Apple II Demo
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
by DEATER (Vince Weaver, vince@deater.net)
|
|
|
|
http://www.deater.net/weave/vmwprod/mode7_demo/
|
|
====================================================
|
|
19 March 2018
|
|
|
|
GOAL:
|
|
~~~~~
|
|
This started out as some SNES style mode7 pseudo-3d graphics code
|
|
I came up with while working on my TF7 game. The graphics looked
|
|
pretty cool, so I started developing a demo around it.
|
|
|
|
The codesize ended up being roughly around 8kB, so I thought I'd
|
|
make it into an 8k demo. There aren't many out there for the Apple II.
|
|
and a Mockingboard sound card.
|
|
|
|
The demo tries to hit the lowest common denominator for Apple II systems,
|
|
so in theory you could have run this on an Apple II in 1977 if you
|
|
were rich enough to afford 48k of RAM. The Mockingboard sound wasn't
|
|
available until 1981, but still this all predates the Commodore 64.
|
|
|
|
USING:
|
|
~~~~~~
|
|
Boot disk on a real system, or emulator with Mockingboard support.
|
|
|
|
Applewin works fine (even under Wine on Linux).
|
|
MESS does too, it's harder to setup (ROMs) but the audio sounds clearer.
|
|
|
|
If you have no emulator you can try one of the online javascript ones.
|
|
https://www.scullinsteel.com/apple2/
|
|
|
|
|
|
Hardware:
|
|
~~~~~~~~~
|
|
The Apple II has a 6502 processor running at roughly 1.023MHz.
|
|
|
|
Early models only shipped with 4k of RAM, but later 48k, 64k, and 128k
|
|
systems were common.
|
|
|
|
The most common disk drive was the Disk II which typically held
|
|
140k of data (single-sided).
|
|
|
|
The only sound available was a bit-banged speaker. No timer,
|
|
if you wanted music you had to cycle-count via the CPU.
|
|
|
|
Later some sound cards were available. This demo uses the
|
|
Mockingboard which has dual AY-3-8910 sound chips. Each
|
|
chip provides 3 channels of square waves, with noise and
|
|
envelope effects available.
|
|
|
|
GRAPHICS
|
|
~~~~~~~~
|
|
|
|
The Apple II had nice graphics for its time, with this time being
|
|
around 1977. Otherwise it is quite limited.
|
|
Hardware Sprites? No
|
|
Linear framebuffer? No
|
|
User-defined charset? No
|
|
Blanking interrupts? No
|
|
Palette selection? No
|
|
Hardware scrolling? No
|
|
Hardware page flip? Yes
|
|
|
|
The hi-res graphics mode was a complex mess of NTSC hacks by Woz.
|
|
You got 280x192 graphics, with 6 colors available. However the colors
|
|
were from NTSC artifacts and there were limitations on which colors
|
|
could be next to each other (in blocks of 3.5 pixels) as well as
|
|
fringing. Also the addresses were interleaved, so not a linear
|
|
framebuffer. Hi-res page0 is at $2000 and page1 at $4000.
|
|
Optionally 4 lines of text can be shown at the bottom of the
|
|
screen instead of graphics.
|
|
|
|
The lo-res mode is a bit easier to use. It is 40x48 blocks
|
|
(40x40 if 4 lines of text are displayed at the bottom).
|
|
15 colors are available, though there is fringing at the edges.
|
|
Again the addresses are interleaved. Lo-res page0 is at $400
|
|
and page1 is at $800.
|
|
|
|
========================================
|
|
DETAILED STEP-BY-STEP REVIEW OF THE DEMO
|
|
========================================
|
|
|
|
BOOTLOADER
|
|
~~~~~~~~~~
|
|
A BASIC "HELLO" program loads the binary.
|
|
This just makes things auto-boot at startup, this doesn't count
|
|
towards the executable size, you could manually BRUN the 8k program
|
|
if you wanted.
|
|
|
|
The binary is loaded at $2000 (hi-res page0) and BASIC kicks into
|
|
HIRES mode before loading so you can watch as the memory is loaded
|
|
from disk in a seemingly random pattern.
|
|
|
|
Since this is an 8k demo, the entirety of the program is shown on
|
|
the screen (or would be if we POKEd the right address to turn off
|
|
the 4 lines of text on the bottom of the screen).
|
|
|
|
Execution starts at address $2000
|
|
|
|
DECOMPRESSER
|
|
~~~~~~~~~~~~
|
|
The binary is LZ4 encoded. The decompresser flips to HGR page 1 so
|
|
we can watch memory as the program is decompressed.
|
|
|
|
The LZ4 decompression code was written by qkumba (Peter Ferrie).
|
|
http://pferrie.host22.com/misc/appleii.htm
|
|
|
|
The actual program/data decompresses to around 22k starting at $4000.
|
|
It over-writes parts of DOS3.3, but since we won't be using the disk
|
|
anymore this isn't an issue.
|
|
|
|
At the top left corner of the screen you'll see the VMW triangles logo
|
|
as it decompresses. To do this I had to put the proper bit pattern
|
|
at $4000, $4400, $4800, and $4C00. I mean to have some words too
|
|
but ran out of disk space. The bit pattern at $4000 is executable
|
|
and is run as code.
|
|
|
|
Optimizing for code size inside of a compressed binary is a pain.
|
|
Removing instructions sometimes made the binary larger as it no longer
|
|
compressed as well. Long runs of values (such as 0 padding) are
|
|
essentially free. This was a difficult challenge.
|
|
|
|
FADE EFFECT
|
|
~~~~~~~~~~~
|
|
The title screen fades in from black.
|
|
|
|
This is a software hack, with a lookup table copying from an off-screen
|
|
buffer. The Apple II doesn't have any palette support.
|
|
|
|
TITLE SCREEN
|
|
~~~~~~~~~~~~
|
|
Once things are decompressed, we jump to $4000. We switch to low-res
|
|
mode for the rest of the DEMO.
|
|
|
|
A background image is loaded from disk. This is RLE encoded (probably
|
|
unnecessary when being further LZ4 encoded).
|
|
|
|
Why not just load the program at $400 and load the graphics image for
|
|
free? Well, remember the graphics are 40x48 (shared with the text).
|
|
Really it's 40x24, with each text char mapping to 4-bits top/bottom
|
|
for color. Do the math, we have 1k reserved for this mode but 40x24
|
|
is only 960 bytes. It turns out there are "holes" in the address range
|
|
that aren't displayed, and various pieces of hardware use these holes
|
|
as scratchpad memory. So if you just blindly uncompress graphics data
|
|
there you can corrupt the scratchpad. So you have to be careful
|
|
when uncompressing to skip the holes.
|
|
|
|
The title screen has scrolling text at the bottom. This is nothing fancy,
|
|
the text is in a buffer off screen and a 40x4 chunk of RAM is copied in
|
|
every so many cycles.
|
|
|
|
You might notice that there is tearing/jitter in the scrolling, even
|
|
though we are double-buffering the graphics. This is because there is
|
|
not a reliable cross-platform way to get the VBLANK info (especially
|
|
on older machines) so we are having some bad luck about when we flip
|
|
pages.
|
|
|
|
MOCKINGBOARD MUSIC
|
|
~~~~~~~~~~~~~~~~~~
|
|
I like chiptune music, especially that for AY-3-8910 based systems.
|
|
Before obtaining a Mockingboard I built a Raspberry Pi chiptune player
|
|
that is essentially the same hardware.
|
|
|
|
Most of my sound infrastructure involves YM5 files, which are often used
|
|
by ZX Spectrum and ATARI ST users. These are usually register dumps
|
|
taken typically at 50Hz. So to play them back you just have to interrupt
|
|
50 times a second and write the registers.
|
|
|
|
To program the Mockingboard, each AY-3-8910 chip has 14 sound related
|
|
registers that control the 3 channels. Each AY chip has a dedicated
|
|
VIA 6522 parallel I/O chip that handles the I/O.
|
|
|
|
Doing this quickly enough is a challenge on the Apple II. For each
|
|
register you have to do a handshake, set the register # and the value.
|
|
This can take upwards of 40 1MHz cycles per register.
|
|
|
|
For complex chiptune files (especially those written on an ST with much
|
|
faster hardware) it's sometimes not possible to get exact playback
|
|
due to the delay. Also one AY is on the left channel and one on the right
|
|
so you have to write both if you want sound from both speakers.
|
|
|
|
I have a whole suite of code for manipulating YM sound data, in my
|
|
vmw-meter git repository.
|
|
|
|
The first step for getting this to work is detecting if a mockingboard is
|
|
there. This can be in any slot 1-7 on the Apple II, though typically
|
|
Slot 4 is standard (in this demo we only check slot 4).
|
|
|
|
The board is initialized, and then one of the 6522 timers is set to
|
|
interrupt at 25Hz (it has to be an on-board timer as the default
|
|
Apple II has no timers).
|
|
|
|
Why 25Hz and not 50Hz? At 50Hz with 14 registers you use 700 bytes/s.
|
|
So a 2 minute song would take 84k of RAM, much more than is available.
|
|
|
|
For this demo I run at 25Hz, and also pack the 14 registers of the data
|
|
into 11 (there are various fields that are not packed well, we can
|
|
unpack at play time). Also I stripped out the envelope data as many
|
|
songs do not use it (so this is a lossy compression method).
|
|
|
|
Also, we keep track of the last values written last frame and only
|
|
write out to the board if things change, which helps with the latency
|
|
a bit.
|
|
|
|
The sound quality suffered a bit, but it's hard to fit a catchy chiptune
|
|
file in 8K.
|
|
|
|
The song being played is a stripped down and re-arranged version of
|
|
"Electric Wave" from CC'00 by EA (Ilya Abrosimov).
|
|
|
|
|
|
MODE7 BACKGROUND
|
|
~~~~~~~~~~~~~~~~
|
|
"MODE7" was a Super Nintendo (SNES) graphics mode that took a tiled
|
|
background and transformed it to look as if it was squashed out to
|
|
the horizon, giving a 3d look. The SNES did this in hardware, but
|
|
in this demo we do this in software.
|
|
|
|
As found on Wikipedia, the transform is of the type
|
|
|
|
[x'] = [a b]([x]-[x0])+[x0]
|
|
[y'] [c d]([y] [y0]) [y0]
|
|
|
|
For our code, we managed to reduce things to a small number of additions
|
|
and subtractions for each pixel on the screen. Of course the 6502 can't
|
|
do floating point, so we do fixed point math. We convert as much as we
|
|
can to table lookups that are pre-calculated. We also make liberal use
|
|
of self-modifying code.
|
|
|
|
Despite all of this there are still some cases where we have to do a
|
|
16bit x 16bit = 32bit multiply, something that is *really* slow on 6502,
|
|
around 700 cycles (for a 8.8 x 8.8 fixed point multiply).
|
|
|
|
To make this faster we use a method described by Stephen Judd.
|
|
|
|
The key to note is that (a+b)^2 = a^2+2ab+b^2 and (a-b)^2=a^2-2ab+b^2
|
|
and if you add them you can simplify to:
|
|
(a+b)^2 (a-b)^2
|
|
a*b = --------- - -------
|
|
4 4
|
|
This is you have a table of squares from 0..511 (all 8-bit a+b and a-b
|
|
will fall in this range) then you can convert a multiply into a table
|
|
lookup plus a subtract.
|
|
|
|
The downsize is you will need 2kB of squares lookup tables (which can
|
|
be generated at startup). This reduces the multiply cost to the order
|
|
of 200 to 250 cycles.
|
|
|
|
By using the fast multiply and a lot of careful optimization you can
|
|
generate a Mode7 background in 40x40 graphics mode at about 5 frames/second.
|
|
|
|
The engine can be parameterized with different tilesets to use, which we
|
|
do to provide both a black+white checkerboard background, as well as the
|
|
island background from the TFV game.
|
|
|
|
BOUNCING BALL ON CHECKERBOARD
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
What would a demo be without some sort of bouncing geometric shape.
|
|
|
|
This is just done with 16 sprites. The sphere was modeled in OpenGL
|
|
from a 2000-era game-engine that I never finished. I then took screenshots
|
|
and then reduced the size/color to an appropriate value.
|
|
|
|
The shadow is also just sprites.
|
|
|
|
The clicking noise on bounce is just touching the speaker at $C030.
|
|
It's mostly there to give some sound effects for those playing the demo
|
|
without a mockingboard.
|
|
|
|
TFV SPACESHIP FLYING
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
The spaceship, water splash, and shadows are all sprites. This is all
|
|
done in software, the Apple II has no sprite hardware.
|
|
|
|
This is the TFV game engine flying-spaceship code, with the keyboard
|
|
routines replaced to read from memory instead (sort of like a script
|
|
of what to do when).
|
|
|
|
STARFIELD
|
|
~~~~~~~~~
|
|
The starfield is your typical starfield code. Only 16 stars are modeled.
|
|
It re-uses the fast-multiply code from the mode7 graphics.
|
|
|
|
Random number generation is not fast on the 6502, so we cheat.
|
|
Originally we had a 256-byte blob of "random" values generated earlier.
|
|
|
|
This wasted space, so now instead we just treat the executable code
|
|
at $5000 as if it were a block of random numbers. This was arbitrarily
|
|
chosen, I tried different areas of memory until I got one where the
|
|
stars seemed to move in a pleasing pattern.
|
|
|
|
A simple state machine controls if the stars move or not, whether the
|
|
background is cleared or not (the streak effect) and what color the
|
|
background is (for the blue flash).
|
|
|
|
The ship moving to the distance is just done with different sized sprites.
|
|
|
|
RASTERBARS/CREDITS
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
The credits happen with the starfield continuing to run.
|
|
|
|
The text is written in the bottom 4 lines of the screen. Some inverse-mode
|
|
space characters are used to try to make it look like graphics are surrounding
|
|
the text. It's actually possible with careful cycle counting to switch
|
|
modes fast enough to have actual mixed graphics/text (See the FrenchTouch
|
|
demos) but I was too lazy to attempt that here.
|
|
|
|
The rasterbar effect isn't really rasterbars, it's just a rainbow assortment
|
|
of lines being drawn with a SINEWAVE lookup table.
|
|
|
|
It's the same rasterbar code from my chiptune player demo. I ended up
|
|
optimizing it a lot via inlining and a few other ways because it turned
|
|
out just drawing a horizontal line can take a very long time.
|
|
|
|
The rotating text is just taking the output string and rapidly rotating the
|
|
character values through the ASCII table.
|
|
|
|
The annoying clicking noise is the same speaker effect caused by hitting
|
|
$C030.
|
|
|
|
Choosing who to thank ended up being extremely critical to fitting in 8kB,
|
|
as unique text strings do not compress well. I'm also still not satisfied
|
|
with how the centering looks.
|
|
|
|
|
|
|
|
Memory Map
|
|
==========
|
|
|
|
(not to scale)
|
|
|
|
-------- $ffff
|
|
| ROM/IO |
|
|
-------- $c000
|
|
| | 32k decompress
|
|
-------- $4000
|
|
| load | 8k
|
|
-------- $2000
|
|
| free |
|
|
-------- $1c00
|
|
| Scroll |
|
|
| Data |
|
|
-------- $1800
|
|
|Multiply|
|
|
| Tables |
|
|
-------- $1000
|
|
|GR pg 2 | 1k
|
|
|-------- $0c00
|
|
|GR pg 1 | 1k
|
|
|-------- $0800
|
|
|GR pg 0 | 1k
|
|
-------- $0400
|
|
| | 0.5
|
|
-------- $0200
|
|
| stack | 0.25
|
|
-------- $0100
|
|
|zero pg | 0.25
|
|
------- $0000
|
|
|