2021-01-07 18:57:41 -05:00

109 lines
3.1 KiB
Plaintext

40x40 lo-res rotozoomer for Apple II
by Vince "deater" Weaver vince _at_ deater.net
Working on this as it's part of a cutscene in my TFV game.
Theory:
~~~~~~~
In a rotozoomer you scan across the screen (in our case in
Apple lo-res, 40x40) and for each pixel do a mapping to find
out what color to draw it.
In this case you have a texture, and to find what point on the
texture maps to the screen co-ordinates you do a transform to
rotate and scale the co-ordinates. This usually involves some
multiplies and some sin/cos calls.
Optimization:
~~~~~~~~~~~~~
This effect is often done on 8-bit computers, the trick is to
take as much work as possible out of the inner loop.
For our case, each cycle we save in the innermost loop saves
1600 cycles total (40x40).
The first optimization is to note that the transform is basically
a set of straight lines plotted across the texture. So you can
calculate the slope of this at the beginning (using sin/cos),
then calculate all the points using simple add instructions.
The code in C looks something like this. Some extra transformation
is done to have the center of rotation be the center of the screen
at 20,20.
ca = cos(theta)*scale;
sa = sin(theta)*scale;
cca = -20*ca;
csa = -20*sa;
yca=cca+ycenter;
ysa=csa+xcenter;
for(yy=0;yy<40;yy++) {
xp=cca+ysa;
yp=yca-csa;
for(xx=0;xx<40;xx++) {
if ((xp<0) || (xp>39)) color=0;
else if ((yp<0) || (yp>39)) color=0;
else {
color=scrn_page(xp,yp,PAGE2);
}
color_equals(color);
plot(xx,yy);
xp=xp+ca;
yp=yp-sa;
}
yca+=ca;
ysa+=sa;
}
Apple II/6502 optimizations
~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ We use an optimized multiply routine (using subtractions of
squares) to do 8.8 fixed point signed multiply
+ We use lookup tables for sin() [and save space by using an
offset into the sin() table for cos()]
+ We use 8.8 fixed point values for math, even though that's
a bit slow on an 8-bit processor like the 6502
+ Apple II screen-read/pixel plotting is a pain as memory is not
linear and has holes in it. We use lookup tables
to calculate the address for each line
+ Apple II lores mode lines are grouped together into the
top/bottom nibbles of a byte. So typically to draw
an arbitrary pixel you have to read the old value, mask
off top or bottom, then OR in the new value.
Our code avoids this... since we are drawing the entire
screen destructively we don't have to save old values
when drawing the bytes.
+ In addition, we unroll the Y loop by one which allows us to
have custom code for odd/even rows which allow optimizing
away a lot of conditional code
Notes on making it even faster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are other lo-res rotozoomer implementations.
They are faster too, but because they don't usually do full 40x40
resolution.
+ If we used a smaller texture (rather than 40x40) things would
be much faster. Other demos use 20x20 which would be
blockier but also 4x faster
+ If we wrapped the texture at the edges (instead of filling with
a solid color out of bounds) we could save at least 20
cycles, which would improve the frame rate.