40x40 lo-res rotozoomer for Apple II by Vince "deater" Weaver vince _at_ deater.net Working on this as it's part of a cutscene in my TFV game. Theory: ~~~~~~~ In a rotozoomer you scan across the screen (in our case in Apple lo-res, 40x40) and for each pixel do a mapping to find out what color to draw it. In this case you have a texture, and to find what point on the texture maps to the screen co-ordinates you do a transform to rotate and scale the co-ordinates. This usually involves some multiplies and some sin/cos calls. Optimization: ~~~~~~~~~~~~~ This effect is often done on 8-bit computers, the trick is to take as much work as possible out of the inner loop. For our case, each cycle we save in the innermost loop saves 1600 cycles total (40x40). The first optimization is to note that the transform is basically a set of straight lines plotted across the texture. So you can calculate the slope of this at the beginning (using sin/cos), then calculate all the points using simple add instructions. The code in C looks something like this. Some extra transformation is done to have the center of rotation be the center of the screen at 20,20. ca = cos(theta)*scale; sa = sin(theta)*scale; cca = -20*ca; csa = -20*sa; yca=cca+ycenter; ysa=csa+xcenter; for(yy=0;yy<40;yy++) { xp=cca+ysa; yp=yca-csa; for(xx=0;xx<40;xx++) { if ((xp<0) || (xp>39)) color=0; else if ((yp<0) || (yp>39)) color=0; else { color=scrn_page(xp,yp,PAGE2); } color_equals(color); plot(xx,yy); xp=xp+ca; yp=yp-sa; } yca+=ca; ysa+=sa; } Apple II/6502 optimizations ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We use an optimized multiply routine (using subtractions of squares) to do 8.8 fixed point signed multiply + We use lookup tables for sin() [and save space by using an offset into the sin() table for cos()] + We use 8.8 fixed point values for math, even though that's a bit slow on an 8-bit processor like the 6502 + Apple II screen-read/pixel plotting is a pain as memory is not linear and has holes in it. We use lookup tables to calculate the address for each line + Apple II lores mode lines are grouped together into the top/bottom nibbles of a byte. So typically to draw an arbitrary pixel you have to read the old value, mask off top or bottom, then OR in the new value. Our code avoids this... since we are drawing the entire screen destructively we don't have to save old values when drawing the bytes. + In addition, we unroll the Y loop by one which allows us to have custom code for odd/even rows which allow optimizing away a lot of conditional code Notes on making it even faster ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are other lo-res rotozoomer implementations. They are faster too, but because they don't usually do full 40x40 resolution. + If we used a smaller texture (rather than 40x40) things would be much faster. Other demos use 20x20 which would be blockier but also 4x faster + If we wrapped the texture at the edges (instead of filling with a solid color out of bounds) we could save at least 20 cycles, which would improve the frame rate.