From 65cf945146333dbe4289dd86a5c3f7b0f4212424 Mon Sep 17 00:00:00 2001 From: Vince Weaver Date: Wed, 25 Apr 2018 00:53:30 -0400 Subject: [PATCH] doc: another pass through mode7 doc --- mode7_demo/docs/mode7_demo.tex | 172 ++++++++++++++++++--------------- 1 file changed, 93 insertions(+), 79 deletions(-) diff --git a/mode7_demo/docs/mode7_demo.tex b/mode7_demo/docs/mode7_demo.tex index 3970d25c..9c54ebad 100644 --- a/mode7_demo/docs/mode7_demo.tex +++ b/mode7_demo/docs/mode7_demo.tex @@ -434,75 +434,81 @@ them in software. % [x'] = [a b]([x]-[x0])+[x0] % [y'] [c d]([y] [y0]) [y0] +% http://www.helixsoft.nl/articles/circle/sincos.htm + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Our algorithm is based on code by Martijn van Iersel. -It iterates through each y line on the screen and calculates based on -the camera location: height ({\em spacez}), x and y coordinates -({\em cx} and {\em cy}) and the {\em angle}. +It iterates through each horizontal line on the screen and calculates the color +to output based on the camera height ({\em spacez}) and {\em angle} as well +as the current x and y coordinates ({\em cx} and {\em cy}). + +First the distance {\em d} is calculated based on fixed scale and +distance-to-horizon factors. +Instead of a costly division we use a pre-generated lookup table for this. + \[d = \frac{z \times yscale}{y+horizon}\] -First calculate the distance - d = (z*yscale)/(y+horizon) Then calculate the horizontal scale (distance between points on -this line) - h = d/xscale -Then calculate delta x and delta y values - dx = -sin(angle)*h - dy = cos(angle)*h -It then calculates the starting offset of the left side of the line in -the tile lookup: - tilex = cx + (d*cos(angle) - (width/2) * dx; - tiley = cy + (d*sin(angle) - (width/2) * dy; -Now iterate the inner loop, where we lookup the tile color for each pixel -on the horizontal line. - putpixel (x, y, tilelookup(tilex,tiley) - tilex += dx; - tiley += dy; +this line): + \[h = \frac{d}{xscale}\] +Then calculate delta x and delta y values between each block on the line. +We use a pre-computed sine/cosine lookup table. + \[dx = -sin(angle) \times h\] + \[dy = cos(angle) \times h\] +The leftmost position in the tile lookup is calculated: + \[tilex = cx + (d*cos(angle) - (width/2) * dx\] + \[tiley = cy + (d*sin(angle) - (width/2) * dy\] +Then an inner loop happens that adds dx and dy as we lookup the color +from the tilemap (just a wrap-around array lookup) for each block +on the line. + \[color = tilelookup(tilex,tiley)\] + \[plot (x, y) \] + \[tilex += dx, tiley+= dy\] -{\bf Optimizations} - -We managed to take this algorithm and speed it up in the following ways: - \begin{itemize} - \item blah - \end{itemize} - - For our code, we managed to reduce things to a small number of additions - and subtractions for each pixel on the screen. Of course the 6502 can't - do floating point, so we do fixed point math. We convert as much as we - can to table lookups that are pre-calculated. We also make liberal use - of self-modifying code. +\noindent +{\bf Optimizations:} +The 6502 processor cannot do floating point, so all of our routines used +8.8 fixed point math. +We eliminated all of the division, and converted as much as possible +to use lookup tables (which involved limiting the heights and angles a bit). +We also saved some cycles here and there by using self-modifying code, +most notably hard-coding the height (z) value and modifying the code +if this is changed. +The code started out only capable of roughly 4.9fps in 40x20 resolution +and in the end we improved this to 5.7fps in 40x40 resolution. +Care was taken to optimize the innermost loop, as every cycle saved there +results in 1280 cycles saved overall. +\noindent {\bf Fast Multiply:} +One of the biggest bottlenecks in the mode7 code was the multiply. +Even our optimized algorithm calls for at least seven +16bit x 16bit = 32bit multiplies, something that is {\em really} slow on +the 6502. +A typical implementation takes around 700 cycles +for a 8.8 x 8.8 fixed point multiply. - Despite all of this there are still some cases where we have to do a - 16bit x 16bit = 32bit multiply, something that is *really* slow on 6502, - around 700 cycles (for a 8.8 x 8.8 fixed point multiply). +We improved this by using the fast multiply algorithm +described by Stephen Judd. - To make this faster we use a method described by Stephen Judd. +This works by noting that + \[(a+b)^{2} = a^{2}+2ab+b^{2}\] +and + \[(a-b)^{2}=a^{2}-2ab+b^{2}\] +If you subtract these you can simplify to + \[a\times b =\frac{(a+b)^{2}}{4} - \frac{(a-b)^2}{4}\] - The key to note is that $(a+b)^{2} = a^{2}+2ab+b^{2}$ - and $(a-b)^{2}=a^{2}-2ab+b^{2}$ - and if you add them you can simplify to: - $a\times b =\frac{(a+b)^{2}}{4} - \frac{(a-b)^2}{4}$ +For 8-bit values if you create a table of squares from 0 to 511 +(all 8-bit a+b and a-b fall in this range) then you can convert a multiply +into two table lookups plus a subtract. +This does have the downside of requiring 2kB of square lookup tables +(which can be generated at startup) but it reduces the multiply +cost to the order of 250 cycles or so. - This is you have a table of squares from 0..511 (all 8-bit a+b and a-b - will fall in this range) then you can convert a multiply into a table - lookup plus a subtract. - - The downsize is you will need 2kB of squares lookup tables (which can - be generated at startup). This reduces the multiply cost to the order - of 200 to 250 cycles. - - By using the fast multiply and a lot of careful optimization you can - generate a Mode7 background in 40x40 graphics mode at about 5 frames/second. - - The engine can be parameterized with different tilesets to use, which we - do to provide both a black+white checkerboard background, as well as the - island background from the TFV game. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{BOUNCING BALL ON CHECKERBOARD} +\subsection{BALL ON CHECKERBOARD} The first Mode7 scene transpires on an infinite checkerboard. A demo would be incomplete without some sort of bouncing geometric solid, @@ -512,65 +518,73 @@ a 20 year old OpenGL game engine. Screenshots were taken then reduced to the proper size and color limitations. The shadows are also just sprites. +Note that the Apple II has no dedicated sprite hardware, so these +are drawn completely in software. The clicking noise on bounce is generated by accessing the speaker port at address {\tt \$C030}. This gives some sound for those viewing the demo without the benefit of a Mockingboard. - \subsection{TFV SPACESHIP FLYING} This next scene has a spaceship flying over an island. +The Mode7 graphics code is generic enough that only one copy of the code +is needed to generate both the checkerboard and island scenes. The spaceship, water splash, and shadows are all sprites. -They are all drawn in software as the Apple II has no sprite hardware. The path the ship takes is pre-recorded; this is adapted from the Talbot Fantasy~7 game engine with the keyboard code replaced by a hard-coded script of actions to take. \subsection{STARFIELD} -The spaceship takes to the stars. -This is typical starfield code. -Only 16 stars are modeled, and the movement code re-uses the -same fast-multiply routine described previously. +The spaceship now takes to the stars. +This is typical starfield code, where on each iteration the x and y +values are changed by + \[dx=\frac{x}{z}, dy=\frac{y}{z}\] +In order to get a good frame rate and not clutter the lo-res screen +only 16 stars are modeled. +To avoid having to divide, the reciprocal of all possible z values +are stored in a table, and the fast-multiply routine described +previously is used. -The star positions require random number generation, but this is not -fast on the 6502. +The star positions require random number generation, but there is no +easy way to quickly get random data on the Apple II. Originally we had a 256-byte blob of pre-generated ``random'' values included in the code. This wasted space, so now instead we just use our code at address at \$5000 as if it were a block of random numbers. This was arbitrarily chosen, and it is not as random as it could be -as seen when the ship enters hyperspace the lower right quadrant has fewer -starts than one could desire. +as seen when the ship enters hyperspace and the lower-right quadrant +is distressingly star-free. + A simple state machine controls star speed, ship movement, hyperspace, background color (for the blue flash) and the eventual sequence of sprites as the ship vanishes into the distance. \subsection{RASTERBARS/CREDITS} -Once the ship has departed, it is time for the credits as the stars -continue to run. +Once the ship has departed, it is time to run the credits as the stars +continue to fly by. -The text is written to the bottom 4 lines of the screen and appears -to be surrounded by low-res graphics blocks. -Mixed graphics/text would generally not be possible on the Apple II, although +The text is written to the bottom four lines of the screen, seemingly +surrounded by graphics blocks. +Mixed graphics/text is generally not be possible on the Apple II, although with careful cycle counting and mode switching groups such as FrenchTouch have achieved this effect. -I was lazy and instead used inverse-mode space characters which appear the same -as white graphics blocks. +What we see in this demo is the use of inverse-mode (inverted color) +space characters which appear the same as white graphics blocks. -The rasterbar effect is not really rasterbars, it's just a colorful assortment +The rasterbar effect is not really rasterbars, just a colorful assortment of horizontal lines drawn at a location determined with a sine lookup table. -Horizontal lines can take a surprising amount of time to draw, so this -was optimized using inlining and a few other methods. +Horizontal lines can take a surprising amount of time to draw, but these +were optimized using inlining and a few other tricks. -The rotating text is done by just rapidly rotating the output string through -the ASCII table, with the clicking effect again by hitting the speaker -at address \$C030. -The list of people to thank ended up being extremely critical to fitting in 8kB, -as unique text strings do not compress well. +The spinning text is done by just rapidly rotating the output string through +the ASCII table, with the clicking effect again generated +by hitting the speaker at address {\tt \$C030}. +The list of people to thank ended up being the primary limitation to +fitting in 8kB, as unique text strings do not compress well. I apologize to everyone whose moniker got compressed beyond recognition, and I am still not totally happy with the centering of the text.