mirror of
https://github.com/deater/dos33fsprogs.git
synced 2024-07-13 22:29:14 +00:00
doc: another pass through mode7 doc
This commit is contained in:
parent
016bfcac63
commit
65cf945146
@ -434,75 +434,81 @@ them in software.
|
|||||||
% [x'] = [a b]([x]-[x0])+[x0]
|
% [x'] = [a b]([x]-[x0])+[x0]
|
||||||
% [y'] [c d]([y] [y0]) [y0]
|
% [y'] [c d]([y] [y0]) [y0]
|
||||||
|
|
||||||
|
% http://www.helixsoft.nl/articles/circle/sincos.htm
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
Our algorithm is based on code by Martijn van Iersel.
|
Our algorithm is based on code by Martijn van Iersel.
|
||||||
It iterates through each y line on the screen and calculates based on
|
It iterates through each horizontal line on the screen and calculates the color
|
||||||
the camera location: height ({\em spacez}), x and y coordinates
|
to output based on the camera height ({\em spacez}) and {\em angle} as well
|
||||||
({\em cx} and {\em cy}) and the {\em angle}.
|
as the current x and y coordinates ({\em cx} and {\em cy}).
|
||||||
|
|
||||||
|
First the distance {\em d} is calculated based on fixed scale and
|
||||||
|
distance-to-horizon factors.
|
||||||
|
Instead of a costly division we use a pre-generated lookup table for this.
|
||||||
|
\[d = \frac{z \times yscale}{y+horizon}\]
|
||||||
|
|
||||||
First calculate the distance
|
|
||||||
d = (z*yscale)/(y+horizon)
|
|
||||||
Then calculate the horizontal scale (distance between points on
|
Then calculate the horizontal scale (distance between points on
|
||||||
this line)
|
this line):
|
||||||
h = d/xscale
|
\[h = \frac{d}{xscale}\]
|
||||||
Then calculate delta x and delta y values
|
Then calculate delta x and delta y values between each block on the line.
|
||||||
dx = -sin(angle)*h
|
We use a pre-computed sine/cosine lookup table.
|
||||||
dy = cos(angle)*h
|
\[dx = -sin(angle) \times h\]
|
||||||
It then calculates the starting offset of the left side of the line in
|
\[dy = cos(angle) \times h\]
|
||||||
the tile lookup:
|
The leftmost position in the tile lookup is calculated:
|
||||||
tilex = cx + (d*cos(angle) - (width/2) * dx;
|
\[tilex = cx + (d*cos(angle) - (width/2) * dx\]
|
||||||
tiley = cy + (d*sin(angle) - (width/2) * dy;
|
\[tiley = cy + (d*sin(angle) - (width/2) * dy\]
|
||||||
Now iterate the inner loop, where we lookup the tile color for each pixel
|
Then an inner loop happens that adds dx and dy as we lookup the color
|
||||||
on the horizontal line.
|
from the tilemap (just a wrap-around array lookup) for each block
|
||||||
putpixel (x, y, tilelookup(tilex,tiley)
|
on the line.
|
||||||
tilex += dx;
|
\[color = tilelookup(tilex,tiley)\]
|
||||||
tiley += dy;
|
\[plot (x, y) \]
|
||||||
|
\[tilex += dx, tiley+= dy\]
|
||||||
|
|
||||||
{\bf Optimizations}
|
\noindent
|
||||||
|
{\bf Optimizations:}
|
||||||
We managed to take this algorithm and speed it up in the following ways:
|
The 6502 processor cannot do floating point, so all of our routines used
|
||||||
\begin{itemize}
|
8.8 fixed point math.
|
||||||
\item blah
|
We eliminated all of the division, and converted as much as possible
|
||||||
\end{itemize}
|
to use lookup tables (which involved limiting the heights and angles a bit).
|
||||||
|
We also saved some cycles here and there by using self-modifying code,
|
||||||
For our code, we managed to reduce things to a small number of additions
|
most notably hard-coding the height (z) value and modifying the code
|
||||||
and subtractions for each pixel on the screen. Of course the 6502 can't
|
if this is changed.
|
||||||
do floating point, so we do fixed point math. We convert as much as we
|
The code started out only capable of roughly 4.9fps in 40x20 resolution
|
||||||
can to table lookups that are pre-calculated. We also make liberal use
|
and in the end we improved this to 5.7fps in 40x40 resolution.
|
||||||
of self-modifying code.
|
Care was taken to optimize the innermost loop, as every cycle saved there
|
||||||
|
results in 1280 cycles saved overall.
|
||||||
|
|
||||||
|
\noindent
|
||||||
{\bf Fast Multiply:}
|
{\bf Fast Multiply:}
|
||||||
|
One of the biggest bottlenecks in the mode7 code was the multiply.
|
||||||
|
Even our optimized algorithm calls for at least seven
|
||||||
|
16bit x 16bit = 32bit multiplies, something that is {\em really} slow on
|
||||||
|
the 6502.
|
||||||
|
A typical implementation takes around 700 cycles
|
||||||
|
for a 8.8 x 8.8 fixed point multiply.
|
||||||
|
|
||||||
Despite all of this there are still some cases where we have to do a
|
We improved this by using the fast multiply algorithm
|
||||||
16bit x 16bit = 32bit multiply, something that is *really* slow on 6502,
|
described by Stephen Judd.
|
||||||
around 700 cycles (for a 8.8 x 8.8 fixed point multiply).
|
|
||||||
|
|
||||||
To make this faster we use a method described by Stephen Judd.
|
This works by noting that
|
||||||
|
\[(a+b)^{2} = a^{2}+2ab+b^{2}\]
|
||||||
|
and
|
||||||
|
\[(a-b)^{2}=a^{2}-2ab+b^{2}\]
|
||||||
|
If you subtract these you can simplify to
|
||||||
|
\[a\times b =\frac{(a+b)^{2}}{4} - \frac{(a-b)^2}{4}\]
|
||||||
|
|
||||||
The key to note is that $(a+b)^{2} = a^{2}+2ab+b^{2}$
|
For 8-bit values if you create a table of squares from 0 to 511
|
||||||
and $(a-b)^{2}=a^{2}-2ab+b^{2}$
|
(all 8-bit a+b and a-b fall in this range) then you can convert a multiply
|
||||||
and if you add them you can simplify to:
|
into two table lookups plus a subtract.
|
||||||
$a\times b =\frac{(a+b)^{2}}{4} - \frac{(a-b)^2}{4}$
|
This does have the downside of requiring 2kB of square lookup tables
|
||||||
|
(which can be generated at startup) but it reduces the multiply
|
||||||
|
cost to the order of 250 cycles or so.
|
||||||
|
|
||||||
This is you have a table of squares from 0..511 (all 8-bit a+b and a-b
|
|
||||||
will fall in this range) then you can convert a multiply into a table
|
|
||||||
lookup plus a subtract.
|
|
||||||
|
|
||||||
The downsize is you will need 2kB of squares lookup tables (which can
|
|
||||||
be generated at startup). This reduces the multiply cost to the order
|
|
||||||
of 200 to 250 cycles.
|
|
||||||
|
|
||||||
By using the fast multiply and a lot of careful optimization you can
|
|
||||||
generate a Mode7 background in 40x40 graphics mode at about 5 frames/second.
|
|
||||||
|
|
||||||
The engine can be parameterized with different tilesets to use, which we
|
|
||||||
do to provide both a black+white checkerboard background, as well as the
|
|
||||||
island background from the TFV game.
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
|
||||||
\subsection{BOUNCING BALL ON CHECKERBOARD}
|
\subsection{BALL ON CHECKERBOARD}
|
||||||
|
|
||||||
The first Mode7 scene transpires on an infinite checkerboard.
|
The first Mode7 scene transpires on an infinite checkerboard.
|
||||||
A demo would be incomplete without some sort of bouncing geometric solid,
|
A demo would be incomplete without some sort of bouncing geometric solid,
|
||||||
@ -512,65 +518,73 @@ a 20 year old OpenGL game engine.
|
|||||||
Screenshots were taken then reduced to the proper size and color
|
Screenshots were taken then reduced to the proper size and color
|
||||||
limitations.
|
limitations.
|
||||||
The shadows are also just sprites.
|
The shadows are also just sprites.
|
||||||
|
Note that the Apple II has no dedicated sprite hardware, so these
|
||||||
|
are drawn completely in software.
|
||||||
|
|
||||||
The clicking noise on bounce is generated by accessing the speaker port
|
The clicking noise on bounce is generated by accessing the speaker port
|
||||||
at address {\tt \$C030}.
|
at address {\tt \$C030}.
|
||||||
This gives some sound for those viewing the demo without the benefit
|
This gives some sound for those viewing the demo without the benefit
|
||||||
of a Mockingboard.
|
of a Mockingboard.
|
||||||
|
|
||||||
|
|
||||||
\subsection{TFV SPACESHIP FLYING}
|
\subsection{TFV SPACESHIP FLYING}
|
||||||
|
|
||||||
This next scene has a spaceship flying over an island.
|
This next scene has a spaceship flying over an island.
|
||||||
|
The Mode7 graphics code is generic enough that only one copy of the code
|
||||||
|
is needed to generate both the checkerboard and island scenes.
|
||||||
The spaceship, water splash, and shadows are all sprites.
|
The spaceship, water splash, and shadows are all sprites.
|
||||||
They are all drawn in software as the Apple II has no sprite hardware.
|
|
||||||
The path the ship takes is pre-recorded; this is adapted from the
|
The path the ship takes is pre-recorded; this is adapted from the
|
||||||
Talbot Fantasy~7 game engine with the keyboard code replaced by a hard-coded
|
Talbot Fantasy~7 game engine with the keyboard code replaced by a hard-coded
|
||||||
script of actions to take.
|
script of actions to take.
|
||||||
|
|
||||||
\subsection{STARFIELD}
|
\subsection{STARFIELD}
|
||||||
|
|
||||||
The spaceship takes to the stars.
|
The spaceship now takes to the stars.
|
||||||
This is typical starfield code.
|
This is typical starfield code, where on each iteration the x and y
|
||||||
Only 16 stars are modeled, and the movement code re-uses the
|
values are changed by
|
||||||
same fast-multiply routine described previously.
|
\[dx=\frac{x}{z}, dy=\frac{y}{z}\]
|
||||||
|
In order to get a good frame rate and not clutter the lo-res screen
|
||||||
|
only 16 stars are modeled.
|
||||||
|
To avoid having to divide, the reciprocal of all possible z values
|
||||||
|
are stored in a table, and the fast-multiply routine described
|
||||||
|
previously is used.
|
||||||
|
|
||||||
The star positions require random number generation, but this is not
|
The star positions require random number generation, but there is no
|
||||||
fast on the 6502.
|
easy way to quickly get random data on the Apple II.
|
||||||
Originally we had a 256-byte blob of pre-generated ``random'' values
|
Originally we had a 256-byte blob of pre-generated ``random'' values
|
||||||
included in the code.
|
included in the code.
|
||||||
This wasted space, so now instead we just use our code at address
|
This wasted space, so now instead we just use our code at address
|
||||||
at \$5000 as if it were a block of random numbers.
|
at \$5000 as if it were a block of random numbers.
|
||||||
This was arbitrarily chosen, and it is not as random as it could be
|
This was arbitrarily chosen, and it is not as random as it could be
|
||||||
as seen when the ship enters hyperspace the lower right quadrant has fewer
|
as seen when the ship enters hyperspace and the lower-right quadrant
|
||||||
starts than one could desire.
|
is distressingly star-free.
|
||||||
|
|
||||||
A simple state machine controls star speed, ship movement, hyperspace,
|
A simple state machine controls star speed, ship movement, hyperspace,
|
||||||
background color (for the blue flash) and the eventual sequence of sprites
|
background color (for the blue flash) and the eventual sequence of sprites
|
||||||
as the ship vanishes into the distance.
|
as the ship vanishes into the distance.
|
||||||
|
|
||||||
\subsection{RASTERBARS/CREDITS}
|
\subsection{RASTERBARS/CREDITS}
|
||||||
|
|
||||||
Once the ship has departed, it is time for the credits as the stars
|
Once the ship has departed, it is time to run the credits as the stars
|
||||||
continue to run.
|
continue to fly by.
|
||||||
|
|
||||||
The text is written to the bottom 4 lines of the screen and appears
|
The text is written to the bottom four lines of the screen, seemingly
|
||||||
to be surrounded by low-res graphics blocks.
|
surrounded by graphics blocks.
|
||||||
Mixed graphics/text would generally not be possible on the Apple II, although
|
Mixed graphics/text is generally not be possible on the Apple II, although
|
||||||
with careful cycle counting and mode switching groups such as FrenchTouch
|
with careful cycle counting and mode switching groups such as FrenchTouch
|
||||||
have achieved this effect.
|
have achieved this effect.
|
||||||
I was lazy and instead used inverse-mode space characters which appear the same
|
What we see in this demo is the use of inverse-mode (inverted color)
|
||||||
as white graphics blocks.
|
space characters which appear the same as white graphics blocks.
|
||||||
|
|
||||||
The rasterbar effect is not really rasterbars, it's just a colorful assortment
|
The rasterbar effect is not really rasterbars, just a colorful assortment
|
||||||
of horizontal lines drawn at a location determined with a sine lookup table.
|
of horizontal lines drawn at a location determined with a sine lookup table.
|
||||||
Horizontal lines can take a surprising amount of time to draw, so this
|
Horizontal lines can take a surprising amount of time to draw, but these
|
||||||
was optimized using inlining and a few other methods.
|
were optimized using inlining and a few other tricks.
|
||||||
|
|
||||||
The rotating text is done by just rapidly rotating the output string through
|
The spinning text is done by just rapidly rotating the output string through
|
||||||
the ASCII table, with the clicking effect again by hitting the speaker
|
the ASCII table, with the clicking effect again generated
|
||||||
at address \$C030.
|
by hitting the speaker at address {\tt \$C030}.
|
||||||
The list of people to thank ended up being extremely critical to fitting in 8kB,
|
The list of people to thank ended up being the primary limitation to
|
||||||
as unique text strings do not compress well.
|
fitting in 8kB, as unique text strings do not compress well.
|
||||||
I apologize to everyone whose moniker got compressed beyond recognition,
|
I apologize to everyone whose moniker got compressed beyond recognition,
|
||||||
and I am still not totally happy with the centering of the text.
|
and I am still not totally happy with the centering of the text.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user