dos33fsprogs/mode7_demo/docs/dram_notes.tex
2018-05-13 00:52:09 -04:00

202 lines
8.1 KiB
TeX

\documentclass{article}
\usepackage{graphicx}
\usepackage{colortbl}
\usepackage{multirow}
\newcommand*\rot{\rotatebox{90}}
\definecolor{color0}{rgb}{0.000,0.000,0.000} % Black
\definecolor{color1}{rgb}{0.890,0.118,0.376} % Magenta
\definecolor{color2}{rgb}{0.376,0.306,0.741} % Dark Blue
\definecolor{color3}{rgb}{1.000,0.267,0.992} % Purple
\definecolor{color4}{rgb}{0.000,0.638,0.376} % Dark Green
\definecolor{color5}{rgb}{0.612,0.612,0.612} % Grey 1
\definecolor{color6}{rgb}{0.078,0.812,0.992} % Medium Blue
\definecolor{color7}{rgb}{0.816,0.765,1.000} % Light Blue
\definecolor{color8}{rgb}{0.376,0.447,0.012} % Brown
\definecolor{color9}{rgb}{1.000,0.416,0.235} % Orange
\definecolor{color10}{rgb}{0.616,0.616,0.616} % Grey 2
\definecolor{color11}{rgb}{1.000,0.627,0.816} % Pink
\definecolor{color12}{rgb}{0.078,0.961,0.235} % Bright Green
\definecolor{color13}{rgb}{0.816,0.867,0.553} % Yellow
\definecolor{color14}{rgb}{0.447,1.000,0.816} % Aqua
\definecolor{color15}{rgb}{1.000,1.000,1.000} % White
\begin{document}
\begin{center}
\begin{large}
{\bf Notes on the Apple II Lores Memory Map}
\end{large}
{\bf Or: Why is the memory map so weird}
\end{center}
The Apple II is very much
a TV-typewriter video-terminal that happens to have a 6502
processor attached to give the display something to do.
(This makes it similar to the SoC in a Raspberry Pi, which is
a large GPU with a small helper ARM processor tacked onto the side.)
The Apple II video display is so central, that it even affects the
CPU timings.
The CPU clock usually runs at 978ns, but every 65th cycle
it is extended to 1117ns to keep the video output in sync with the colorburst.
This is why the 6502 runs at the somewhat unusual average speed of 1.020484MHz.
Text mode and low-resolution graphics share the same 1k region of memory
from addresses {\tt \$400} to {\tt \$800} for Page1.
A straightforward setup would have a linear memory map where
location (0,0) would map to address {\tt \$400}, location (39,0) would map
to {\tt \$427}, and location (0,1) would be at {\tt \$428}.
That would make too much sense.
For low-res, the first complication is what is represented by each
memory byte.
In text mode this is the ASCII value you wish to display, or-ed with
\$80 so the high bit is set.
Leaving the high bit clear does weird things like enable inverse
(black-on-white) or flashing characters.
Setting address {\tt \$400} to {\tt \$C1}
would put an 'A' (ASCII {\tt \$41})
in the upper left corner of the screen.
In low-res graphics mode the two 4-bit nibbles are split and
interpreted as two blocks, one above each other.
In this case the the {\tt \$C1} would be a color 1 (red) block on top
and a color 12 (light green) block on the bottom.
The colors are NTSC artifact colors, formed by outputting the raw bit
pattern out to the screen with the color burst enabled.
You can try this out yourself from BASIC by running
{\tt TEXT:HOME:POKE 1024,193} to see the text result, and
{\tt GR:POKE 1024,193} to see the graphics result.
That is not too bad so far.
The next complication is packing the 40-columns of characters into
video memory.
Sadly 40 is not a nice power of two, so any packing is going to
be inefficient somehow with respect to addressing bits.
The compromise is to pack three 40-byte columns into 128 bytes,
wasting 8 bytes (the ``screen holes'').
This still might not be that weird, but then the address interleaving
comes into play.
Note that row 0 starts at {\tt \$480}, but row 1 starts at
{\tt \$480} (a diff of 128), not {\tt \$428} (a diff of 40)
as you might expect.
Address {\tt \$428} actually corresponds to row 16.
The reason for this craziness, as with most oddities on the Apple II,
turns out to be Steve Wozniak being especially clever.
Early home computers often used static RAM (SRAM).
SRAM is easy to use, you just hook up the CPU address and read/write lines
to the memory chips and read and write bytes as needed.
The Apple II instead uses dynamic RAM (DRAM), where each bit is stored in
a capacitor whose value will leak away to zero unless you
refresh it periodically.
Why would you use memory that did that?
Well SRAM uses 6 transistors to store a bit, DRAM uses only 1.
So in theory you can fit 6 times the RAM in the same space, leading
to much cheaper costs and much better density.
To avoid losing the contents though, you must regularly refresh.
This involves reading each memory value out faster
than it leaks away.
DRAM reads are destructive,
so a read operation always reads out, recharges, then writes back
the original value.
Because of this you can avoid explicitly refreshing DRAM with a dedicated
circuit if you can guarantee you perform a read of each memory row
in the required timeframe.
Many systems could not do this, so there was separate
hardware to conduct the refresh.
Often this hardware would take over the memory bus and halt the CPU
while it was happening, slowing down the whole system.
This is true of the original IBM PC;
if you ever look at cycle-level optimization on the PC
you will notice the coders have to take into account pauses caused by
memory refresh (the refresh tended to be conservative so some coders
chose to live dangerously and make refresh happen less often to increase
performance).
% Wozniak's article in Byte magazine, May 1977 (Volume 2, Number 5)
% Gayler: The Apple II Circuit description
% 15-bit video address, 6 horiz 9 vert, increments, repeating 60Hz
% vert has 262 values, horiz has 65 (40 chars+25 horiz blank)
% value is loaded from proper place, and latched, 7 bits written out?
% Crazy, normally the 6502 runs at 978ns, but every 65th cycle
% it is extended to 1117ns to keep the video output in sync
% Which is why the average CPU freq of apple II is 1.020484MHz
% 192 dots vertical. 70 blanking
% Understanding the Apple II by Sather
% interleaving, but also to not leave excessive holes in map
% In interview in Sather book Woz says could have had contiguous
% memory with 2 more chips.
Steve Wozniak realized that he could avoid stopping the CPU for refresh.
The 6502 clock has two phases:
during first phase processor is busy
with internal work and the memory bus is idle.
The CPU only accesses memory in the second phase.
The Apple II uses the idle phase to step through the video memory
range and updates the display.
To refresh the 16k (model 4116) DRAM chips you need to read each 128-wide
row at least once every 2ms.
By carefully selecting the way that the CPU address lines map to
the RAS/CAS lines into the DRAM you can have the video scanning
circuitry walk through each row of the DRAMs fast enough to
conduct the refresh for free.
This works beautifully, but as a side effect you end up with the Apple II's
weird interleaved memory maps.
%
% 654 3210
%0x400 00 000 1000 000 0000
%0x480 00 000 1001 000 0000
%0x500 00 000 1010 000 0000
%0x580 00 000 1011 000 0000
%0x600 00 000 1100 000 0000
%0x680 00 000 1101 000 0000
%0x700 00 000 1110 000 0000
%0x780 00 000 1111 000 0000
%0x428 00 000 1000 010 1000
%0x4a8 00 000 1001 010 1000
%0x528 00 000 1010 010 1000
%0x5a8 00 000 1011 010 1000
%0x628 00 000 1100 010 1000
%0x6a8 00 000 1101 010 1000
%0x728 00 000 1110 010 1000
%0x7a8 00 000 1111 010 1000
%0x450,0x4d0,0x550,0x5d0,0x650,0x6d0,0x750,0x7d0,
%
%127 values needed
%
%0000 0000 0000 0000 = $0000
%...
%0011 1111 1000 0000 = $3f80
Wozniak said in a later interview that in retrospect he could have
gotten a linear video memory map at the expense of two more chips
on the circuit board.
Apparently when designing the Apple II he thought most people would use BASIC
which hid the memory map, and did not realize the interleaving would
be such a pain for assembly coders.
%So this is the reason for the ugly memory map.
This is why low-level text and lowres graphics routines
%and text code often
%It is also why Apple II graphics code must
can be complex, using lookup tables and
read/shift/mask operations just to do simple plot operations.
Fully generic routines have to handle all the corner cases, which is why
the Mode7 demo cheats and the sprite drawing code only works
at even row offsets (as this makes the code smaller and simpler).
While this seems needlessly complicated, the hi-res graphics mode
is even worse that the mess described above.
\input{table.tex}
\end{document}