From 0b239efeebe757de25f1918e870ab1988888e940 Mon Sep 17 00:00:00 2001
From: Vince Weaver <vince@deater.net>
Date: Tue, 8 May 2018 01:50:35 -0400
Subject: [PATCH] doc: update the notes

---
 mode7_demo/docs/dram_notes.tex | 208 ++++++++++++++++-----------------
 1 file changed, 102 insertions(+), 106 deletions(-)

diff --git a/mode7_demo/docs/dram_notes.tex b/mode7_demo/docs/dram_notes.tex
index 758b7f00..6d08eb79 100644
--- a/mode7_demo/docs/dram_notes.tex
+++ b/mode7_demo/docs/dram_notes.tex
@@ -25,91 +25,85 @@
 
 \begin{document}
 
-Sort of similar to how the Raspberry Pi started out as a GPU with a small
-RAM processor tacked to the side, the Apple II design is very much
-a TV-typewriter style video display that just happens to have a 6502
+\begin{center}
+\begin{large}
+{\bf Notes on the Apple II Lores Memory Map}
+\end{large}
+
+{\bf Or: Why is the memory map so weird}
+\end{center}
+The SoC in a Raspberry Pi is actually a large GPU with a small
+helper ARM processor tacked onto the side.
+In a similar fashion, the Apple II is very much
+a TV-typewriter video terminal that happens to have a 6502
 processor attached to give the display something to do.
-
-Wozniak has said in an interview in retrospect he could have
-gotten a lineary video memory map at the expense of two more chips.
-Apparently at design time he had thought most people would use BASIC
-on the machine, and since BASIC already supported things did not realize
-what a hassle the map would be to assembly coders.
-
-Note any time use text or plot/line need a lookup table.
-My sprite code cheats and only supports drawing at an even rows because
-it makes the code smaller, fater, and simpler
-
-
-
-
+The video display is key to many things, in fact the CPU clock
+usually runs at 978ns, but every 65th cycle
+it is extended to 1117ns to keep the video output in sync with the colorburst.
+This is why the 6502 runs at the odd average speed of 1.020484MHz.
 
 Page 1 of The Apple II low-resolution graphics and text display share
 the same 1k region of memory, from addresses {\tt \$400} to  {\tt \$800}.
 In an easy-to-use setup you would have a linear memory map where
 location (0,0) would map to address {\tt \$400}, location (39,0) would map
 to {\tt \$427}, and location (1,0) would be at {\tt \$428}.
-This is not how it works though.
+That would make too much sense.
 
 First, each memory location holds an 8-bit value. 
-In text mode this is just the ASCII value you want to print 
-(confusingly with the high bit set for plain text, the low-bit clear
-does weird things like enable inverse (black-on-white) or flashing
-modes).
-So setting address {\tt \$400} to {\tt \$C1}
+In text mode this is just the ASCII value you want to print,
+although confusingly with the high bit set for plain text.
+Leaving the high bit clear does weird things like enable inverse 
+(black-on-white) or flashing characters.
+Setting address {\tt \$400} to {\tt \$C1}
 would put an 'A' (ASCII {\tt \$41})
 in the upper left corner of the screen.
 In low-res graphics mode the nibbles are used, so the {\tt \$C1} would
-be intepreted as putting two blocks, one above each other, in the upper
+be interpreted as putting two blocks, one above each other, in the upper
 left.
-The top one would be color 1 (red) and the bottom color 12 (light green).
+The top block would be color 1 (red) and the bottom color 12 (light green).
 The colors are NTSC artifact colors, caused by outputting the raw bit
 pattern out to the screen with the color burst enabled.
 You can try this out yourself from BASIC by running 
 {\tt TEXT:HOME:POKE 1024,193} to see the text result, and
-{\tt GR:POKE 1024,193} to see the graphcis result.
+{\tt GR:POKE 1024,193} to see the graphics result.
 
 The next part that complicates things is the weird interleavings of
 the addresses.
-Note that Line 2 starts at {\tt \$480}, not {\tt \$428} as expected.
+Note that Line 2 starts at {\tt \$480}, not {\tt \$428} as you might expect.
 {\tt \$428} actually corresponds to line 16.
 
 The reason for this craziness turns out to Steve Wozniak being especially 
 clever, and finding a way to get DRAM refresh essentially for free.
-
 Early home computers often used static RAM (SRAM).
-SRAM is easy to use, you just hook up the address and read/write lines,
-then just read and write to memory.
-It was very fast too.
-So why use dynamic RAM (DRAM)?
-Well SRAM uses 6 transistors per bit.
-DRAM only uses 1 transistor (plus a capacitor to store the value).
-So you can in theory fit 6 times the RAM in the same space, leading
-to much cheaper costs and much better density.
-What are the downsides of DRAM?
-Typically it was slower.
-Also that capacitor leaks away your memory.
-Given enough time (on the order of seconds or so) and all of your RAM
-will leak down to zero.
-To combat that, you have to refresh your memory.
-This essentially involves reading each memory value out faster
-than it leaks away (due to the design of DRAM, reads are destructive,
-so a read operation always reads out, recharges, then writes back
-the value).
+SRAM is easy to use, you just hook up the CPU address and read/write lines
+to the memory chips and read and write bytes as needed.
 
-The act of refresh can be slow.
-On many systems there was separate hardware to do the refresh, and
-it often took over the memory bus to do this and your CPU would have
-to pause while it was happening.
-This is true of the original IBM PC.
-If you ever look at cycle-level optimization on this platform you will
-notice the coders have to take into account pauses caused by
+The Apple II uses dynamic RAM (DRAM) where each bit is stored in a capacitor
+whose value will leak away to zero unless you refresh it periodically.
+Why would you use memory that did that?
+Well SRAM uses 6 transistors to store a bit, a DRAM only 1.
+So in theory you can fit 6 times the RAM in the same space, leading
+to much cheaper costs and much better density.
+
+Refreshing the DRAM involves regularly reading each memory value out faster
+than it leaks away.
+Due to the design of DRAM, reads are destructive,
+so a read operation always reads out, recharges, then writes back
+the value.
+
+Refreshing can be slow.
+On many systems there was separate hardware to conduct the refresh, and
+often this hardware would take over the memory bus and halt the CPU
+while it was happening.
+This is true of the original IBM PC;
+if you ever look at cycle-level optimization on the PC
+you will notice the coders have to take into account pauses caused by
 memory refresh (the refresh tended to be conservative so some coders
-would live dangerously and make refresh happen less often to increase
+chose to live dangerously and make refresh happen less often to increase
 performance).
 
 % Wozniak's article in Byte magazine, May 1977 (Volume 2, Number 5) 
-% Gayler: The Apple II Circuit discription
+% Gayler: The Apple II Circuit description
 %  15-bit video address, 6 horiz 9 vert, increments, repeating 60Hz
 %  vert has 262 values, horiz has 65 (40 chars+25 horiz blank)
 % value is loaded from proper place, and latched, 7 bits written out?
@@ -117,63 +111,65 @@ performance).
 %   it is extended to 1117ns to keep the video output in sync
 %  Which is why the average CPU freq of apple II is 1.020484MHz
 % 192 dots vertical.  70 blanking
-% Undstanding the Apple II by Sather
-%   interleaving, but also to not leave execessive holes in map
+% Understanding the Apple II by Sather
+%   interleaving, but also to not leave excessive holes in map
 % In interview in Sather book Woz says could have had contiguous
 %   memory with 2 more chips.
 
-Could have had contiguous memory with two more chips?
-
-1MHz 6502 cpu clock two phases.  During first phase processor is busy
-with internal work and the memory bus is idle.  So during this
-time the video circuitry reads the memory and updates the display.
-
-4116, refresh every two milliseconds. 2ms=500Hz
-
-each column has 128 bytes
-
-RA0-RA2 are the only ones that matter for correctness
-
-RA0 = V0
-RA1 = H2
-RA2 = H0
-
-           654 3210
-0x400	00 000 1000 000 0000
-0x480	00 000 1001 000 0000
-0x500	00 000 1010 000 0000
-0x580	00 000 1011 000 0000
-0x600	00 000 1100 000 0000
-0x680	00 000 1101 000 0000
-0x700	00 000 1110 000 0000
-0x780	00 000 1111 000 0000
-0x428	00 000 1000 010 1000
-0x4a8	00 000 1001 010 1000
-0x528	00 000 1010 010 1000
-0x5a8	00 000 1011 010 1000
-0x628	00 000 1100 010 1000
-0x6a8	00 000 1101 010 1000
-0x728	00 000 1110 010 1000
-0x7a8	00 000 1111 010 1000
-0x450,0x4d0,0x550,0x5d0,0x650,0x6d0,0x750,0x7d0,
-
-127 values needed
-
-0000 0000 0000 0000 = $0000	
-...
-0011 1111 1000 0000 = $3f80
-
-
-
-
-
-
-
-
-
+Steve Wozniak realized that he could avoid stopping the CPU for refresh.
+The 6502 clock has two phases.
+During first phase processor is busy
+with internal work and the memory bus is idle.  
+On the Apple II during the idle time it steps through the video memory
+and updates the display.
+To refresh the 16k (4116) DRAM chips you need to read each 128-wide
+row at least once every 2ms.
+By carefully selecting the way that the CPU address lines map to
+the RAS/CAS lines into the DRAM you can have the video scanning
+circuitry walk through each row of the DRAMs fast enough to
+conduct the refresh for free, at the expense of having weird
+interleaved video memory mappings.
 
+%
+ %          654 3210
+%0x400	00 000 1000 000 0000
+%0x480	00 000 1001 000 0000
+%0x500	00 000 1010 000 0000
+%0x580	00 000 1011 000 0000
+%0x600	00 000 1100 000 0000
+%0x680	00 000 1101 000 0000
+%0x700	00 000 1110 000 0000
+%0x780	00 000 1111 000 0000
+%0x428	00 000 1000 010 1000
+%0x4a8	00 000 1001 010 1000
+%0x528	00 000 1010 010 1000
+%0x5a8	00 000 1011 010 1000
+%0x628	00 000 1100 010 1000
+%0x6a8	00 000 1101 010 1000
+%0x728	00 000 1110 010 1000
+%0x7a8	00 000 1111 010 1000
+%0x450,0x4d0,0x550,0x5d0,0x650,0x6d0,0x750,0x7d0,
+%
+%127 values needed
+%
+%0000 0000 0000 0000 = $0000	
+%...
+%0011 1111 1000 0000 = $3f80
 
+Wozniak said in a later interview that in retrospect he could have
+gotten a linear video memory map at the expense of two more chips
+on the circuit board.
+Apparently when designing the Apple II he thought most people would use BASIC
+which hid the memory map, and did not realize the interleaving would
+be such a pain for assembly coders.
 
+So this is the reason for the ugly memory map.
+It is also why Apple II graphics code often uses lookup tables and
+read/shift/mask operations just to do a simple plot operation.
+It is also why my demo code cheats and the sprite code only works
+at even row offsets.
+And it may seem hard to believe, but the hi-res code drawing routines
+are even more complicated.
 
 \input{table.tex}