Commit work-in-progress of BBU FPGA in Verilog.

This commit is contained in:
Andrew Makousky 2020-10-25 09:00:40 -05:00
parent bb049216e4
commit f9200e43ac
2 changed files with 790 additions and 4 deletions

View File

@ -96,7 +96,7 @@ a huge number of pins, its purpose can be summarized as follows.
configurations.
128K (double undocumented), 256K (double undocumented), 512K
(undocumented), 1 MB, 2 MB, 4 MB.
(undocumented), 1MB, 2MB, 4MB.
* Refresh the DRAM by periodically reading some arbitrary memory from
every available row. Unlike the Apple II, the contiguous
@ -169,7 +169,7 @@ only simple, single-pin interfaces.
to the address lines on the RAM SIMMs, outputs.
* `RA9` is only controlled when the `MBRAM` input is TRUE, i.e. +5V.
This indicates that 1 MB RAM SIMMs are being used. Otherwise, it is
This indicates that 1MB RAM SIMMs are being used. Otherwise, it is
kept at zero and high memory addresses are marked as bus errors.
256K RAM SIMMs are used in this case.
@ -179,10 +179,10 @@ only simple, single-pin interfaces.
`CAS1H` are not driven.
* If both `MBRAM` and `ROW2` are TRUE, i.e. +5V, it is also possible
for the BBU to detect a 2.5 MB RAM configuration and adjust bus
for the BBU to detect a 2.5MB RAM configuration and adjust bus
errors flagging accordingly.
* `RDO0` - `RDO15` are bidirectional data signals, they are the
* `RDQ0` - `RDQ15` are bidirectional data signals, they are the
primary means by which single-pin I/O devices and the like are
mapped into the address space that can be directly accessed by the
CPU, in conjunction with the address inputs. Namely, the BBU reads

786
hardware/fpga/bbu/bbu.v Normal file
View File

@ -0,0 +1,786 @@
/* Synthesizable Verilog hardware description for a drop-in
replacement of the Apple Custom Silicon Bob Bailey Unit (BBU), an
address controller for the Macintosh SE and similar computers.
Written in 2020 by Andrew Makousky
Public Domain Dedication:
To the extent possible under law, the author(s) have dedicated all
copyright and related and neighboring rights to this software to
the public domain worldwide. This software is distributed without
any warranty.
You should have received a copy of the CC0 Public Domain Dedication
along with this software. If not, see
<http://creativecommons.org/publicdomain/zero/1.0/>.
*/
/* Top-level module for the BBU.
TODO Abbreviations legend, here since they are not noted elsewhere:
RA = RAM Address
RDQ = RAM Data Value
PMCYC = Processor Memory Cycle
PINOUT:
RA0 79
RA1 78
RA2 76
RA3 73
RA4 71
RA5 70
RA6 68
RA8 67
RA7 66
RA9 65
*CAS1L 16
*CAS0L 15
RAM R/*W 14
*RAS 20
*CAS1H 19
*CAS0H 18
RDQ0 69
RDQ1 72
RDQ2 74
RDQ3 75
RDQ4 77
RDQ5 80
RDQ6 83
RDQ7 2
RDQ8 3
RDQ9 4
RDQ10 5
RDQ11 6
RDQ12 7
RDQ13 8
RDQ14 9
RDQ15 10
*EN245 12
*DTACK 38
R/*W 47
*IPL1 30
*LDS 33
*VPA 36
*C8M 37
VCC 22
VCC 64
VCC 42
VCC 84
MBRAM 17
GND 1
GND 21
GND 43
GND 63
ROW2 13
*EXTDTK 11
A23 23
A22 24
A21 25
A20 26
A19 27
A17 28
A9 29
*PMCYC 81
C2M 82
*RES 59
C16MRSF2 44
C3.7M 40
*ROMEN 39
*SCCRD 46
PWM 49
SCSIDRQ 55
*IWM 48
*SCCEN 45
*SCSI 57
*DACK 56
SNDRES 50
VIA.CS1 58
VIDPG2 53
*EAREN 52
*AS 41
*BERR 34
SND 51
*VSYNC 61
*IOW 54
*HSYNC 60
*VIAIRQ 32
VIDOUT 62
*IPL0 31
*UDS 35
Undocumented but assumed to exist:
64KRAM ???
Note that *IOW controls both *SCSI.IOW and *SCC.WR.
*/
module bbu_master_ctrl
// Essential sequential logic RESET and clock signals
(n_res, c16m, c8m, c3_7m, c2m,
// RAM configuration pins
row2, mbram, s64kram,
// MC68000 signals
a9, a17, a19, a20, a21, a22, a23,
r_n_w, n_as, n_uds, n_lds, n_dtack,
n_ipl0, n_ipl1, n_berr,
n_vpa,
// DRAM signals
ra0, ra1, ra2, ra3, ra4, ra5, ra6, ra8, ra7, ra9,
n_cas1l, n_cas0l, ram_r_n_w, n_ras, n_cas1h, n_cas0h,
rdq0, rdq1, rdq2, rdq3, rdq4, rdq5, rdq6, rdq7,
rdq8, rdq9, rdq10, rdq11, rdq12, rdq13, rdq14, rdq15,
n_en245, n_pmcyc,
// ROM and memory overlay signals
n_romen,
// VIA signals
via_cs1, // TODO VERIFY: indeed active high?
n_viairq,
// Video signals
vidpg2, vidout, n_hsync, n_vsync,
// Sound and disk speed signals
sndres, snd, pwm,
// IWM signals
n_iwm,
// SCC signals
n_sccen, n_sccrd, n_iow,
// SCSI signals
n_scsi, scsidrq, n_dack,
// PDS signals
n_extdtk, n_earen,
);
// TODO FIXME: Signals missing-in-action: OVERLAY, SNDPG2. It's
// possible that the Macintosh SE uses a magic address access to
// obviate the need for consuming a VIA pin for OVERLAY, but I have
// no idea what this address would be. Has support for the second
// sound page been dropped in the Macintosh SE?
// Essential sequential logic RESET and clock signals
input wire n_res; // *RESET signal
input wire c16m; // 15.667200 MHz master clock input
output reg c8m; // 7.8336 MHz clock output
output reg c3_7m; // 3.672 MHz clock output
output reg c2m; // 1.9584 MHz clock output
// RAM configuration pins
input wire row2; // 1/2 rows of RAM SIMMs jumper
input wire mbram; // 256K/1MB RAM SIMMs jumper
input wire s64kram; // DOUBLY UNDOCUMENTED 64K RAM SIMMs jumper
// MC68000 signals
input wire a9, a17, a19, a20, a21, a22, a23;
input wire r_n_w, n_as, n_uds, n_lds;
output reg n_dtack, n_ipl0, n_ipl1, n_berr;
input wire n_vpa;
// DRAM signals
inout wire ra0, ra1, ra2, ra3, ra4, ra5, ra6, ra8;
output reg ra7, ra9;
output reg n_cas1l, n_cas0l, ram_r_n_w, n_ras, n_cas1h, n_cas0h;
inout wire rdq0, rdq1, rdq2, rdq3, rdq4, rdq5, rdq6, rdq7,
rdq8, rdq9, rdq10, rdq11, rdq12, rdq13, rdq14, rdq15;
output reg n_en245, n_pmcyc;
// ROM and memory overlay signals
output reg n_romen;
// VIA signals
output reg via_cs1;
input wire n_viairq;
// Video signals
input wire vidpg2; // VIDPG2 signal
output reg vidout; // VIDOUT signal
output reg n_hsync; // *HSYNC signal
output reg n_vsync; // *VSYNC signal
// Sound and disk speed signals
input wire sndres;
output reg snd, pwm;
// IWM signals
output reg n_iwm;
// SCC signals
output reg n_sccen, n_sccrd, n_iow;
// SCSI signals
output reg n_scsi;
input wire scsidrq;
output reg n_dack;
// PDS signals
input wire n_extdtk;
output reg n_earen; // ??? Purpose unknown.
// Note tristate inout ... 'bz for high impedance. 8'bz for wide.
///////////////////////////////////////////////////////////
// 15.6672 / 3.6720 = 9792/2295 = (51*2^6*3)/(51*3^2*5)
// = (2^6)/(3*5) = 64/15
// So, here's how we implement the frequency divider to generate
// the 3.672 MHz clock. Initialize to 64 - 15 = 49, and keep
// subtracting 15 until we reach zero or less. Then, add back 49,
// and toggle the C3.7M output.
// TODO FIXME: The complex frequency divider will not work
// correctly: Since the base frequency is not high enough, there
// will be terrible aliasing artifacts. Divide by 4 is a bit too
// fast, divide by 5 is a bit too slow, but that's the best we can
// do without PLL clock frequency multiplication.
//
// PLL clock frequency synthesis inside the BBU is conceivable to
// believe, however, considering that the the GLUE chip in the
// Macintosh SE/30 doubles the 16 MHz crystal input to 32 MHz.
// Easiest method, we want the least common multiple between these
// two frequencies. So, back to where we started.
//
// 15.6672 / 3.6720 = 16/16 * 9792/2295 = (51*2^10*3)/(51*2^4*3^2*5)
// LCM: 51*2^10*3^2*5 = 2350080 -> 235.0080 MHz
// 235.0080 / 15.6672 = 15
// 235.0080 / 3.6720 = 64
//
// So, this is how we synthesize the perfect 3.6720 MHz clock
// signal. Multiply the source frequency of 15.6672 MHz by 15 via
// a PLL to get an intermediate clock frequency of 235.0080 MHz,
// then divide by 64 to get the target 3.6720 MHz clock signal.
// Yes, we could really just use divide-by-four (3.9168 MHz) if
// going a tad bit faster wasn't an issue.
//
// How about this, 16 / (64 * 16/15) ~= 16/68. Multiply by 16,
// divide by 68. PLL = 250.6752 MHz, result = 3.6864 MHz. I guess
// that's a lot better. Alternatively, PLL = 250 MHz, result =
// 3.6765 MHz. Even better.
// We use shift registers or 1-bit inverters for high performance,
// minimal cycle overhead.
reg c16m_div2_cntr; // C16M / 2 counter
reg [5:0] c16m_div4_cntr; // Complex C16M -> C3_7M divider counter
reg [7:0] c16m_div8_cntr; // C16M / 8 counter
// Table of total supported RAM sizes, used by the RAM row refresh
// circuitry. Note that although hardware only has 23 address
// lines, and only 21 address lines are ever used for RAM, we
// define these registers as 24 address lines solely for Verilog
// source code readability.
wire [23:0] ramsz;
reg [23:0] ramsz_128k;
reg [23:0] ramsz_256k;
reg [23:0] ramsz_512k;
reg [23:0] ramsz_1m;
reg [23:0] ramsz_2m;
reg [23:0] ramsz_2_5m;
reg [23:0] ramsz_4m;
// C16M pixel clock (0.064 us per pixel).
// 512 horizontal draw pixels, 192 horizontal blanking pixels.
// 342 scan lines, 28 scan lines vertical blanking.
// 60.15 Hz vertical scan rate.
// (512 + 192) * (342 + 28) = 260480 pixel clock ticks per frame.
// Total screen buffer size = 10944 words. High-order bit of each
// 16-bit word is the leftmost pixel, low-order bit is the
// rightmost pixel. Words in ascending order move from left to
// right in the scan line, first scan line is topmost and then
// moves downward.
// The main and alternate screen buffer memory addresses are
// calculated by subtracting a constant from the installed RAM
// size. Deltas: main -0x5900, alt. -0xd900.
// Computed values for reference:
// 128K: main 0x1a700 alt. 0x12700.
// 256K: main 0x3a700, alt 0x32700.
// 512K: main 0x7a700, alt. 0x72700.
// 1MB: main 0xfa700, alt 0xf2700.
// 2MB: main 0x1fa700, alt 0x1f2700.
// 2.5MB: main 0x27a700, alt 0x272700.
// 4MB: main 0x3fa700, alt 0x3f2700.
// Please note: If we don't list the configuration in the table,
// it's not supported by the BBU. The BBU is a gate array, not a
// microcontroller!
// *HSYNC and *VSYNC counters are negative during blanking.
reg [15:0] vidout_sreg; // VIDOUT shift register
reg [4:0] vidout_cntr; // VIDOUT remaining counter
reg [9:0] vid_hsync_cntr; // *HSYNC counter
reg [8:0] vid_vsync_cntr; // *VSYNC counter
wire [23:0] vid_main_addr; // Address of main video buffer
wire [23:0] vid_alt_addr; // address of alternate video buffer
// Table of video memory base addresses.
// TODO FIXME: These should all be hard-wired constant registers.
reg [23:0] vid_main_addr_128k; reg [23:0] vid_alt_addr_128k;
reg [23:0] vid_main_addr_256k; reg [23:0] vid_alt_addr_256k;
reg [23:0] vid_main_addr_512k; reg [23:0] vid_alt_addr_512k;
reg [23:0] vid_main_addr_1m; reg [23:0] vid_alt_addr_1m;
reg [23:0] vid_main_addr_2m; reg [23:0] vid_alt_addr_2m;
reg [23:0] vid_main_addr_2_5m; reg [23:0] vid_alt_addr_2_5m;
reg [23:0] vid_main_addr_4m; reg [23:0] vid_alt_addr_4m;
// Sound and disk speed buffers are scanned 370 words per video
// frame, and the size of both buffers together is 370 words. Or,
// 260480 pixel clock ticks / 370 = 704 pixel clock ticks per word.
// In a single scan line, (512 + 192) / 704 = 704 / 704 = exactly 1
// word is read. The sound byte is the most significant byte, the
// disk speed byte is the least significant byte. Both the sound
// sample and disk speed represent a PCM amplitude value, this is
// used to generate a PDM waveform that can be processed by a
// low-pass filter to generate the analog signal.
// Well, at least in concept... Inside Macintosh claims that only a
// single pulse is generated, so this is not quite your typical PDM
// audio circuit. Nevertheless, the sample rate is 22.2555 kHz, so
// it's not too bad overall for generating lo-fi audio. But, good
// point to ponder, this is an area of improvement where a
// different algorithm can generate better audio quality.
// The main and alternate sound and disk speed buffer addresses are
// calculated by subtracting a constant from the installed RAM
// size. Deltas: main -0x0300, alt. -0x5f00.
// Computed values for reference:
// 128K: main 0x1fd00 alt. 0x1a100.
// 256K: main 0x3fd00, alt. 0x3a100.
// 512K: main 0x7fd00, alt. 0x7a100.
// 1MB: main 0xffd00, alt. 0xfa100.
// 2MB: main 0x1ffd00, alt. 0x1fa100.
// 2.5MB: main 0x27fd00, alt. 0x27a100.
// 4MB: main 0x3ffd00, alt. 0x3fa100.
// Please note: If we don't list the configuration in the table,
// it's not supported by the BBU. The BBU is a gate array, not a
// microcontroller!
reg [15:0] snddsk_reg; // PCM sound sample and disk speed register
wire [23:0] snddsk_main_addr; // Address of main sound/disk buffer
wire [23:0] snddsk_alt_addr; // address of alternate sound/disk buffer
// Table of sound and disk speed memory base addresses.
// TODO FIXME: These should all be hard-wired constant registers.
reg [23:0] snddsk_main_addr_128k; reg [23:0] snddsk_alt_addr_128k;
reg [23:0] snddsk_main_addr_256k; reg [23:0] snddsk_alt_addr_256k;
reg [23:0] snddsk_main_addr_512k; reg [23:0] snddsk_alt_addr_512k;
reg [23:0] snddsk_main_addr_1m; reg [23:0] snddsk_alt_addr_1m;
reg [23:0] snddsk_main_addr_2m; reg [23:0] snddsk_alt_addr_2m;
reg [23:0] snddsk_main_addr_2_5m; reg [23:0] snddsk_alt_addr_2_5m;
reg [23:0] snddsk_main_addr_4m; reg [23:0] snddsk_alt_addr_4m;
// We must be careful that the sound circuitry does not attempt to
// access RAM at the same time as the video circuitry. Because the
// phases are coherent, we can simply align the sound and disk
// speed RAM fetch to be at a constant offset relative to the video
// RAM fetch.
// PLEASE NOTE: We must carefully time our RAM accesses since they
// have delays and we don't want the screen bits shift register
// buffer to run empty before we have the next word available from
// RAM. Our ideal is that the next word is available from RAM just
// as we are shifting out the last pixel, so that we can use a
// non-blocking assign and the new first pixel will be available
// right at the start of the next pixel clock cycle. Otherwise,
// less ideal but easier to program would be to use two 16-bit
// buffers as a FIFO.
// SCC access notes: Even byte accesses are a read, odd byte
// accesses are a write. Namely: `*LDS` == 0 == write, `*UDS` == 0
// == read. Remember, it's big endian. What about the separate
// address regions? Well, I say just ignore those, it's there for
// a convenient convention, but it's not the officially documented
// hardware protocol.
// VIA support: Simply handle chip select, and issue an MC68000
// interrupt priority zero if we receive an interrupt signal from
// the VIA.
// SCSI support: Handle chip select, and handle DMA.
// NOTE: For all peripherals, we must set `*DTACK` from the BBU
// upon successful access condition and time durations because it
// is not set by the device itself.
//////////////////////////////////////////////////
// Pure combinatorial logic is defined first.
// TODO FIXME: We need a way to detect the 2.5MB RAM configuration
// and set the memory addresses accordingly. The BBU could do its
// own memory-test in this configuration to set a bit indicating
// that there is 2.5MB of RAM installed rather than 4MB.
assign ramsz
= (s64kram) ? // 64K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
ramsz_128k
: // 2 rows of RAM SIMMs
ramsz_256k
: (~mbram) ? // 256K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
ramsz_512k
: // 2 rows of RAM SIMMs
ramsz_1m
: // 1MB RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
ramsz_2m
: // 2 rows of RAM SIMMs
ramsz_4m
;
assign vid_main_addr
= (s64kram) ? // 64K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_main_addr_128k
: // 2 rows of RAM SIMMs
vid_main_addr_256k
: (~mbram) ? // 256K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_main_addr_512k
: // 2 rows of RAM SIMMs
vid_main_addr_1m
: // 1MB RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_main_addr_2m
: // 2 rows of RAM SIMMs
vid_main_addr_4m
;
assign vid_alt_addr
= (s64kram) ? // 64K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_alt_addr_128k
: // 2 rows of RAM SIMMs
vid_alt_addr_256k
: (~mbram) ? // 256K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_alt_addr_512k
: // 2 rows of RAM SIMMs
vid_alt_addr_1m
: // 1MB RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
vid_alt_addr_2m
: // 2 rows of RAM SIMMs
vid_alt_addr_4m
;
assign snddsk_main_addr
= (s64kram) ? // 64K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_main_addr_128k
: // 2 rows of RAM SIMMs
snddsk_main_addr_256k
: (~mbram) ? // 256K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_main_addr_512k
: // 2 rows of RAM SIMMs
snddsk_main_addr_1m
: // 1MB RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_main_addr_2m
: // 2 rows of RAM SIMMs
snddsk_main_addr_4m
;
assign snddsk_alt_addr
= (s64kram) ? // 64K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_alt_addr_128k
: // 2 rows of RAM SIMMs
snddsk_alt_addr_256k
: (~mbram) ? // 256K RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_alt_addr_512k
: // 2 rows of RAM SIMMs
snddsk_alt_addr_1m
: // 1MB RAM SIMMs
(~row2) ? // 1 row of RAM SIMMs
snddsk_alt_addr_2m
: // 2 rows of RAM SIMMs
snddsk_alt_addr_4m
;
// The remainder of definitions are for sequential logic.
always @(negedge n_res) begin
// TODO FIXME: Initialize all hard-wired constant registers on
// RESET. This should not be necessary. We could simply define
// constants and assign those instead, assuming the compiler
// understands the intent.
ramsz_128k <= 'h20000;
ramsz_256k <= 'h40000;
ramsz_512k <= 'h80000;
ramsz_1m <= 'h100000;
ramsz_2m <= 'h200000;
ramsz_2_5m <= 'h280000;
ramsz_4m <= 'h400000;
vid_main_addr_128k <= 'h1a700; vid_alt_addr_128k <= 'h12700;
vid_main_addr_256k <= 'h3a700; vid_alt_addr_256k <= 'h32700;
vid_main_addr_512k <= 'h7a700; vid_alt_addr_512k <= 'h72700;
vid_main_addr_1m <= 'hfa700; vid_alt_addr_1m <= 'hf2700;
vid_main_addr_2m <= 'h1fa700; vid_alt_addr_2m <= 'h1f2700;
vid_main_addr_2_5m <= 'h27a700; vid_alt_addr_2_5m <= 'h272700;
vid_main_addr_4m <= 'h3fa700; vid_alt_addr_4m <= 'h3f2700;
snddsk_main_addr_128k <= 'h1fd00; snddsk_alt_addr_128k <= 'h1a100;
snddsk_main_addr_256k <= 'h3fd00; snddsk_alt_addr_256k <= 'h3a100;
snddsk_main_addr_512k <= 'h7fd00; snddsk_alt_addr_512k <= 'h7a100;
snddsk_main_addr_1m <= 'hffd00; snddsk_alt_addr_1m <= 'hfa100;
snddsk_main_addr_2m <= 'h1ffd00; snddsk_alt_addr_2m <= 'h1fa100;
snddsk_main_addr_2_5m <= 'h27fd00; snddsk_alt_addr_2_5m <= 'h27a100;
snddsk_main_addr_4m <= 'h3ffd00; snddsk_alt_addr_4m <= 'h3fa100;
// Initialize all output registers on RESET.
c8m <= 0;
c3_7m <= 0;
c2m <= 0;
n_dtack <= 1; n_ipl0 <= 1; n_ipl1 <= 1; n_berr <= 1;
ra7 <= 0; ra9 <= 0;
n_cas1l <= 1; n_cas0l <= 1;
ram_r_n_w <= 0; n_ras <= 1;
n_cas1h <= 1; n_cas0h <= 1;
n_en245 <= 1;
n_pmcyc <= 1;
vidout <= 0; n_hsync <= 1; n_vsync <= 1;
snd <= 0; pwm <= 0;
n_iwm <= 1; n_sccen <= 1; n_sccrd <= 1; n_iow <= 1;
n_scsi <= 1; n_dack <= 1;
n_earen <= 1;
// Initialize all internal registers on RESET.
c16m_div2_cntr <= 0;
c16m_div4_cntr <= 1;
c16m_div8_cntr <= 1;
vidout_sreg <= 0;
vidout_cntr <= 0;
vid_hsync_cntr <= 0;
vid_vsync_cntr <= 0;
snddsk_reg <= 0;
end
always @(posedge c16m) begin
if (n_res) begin
// All high speed sequential logic goes here.
// Generate the frequency-divided clock signals.
if (c16m_div2_cntr == 1) c8m <= ~c8m;
c16m_div2_cntr <= ~c16m_div2_cntr;
if (c16m_div4_cntr[3] == 1) begin
c3_7m <= ~c3_7m;
c16m_div4_cntr <= 1;
end
else
c16m_div4_cntr <= { c16m_div4_cntr[4:0], c16m_div4_cntr[5] };
if (c16m_div8_cntr[7] == 1) c2m <= ~c2m;
c16m_div8_cntr <= { c16m_div8_cntr[6:0], c16m_div8_cntr[7] };
end
end
always @(posedge c8m) begin
if (n_res) begin
// All CPU speed sequential logic goes here.
end
end
always @(posedge c3_7m) begin
if (n_res) begin
// All peripheral speed sequential logic goes here.
end
end
always @(posedge c2m) begin
if (n_res) begin
// Only DRAM operations go here.
end
end
always @(negedge c2m) begin
if (n_res) begin
// Only DRAM operations go here.
end
end
endmodule
/*
Any time we need to request DRAM access, we need to implement a
finite-state machine as follows.
For CPU memory accesses:
1. The *AS signal is asserted. This signals to us that we must
process DRAM on behalf of the CPU, provided that the upper
address bits are within the range of DRAM. We signal our
internal state accordingly.
2. Just before the proper cycle time we schedule PMCYC to be
enabled.
3. We assist in setting the row access strobe from what is not
set from the address multiplexers. We set write-enable if
required, depending on what the R/*W input is.
4. We assist in setting the column access strobe likewise.
4. Once the right DRAM cycle time has transpired, we enable the
EN245 bus switcher to the DRAM and we acknowledge DRAM is
ready to access via *DTACK. We finally terminate once *AS is
no longer asserted.
How do we handle 8-bit writes? Easy, we can use the separate
column access strobes to simply never access the columns we don't
want to mutate.
1. Enqueue a DRAM access request and set our state to waiting for
DRAM.
2. The next clock cycle finds out that a DRAM request is
enqueued, so the DRAM logic goes through its DRAM access
nexus, which again is a finite-state machine,.
1. Check if we must yield to processor memory accesses, to
prevent the BBU from starving the processor of memory
cycles. If so, let the processor go first, and continue
with our use.
2. Set the initial write-enable flag, and send the row access strobe.
3. Send the column access strobe.
4. After DRAM's cycle time, signal that the request has been
completed.
3. Our code operating on its sequential cycle clock will poll the
request status and see that the request has been completed.
Then it will read the DRAM output into its own register and we
will signal to execute the DRAM completion actions.
DRAM refresh? See if we can do this during horizontal trace I
guess.
Important notes, DRAM initialization, you must do at least 8 cycles of
RAS refresh or CAS before RAS refresh before the DRAM is ready to use.
Macintosh Plus and newer DRAM speed is rated at a maximum access time
of 150 ns. So, the 2 MHz clock is quite appropriate for row and
column access strobes. Finally, a note on timing. The important
thing is to make sure there is sufficient delay after asserting RAS
and CAS. You can just have the delay times uniform and the DRAM
readout is immediately accessible after. Oh, and write-enable?
Typically that is asserted before asserting CAS, but after asserting
RAS. However, as I see it, asserting earlier does no harm.
Write down all my questions thus far about the BBU:
* Where the boot-time OVERLAY controlled? I see no input signal
to the BBU, but I'd assume the BBU must be responsible for the
boot-time address overlays.
* Is SNDPG2 support truly eliminated from the Macintosh SE?
* How does the BBU refresh the DRAM? Is this once at the end of
drawing a video frame? Unlike the Apple II, video frame
drawing doesn't automatically refresh the DRAM because it
doesn't access all rows of DRAM.
Is column access strobe required for DRAM refresh, or does using
only row access strobe work just fine? Another way of asking, do we
use CAS before RAS refresh? I'd say we don't make the assumption
for greater flexibility on installed memory options.
* What are the timing requirements for DRAM access? When does
write-enable need to be set to function properly?
* What is the default configuration of the data bus when the CPU
is not requesting access? I'm assuming it is switched to
high-impedance, i.e. all of ROM, RAM, and peripherals are
disabled and not accessing the bus.
* Is the BBU designed to use CBR refresh on the DRAM, or does it use
RAS refresh?
/*
Okay, I think I'm sold on how to implement the RAM accessor circuit.
1. Set an internal register to signal the request, and the origin of
the request (CPU or BBU). We actually have two sets or registers
so both requests can be issued at the same time, and they will be
emptied in priority order, BBU first. This works fine since the
BBU does not tie up DRAM during all horizontal screen cycles,
only 50% of them. Alternatively, instead of setting an internal
register, we can use a wire and condition network to signal the
condition.
I say, wwhen we have questions like this, where we could design
"pipelining" or not, write your code so that we have blocks to
set and check the condition, and the decision whether to set and
read from a register can be programmed in easily.
2. On the C16M clock, check that this register is set. If it is,
set another register to schedule the next set of actions we will
execute.
3. On the falling edge of the C2M clock, as sampled by the C16M
clock, issue the row address. For processor memory cycles, this
means enabling *PMCYC (drop to zero) and setting the bits
directly controlled by the BBU. RAM address lines not set by the
BBU are switched to high-impedance. For BBU memory cycles, all
RAM address lines are set by the BBU, and *PMCYC is not used.
Set a register to schedule the next action.
4. On the very next C16M clock cycle, assert *RAS (drop to zero).
We wait one C16M cycle so that we ensure the edge trigger results
in reading a valid address. In the case of writes, assert
write-enable and the RAM data to write. For processor writes,
this means asserting *EN245 (drop to zero). For BBU writes, this
means setting the RAM data lines. Set a register to schedule the
next action.
5. On the rising edge of the C2M clock, as sampled by the C16M
clock, issue the column address. Same deal as for the row
address except that *PMCYC is already set ahead of time. Set a
register to schedule the next action.
6. On the very next C16M clock cycle, assert *CAS (drop to zero).
Set a register to schedule the next action.
7. After 4 C16M clock cycles, trigger the RAM access completion
conditions. De-assert *PMCYC, *RAS, *CAS, write-enable, *EN245,
RAM address lines, and RAM data lines (if applicable). But do
not de-assert *RAS, *CAS, and *EN245 for processor read memory
cycles. Assert *DTACK (drop to zero) for processor memory
cycles. Capture the RAM data read in an internal register for
BBU memory cycles. In the case of DRAM refreshes, increment the
refresh row address. In the case of BBU memory cycles, we can
immediately check if there are queued memory access requests and
execute the starting actions, or clear the registers to signal
completion. In the case of processor read memory cycles, we set
a register to wait for the event completion signal, namely the
de-assertion of *AS.
NOTE: Another idea. High-speed and low-speed circuit
communication may sound tricky, but this is our saving grace.
Our register latch updates are always assumed to be precise
enough to occur on a 16 MHz clock, it's just that the
intermediate computation actions leading up to that may take
longer. So, we can have both low-speed and high-speed circuits
access the same register, so long as they use combinatorial logic
gates to only ever allow one access at any given time.
For the sake of high-speed circuits, technically we'll set
a register to indicate an increment request and then do the
actual increment on a lower-speed clock. Then... because of the
clock differences, we must use two different register sets to
avoid write conflicts when we want to clear the register.
Message passing between high-speed and low-speed clock circuits.
8. For processor read memory cycles, once *AS is deasserted (raised
to one), de-assert *RAS, *CAS, and *EN245. Immediately check if
there are queued memory access requests and execute the starting
actions, or clear the registers to signal completion.
In practice, we will use a single shift register for the finite
state machine states, shift register for performance since it is
faster than adder and compare circuits.
*/