diff --git a/docs/emulating-a2hssc.txt b/docs/emulating-a2hssc.txt new file mode 100644 index 0000000..e6c4269 --- /dev/null +++ b/docs/emulating-a2hssc.txt @@ -0,0 +1,1925 @@ + EMULATING THE APPLE HIGH SPEED SCSI CARD: AN EXERCISE IN DIGITAL ARCHAEOLOGY + + by James Hammons + + ~~==< Brought to you in Glorious 80-Column Monospace-o-Vision(TM) >==~~ + + +Motivations +----------- + +While reading 4am's Twitter feed one day, he talked about his "Pitch Dark" hard +drive image, which looked incredibly cool and like something that I would very +much be interested in. But in reading about it, I came across a seemingly +throwaway line about how all decent emulators can run them, which, sadly, +Apple2 could not at the time. And so, in order to save Apple2 from indecency +(and because I wanted to see if I could get 4am's "Pitch Dark" to work because +it looked cool and interesting), I set about for finding some documentation on +how hard drives interfaced to Apple IIs--and ran into a complete dearth of +information. There were little things sprinkled around here and there, but +nothing of any deep, satisfying, technical significance. + + +In Order To Run A Hard Drive Image, You Must First Create The Universe +---------------------------------------------------------------------- + +While it's a nice bit of hyperbole, it's not exactly true that you have to +first create the Universe, as fortunately, that part has largely been taken +care of. However, you still have to figure out how to emulate it if you are +keen on running a hard drive image on your emulator of choice. And in so +doing, you have to figure what the requirements are; what the minimal pieces +are that are required to have a functioning hard drive system; you also have to +figure out how that system talks to the emulated computer. And that all +requires information. I wasn't asking for much, but something along the lines +of Jim Sather's "Understanding The Apple IIe" for hard drives would have been a +nice thing to have. + + +The Next Part, In Which Nice Things To Have Are Not Forthcoming +--------------------------------------------------------------- + +Unfortunately, Jim Sather, and nobody else as far as I can tell, ever wrote +such a document, and so I did what any lazy programmer would do: I took a look +at some other project's source--in this case, AppleWin's source. I didn't +really *want* to look at it, having looked at it before and recoiled in horror +at the sight, but, my search-fu apparently being not up to the task of finding +relevant information drove me to it. And looking at it didn't really provide +any illumination; to me it looked like some kind of hacky thing and I wasn't +interested in that kind of approach at all--so I abandoned the idea. As I dug +a little deeper into the minute literature that existed as such on the subject, +I learned that pretty much any time you wanted to hook up a hard drive to your +Apple II, you had to use an interface card, and typically that meant some kind +of SCSI card. And looking here, there was no shortage of SCSI cards that you +could use to hook up your hard drive therewith. + +So, that being a promising looking path to pursue on the road to this +particular perdition, the question then became, which one should I choose? At +first I thought the RAMFast card would fit the bill as it seemed to be very +popular, but there was literally no technical infomation on the thing. The +Apple SCSI card looked promising, but then I saw that it "ghosted" a slot, +meaning that it would have to occupy two consecutive slots in order to work and +I didn't much care for that. And so, after looking at, and rejecting, card +after card for pretty much the same reason, I settled on the Apple High Speed +SCSI card for a few reasons--one, it was purportedly fast; two, it worked on +the Apple IIe (as well as the IIgs, but I didn't really care that much about +that to be honest); three, it had a users manual that wasn't completely devoid +of technical information; four, it had a schematic; and five, it had a firmware +image. This looked like a promising start--how hard could it be to make this +work? + + +Things Aren't Exactly Hard, But They Aren't Exactly Soft Either +--------------------------------------------------------------- + +One of the necessary things that I didn't have out of all of that was good +information on how the thing worked. I knew that it was a SCSI card, and I +knew that it talked to the SCSI bus using an NCR 53C80 chip, but I had no idea +exactly how. But I did have something that *did* know how to talk to it: the +firmware for the card. + +Now when you take a look at the firmware, the first thing you notice is that +it's 32K in size--which is *much* larger than the typical 256 bytes that you +encounter when looking at Apple II card drivers. It also happens to be quite a +bit larger than the 2K "bonus" space that Apple II cards have available to them +in the $C800 to $CFFF address space. So what gives? + +Fortunately for me, Apple2 has a built-in disassembler (which will probably +stay in for all time, as it turns out to be a very useful thing to have on +hand), and so I split that out into a stand-alone command line driven program, +called d65c02, in order to be able to disassemble such things as device driver +firmware blobs. It isn't fancy, it doesn't do any analysis on what is code and +what is data, but it gets the job done in turning incomprehensible binary +gibberish (except to certain mad geniuses who will go heretofore unnamed) into +human readable ASCII gibberish. Thus I used said tool to disassemble the +firmware blob. + +Pulling up the results in my text editor, I could see that at least the front +of the listing looked like it could plausibly be code that would go into the +usual 256 byte card slot address space of $Cx00 to $CxFF, where x ranges from 1 +to 7 depending on the slot number. Looking further, I could see this first 256 +bytes of code was repeated three times, meaning that this was a good candidate +for the slot device code. I could also see that it was written as relocatable +code, and it contained this little tidbit: + +001B: A9 60 LDA #$60 ; Stuff an RTS into RAM somewhere +001D: 8D F8 07 STA $07F8 +0020: 20 F8 07 JSR $07F8 ; Jump there and return in order to get evidence + ; of where in memory we did it from +0023: BA TSX ; Retrieve the stack pointer +0024: BD 00 01 LDA $0100,X ; Get the hi byte of the address we just pushed on + ; the stack in order to come back here +0027: 8D F8 07 STA $07F8 ; & save it for later perusal + +which meant that it was an excellent candidate for the slot device code. But +why should that be? + + +A Short Digression Into Why Slot Code Must Be Relocatable +--------------------------------------------------------- + +Slot code must be relocatable because such a card may be installed into any +given slot in an Apple II--which means its code will show up anywhere from +$C100 to $C700 (it always shows up on a page boundary). By virtue of this, it +also means that the I/O address for the card will also show up in the +corresponding $C090 to $C0F0 address range (it always shows up on a 16-byte +boundary). And so, because of this, you have to write your slot code in such a +way that it will work regardless of which slot it's installed in, which means +the code must be relocatable--which ultimately means you can't use any JMP +instructions to addresses in your driver, and you can't use absolute addressing +to refer to stuff in the slot address space. + +So, using the above code, a clever coder can figure out what slot their code is +executing in and they can then use that knowledge to figure out which is the +proper I/O range to use for the card. All this being necessary in order to +make a seamless experience for the end user of the card. + + +The Next Part, In Which 32K Is Still Larger Than 256 +---------------------------------------------------- + +So, in looking at the code that comes after the Code Which Looks Like It +Belongs In Slot Memory (which makes the wonderful acronym CWLLIBISM), I noticed +that it seemed to be organized in 1K chunks. And further persual of said +chunks made it seem very likely that they resided in the $CC00 to $CFFF memory +space. However, the "extra" memory space given to cards to use starts 1K +earlier--at $C800. What could this mean? + +Well, in looking at the schematic for the card, one not only finds the 32K ROM +chip, but also an 8K static RAM. Which means that it's very likely that the +address space from $C800 to $CBFF is mapped to that 8K static RAM. But 8K is +larger than 1K; how does that work? + +As it turns out, it's bank switched, but I didn't know it at the time--we'll +get to that eventually. In the meantime, with further perusal of the code (the +code gets perused quite a bit), it seems very likely that the 1K address range +from $C800 to $CBFF is said RAM as that range is written to by the 1K code +chunks quite frequently. + +Finding that the code in the firmware is divvied up into 1K chunks would seem +to imply that it's bank switched into the $CC00 to $CFFF range. And in looking +at the CWLLIBISM, we see the following: + +005C: A9 0B LDA #$0B ; Get 11 in the accumulator +005E: AE 08 C8 LDX $C808 ; Get offset to proper I/O space in X +0061: 5A PHY ; Save Y on the stack for later +0062: A8 TAY ; Copy the accumulator to Y +0063: 29 1F AND #$1F ; Strip off the upper three bits +0065: 9D 6E C0 STA $C06E,X ; & write to card I/O location $E + +which implies it heavily. Taking the number put into the accumulator and then +masking out the lower 5 bits creates a range that goes from 0 to 31, which is +32 distinct values, which corresponds to 32 1K chunks of code. + +The above code, which is part of the initialization of the card, heavily +implies that it's selecting a 1K chunk of code from bank 11 (counting from +zero, naturally) to put into the $CC00 to $CFFF address range. And so we get +to(*) look there for a start. + +(*) While changing 'have to' to 'get to' can make life awesome in many ways, +this is far from a universal truth. 'Getting to' have one's arm amputated is +never, ever awesome + + +The Next Part, In Which We Sadly Bid Adeiu To CWLLIBISM +------------------------------------------------------- + +But before we do that, in order to understand what's going on in those wicked +little 1K chunks of code, we should first take a closer look at CWLLIBISM. So +let's jump in: + +0000: A2 20 LDX #$20 ; The bytes after the LDX # identify this card as +0002: A2 00 LDX #$00 ; being capable of SmartPort calls, and the $82 at +0004: A2 03 LDX #$03 ; $FB further identifies it as a SCSI card ($2) +0006: A2 00 LDX #$00 ; that supports extended calls ($8). + +The way that I was able to find out that this seemingly useless bit of code was +a way of identifying SmartPort capable cards was in the serendipitous find of +the "Technical Manual for the Apple SCSI Card"(*), which, while helpful in some +ways, was almost completely useless in trying to figure out the what the card +I/O addresses did. + +(*) No relation to the Apple High Speed SCSI Card + +0008: 2C 58 FF BIT $FF58 ; Check byte in ROM (usually, an RTS lives here) +000B: 70 05 BVS $0012 ; Bit 6 set? >> $12 (which means, this branch + ; will be taken...) + +This little tidbit checks a ROM location that usually carries an RTS (at least +it does in the Apple IIe), which is $60. Which means that the following BVS +will always be taken and skip over the following: + +000D: 38 SEC ; ProDOS entry point +000E: B0 01 BCS $0011 ; Branch over the following CLC +0010: 18 CLC ; SmartPort DISPATCH +0011: B8 CLV ; Signal we're doing normal I/O, not init code + +So this clever little bit here, according to the "Technical Manual for the +Apple SCSI Card", sets some flags so that later on in the firmware, it can +discern whether it's being called from ProDOS (in which the carry flag will be +set) or if it's a SmartPort call (in which the carry flag will be clear). +Either way, the overflow flag is cleared to let the firmware know that this is +a request to talk to the drive, and not initialization. Initialization skips +over this code and ends up here: + +0012: D8 CLD ; Clear the decimal flag, to prevent bad math +0013: 08 PHP ; Save the carry & overflow flags for later +0014: 78 SEI ; Turn IRQs off +0015: AD FF CF LDA $CFFF ; Turn INTC8ROM off (puts card in $C800-CFFF) +0018: 8D 00 CC STA $CC00 ; ??? + +This bit of code is a bit of housekeeping; making sure the decimal flag isn't +set so that ADC & SBC both work as expected, saving the flags register so that +the firmware code later can determine whether it's an initialization call or a +regular I/O call, making sure that IRQs don't happen while in the firmware +code, and turning on the "extra" addresses in the $C800 to $CFFF range. + +The store to $CC00 is mysterious, as it's a ROM location and stores to ROM +locations are usually void and of null effect. This likely means that it's +some kind of soft-switch that controls something in card, but exactly what +would require a few things that I don't have, namely: the contents of the two +PALs on the card (which sit between the address lines of the slot and the rest +of the card), and a description of what the ports on the Sandwich II do (the +chip that sits between the Apple IIe proper and the NCR 53C80). So, moving +right along: + +001B: A9 60 LDA #$60 ; See where we're executing from +001D: 8D F8 07 STA $07F8 +0020: 20 F8 07 JSR $07F8 +0023: BA TSX +0024: BD 00 01 LDA $0100,X ; Get the address we just pushed on the stack +0027: 8D F8 07 STA $07F8 ; Save it + +We've seen this already, this is the code that determines which slot it's +sitting in. Say, for example, that it's sitting in slot 7; the byte that it +will retrieve from the stack will be $C7 (for the sake of completeness, the lo +byte will be $22--as to why, this is left as an exercise for the reader). In +order to turn that into something that it can use to hit the proper slot I/O +addresses, it does the following: + +002A: 29 0F AND #$0F ; Get the lo nybble +002C: 0A ASL A ; Multiply it x16 +002D: 0A ASL A +002E: 0A ASL A +002F: 0A ASL A +0030: 18 CLC +0031: 69 20 ADC #$20 ; Add $20 to it for some reason +0033: AA TAX ; & stick in the X register + +The important part of the $C7 hi byte of the address we found through +cleverness and trickery is the slot number, which will always fall in the lower +4 bits. And, in order to be useful to find the correct slot I/O address range, +that slot number needs to be multiplied by 16, as each of the slot I/O address +ranges cover exactly sixteen bytes. Note that masking off the bottom 4 bits, +as is done with the AND #$0F instruction, is unnecessary as the four ASL A +instructions after it will necessarily shift the top four bits out of the +picture. + +The one thing that stands out as not typical of this kind of device driver code +is the adding of $20 to the index. Typically, writers of this kind of I/O code +will use $C080 to $C08F (plus the contents of the X register to reach the +correct slot I/O range) as the base address for slot I/O, but, for some reason, +the writers of this card's firmware chose to use $C060 to $C06F, thus +necessitating the addition of $20 to the value in the X register to reach the +correct range for slot I/O. + +0034: A9 00 LDA #$00 ; +0036: 9D 6E C0 STA $C06E,X ; Select bank #0 (register $E, lower 5 bits) +0039: A9 0F LDA #$0F +003B: 9D 6F C0 STA $C06F,X ; Store a $F in register $F +003E: 8E 08 C8 STX $C808 ; Put slot # at $C808 (banked RAM in $C800-CBFF) +0041: 9C 09 C8 STZ $C809 ; Put zero at $C809 +0044: 9C F2 C8 STZ $C8F2 ; & $C8F2 + +One thing I forgot to mention is that the Apple High Speed SCSI card is only +usable by enhanced Apple IIe and IIgs machines, and that's because it relies on +instructions only found in the 65C02 like STZ and PHY; a regular 6502 will not +even remotely do the same things that those instructions do on the 65C02--so +they're right out. + +At any rate, the above code does some writing to the slot I/O address range and +sets up some values in the card's static RAM, including saving the contents of +the X register for later. + +0047: A2 22 LDX #$22 ; Transfer 35 bytes from ZP ($40) to $C82D +0049: B5 40 LDA $40,X +004B: 9D 2D C8 STA $C82D,X +004E: CA DEX +004F: 10 F8 BPL $0049 + +This bit of code transfers 35 bytes in page zero RAM to the card's static RAM, +presumably to restore them later. + +0051: AD F8 07 LDA $07F8 ; Get original $Cx byte again +0054: 8D 01 C8 STA $C801 ; Put it in $C801 +0057: A9 61 LDA #$61 ; +0059: 8D 00 C8 STA $C800 ; Put $61 in $C800 (= $Cx61) +005C: A9 0B LDA #$0B +005E: AE 08 C8 LDX $C808 ; Get X from $C808 + +This little bit of code sets up for the code that comes below; it sets up +locations $C800-1 as a location for an indirect jump that seems to happen a lot +in the 1K chunks that come later. The address it sets up as the jump target is +the code that comes next: + +0061: 5A PHY ; Save Y (follow on bank, passed in by caller) +0062: A8 TAY ; Save A register +0063: 29 1F AND #$1F ; Mask off the lower 5 bits +0065: 9D 6E C0 STA $C06E,X ; First time, select bank 11:0 (I/O register $E) +0068: 98 TYA ; Restore the A register +0069: 29 E0 AND #$E0 ; Mask off the upper 3 bits +006B: 4A LSR A ; & shift them down +006C: 4A LSR A +006D: 4A LSR A +006E: 4A LSR A +006F: A8 TAY ; Use as an index into a table (Y x 2) + +What this does is save the Y register on the stack, then separates the +accumulator into a upper 3-bit part and a lower 5-bit part. The lower 5 bits +go into I/O slot register $E, which presumably selects which 1K chunk of code +will appear in the $CC00 to $CFFF address range while the upper 3 bits are used +as an index into a table that appears near the end of each 1K chunk: + +0070: B9 F0 CF LDA $CFF0,Y ; Get address of current 1K bank +0073: 85 54 STA $54 ; & stuff it into $54/55 +0075: B9 F1 CF LDA $CFF1,Y +0078: 85 55 STA $55 + +So it uses the Y register as index into the current selected bank's $CFF0 +address range and stuffs them into $54 and $55, so that it can jump to the +address at some point. + +007A: AD F8 07 LDA $07F8 ; Get original $Cx byte again +007D: A8 TAY ; Put it in Y +007E: 48 PHA ; Put it to the stack +007F: A9 86 LDA #$86 +0081: 48 PHA ; Push $86: return address is now $Cx87 + +What this does is set up the stack for what I'm going to name (for lack of a +better term, or any at all to be honest) an "RTS call". This takes advantage +of how the CPU uses the stack to return execution to the instruction after a +JSR instruction: when the CPU encounters a JSR opcode, it pushes the the +location of the program counter, plus two, onto the stack before loading the +program counter with the address that comes after the JSR. When an RTS opcode +is then encountered, it restores the program counter from the stack and adds +one to it before resuming execution. + +The upshot of this is that you can transfer execution of a program from one +place to the next, without using JMP, JSR or branch instructions by simulating +this behavior--which also turns out to be a necessity when you're writing +relocatable code. So what the above code does is set up the stack so that it +will jump to location $Cx87 when it encounters an RTS. + +0082: 5A PHY ; Push $Cx +0083: A9 8B LDA #$8B ; Push $8B: return address is now $Cx8C +0085: 48 PHA + +Similarly, this code sets up the stack so it will jump to $Cx8C when it +encounters an RTS as well. So it will go there first, then to $Cx87 second +when the routine first called via RTS call, er, uh, returns. + +0086: 60 RTS ; First time, will "return" to $Cx8C + +Thus, this first RTS transfers control to the JMP ($0054) down below, which was +set up above as an address somewhere in a 1K code chunk. Since the code that +goes into the 1K code chunk is a JMP instruction, once that code returns, it +will then find the address that was pushed on the stack earlier, and execute +the following code: + +0087: 68 PLA ; After the $CCxx block is done, it comes here +0088: 9D 6E C0 STA $C06E,X ; Restore last block (one passed in Y reg) +008B: 60 RTS ; & return to calling code in that block + +This code pops the Y register that was saved way back up at location $Cx61 and +uses it to set the I/O register at $E, which, presumably, is the bank switch +I/O address for the card. This will turn out to be of vital importance later, +but we'll leave it for now. The RTS, finally, returns from initialization and +back from whence it came. + +008C: 6C 54 00 JMP ($0054) ; Jump to the $CCxx block code + +This indirect JMP instruction, called up above via RTS call, kicks things off. + +008F-00FA: 00 ; $6B worth of zeroes +00FB: 82 00 00 BF 0D ; ID/offset bytes + +So these bytes that look like a bit of detritus actually do serve a useful +function in ProDOS. The $0D at the very end serves as an offset from the +beginning of the code to the ProDOS entry point, which in this case works out +to $Cx0D. It also serves as the entry point for SmartPort calls (by adding 3 +to it), which works out to $Cx10. + +Further, the "Technical Manual for the Apple SCSI Card" says the following +about the byte at $FB: "An additional byte, at $CnFB, should contain $82, +indicating that the device is the SCSI card ($2) and that it supports extended +calls ($8)." This just happens to be one of a small handful of those +aforementioned tiny bits of useful information that I was able to glean from +that source. + +And so, at last, we come to the realization that this is definitely the slot +ROM code, and thus CWLLIBISM becomes CWSISM (Code Which Sits In Slot Memory). + + +And Now For Something Not Quite So Completely Different +------------------------------------------------------- + +And with that digression into CWSISM, we turn our attention back to the 1K +chunk of initialization code that sits in bank 11. In looking at the table +that we discovered sits at $CFF0, we find the following in the 11th (counting +from zero) 1K chunk: + +CFF0: 00 CC +CFF2: 91 CE +CFF4: 9A CD +CFF6: 00 00 00 00 00 00 00 00 00 00 + +This tells us that there are only three valid addresses in the table (as the +zeroes will take you nowhere), and that further, they are $CC00, $CE91 and +$CD9A. And since the CWSISM set up the $Cx61 dispatch call with $0B (at +$Cx5C), it will pick the zeroeth address in that list, namely, $CC00. So, +looking at the code that lies there, what we see looks promising: + +CC00: 68 PLA ; Discard the 2nd return path (bank switch back) +CC01: 68 PLA +CC02: 68 PLA ; Discard the follow on bank #, as there is none + +Since this is initialization code, we can discard the RTS call from the stack +since we aren't calling this code from another bank. Which also means that we +can discard that parameter which tells the RTS call what bank to select before +returning. + +CC03: 86 5E STX $5E ; Save slot # (+$20) in $5E +CC05: 9C 93 C8 STZ $C893 ; Zero out $C893 & $5D +CC08: 64 5D STZ $5D +CC0A: 20 C1 CC JSR $CCC1 ; Test for GS hardware + DMA switch + +This is basically housekeeping, and the routine called at $CCC1 tests if the +card is running on an Apple IIgs and sets bit 6 of zero page location $5D if it +detects that. It also checks the physical DMA on/off switch on the card as +well; if it's set, it sets bit 5 of $5D. The following bit of code checks $5D +to see if bit 6 is clear and skips the instructions at $CC11 to $CC19 if +so--and since I'm emulating an Enhanced Apple IIe, it *will* skip those +instructions: + +CC0D: 24 5D BIT $5D ; Check if bit 6 of $5D is set (means it's a GS) +CC0F: 50 0B BVC $CC1C ; Skip over if not set (it's not a IIgs) +CC11: AD 36 C0 LDA $C036 ; IIgs Speed Reg. +CC14: 8D 96 C8 STA $C896 ; Save it for later... +CC17: 09 80 ORA #$80 ; Set speed to 2.8 MHz +CC19: 8D 36 C0 STA $C036 ; & modify + +Luckily there exists a very good techinical reference manual for the Apple +IIgs; unluckily, it's a bit hard to track down. But once you do, the +information in it is quite good. The above bit of code shows that the card +firmware shifts the IIgs into high gear while running on the card. However, we +don't really care about that bit of code; which is why we spent so much time +explaining what it does. + +CC1C: 68 PLA ; Get flags from slot init + +Way back in CWSISM, at slot location $Cx13, there was an innocuous looking PHP +instuction; here is where we finally take a look at the contents of it. + +CC1D: A8 TAY ; Save them in Y +CC1E: 29 04 AND #$04 ; Check if I flag is set +CC20: F0 05 BEQ $CC27 ; Skip if I is not set +CC22: A9 80 LDA #$80 ; Else, signal I flag is set ($80 -> $C893) +CC24: 8D 93 C8 STA $C893 + +Here we look at the interrupt disable bit in the processor flags that we saved +earlier; if it's not set we skip on over to the next bit of code below. +Otherwise, the code sets $80 into memory location $C983 to signal that +initialization code was called with the I flag set. + +CC27: 98 TYA ; Restore flags from Y +CC28: 09 04 ORA #$04 ; Set I flag +CC2A: 48 PHA ; Push them to the stack +CC2B: 28 PLP ; & restore flags for real + +Since we need to get the values of the overflow and carry flags back, which +were set way back in CWSISM at addresses $Cx0D through $Cx11, we have to +retrieve them from the Y register, then push them onto the stack and then use a +PLP to get them back into the flags register proper. Along the way, we set the +interrupt disable flag at $CC28 (the ORA #$04 instruction). + +And in looking at code as we're doing here, it's hard not to look at it with a +critical eye and notice that the coder could have saved a byte by deleting the +ORA #$04 (which takes two bytes) and putting an SEI after the PLP (which takes +one byte). And, since we don't have any source code to look at, we may never +know what the intention was; though it's quite likely that this was just a +simple oversight. + +CC2C: 50 09 BVC $CC37 ; If SmartPort call, skip over + +Here we see that if the card firmware was called via the SmartPort vector at +$Cx10, the overflow flag would be clear and we would skip over the following. +But, since the flag was definitely set, we know that we will execute what +follows: + +CC2E: BA TSX ; Slot init & regular ProDOS dispatch get here +CC2F: 8E 07 C8 STX $C807 ; Save stack pointer in $C807 +CC32: A9 0F LDA #$0F +CC34: 4C 5F CF JMP $CF5F ; Jump to bank 15:0 for rest of init + +This saves the stack pointer and sets up to jump to a new bank, which means we +won't be coming back here. Onward: + +CF5F: A6 5E LDX $5E ; Restore slot # (+$20) in X +CF61: A0 0B LDY #$0B ; Y gets loaded with bank to return to on RTS +CF63: 6C 00 C8 JMP ($C800) ; & go! + +There are variants of this piece of code throughout every 1K bank of firmware +code. And since we took a good long look at CWSISM, we know that CWSISM set up +location $C800 and $C801 to point to the card slot I/O location of $Cx61, and +suddenly it becomes clear what that bit of code does. + +Since the firmware code bounces around a lot in different banks (as we will +discover shortly), it needs a mechanism to get back to the place that called it +in the first place. The problem is this: once a new 1K bank of code is +switched into the $CC00 to $CFFF address space, there's no way for the 65C02 to +get back to the caller with a simple RTS; any code that attempted to do so +would end up executing the wrong code as the 65C02 knows nothing about bank +switching and has no built-in mechanism to handle such things. + +And so, by virtue of this, the code needs a way to do this manually. Which is +why the $Cx61 code in CWSISM saves the bank number on stack, and then sets up a +pair of RTS calls which first, sets the correct bank and calls the correct +function number in that bank and second, sets the bank to the bank that made +the call in the first place before executing a final RTS which then goes back +to the correct address. + +And since we saw up above that it passed $0F into the calling routine (well, +actually, it jumped there), we know that it's going to call function #0 in bank +15. As it turns out, the function table for bank 15 looks like this: + +CFF0: 00 CC +CFF2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + +which means bank 15 only contains one function, and it starts at $CC00. + + +The Next Part, In Which We Peruse Bank 15 +----------------------------------------- + +The story so far: we started in slot ROM, set up a bunch of variables, then +bounced to bank 11, and just now bounced to bank 15. + +CC00: A9 40 LDA #$40 +CC02: 8D 09 C8 STA $C809 ; Put $40 into $C809 +CC05: 8D 32 BF STA $BF32 ; & $BF32(!) +CC08: 9C 0A C8 STZ $C80A ; Zero out $C80A + +So far this is all normal housekeeping boilerplate, though putting the value +$40 into RAM at address $BF32 makes me raise an eyebrow (to this day, I still +have no idea what that's supposed to do). So then we come to the heart of the +matter: + +CC0B: A9 03 LDA #$03 +CC0D: 20 AF CF JSR $CFAF ; Call bank 3:0 (enumerate all connected drives) + +Here is the first proper JSR into bank switched code, and in taking a cursory +glance at the code there, well... It's a bit of a Gordian knot. So we'll +ignore the stones in the field for now, and keep on plowing ahead: + +CC10: AE 08 C8 LDX $C808 ; Restore slot # (+$20) to X +CC13: A5 4F LDA $4F +CC15: F0 03 BEQ $CC1A ; Skip over if call was successful ($4F == 0) +CC17: 4C F0 CC JMP $CCF0 ; Else, do a LDA #2B, JMP $CFAF to bank 11:1 + +So here the code retrieves the slot I/O offset in X from the location set way +back in CWSISM, then checks what looks like some kind of error condition. If +it fails, it skips on over to function 1 in bank 11; otherwise, it keeps going +here: + +CC1A: 24 5D BIT $5D ; Are we running on a IIgs? +CC1C: 70 05 BVS $CC23 ; If so, skip over & keep going + +Since we're not running on a IIgs, this branch is not taken and thus it can be +safely ignored. Continuing on: + +CC1E: A9 4B LDA #$4B ; Else, jump to bank 11:2 (normal success path) +CC20: 4C AF CF JMP $CFAF +; +CFAF: A6 5E LDX $5E ; Restore slot (+$20) in X +CFB1: A0 0F LDY #$0F ; Make sure we come back here... +CFB3: 6C 00 C8 JMP ($C800) ; & go!! + +So what this means is that if the function call to bank 3:0 succeeded, the code +will then bounce to function 2 in bank 11. And, as we saw above, function 2 +starts at $CD9A in bank 11. + + +The Next Part, In Which Be Bounce Back To Bank 11 And Find Something Familiar +----------------------------------------------------------------------------- + +So far, this little expedition is proving to be circuituitous, but not +impenetrable. And it makes sense that we would come back to bank 11, as that's +where the initialization code sent us in the first place. And so, pressing on, +we find: + +CD9A: 86 5E STX $5E ; Save X in $5E +CD9C: A9 01 LDA #$01 ; Put 1 in $43, $44 +CD9E: 85 43 STA $43 +CDA0: 85 44 STA $44 +CDA2: 64 46 STZ $46 ; Zero out $46, $47, $48, $49 +CDA4: 64 47 STZ $47 +CDA6: 64 48 STZ $48 +CDA8: 64 49 STZ $49 +CDAA: A9 08 LDA #$08 ; Put $08 in $41 +CDAC: 85 41 STA $41 +CDAE: 64 40 STZ $40 ; Zero out $40, $42 +CDB0: 64 42 STZ $42 + +This is again more housekeeping boilerplate, initializing a bunch of zero page +locations. Then we find this: + +CDB2: A9 09 LDA #$09 +CDB4: 20 5F CF JSR $CF5F ; Call bank 9:0 (directly) + +So this calls function 0 in bank 9, which lives at $CC00. And looking through +that code, well, let's just put that aside for now as it's long and involved +and will require a fair amount of study. Continuing: + +CDB7: A5 4F LDA $4F +CDB9: D0 0C BNE $CDC7 ; Fail if $4F is non-zero + +This looks at the error flag we saw up above in bank 15, and jumps to function +1 in this bank if the error flag is non-zero. + +CDBB: AD 01 08 LDA $0801 ; Get byte @ $801 (!) +CDBE: F0 07 BEQ $CDC7 ; Fail if it's zero + +Now here is something interesting! Why this is interesting is because when +booting from a floppy disk, the disk driver typically loads at least one sector +(256 bytes of data) into location $800. So we can deduce that the above call +into function 0 in bank 9 is loading something similar from the hard drive into +memory at a similar address. With this bit of knowledge, we can see up above +where it puts address $800 into zero page locations $40 and $41 that those +locations must be a loading address. + +CDC0: AD 00 08 LDA $0800 ; Get byte @ $800 (!) +CDC3: C9 01 CMP #$01 +CDC5: F0 03 BEQ $CDCA ; Keep going if it's equal to 1 +CDC7: 4C 91 CE JMP $CE91 ; Else, jump to function 1 (failure point) + +Again, this interesting because with floppy disks, the first byte of the first +sector loaded into memory at $800 contains the number of sectors that the +floppy driver should load into memory; this looks eerily similar--only in this +case, it will jump to the failure path if it sees it wanting more than one +block. Assuming all is well, we then have this: + +CDCA: 8D 09 C8 STA $C809 ; Put a 1 into $C809 +CDCD: AD F8 07 LDA $07F8 ; Get $7F8 +CDD0: 0A ASL A ; x16 +CDD1: 0A ASL A +CDD2: 0A ASL A +CDD3: 0A ASL A +CDD4: AA TAX ; Store it in X +CDD5: A9 00 LDA #$00 ; Stuff 0 in $C035 (GS location?) +CDD7: 8D 35 C0 STA $C035 +CDDA: 8D 01 CC STA $CC01 ; What does this do? +CDDD: 4C 01 08 JMP $0801 ; Run the code from block 0 + +And here we see it hand off execution to data that it pulled from the hard +drive by jumping to $801, and thus we see that this must be the end of the hard +drive boot logic. As far as the firmware is concerned, its initialization job +of bootstrapping the hard drive is concluded. + +However, we still really don't know anything that tells us what the slot I/O +addresses do (aside from location $E) and we still have no idea how the card +talks to the hard drive. At least we have a pretty good idea of where to look. + + +What Are All These Eels, And What Are They Doing In My Hovercraft +----------------------------------------------------------------- + +So at last we get to take a look at function 0 in bank 3. And, much like a +hovercraft full of eels, it's a twisty mass of slippery, squirming code. And, +looking at it more closely, it does a bunch of things which don't make much +sense until you understand other code, which bounces around to lots of other +banks. And a lot of it is opaque unless you somewhat understand what the ports +on the NCR 53C80 do and how the SCSI protocol works. + +So while we have an excellent start on understanding, for the most part, the +broad outlines of how the card works, we are still stuck with a profound lack +of critical knowledge on how the thing talks to the the hard drive and, +conversely, how the hard drive talks to the card. And without that knowledge, +we perish. + + +The Next Part, In Which We Are Not Ready To Perish +-------------------------------------------------- + +Fortunately, the NCR 5380 and, by extension, the 53C80 is well documented and +said documentation is readily available, and so I availed myself of it. I took +another look at the schematic for the card and noticed that the 53C80 had three +address lines on it, which implied that it had eight ports for controlling it. +Unfortunately, there's an error on the schematic in which they have the address +lines hooked up in reverse, and this caused me no small amount of consternation. + +It seemed obvious that those eight ports were hooked up to the slot I/O +addresses, and also seemed very plausible, after having looked at and analyzed +a lot of code heretofore unmentioned, that it was connected to the lower half +of that address space. So, in order to confirm my suspicions, I started +writing the hard drive emulator. + +This started out, simply, as a bunch of statements that output human readable +words to a log file whenever the slot I/O addresses were accessed by the card +firmware; I used the firmware's access to the slot I/O to tell me what it said +and what it was listening for. Well, that, and some code to properly handle +the bank selection of the ROM space as well. In this way, I was able to +enlarge my understanding of what the card expected to see as well as what the +ports that weren't connected to the 53C80 (which were likely connected to the +Sandwich II) might be up to. + +So in fits and starts, I used the code that writes to the Mode Register of the +53C80 to get the code to successfully... do something. It was at that point I +could see that it was getting through the initialization phase of the card's +firmware as Apple2 would be able to boot a floppy image inserted into a drive +in slot 6 at that point. But in tracing the reads and writes to the slot I/O +address space in the log I could see that it was getting through the card's +firmware in a failure mode. It was progress, of a sort. Even failure tells +you something. + +And what it told me was that I needed to dig into the SCSI specification to +figure out how the protocol worked. Looking back I can see that I was getting +through to the MESSAGE phase and, because of the way I was responding to that +message, that the firmware would then send an ABORT message, but that's all +pretty much meaningless as I haven't explained anything about the SCSI protocol +and how it works. + +And here, while there is a lot of information about the latter day iterations +of the SCSI protocol, there wasn't much pertaining to the kind of SCSI that the +Apple High Speed SCSI card spoke, which in its case, has been retroactively +labeled SCSI-1. + +And when looking at the SCSI protocol, the first thing that hits you is that +it's a very well designed, robust protocol and it's nothing short of a minor +miracle that it survived and still survives to this day. However, the +documentation on how it *really* works is a bit lacking. Yes, you can discover +that there are nine phases, and the first three are fairly easy to understand; +it's what comes after that where things get murky. + + +Talk SCSI To Me +--------------- + +So here is a crash course in the SCSI-1 protocol. The SCSI bus is engineered +such that it allows for eight devices to connect to said bus; devices connected +to the bus can have Initiator and/or Target roles. Devices can talk to each +other by passing messages over this bus, however only one pair of devices can +use the bus at any one time. In order to prevent deadlock from happening when +more than one device attempts to take control of the bus, there is an enforced +hierarchy of devices wherein they all have a unique ID; a device that contends +for use of the bus at the same time as another device wins this contention if +and only if its device ID is higher than the other device's ID (1 in this case +being the highest, and 128 being the lowest). The bus is an 8-bit parallel +data bus that is controlled by a variety of signals (and these are typically +called "lines"). + +In contending for and utilizing the bus, there are nine phases that all SCSI +devices must understand and negotiate. They are as follows: + + - Bus Free + - Arbitration + - Selection + - Message In + - Message Out + - Data In + - Data Out + - Command + - Status + +In the Bus Free phase, as one might expect, no devices are using the bus. This +is the ground state of the SCSI protocol, the phase from whence all +communication starts and where it all ends. Any device that wishes to talk to +another device on the bus must start here. + +Once a device sees that the bus is free, it can enter the Arbitration phase as +an Initiator; it does so by first setting the bit that corresponds to its +device ID on the data bus. If another device tries to do this at the same +time, the device with the lower ID will remove its bit from the data bus and +try again when it detects that the bus is free again. When the Initiator has +waited a certain amount of time with no other contention, it then asserts the +SEL line and goes into the Selection phase. + +In the Selection phase, the Initiator sets the bit that corresponds to the +device ID it wants to talk to (the Target) on the data bus. Every other device +on the bus, by virtue of the asserted SEL line, knows it's in the Selection +phase and can see the device ID bits being asserted on the data bus; if none of +the bits match its own ID, it will stay silent. If the Target device doesn't +respond in a timely manner, the device that tried "calling" it drops the bits +it asserted on the data bus and drops the SEL line. Otherwise, if the Target +device sees its ID on the data bus, it responds by asserting the BSY (BuSY) +line. + +The device that started all of this (the Initiator) then drops the SEL line and +the Initiator and Target devices then enter the next phase. What phase that is +took some teasing out of lots of different papers, datasheets and manuals--as +well as much trial and error in the emulation code. And what I found was this: +once the devices are in the Selection phase, they typically(*) dance through +the following set of phases, in order, before being done with their +transaction: Message Out(**), Command, Data In/Out, Status, Message In. + +(*) One exception to this is the TEST UNIT READY command, which will skip the +Data In/Out phase + +(**) Note that the qualifiers "In" and "Out" come strictly from the perspective +of the Initiator + +Once the devices have successfully negotiated the Message In phase at the end +of their phase dance, the Target device drops the BSY line and the bus is then +free again for another transaction. + +One thing I forgot to mention is that each phase transition, once the devices +are in the Selection phase, is punctuated by a REQ/ACK handshake. Typically, +the Target asserts and drops the REQ line while the Initiator asserts and drops +the ACK line. Basically, when the Target is ready to move to a different +phase, it will assert the REQ line; the Initiator will see this and then assert +the ACK line. Once the Target sees the ACK line asserted, it will drop the REQ +line; the Initiator, seeing this, will then drop the ACK line. And thus hands +are shaken, and all are in agreement as to where they are and what they are +doing. + +One interesting consequence of this kind of handshaking is that it means that +every phase past Arbitration is driven by the Target device. + + +By Your Command +--------------- + +And so having deciphered the proper steps in the post-Selection phase dance, we +come as last to the heart of the matter: the Command phase. Commands come in a +few different flavors: the six byte, the ten byte and the twelve byte. The +flavor is given by the top three bits of first byte while the command itself is +given by the bottom five bits. Treating those top three bits as a number from +zero to seven, the flavors fall into the following groups: + +six byte: 0 +ten byte: 1, 2 +twelve byte: 5 + +Yes, 3, 4, 6 and 7 are all missing, and, for the purposes of this crash course, +can be safely ignored(*). + +(*) For the terminally curious, 3 and 4 are (were?) "reserved", and 6 and 7 are +for "vendor specific" commands + +Having now discerned their form, the question arises: just what do these +commands do? Basically, they tell the Target what the Initiator wants from it. +For example, let's say that the Initiator wants to know if a device on the bus +is ready to receive commands. It would send out, during the Command phase, a +TEST UNIT READY command which has the following form: + +00 00 00 00 00 00 + +Assuming the device receiving this command actually is ready to receive +commands, it would then send back a status message (in the Message In phase +following the Status phase) saying "Good" (which, in this case, is coded as +$00). + +Other commands follow basically the same form; only instead of going directly +to the Status phase, as the TEST UNIT READY command does, it will go into +either the Data In or Data Out phase before going to the Status +phase--depending on what the command does. For example, a READ command will go +to the Data In phase, because the Initiator is requesting data from the Target; +likewise, a WRITE command will go to the Data Out phase because the Initiator +wants to send data to the Target. + + +Back To Our Regularly Scheduled Analysis +---------------------------------------- + +So, before we diverged into a crash course of the SCSI-1 protocol, we were +looking at where I had been able to have the card's firmware return back to the +Apple IIe's Autostart program, but in a failure mode. Which, while ultimately +unsatisfying, *was* a step in the right direction. + +So I could see that with my hard-coded responses to the firmware's inquiries, I +was getting an IDENTIFY message ($80) followed by an ABORT message ($06). It +was a this point I could also see that I was going to have to start writing the +actual hard drive device emulator code as well, as trying to keep track of all +the phase changes in the slot I/O register code was turning into an +impenetrable mess and wasn't going to be fruitful in the long run. + +This also necessitated a closer look at the code for function 0 in bank 3. I +took copious notes on where the code went and what it did, and eventually found +that almost everything, at some point, seemed to end up calling function 0 in +bank 16. + + +All Roads Lead To Bank 16:0 +--------------------------- + +The one thing I was trying to figure out from this code was: what was the +failure mode that would get you out cleanly? Because in order for the code +that called here to work properly, it would have to have some kind of clean +failure mode to indicate that there was no drive present at this device ID; +also in my first attempts to get the firmware code to successfully run (for +some value of "successfully" > 0), it would hang up somewhere in this code. +And that meant, since I didn't understand the SCSI chip, that I would have to +understand the SCSI chip and how it worked to have any hope of untangling the +tangled mass of code here. + +So before we take a quick look at that, let's take a look at the top level code +that lives at function 0, bank 16. At first glance, it doesn't look all that +bad: + +CC00: 8D 00 CD STA $CD00 ; Write to $CD00 (what does it do?) +CC03: 20 D0 CD JSR $CDD0 ; Clear DMA bit (1) from reg. $2, init some stuff +CC06: 20 CE CE JSR $CECE ; Check if reg. $4 has 0, 2 (/SEL) or 4 (/I/O) +CC09: B0 16 BCS $CC21 ; If failure, skip over + +This is pretty straightforward stuff; the routine at $CECE will set the carry +flag if slot I/O register $4 is not exactly one of: 0, 2, or 4. If the carry +is set, it bypasses the following sections of code: + +CC0B: 20 42 CF JSR $CF42 ; Check if bit 7 in $C893 is set (success == yes) +CC0E: 20 24 CC JSR $CC24 ; Do Arbitration phase +CC11: B0 03 BCS $CC16 ; If Arbitration timed out, jump over Selection + +It wasn't obvious when I first encountered this code, but, once I delved into +the SCSI protocol I was able to figure out that the code at $CC24 was +negotiating the Arbitration phase. + +CC13: 20 7A CC JSR $CC7A ; Do Selection phase + +Likewise, it was not obvious that the code at $CC7A was negotiating the +Selection phase--but I was able to figure out that the code could cleanly exit +this bank (in a failure mode, naturally) if the BSY line was not asserted. + +CC16: 20 58 CF JSR $CF58 ; Check if bit 7 in $C893 is set (success = yes) +CC19: B0 06 BCS $CC21 ; Skip over if it failed + +Since the address at $C893 got loaded with $80 way back in function 0 in bank +11, the carry flag will be clear and we will execute the following: + +CC1B: 20 E4 CC JSR $CCE4 ; Do SCSI communication with target +CC1E: 20 A0 CD JSR $CDA0 ; Do nothing if $C88F is nonzero, else check on + ; $C8EC + +The code at $CCE4 was quite mystifying for some time, even after I had educated +myself on the intricacies of the SCSI protocol and the ins and outs of the NCR +53C80's ports. I wasn't able to make sense of this until I was able to +understand the phases after Selection and how they were expected to be +negotiated. + +CC21: 4C 18 CE JMP $CE18 ; Do some post cleanup before returning + +The code at $CE18 basically does some error checking and cleanup before +returning back to whence it came; it's fairly easy to digest. But before we +dig into subroutines of bank 16:0, we need to take a short digression into how +the ports of the 53C80 work. + + +A Somewhat Brief Digression Into The 53C80's Ports +-------------------------------------------------- + +And so, having avoided looking into the 53C80 and how it works up until this +point, we find we can no longer avoid it and thus, finally bite the bullet. +The 53C80 has eight ports (also called registers) with which the Apple IIe's +CPU can communicate. They are: + +$0 - Data on the SCSI bus +$1 - Initiator Command +$2 - Mode +$3 - Target Command +$4 - Current SCSI Bus Status (R), Select Enable (W) +$5 - Bus and Status (R), Start DMA Send (W) +$6 - Input Data (R), Start DMA Target Receive (W) +$7 - Reset Parity/Interrupt (R), Start DMA Initiator Receive (W) + +Note too that there is a one-to-one correspondence with the port numbers as +they appear on the 53C80 and their location in the slot I/O address range. +What follows is an explanation of what the registers do: + +Register $0 is pretty much what it says it is; data on the SCSI bus will appear +here barring this caveat: it only works when bit 0 of register $1 (ASSERT DATA +BUS) is set. Which bring us to... + +Register $1 is used to monitor and assert signals on the SCSI bus. The bits +are: + +7 6 5 4 3 2 1 0 +RST AIP/TEST MODE LA/DIFF ENBL ACK BSY SEL ATN DATA BUS + +RST (ReSeT) sets the RST signal on the SCSI bus and resets the internal state +of the 53C80; it stays in the reset state until this bit is cleared. AIP/TEST +MODE (Arbitration In Progress) is a bit that is split between two functions: +when read, it signals whether or not the Arbitration phase is in progress; when +a one is written to it, it disables all output from the chip (zero restores +output). LA/DIFF ENABL (Lost Arbitration) is another split signal: when read, +it signals whether or not Arbitration was lost; writing has no effect. ACK +(ACKnowledge) sets or clears the ACK line, BSY (BuSY), SEL (SELect), ATN +(ATteNtion) and DATA BUS all do the same. + +The important thing to note here is that by setting the ATN line on the SCSI +bus, the initiator signals to the Target that it wants to send a message and +so, at the appropriate time, the Target will then assert the MSG and C/D lines +in response. + +Register $2 controls various modes of the 53C80, as well as whether or not +certain interrupts will be triggered. The bits are: + +7 6 5 4 3 2 1 0 +BLOCK TARGET ENABLE ENABLE ENABLE EOP MONITOR DMA ARBITRATE +MODE MODE PARITY PARITY INTERRUPT BUSY MODE +DMA CHECKING INTERRUPT + +The only two of real interest are bits 1 (DMA MODE) and 0 (ARBITRATE); the +former sets the chip into DMA mode, readying it for a DMA transfer while the +latter tells the chip to start the Arbitration phase. + +Register $3 is used mainly if the chip is operating in Target mode, as all the +lines controlled by it are typically only controllable by the Target device. +The only exception is when the Initiator is sending data to the Target; in that +case, bits 0, 1 and 2 must match the lines being asserted by the Target. The +bits are (where X means unused): + +7 6 5 4 3 2 1 0 +LAST BYTE SENT X X X REQ MSG C/D I/O + +Register $4 is another split register. When read, it returns the state of the +following lines on the SCSI bus: + +7 6 5 4 3 2 1 0 +RST BSY REQ MSG C/D I/O SEL DBP + +When written to, it enables an interrupt to occur if the device ID written to +the SCSI bus is present, BSY is clear and SEL is set. + +The important thing about this register is that it allows monitoring of the +MSG, C/D and I/O lines of the SCSI bus. These three bits are what the Target +uses to signal moves from phase to phase; without these three bits it would be +impossible, as an initiator, to figure out what to do once in the Selection +phase. + +And with three bits, you would expect there to be eight phases controlled here, +but only six are controlled from these signals--having MSG set to 1 while C/D +is set to 0 is an illegal combination, and that knocks two of the combinations +right out of contention. Each legal combination corresponds to a phase, and +this is, as it turns out, vital information: + +Data Out: MSG = 0, C/D = 0, I/O = 0 (0) +Data In: MSG = 0, C/D = 0, I/O = 1 (1) +Command: MSG = 0, C/D = 1, I/O = 0 (2) +Status: MSG = 0, C/D = 1, I/O = 1 (3) +Message Out: MSG = 1, C/D = 1, I/O = 0 (6) +Message In: MSG = 1, C/D = 1, I/O = 1 (7) + +Note that there's nothing magical about the order of these three lines; they +could be in any order whatsoever and they would still work the same way. The +only reason that they are presented this way is one, this is how they are laid +out in the NCR 53C80 chip (in this register in particular) and two, this is +order that they are used in the firmware. + +Register $5 is--you guessed it--another split register. When read, it returns +some internal state registers as well as a couple more SCSI bus lines: + +7 6 5 4 3 2 1 0 +END OF DMA PARITY IRQ PHASE BUSY ATN ACK +DMA REQUEST ERROR ACTIVE MATCH ERROR + +When written to, it initiates a DMA send transfer from memory to the SCSI bus. + +Register $6, another split register, when read, holds data coming from the SCSI +bus during a DMA transfer. When written to, it initiates a DMA receive +transfer from the SCSI bus (the Target) to memory. + +And finally, register $7 is yet another split register, that when read, resets +the internal PARITY ERROR, IRQ ACTIVE and BUSY ERROR bits in register $5; when +written to in initiates a DMA receive transfer from the SCSI bus (the +Initiator) to memory. + + +Back To Bank 16 +--------------- + +So, with that info-dump out of the way, let's return back to the first +subroutine of the initial code of bank 16:0. We start with the routine at +$CC24: + +CC24: 9E 63 C0 STZ $C063,X ; Zero reg $3 (Target Command) +CC27: 20 2F CF JSR $CF2F ; Toggle bit 7 of reg. $E (ON-off-ON) +CC2A: AD DA C8 LDA $C8DA ; Get SCSI ID of initiator device +CC2D: 9D 60 C0 STA $C060,X ; & put it in reg. $0 (Output Data) +; +CC30: 9E 62 C0 STZ $C062,X ; Zero out reg. $2 (Mode) +CC33: A9 01 LDA #$01 +CC35: 9D 62 C0 STA $C062,X ; Set bit 0 (ARBITRATE) of reg. $2 + +This code zeroes out the Target Command register, then toggles bit 7 of +register $E on, then off, then back on. It then puts the SCSI ID of the +initiator device into the SCSI Data Bus register, then clears and sets the +ARBITRATE bit of the Mode register. This is the start of the Arbitrate phase. + +CC38: BD 6C C0 LDA $C06C,X ; Get reg. $C +CC3B: 89 10 BIT #$10 ; Check bit 4 +CC3D: D0 05 BNE $CC44 ; Skip over this if it's set +CC3F: 20 0C CF JSR $CF0C ; Toggle bit 7 of register $E ON-off-ON + ; # of times before C is set is in $C817/8 +CC42: B0 2E BCS $CC72 ; Signal failure is C is set + +There is a lot of this code and variants thereof sprinkled liberally throughout +the firmware code. I'm still not sure what bit 4 of register $C is a signal +for, but it seems clear that it indicates some kind of error condition because +whenever it's not set, it toggles bit 7 of register $E and will eventually, +when this has happened enough times, signal an error and exit. + +CC44: 3C 61 C0 BIT $C061,X ; Check bit 6 (AIP) of reg. $1 +CC47: 50 E7 BVC $CC30 ; Try again if it's not set + +This little bit of code checks the AIP (Arbitration In Progress) bit, and loops +back to try again if it's not set. + +CC49: EA NOP ; Do a small delay +CC4A: EA NOP +CC4B: A9 20 LDA #$20 +CC4D: 3D 61 C0 AND $C061,X ; Check if bit 5 (LA) of reg. $1 is set +CC50: D0 DE BNE $CC30 ; Try again if it's set + +After checking to see if the AIP bit is set, it then waits a short amount of +time before checking to see if the LA (Lost Arbitration) bit is set; if it's +set, it loops back to try again. + +CC52: BD 60 C0 LDA $C060,X ; Get reg. $0 +CC55: 4D DA C8 EOR $C8DA ; EOR it with what we put there to begin with +CC58: F0 05 BEQ $CC5F ; If it's the same, bypass (we won arbitration) +CC5A: CD DA C8 CMP $C8DA ; Otherwise, see if the EORed value is >= orig +CC5D: B0 D1 BCS $CC30 ; Try again if so + +Here we look at the data on the SCSI bus and see if there were any other +devices attempting to arbitrate at the same time. If there were, and their +SCSI ID was higher than ours, then loop back and try again; otherwise, we won +arbitration and continue on: + +CC5F: A9 20 LDA #$20 +CC61: 3D 61 C0 AND $C061,X ; Check if bit 5 (LA) of reg. $1 is set +CC64: D0 CA BNE $CC30 ; Try again if so + +We check the LA bit one more time to ensure it's not set; if it is, then loop +back and try again. + +CC66: A9 06 LDA #$06 ; Set bits 1-2 (ASSERT /ATN, /SEL) of reg. $1 +CC68: 1D 61 C0 ORA $C061,X +CC6B: 29 9F AND #$9F ; And clear bits 5-6 (TEST MODE, DIFF ENBL) of $1 +CC6D: 9D 61 C0 STA $C061,X +CC70: 18 CLC ; Signal success +CC71: 60 RTS ; & return + +Now that we've won the Arbitration phase, we assert the ATN and SEL lines and +make sure that the TEST MODE and DIFF ENBL lines are dropped. By setting the +ATN line, we signal to the Target that we want to go to the Message Out phase +after the Selection phase is done. Once that's done, we signal success and +return. + +CC72: A9 80 LDA #$80 +CC74: 8D 8F C8 STA $C88F +CC77: 4C 91 CD JMP $CD91 ; Signal failure + +This bit is called if the code that checks register $C fails; this is the only +failure path for the Arbitration phase code. + + +A Fine SELECTion Of Devices +--------------------------- + +Now that the Initiator (us) has won the Arbitration phase, it's time to see if +the device we want to talk to exists, and is ready and able to talk. + +CC7A: 9E 64 C0 STZ $C064,X ; Zero out reg. $4 (Select Enable) +CC7D: AD DA C8 LDA $C8DA ; Host ID +CC80: 0D DB C8 ORA $C8DB ; Target ID +CC83: 9D 60 C0 STA $C060,X ; Store $C8DA & DB (ORed) into reg. $0 (Data Bus) +CC86: A9 41 LDA #$41 ; Set bits 0 (DATA BUS) & 6 (TEST MODE) in reg. $1 +CC88: 1D 61 C0 ORA $C061,X ; Then clear bits 5-6 (DIFF ENBL, TEST MODE) in $1 +CC8B: 29 9F AND #$9F +CC8D: 9D 61 C0 STA $C061,X + +The code here clears the Select Enable register to ensure no IRQs are generated +during the Select phase, then puts both the Initiator's SCSI ID and the +Target's SCSI ID into the 53C80's data register. It then does something that +doesn't seem to make any sense, as it sets the DATA BUS ENABLE and TEST MODE +bits. The former puts the 53C80's data register onto the SCSI data bus, while +the latter disables all outputs of the 53C80. Maybe this was necessary because +of the Sandwich II chip and the way it was hooked up to the slot I/O bus and +the 53C80, but there's no way to know for sure without access to actual +hardware. + +After this, it disables the TEST MODE bit, which then enables the outputs of +the 53C80, and thus the Target's SCSI ID is then visible to all the devices +connected to the SCSI bus. + +CC90: A9 FE LDA #$FE ; Clear bit 0 (ARBITRATE) in reg. $2 +CC92: 3D 62 C0 AND $C062,X +CC95: 9D 62 C0 STA $C062,X +CC98: A9 02 LDA #$02 ; Set bit 1 (DMA MODE) in reg. $2 +CC9A: 1D 61 C0 ORA $C061,X +CC9D: 9D 61 C0 STA $C061,X +CCA0: AD DC C8 LDA $C8DC ; Get $C8DC, set hi bit, save in $C821 +CCA3: 09 80 ORA #$80 +CCA5: 8D 21 C8 STA $C821 +CCA8: A9 F7 LDA #$F7 ; Clear bit 3 (ASSERT /BSY) in reg. $1 +CCAA: 3D 61 C0 AND $C061,X +CCAD: 9D 61 C0 STA $C061,X + +This is all pretty straightforward stuff. It clears the ARBITRATE bit, sets +the DMA MODE bit, and clears BSY (if it was set before; more likely than not, +it will have been cleared already). It also sets bit 7 of $C8DC and saves it +in $C821, but it's not clear just why yet. + +CCB0: 20 51 CD JSR $CD51 ; Wait for bit 6 (/BSY) of reg. $4 to be set +CCB3: 90 03 BCC $CCB8 ; Skip over JSR if success +CCB5: 20 75 CD JSR $CD75 ; Shorter wait for bit 6 in reg. $4 to be set + +This bit of code waits for the Target to assert the BSY line; if it fails after +the first attempt, it will try again with a shorter wait time. + +CCB8: A9 FB LDA #$FB ; Clear bit 2 (ASSERT /SEL) in reg. $1 +CCBA: 3D 61 C0 AND $C061,X +CCBD: 9D 61 C0 STA $C061,X +CCC0: 90 10 BCC $CCD2 ; Skip over if the JSR was successful + +This code drops the SEL line, and depending on whether or not the Target +asserted the BSY line, will either drop through to the failure path or skip +over to the success path. + +CCC2: A9 FE LDA #$FE ; Clear bit 0 (DATA BUS) in reg. $1 +CCC4: 3D 61 C0 AND $C061,X +CCC7: 9D 61 C0 STA $C061,X +CCCA: A9 81 LDA #$81 ; Put $81 in $C88F +CCCC: 8D 8F C8 STA $C88F +CCCF: 4C 91 CD JMP $CD91 ; Signal failure + +This is the only failure path in the Selection phase code, but, unlike the +Arbitration phase code, this code path will *not* lock up waiting for signals. +It will wait only so long for the Target to assert the BSY line before giving +up and signalling failure. It will also bail out of this bank completely, so +it will not try any further communication--for now. + +CCD2: A9 9D LDA #$9D ; Clear bits 1, 5-6 (TEST, DIFF E., DMA) in $1 +CCD4: 3D 61 C0 AND $C061,X +CCD7: 9D 61 C0 STA $C061,X +CCDA: A9 FE LDA #$FE ; Then clear bit 0 (DATA BUS) in $1 +CCDC: 3D 61 C0 AND $C061,X +CCDF: 9D 61 C0 STA $C061,X +CCE2: 18 CLC ; Signal success +CCE3: 60 RTS ; & return + +Otherwise, the code clears TEST MODE, DIFF ENBL and DMA MODE before clearing +DATA BUS, signalling success and returning. + + +The Next Part, In Which We Find Ourselves In A Maze Of Twisty Code +------------------------------------------------------------------ + +Now that we've successfully navigated the Selection phase, it's time to talk +SCSI. For the sake of brevity, we will refer to this code as The Code That +Comes After Selection, or TCTCAS for short. This bit of code calls a bunch of +other code which in turns calls even more code; keeping it all straight was +quite the challenge. + +CCE4: BD 6C C0 LDA $C06C,X ; Get $C +CCE7: 89 10 BIT #$10 ; Is bit 4 set? +CCE9: D0 05 BNE $CCF0 ; Skip ahead if so +CCEB: 20 0C CF JSR $CF0C ; Else, toggle bit 7 of $E (ON-off-ON) w/countdown +CCEE: B0 40 BCS $CD30 ; Exit if countdown hit zero + +Here again we see the boilerplate checking of bit 4 of register $C. + +CCF0: BD 64 C0 LDA $C064,X ; Get reg. $4 +CCF3: 29 42 AND #$42 ; Are bits 1 (/SEL) & 6 (/BSY) clear? +CCF5: F0 3A BEQ $CD31 ; If so, we're done (jump down, signal error) + +Here we're checking the BSY and SEL lines; if both have been dropped after the +last phase, we jump down to $CD31 and do some final checking before exiting. + +CCF7: C9 40 CMP #$40 ; Is only bit 6 (/BSY) set? +CCF9: D0 E9 BNE $CCE4 ; Loop back if not... + +The second check looks to see if only BSY is set; if not it loops back to the +start of this subroutine, otherwise it continues on: + +CCFB: BD 62 C0 LDA $C062,X ; Clear bit 1 (DMA MODE) of reg. $2 +CCFE: A8 TAY +CCFF: 29 FD AND #$FD +CD01: 9D 62 C0 STA $C062,X +CD04: 98 TYA ; Then restore its previous state +CD05: 1D 62 C0 ORA $C062,X +CD08: 9D 62 C0 STA $C062,X + +This little bit of code toggles DMA MODE line off then on if it was set to +begin with, otherwise it does nothing. Well, it doesn't *do* nothing, but the +effect is null and void. + +CD0B: BD 64 C0 LDA $C064,X ; Is bit 5 (/REQ) of reg. $4 clear? +CD0E: A8 TAY +CD0F: 29 20 AND #$20 +CD11: F0 D1 BEQ $CCE4 ; Loop back if so... + +This checks to see if the REQ line has been asserted by the target yet, and if +not, loop back to the beginning of the subroutine. + +CD13: AD 1F C8 LDA $C81F ; Save $C81F in $C820 (last 3-bit pattern we saw) +CD16: 8D 20 C8 STA $C820 + +Here we save the last phase that was seen in $C820. + +CD19: 98 TYA ; Restore reg. $4 from Y +CD1A: 29 1C AND #$1C ; Keep only bits 2-4 (/I/O, /C/D, /MSG) +CD1C: 8D 1F C8 STA $C81F ; & save in $C81F + +Earlier we saved the contents of register $4 (which holds the MSG, C/D and I/O +bits) in the Y register, now we retrieve them and mask off the MSG, C/D and I/O +bits and save them for later. By virtue of this, every time we get here the +previous value that was in $C81F must be different than the last value we saw +here. + +As to why: when I first encountered this code, I approached it the way I +usually approach unknown code: by feeding it zeroes. However, when I did that, +these lines of code caused a failure mode later on. And so I had to dig a +little deeper into all things SCSI and 53C80 to figure out why--we'll see why +that caused a failure later on. + +CD1F: 4A LSR A +CD20: 8D 2B C8 STA $C82B ; & put /2 in $C82B + +Here we shift it right one bit and stuff it into $C82B; this is also a clever +way of making it into an index for a jump table. + +CD23: A8 TAY ; & use as index into jump table +CD24: 4A LSR A ; & /2 again +CD25: 9D 63 C0 STA $C063,X ; Write it to reg. $3 (Target Command) + +Here we put it into the Y register and then shift it to the right one more time +to set the bits in the Target Command register properly. The Initiator needs +to set this register properly at each phase change, otherwise the 53C80 will +signal a phase match error. + +CD28: 20 48 CD JSR $CD48 ; Use Y as idx to jump table and go there + +So here the code uses the three phase bits (MSG, C/D and I/O) as an index into +a jump table to handle the six phases after the Selection phase (Data Out, Data +In, Command, Status, Message Out, Message In). We'll have more to say about +this shortly. + +CD2B: 2C 06 C8 BIT $C806 ; Is bit 7 of $C806 clear? +CD2E: 10 B4 BPL $CCE4 ; Loop back if so... +CD30: 60 RTS + +This simply checks bit 7 of $C806, which only gets set under very specific +circumstances; those being that MSG, C/D and I/O are all asserted (Message In +phase), and that the value returned from the Target is a "Good" message, and +that the prior phase was either Message In, Message Out, or Status. + +CD31: AD 8F C8 LDA $C88F ; Get $C88F +CD34: D0 08 BNE $CD3E ; If $C88F is != 0, just return +CD36: A9 82 LDA #$82 ; Stuff $82 into $C88F +CD38: 8D 8F C8 STA $C88F +CD3B: 4C 91 CD JMP $CD91 ; Signal failure (?) & return +CD3E: 80 F0 BRA $CD30 + +This is the code path taken if the BSY and SEL lines are dropped. It signals +that something went wrong before returning. + + +The Next Part, In Which Things Start To Make Sense +-------------------------------------------------- + +So TCTCAS is, as it turns out, where the Target drives the Initiator; which in +this case is the hard drive driving the card. As I mentioned up above, when I +first started poking around at this code, I was feeding it zeroes at first as a +place to start seeing if I could get it to do something meaningful. However, +when you try that, you run into the following bit of code which says, "No, +fuggetaboutit." + +CEE5: AD 1F C8 LDA $C81F ; Get the current MSG, C/D, I/O values +CEE8: CD 20 C8 CMP $C820 ; Compare it to the previous values +CEEB: D0 05 BNE $CEF2 ; If they're different, skip over +CEED: A9 27 LDA #$27 ; (This is ignored by the jump target) +CEEF: 4C 6C CE JMP $CE6C ; Else, do a soft, then a hard reset of the card +CEF2: ... + +And so, after looking over the SCSI documentation for the umpteenth time, I +realized that what it was saying is that you can't do a Data Out phase directly +after the Selection phase; it has to be Something Else. And this is because +$C81F gets initialized with zero (which corresponds to the Data Out +phase)--which means starting with zero Won't Work. + +As luck would have it, however, we know that in the Selection phase, it +asserted the ATN line, which in turn tells the Target to assert the MSG and C/D +lines (but not I/O). Which means that we *know* that the Target will first go +to the Message Out phase, every time. + +And so, by writing the hard drive emulator to properly respond to the MSG, C/D +and I/O lines I got it to handshake the Message Out phase properly. But I +could see that after that, it wasn't exiting; it was running through another +round of seeing what was in MSG, C/D and I/O and running the appropriate +handler. + +Now I was a bit stuck here, as there was *no* documentation on how a Target +device, such as a hard drive, would drive the handshaking for the Initiator +device. And it wasn't clear what phase the firmware was expecting to come +next, so guessing wasn't likely to yield positive results. + +So, by the serendipitous luck of the Search Engine gods, I stumbled upon a page +which looked like a scan of a book mixed with some bespoke images made by +someone whose primary language was not English. One of the images, which had +misaligned text set next to it, was, however, suggestive. It showed a sequence +of phases that went from Bus Free to Arbitration to Selection to Message Out to +Command to Data In to Status to Message In to Bus Free. This was the first +time I had seen anything like this; in all of the SCSI literature that I had +surveyed, there was nothing beyond the vaguest hints that there was a typical +order to the phases. Sure, they would say that one *could* go from one phase +to another, and how the handshaking worked, but there was *nothing* saying that +there was a definite order to the phases that should be observed. + +So, as I said, this image was highly suggestive. Could this be the key to the +whole thing that I was missing? + +I had set things up in the hard drive emulation to go to the Message Out phase +after the Selection phase, and so I added code to go to the Command phase after +that. I could see that the firmware was sending something in the Command phase +at this point, which was the following six bytes: 00 00 00 00 00 00. And +looking that up in the SCSI literature showed that to be the TEST UNIT READY +command. But the firmware was still looking for more. + +From what I saw in the logs, it didn't look like it was going for a Data In +phase next, so I set it up to go to the Status phase, and that got things going +a little bit further. To me, this looked like it should be the end of the +dance, but the firmware was *still* looking for more. + +But even though a byte was sent from the Target to the Initiator during the +Status phase, it seemed that the Status reponse was actually sent in the +Message In phase. Once I had coded this into the hard drive emulation, I could +see the TEST UNIT READY command going into TCTCAS and coming out of it in a +non-failure mode. + +The dance has steps, and they must be followed in order. + + +Dancing In The Dark +------------------- + +However, something is still not quite right; my assumption--that all the +firmware needed to do to see if there was a drive on the bus was to probe +through to the Selection phase and then, if anything responded, to see if it +successfully responded to the TEST UNIT READY command--turned out to be wrong. +How wrong? Let's take a look back at the code in bank 3:0 which attempts to +enumerate all devices it can see on the SCSI bus: + +CC55: A0 07 LDY #$07 +CC57: 8C 73 C8 STY $C873 ; Save Y in $C873 +CC5A: 9C DC C8 STZ $C8DC ; Zero out $C8DC +CC5D: B9 F4 CF LDA $CFF4,Y ; Get SCSI ID from table into A +CC60: CD DA C8 CMP $C8DA ; Compare it to our SCSI ID (default is $01) +CC63: F0 1F BEQ $CC84 ; Skip over if it's equal (don't query our SCSI ID) + +So here it's looping through all eight SCSI IDs, starting with the lowest +priority and working its way up to the highest (for reference, the table at +$CFF4 has the following values: $01, $02, $04, $08, $10, $20, $40, $80). It +compares the SCSI ID from the table to the SCSI ID of the card, and skips over +the following code (down to $CC84) if it's the same. + +CC65: 8D DB C8 STA $C8DB ; Else, put SCSI ID to look at in $C8DB +CC68: 64 4F STZ $4F ; Zero out $4F (error flag) +CC6A: 20 5F CF JSR $CF5F ; Do TEST UNIT READY (calls bank 16:0) + +This is the code that I was now able to successfully navigate with my hard +drive emulation. It emulated exactly one SCSI ID, and that one ID returned +here successfully (every other ID, obviously with nothing connected to the bus, +returned failure). However, I could see from the log file that it was trying +to issue some more commands--which was puzzling, but told me that I needed to +dig even deeper into the code. + +CC6D: A5 4F LDA $4F ; Get error code +CC6F: D0 0F BNE $CC80 ; Skip over if error occurred + +This is fairly straightforward; it checks the error code returned from the call +we made to bank 16:0, and if it's anything but zero, skip over the following +code: + +CC71: EE 0D C8 INC $C80D ; Success means add one to $C80D (# of devices) +CC74: 20 9F CC JSR $CC9F ; & call Function 1 in this bank (INQUIRY + MORE) +CC77: 90 0B BCC $CC84 ; Check next ID if C == 0 + +So here we increment a counter, which we suppose to be a count of the number of +valid devices we have found on the SCSI bus. And here, we come to the +realization that it isn't just hard drives that can talk to the Apple High +Speed SCSI card, it's also printers, scanners, tape drives and whatnot. And +so, it makes perfect sense that TEST UNIT READY is only the first step in +discovering if a device is a hard drive or not because here, it calls function +1 of bank 3 (the bank we're currently in) which is what issues more commands to +figure out what the device it's talking to actually *is*. + +CC79: A9 99 LDA #$99 ; Else, stuff $99 into $C887 +CC7B: 8D 87 C8 STA $C887 +CC7E: 80 17 BRA $CC97 ; & signal success + +So if the call to $CC9F (INQUIRY + MORE) returned with the carry flag set, it +stuffs a magic number into $C887, signals success and returns. + +CC80: C9 80 CMP #$80 ; Was error $80? +CC82: F0 16 BEQ $CC9A ; Signal NoDrive error if so + +This is where it lands if the TEST UNIT READY call returned a non-zero result +in the "error code" memory location. if it equals $80, it puts the ProDOS error +code for a "NoDrive" error into the error code and returns. + +CC84: AC 73 C8 LDY $C873 ; Restore Y +CC87: 88 DEY ; Done looking at all IDs? +CC88: 10 CD BPL $CC57 ; Go back if not. + +Here we decrement the counter and loop back if we haven't looked at all eight +(except for the card's) SCSI IDs. Otherwise, we've finished, and fall through +to the following: + +CC8A: A9 77 LDA #$77 ; Else, stuff $77 into $C80A & $C887 +CC8C: 8D 0A C8 STA $C80A +CC8F: 8D 87 C8 STA $C887 +CC92: AD 0D C8 LDA $C80D ; Did we find any devices? +CC95: F0 03 BEQ $CC9A ; Signal NoDrive if not +CC97: 64 4F STZ $4F ; Else, signal success +CC99: 60 RTS ; & return + +So here it stuffs the magic number $77 into $C887 and $C80A; it also checks the +"number of devices found" memory location, and signals a "NoDrive" error if the +count is equal to zero. + +CC9A: A9 28 LDA #$28 ; Return $28 (NoDrive) in $4F +CC9C: 85 4F STA $4F +CC9E: 60 RTS + +This is the landing location for the various failure modes seen up above; it +simply puts the ProDOS "NoDrive" error into the error flag and returns. + +So now I get to figure out what the commands are in that call to 3:1 that are +causing the card to return in a failure mode. + + +The Test Is Easy, When You Have The Answer Key +---------------------------------------------- + +At this point, even though I had the hard drive emulation doing a proper dance +through the TEST UNIT COMMAND, it was in a very crude state and couldn't really +do anything else. And so I had to take a closer look at the seemingly +impenetrable code that set up a bunch of memory locations before calling bank +16:0 to see if I could make sense of it. + +Rather than go through every last one, I will go through part of the first such +piece of code, as it's instructive: + +CD0E: 20 A4 CF JSR $CFA4 ; Set $60/1 to $C923, $56/7 to $C92F +CD11: 20 B9 CF JSR $CFB9 ; Put $C9C3 into $C92F/30, zero $C931 +CD14: A9 12 LDA #$12 ; Put $12 into $C923 +CD16: 8D 23 C9 STA $C923 +CD19: 9C 24 C9 STZ $C924 ; Zero out $C924-6, $C928 +CD1C: 9C 25 C9 STZ $C925 +CD1F: 9C 26 C9 STZ $C926 +CD22: 9C 28 C9 STZ $C928 +CD25: A9 1E LDA #$1E ; Put $1E in $C927, $C933 (length of reply, 30) +CD27: 8D 27 C9 STA $C927 +CD2A: 8D 33 C9 STA $C933 + +So we can see right off the bat that it's setting up zero page locations $60 +and $61 to point to memory at $C923, and that it sets up six bytes at that +location with the following: + +C923: 12 00 00 00 1E 00 + +Reaching back to our crash course on SCSI commands, we can see by the first +byte, since the top three bits are all zero, that this must be a six-byte +command. And after that, uh, well, we don't really know much of anything. So +after digging around some more for something even remotely relevant, I found a +document dealing with SCSI-2 and SCSI-3 hard disk interfacing--which told me, +first of all, that $12 was the INQUIRY command, and second, that the fifth byte +in the command was the length of the message that the Initiator was expecting +back from the target in response to this command. Progress! + +CD2D: 20 CB CF JSR $CFCB ; Call bank 16:0 (Do INQUIRY command) +CD30: A5 4F LDA $4F +CD32: F0 05 BEQ $CD39 ; Skip over if no error + +And this, as we now know, does the phase to phase dance from start to finish, +and checks the resulting error code to do any necessary error handling. But +what of the response? How do we know what to say from our emulated hard disk +back to the firmware? The hard disk interface document had something that +looked plausible, if overlong (it seems that latter day SCSI drives are +expected to return 148 bytes instead of 30). So I expected that I could adapt +that to suit the purposes of the emulation. + +It was obvious that I had to write code to handle more than just the TEST UNIT +READY command, and that it had to be able to send and receive data over the +SCSI bus, which it, in its current state, couldn't do. Eventually I was able +to get that working and I could see that the firmware was successfully +negotiating the INQUIRY command *and* coming to the conclusion that it was +talking to a hard disk. More progress! + +And, as it turns out, this first call in bank 3:1 is what determines what the +device we're talking to actually is, and it sets up appropriate memory +locations to signal that to other parts of the firmware. This is another one +of those places where the "Technical Manual for the Apple SCSI Card" had a +useful tidbit, namely a small table that looked something like this: + +Code Device Type +------------------------------ +$03 Nonspecific SCSI +$05 CD-ROM +$06 Direct-access tape drive +$07 Hard disk +$08 Scanner +$09 Printer + +These device codes are different from the device codes that the INQUIRY command +returns, and this bit of code also does the translation from one to the other. + + +The Next Part, In Which More Progress Is Made +--------------------------------------------- + +And so, in using similar analysis in the other parts of the code called by bank +3:1, I was able to discern that after the INQUIRY command, it was calling the +MODE SENSE, MODE SELECT, READ CAPACITY and READ commands afterward. And since +I didn't know exactly what these commands returned, I used the time honored +method of returning messages consisting of all zeroes. + +And, in fixing up the hard drive emulation to respond to these commands, I +could see the firmware was making it all the way through the bank 3:1 code +successfully, and not in a failure mode. It didn't boot anything yet, as I +hadn't written the code to load a hard disk image much less dole it out over +the SCSI bus, but it was a good result and I could finally see the end of this +Herculean task coming into view. + +However, I could see from the log file that something still wasn't quite right. + + +The Next Part, In Which Things Start Getting LUN-ey +--------------------------------------------------- + +The problem was one of too much success. It wasn't going through the set of +INQUIRY, MODE SENSE, MODE SELECT, READ CAPACITY and READ commands just once, it +was doing it *eight* times. And in looking for the culprit, I found the +following tidbit: + +CCE5: EE DC C8 INC $C8DC ; Increment a counter +CCE8: AD DC C8 LDA $C8DC +CCEB: C9 08 CMP #$08 +CCED: D0 B0 BNE $CC9F ; Loop back if we haven't checked 8 times yet + +It wasn't obvious on first examination, but I eventually figured out that +location $C8DC was being put into byte one of every command being sent over the +SCSI bus--as I could see the INQUIRY command was changing every time it was +called like so: + +12 00 00 00 1E 00 +12 20 00 00 1E 00 +12 40 00 00 1E 00 +12 60 00 00 1E 00 +12 80 00 00 1E 00 +12 A0 00 00 1E 00 +12 C0 00 00 1E 00 +12 E0 00 00 1E 00 + +And so, after more digging into the hard disk interface document, I could see +that the field being modified was called the Logical Unit Number, or LUN for +short. Further, hard disks conforming to the SCSI-2 and SCSI-3 had a +commandment, that being as follows: + +The LUN Shall Be Zero, And Zero Shall The LUN Be. It Shall Be No Other Number +Save For Zero, For Any Other Number Shall Be An Abomination Before The Drive. + +Well, going by simple logic, it would appear that the SCSI-1 protocol was not +bound by such a rule, and so you could have eight Logical Units for each SCSI +device on the bus. But this presents an interesting challenge. We need to +tell the firmware to pound sand for all but one LUN. + + +Failure Is An Option +-------------------- + +And so I found myself in the position of needing to have the hard drive +emulation fail in a meaningful way; which sounds like an oxymoron but really +isn't. I needed to code the hard drive emulation to respond with a CHECK SENSE +message, which is how, I eventually discovered, that you signal an error +condition in the SCSI protocol. When I did this, the firmware then sent a +REQUEST SENSE command, which I wasn't sure how to craft a response that would +signal failure for an invalid LUN. Responding with all zeroes didn't signal +failure as I hoped it would, so it was back to the hard disk interface document +to find the missing information. + +There I found out that byte two of the response is a four-bit "Sense Key", and +that zero corresponds to "No Sense", which means the command was successful. +Which, as it turns out, is no way to signal failure. The one that fit the bill +was five, which corresponds to "Illegal Request". + +And so it seems that 16 Sense Keys was not enough for the designers of the SCSI +protocol, so those Sense Keys correspond to broad categories. To give even +more fine-grained responses to what went wrong, there are at least two more +eight-bit bytes called the "Additional Sense Code" and "Additional Sense Code +Qualifier", which, taken together, provide for 65,536 different combinations. +And, in the interface document, I found $08 $00 which corresponds to "Logical +Unit Communication Failure" which seemed like a reasonable message for this +failure mode. + +Coding up the meaningful failure path and running the emulation showed that +this mostly satisfied the firmware; it would almost get all the way to the +point where it attempted to read block zero from the hard drive in a +non-failure mode, but there was still a small problem. + + +Every Problem Is Small, From A Certain Point Of View +---------------------------------------------------- + +There is a call in the bank 3:1 code that calls bank 4:0 to read a block from +the disk and do some analysis on what it finds. The logs also showed that this +code was also doing a lot of writing to slot I/O register $F. Much of it being +calls to the following brief routine: + +CFB4: AD 86 C8 LDA $C886 ; Get the value in $C886 +CFB7: 4A LSR A ; Shift the hi nybble to the lo nybble +CFB8: 4A LSR A +CFB9: 4A LSR A +CFBA: 4A LSR A +CFBB: 09 08 ORA #$08 ; Set the high bit of the lo nybble +CFBD: 9D 6F C0 STA $C06F,X ; & store it in slot I/O register $F +CFC0: 60 RTS + +This was some highly suggestive code, and what it suggested was that it was +using three bits of a value set up elsewhere which made for eight combinations. +The only significant loose end, as far as the hardware was concerned, was the +8K static RAM; in all of the analysis I had done up to this point, it *seemed* +that only 1K of it was ever used. But this code suggested otherwise. + +It was suggesting that slot I/O register $F was a bank select soft switch for +the 8K static RAM; once I coded it up as such, the firmware was then completely +satisfied and would get all the way to where it attempted to read block zero +from the hard drive in a non-failure mode. + + +The End Is Nigh +--------------- + +And so, having studiously and painstakingly laid the foundation for the actual +purpose of the hard drive emulation--that being the transfer of data to and +from the thing--I came at last to the part where I had to actually write code +to have real data flowing to and from the emulated hard disk. And this, as it +turns out, was the least interesting part of the whole thing; getting the +contents of files into memory and parsing them is a really trivial thing and +usually quite boring. + +So in writing this bit of code, I used 4am's "Pitch Dark" hard drive image, and +added the necessary code to serve up appropriate slices of it in response to +the firmware's READ command. And, of course, after running the new emulation +it failed to load anything. + +It was then that I remembered that I sent back messages of all zeroes to +requests from commands, for the most part, with a few exceptions. One of these +that was sure to cause problems without a proper response was the READ CAPACITY +command. When the firmware inquired about the size of the hard drive, the +emulator would happily tell it that it had zero capacity--which meant that any +attempted reads by the firmware would be out of range. + +So I coded up a proper response for the size of the hard drive image I was +using and fired up the emulator and... It still didn't work. The logs told me +that it was sending a ten-byte command, and one I hadn't seen before, which was +basically the ten-byte variant of the READ command. Once I had *that* coded up +properly, I fired up the emulator and after a few seconds, found myself in the +monitor. + +What? Why? How does this even-- + +To quell the questions that were pooling up in my head I wrote some hooks into +the emulator to trigger a code trace at the appropriate time; that being where +the code transfered control to memory address $801, the ostensible location +where the firmware allegedly read from block zero and placed it in memory at +$800. And I knew that it was getting to that point successfully because the +firmware doesn't get there unless everything is working on the SCSI bus as it +should, and the trace in the log file confirmed this. + +There are worse things than being dumped into the Apple II monitor; at least I +could poke around memory and disassemble things to try to figure out what was +going wrong. And I could see that the block that was loaded into memory was +looking at the slot ROM for a certain value that caused it to take a branch +that landed it in a crash zone. This made no sense whatsoever. + +Fortunately for me though, I have the ability to disassemble a snapshot of any +memory range that I desire--so I disassembled the entire block from $800 to +$9FF. And what I saw there was still strange; near the end of the block it +just kind of ran out of instructions, like something was missing. And looking +near the middle of the block, I saw something eerily similar to what I saw at +the end. + +Then I realized it wasn't similar, it was *identical*. Looking through the +hard drive emulator code, I was not surprised to find this: + +static uint8_t * buf; +static uint8_t bufPtr; + +Yes, I had made a rookie mistake of using too small of a value for my buffer +pointer; it was loading the correct block, but, because the buffer pointer was +only eight bits wide, it only copied the first 256 bytes out of the hard drive +image *twice*. + +As embarrassing as this was, it was also good news, as it meant that firmware +bootstrap code was working; it was reading real data from the hard drive +emulation and running correctly. Which meant that once I fixed the size of my +buffer pointer, the emulated hard drive should boot up correctly. + +And once I coded up the fix and started up the emulator once more, after six or +so seconds, "Pitch Dark" came up on the screen and it was glorious... + + +Sic Transit Gloria Mundi +------------------------ + +I was able to navigate forward and back through the various games on the hard +drive image; I could even view the artwork that came with each one. And lo and +behold: the games worked! + +I was playing through a bit Wishbringer when I got to a point where I wanted to +save my game. And, even though there was no WRITE command hooked up yet, I +tried it anyway and got a nice hard lockup on the emulator. This would never +do--to have a hard disk that was read-only--so I coded up the WRITE command +handler. + +And upon booting up the hard drive, it looked like it was OK, only there were +problems; namely, while you could navigate through the various games, you could +not launch them. As a matter of fact, the only game that *could* be launched +was Zork I, which was the first game to pop up on the menu. So after looking +the code, I noticed that there was an asymmetry in the ports used for reading +and writing to the SCSI bus. Which requires a brief digression into data +transference. + + +To DMA, Or Not To DMA, That Is The Question +------------------------------------------- + +As it turns out, I was finally able to figure out that the physical DMA on/off +switch on the card was wired to bit 6 of slot I/O register $C. I further found +out that, since I was defaulting to zero for any unknown bit in the slot I/O +registers, that it was treating the DMA switch as if it were in the off +position. However, even so, the firmware was still treating this as a DMA +transfer. + +And, looking at the 53C80 manual, I could see that it supported three distinct +kinds of bus I/O: Programmed I/O (or PIO for short), Direct Memory Access (or +DMA for short) and Pseudo DMA. Of these three, PIO is the slowest, as it +relies 100% on handshaking on the SCSI bus for data transfer, while DMA is the +fastest, as all you need to do is set some registers and tell the 53C80 to go +and it handles the transfer all in the background without the need for any +intervention from the CPU whatsoever. But what the firmware was doing, in this +DMA switch in the off position mode, was Pseudo DMA. + +How it works for reading data from the SCSI bus is that the CPU monitors bit 6 +(DMA REQ) in the slot I/O register $5, then reads the data that shows up in +slot I/O register $6 when the DMA REQ bit is asserted. For this kind of +transfer to work, however, there must be some kind of address decoding that +will assert the DACK (Dma ACKnowledge) line once the data is read. Because +this code works, we can logically deduce that the read to slot I/O register $6 +is wired to produce this signal, even if we can't prove it conclusively through +the schematic of the card. + +Writing works in a similar manner by monitoring the DMA REQ line, but instead +of writing to slot I/O register $6 (which is a trigger for starting a DMA +transfer) it writes to slot I/O register $0. And, as we inferred through logic +about the setting of the DACK line in the reading case, we can similarly infer +that the DACK line is being set in a similar manner in the writing case. + +The upshot is, even though Pseudo DMA transfers are still CPU intensive, they +are faster than PIO transfers. And when it comes to relatively slow CPUs like +the 65C02, faster is better. + + +And They All Lived Happily Ever After-ish +----------------------------------------- + +So in looking at the code for the WRITE command, I could see that I had it +using register $6 for the data transfer, which, as we can see from the short +digression above, won't work. Fixing this to look at the correct register ($0) +brought things into alignment, and a thorough test of "Pitch Dark" confirmed +that I had indeed solved the problem. + +So, in the final analysis, I was finally able to restore decency to Apple2 and +play "Pitch Dark" on it to boot. But was it worth it? In my opinion the +answer is an unequivocal "yes", and not just because it enables the use of hard +drive images in emulators. + +The reason this little exercise in digital archaeology was worth the effort +expended is that it underscores a problem that seems to have gone largely +underappreciated: the early microcomputers, in some respects, are very well +documented; however, in many others, they are not--and the knowledge of exactly +how they worked is in danger of disappearing. The fact that the documentation +for the Apple High Speed SCSI card is of a consumer oriented nature with very +little technical content was of little use in figuring out how it really +worked, and shows a marked contrast to the early days of Apple where they +published very detailed information about their computers and how they worked, +including schematics and source code. + +All that is to say that unless those of us who still remember these artifacts +and have the ability to analyze them to tease out their inner workings actually +*do* so, these things *will* disappear, and they will pass out of human memory +forever. + + +-------------- +v1.0: 6/3/2019 +v1.1: 1/10/2020