See https://github.com/TomHarte/dsk2woz/blob/master/dsk2woz.c#L366 and in particular https://github.com/TomHarte/dsk2woz/blob/master/dsk2woz.c#L308 for an implementation of '6 and 2' disk encoding.
Physical Encoding
Apple's GCR encoding was designed in reaction to FM encoding, and uses the same data density and bit clock. However in its more efficient '6 and 2' form it uses only two-thirds as much disk surface area as FM to encode the same data. It predates and is not as efficient as MFM encoding.
Two forms of Apple GCR were used during the Apple II's lifetime: '5 and 3' and '6 and 2'. Each name refers to the number of bits of information you get from each 8 flux transition windows on a disk. '5 and 3' is less efficient than '6 and 2'— five bits of information are decoded from each eight flux windows, rather than six.
GCR disks have a third encoding of data, '4 and 4' which is similar to FM. It is used only for sector metadata, not for sector contents.
Track Layout
Apple's track layout derives from that of FM data: for each sector there is a header, then a gap, then the data, then another gap.
The gaps contain synchronisation information, allowing the controller to align its read window with the on-disk data.
A header consists of the sector's track and sector number, the disk's volume identifier, and a check byte. Each of those is one byte long. The check byte is an exclusive OR of the other values.
The data consists of 256 bytes of information plus a check byte. As with the header, the check byte is an exclusive OR of the other values.
Flux window content rules
The Disk II utilises an 8-bit lsb-to-msb shift register. It shifts at the on-disk data rate. It will shift in 1s wherever a flux transition is found on the disk and 0s where the flux transition is absent. Encoded bytes will always have the most significant bit set and the shift register is reset to 0s if read while full.
The archetypal polling loop to obtain the next byte from the Disk II is:
.loop LDA shift_register
BPL .loop
If the MSB of the shift register is 0, it will shift immediately upon detecting the flux transition. If the msb is 1, it will pause for two bit durations before shifting in the next bit. So there is a lengthened window for consumption of a finished byte.
A further constraint is imposed by the analogue-to-digital conversion: its automatic gain control is prone to amplify noise into signal if more than two consecutive flux windows pass without a transition in them.
Applying those two constraints — the msb set and no more than two consecutive zeros — motivates '6 and 2' encoding as the number of usable bytes is a little more than 64.
Steve Wozniak designed the '6 and 2' encoding but had previously been under the impression there could be no more than a single consecutive zero bit; the less-efficient '5 and 3' encoding is the result of conforming to that stricter constraint.
Sync words
Sync words lie in the gaps between sectors and between the header and data parts of sectors.
A sync word is an ff
followed by as many zeroes as the content rules will allow: a single zero for '5 and 3' and two zeroes for '6 and 2'.
Given the top-bit-set rule, a series of sync words has the effect of bringing a CPU polling loop as above into phase with the start of each sync word.
'4 and 4' Encoding
Sector header content is encoded in '4 and 4' form regardless of the encoding in use for sector contents.
'4 and 4' encoding encodes the source byte b, with bits b7, b6, b5 ... b0, as the two on-disk bytes:
1 b7 1 b5 1 b3 1 b1
1 b6 1 b4 1 b2 1 b0
Which is equivalent to FM encoding except in bit order. Bit order differs from FM to allow for efficient decoding:
(((1 b7 1 b5 1 b3 1 b1) << 1) | 1) & (1 b6 1 b4 1 b2 1 b0) = original byte
A complete sector header is formed on disk as:
three bytes prologue: 0xd5, 0xaa, 0x96
two bytes: '4 and 4' encoded volume
two bytes: '4 and 4' encoded track
two bytes: '4 and 4' encoded sector
two bytes: '4 and 4' encoded check value — the exclusive OR of (volume, track, sector)
three bytes epilogue: 0xde, 0xaa, 0xeb
track
counts upwards from zero, zero being the outermost track. sector
counts upward from zero, zero being first sector on any track.
volume
has at least two context-dependent meanings. In both of Apple's operating systems it defaults to 254
for Disk II-compatible media. In Pro DOS is used to confirm the volume type. Some software prefers to use it as volume number, for distinguishing different disks or sides of a disk. It should be written as 254
unless there is a reason to do otherwise.
In principle the entire disk contents could have been encoded in '4 and 4' form, to give exactly the same data density as FM encoding. In practice '5 and 3' encoding was the first to be deployed; '5 and 3' was phased out in favour of '6 and 2' in 1980.
'6 and 2' Encoding
Apple's second deployed sector data encoding fits six data bits into every on-disk byte.
An entire buffer of 256 bytes is encoded into 342 bytes as an atomic unit.
In overview:
- each of the source bytes is cut into two parts: its highest 6 bits and its lowest two;
- the first 86 bytes of the encoded sector are used to keep the lowest two bits of all bytes;
- the remaining portions of six bits fill the final 256 on-disk bytes of the sector;
- an exclusive OR checksum is used, but to reduce decoding time it is applied within the six-bit data as described below; and
- a three-byte prologue and a three-byte epilogue are applied.
After decoding the first 84 bytes after the prologue are formed as:
byte n = {
bits 4 & 5: low two bits of source byte n + 172, reversed
bits 2 & 3: low two bits of source byte n + 86, reversed
bits 0 & 1: low two bits of source byte n, reversed
}
n+172 would be out of bounds for the 85th and 86th bytes, so they are formed as:
byte n = {
bits 2 & 3: low two bits of source byte n + 86, reversed
bits 0 & 1: low two bits of source byte n, reversed
}
From the 87th byte onwards the content is:
byte n+87 = high six bits of source byte n
To produce a check sum, the 342nd byte is duplicated to create a 343rd. Each unencoded six-bit value is exclusive ORd with the value one before it. The first value is unaltered.
This allows the decoder to keep a running exclusive OR tally from the start of the data to the end; it is seeded with the decoded value in byte 0, and from then on determines the true value of byte n by running from start to finish, exclusive ORing the received value with the running tally and storing the result.
As long as the final byte decoded matches the one stored on disk as the 343rd, the exclusive OR test passes.
For writing to disk each six-bit value is mapped to an eight-bit value using the table:
const uint8_t six_and_two_mapping[] = {
0x96, 0x97, 0x9a, 0x9b, 0x9d, 0x9e, 0x9f, 0xa6,
0xa7, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb2, 0xb3,
0xb4, 0xb5, 0xb6, 0xb7, 0xb9, 0xba, 0xbb, 0xbc,
0xbd, 0xbe, 0xbf, 0xcb, 0xcd, 0xce, 0xcf, 0xd3,
0xd6, 0xd7, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde,
0xdf, 0xe5, 0xe6, 0xe7, 0xe9, 0xea, 0xeb, 0xec,
0xed, 0xee, 0xef, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6,
0xf7, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};
Where six-bit value n
maps to the 8-bit value six_and_two_mapping[n]
.
The prologue for sector data is d5, aa, ad
; the epilogue is de, aa, eb
, which is the same epilogue as for sector headers.
So the complete sector data is formed on disk as:
prologue d5, aa, ad
[ XOR section:
86 bytes containing combinations of the low two bits of source bytes
256 bytes containing the high six bits of source bytes
], all encoded via the 6-to-8 table
the XOR check value, which is the same as the high six bits of the final source byte, 6-to-8 encoded
epilogue de, aa, eb
Sector interleaving and file formats
Apple's '5 and 3' operating systems physically interleave sectors on the disk surface. The '6 and 2' operating systems do not; a raw reading of the disk surface would show sector 0, followed by sector 1, followed by sector 2, etc. Instead they apply an internal remapping of DOS logical sectors to on-disk physical sectors.
The most common type of Apple disk image — variously DSK, DO, PO and other extensions — is a sector-contents-only dump, containing the original sectors in logical order. So to map them back to real media a program must produce sectors in ordinary ascending order but pick sector contents from non-sequential parts of the file: e.g. for a DOS 3.3 image the first on-disk sector, labelled as sector 0, should contain the first sector from the file but the second on-disk sector, labelled as sector 1, should contain the 8th sector from the file.
Therefore the proper physical interpretation of that sort of disk image is tightly coupled to the internals of the specific software that created it.
Specifically, on-disk sectors 0 to 15 of a DOS 3.3 image (ordinarily having the extension DSK or DO) should contain the contents of the image sectors at offsets: 0, 7, 14, 6, 13, 5, 12, 4, 11, 3, 10, 2, 9, 1, 8, 15
— i.e. increase by 7 at each step, and take the modulo by 15 if out of bounds. The on-disk sectors 0 to 15 of a Pro-DOS image (ordinarily PO) should contain the sectors at offsets 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15
— i.e. increase by 8 each step, and take the modulo by 15 if out of bounds.