Attempts some minor copy improvements.

2025-08-13 00:25:26 +00:00 · 2019-06-03 22:20:15 -04:00
parent 629679b942
commit e77b6f053d
1 changed files with 32 additions and 26 deletions
--- a/Apple-GCR-disk-encoding.md
+++ b/Apple-GCR-disk-encoding.md
@@ -1,51 +1,53 @@
+_See https://github.com/TomHarte/dsk2woz/blob/master/dsk2woz.c for an implementation of '6 and 2' disk encoding._
+
 # Physical Encoding
-Apple's GCR encoding was designed in reaction to FM encoding, and uses the same data density and bit clock. However in its more efficient '6 and 2' form, it uses only two-thirds as much disk surface area as FM to encode the same data. It predates and is not as efficient as MFM encoding.
+Apple's GCR encoding was designed in reaction to FM encoding, and uses the same data density and bit clock. However in its more efficient '6 and 2' form it uses only two-thirds as much disk surface area as FM to encode the same data. It predates and is not as efficient as MFM encoding.

-Two forms of Apple GCR were used during the Apple II's lifetime: '5 and 3' and '6 and 2'. Each name refers to the number of bits of information you get from each 8 flux transition windows on a disk. '5 and 3' is therefore less efficient than '6 and 2' because only five bits of information are decoded from each eight flux windows, rather than six.
+Two forms of Apple GCR were used during the Apple II's lifetime: '5 and 3' and '6 and 2'. Each name refers to the number of bits of information you get from each 8 flux transition windows on a disk. '5 and 3' is less efficient than '6 and 2'— five bits of information are decoded from each eight flux windows, rather than six.

-GCR disks have a third encoding of data, '4 and 4' which is very similar to FM. It is used only for sector metadata, not for sector contents themselves.
+GCR disks have a third encoding of data, '4 and 4' which is similar to FM. It is used only for sector metadata, not for sector contents.

 ## Track Layout
 Apple's track layout derives from that of FM data: for each sector there is a header, then a gap, then the data, then another gap.

 The gaps contain synchronisation information, allowing the controller to align its read window with the on-disk data.

-A header consists of the sector's track and sector number, the disk's volume identifier, and a check value. Each of those is one byte long. The check value is a simple exclusive OR of the other values.
+A header consists of the sector's track and sector number, the disk's volume identifier, and a check byte. Each of those is one byte long. The check byte is an exclusive OR of the other values.

-The data consists of 256 bytes of information plus a check byte. As with the header, the check byte is a simple exclusive OR of the other values.
+The data consists of 256 bytes of information plus a check byte. As with the header, the check byte is an exclusive OR of the other values.

 ## Flux window content rules
-The Disk II utilises an 8-bit lsb-to-msb shift register. It shifts at the on-disk data rate. It will shift in 1s wherever a flux transition is found on the disk and 0s where the flux transition is absent.
+The Disk II utilises an 8-bit lsb-to-msb shift register. It shifts at the on-disk data rate. It will shift in 1s wherever a flux transition is found on the disk and 0s where the flux transition is absent. Encoded bytes will always have the most significant bit set and the shift register is reset to 0s if read while full.

-If the MSB of the register is 0, it will shift immediately upon detecting the flux transition. If the msb is 1, it will pause slightly before shifting in the next bit.
-
-This is to allow adherence to a rule that for the purpose of synchronisation, encoded bytes will always have the msb set. The archetypal polling loop to obtain the next byte from the Disk II is:
+The archetypal polling loop to obtain the next byte from the Disk II is:

    .loop    LDA shift_register
             BPL .loop

-A further constraint is imposed by the analogue-to-digital conversion that looks for flux transitions; its automatic gain control is prone to amplifying noise into signal if more than two consecutive flux windows pass without a transition in them.
+If the MSB of the shift register is 0, it will shift immediately upon detecting the flux transition. If the msb is 1, it will pause for two bit durations before shifting in the next bit. So there is a lengthened window for consumption of a finished byte.

-Applying those two constraints — the msb set and no more than two consecutive zeros — motivates '6 and 2' encoding as the number of bytes with that property is between 64 and 128, making 6 the easiest number of bits to encode in a byte for base two.
+A further constraint is imposed by the analogue-to-digital conversion: its automatic gain control is prone to amplify noise into signal if more than two consecutive flux windows pass without a transition in them.

-Steve Wozniak who designed the '6 and 2' encoding had previously been under the impression that the rule was that there could be no more than a single consecutive zero bit; the less-efficient '5 and 3' encoding is the result of conforming to that stricter constraint.
+Applying those two constraints — the msb set and no more than two consecutive zeros — motivates '6 and 2' encoding as the number of usable bytes is a little more than 64.
+
+Steve Wozniak designed the '6 and 2' encoding but had previously been under the impression there could be no more than a single consecutive zero bit; the less-efficient '5 and 3' encoding is the result of conforming to that stricter constraint.

 ## Sync words
-As above, sync words lie in the gaps between sectors and between the header and data parts of sectors.
+Sync words lie in the gaps between sectors and between the header and data parts of sectors.

-A sync word is simply an `ff` encoded byte followed by as many zeroes as the content rules will allow: a single zero for '5 and 3', or two zeroes for '6 and 2'.
+A sync word is an `ff` followed by as many zeroes as the content rules will allow: a single zero for '5 and 3' and two zeroes for '6 and 2'.

 Given the top-bit-set rule, a series of sync words has the effect of bringing a CPU polling loop as above into phase with the start of each sync word.

 # '4 and 4' Encoding
 Sector header content is encoded in '4 and 4' form regardless of the encoding in use for sector contents.

-'4 and 4' encoding encodes the source byte b, with bits b7, b6, b5 ... b0 as the two on-disk bytes:
+'4 and 4' encoding encodes the source byte b, with bits b7, b6, b5 ... b0, as the two on-disk bytes:

    1 b7 1 b5 1 b3 1 b1
    1 b6 1 b4 1 b2 1 b0

-Which is equivalent to FM encoding other than in bit order. The bits are ordered different to allow for efficient decoding:
+Which is equivalent to FM encoding except in bit order. Bit order differs from FM to allow for efficient decoding:

    (((1 b7 1 b5 1 b3 1 b1) << 1) | 1) & (1 b6 1 b4 1 b2 1 b0) = original byte

@@ -58,23 +60,25 @@ A complete sector header is formed on disk as:
    two bytes: '4 and 4' encoded check value — the exclusive OR of (volume, track, sector)
    three bytes epilogue: 0xde, 0xaa, 0xeb

-`track` counts upwards from zero for the outermost track. `sector` counts upward from zero for the first sector on a track.
+`track` counts upwards from zero, zero being the outermost track. `sector` counts upward from zero, zero being first sector on any track.

-`volume` has at least two context-dependent meanings. In both of Apple's operating systems it defaults to `254` for Disk II-compatible media and in Pro DOS is used to confirm the volume *type*. Some software prefers to use it as volume *number*, for distinguishing different disks or sides of a disk. It should be written as `254` unless there is a reason to do otherwise.
+`volume` has at least two context-dependent meanings. In both of Apple's operating systems it defaults to `254` for Disk II-compatible media. In Pro DOS is used to confirm the volume *type*. Some software prefers to use it as volume *number*, for distinguishing different disks or sides of a disk. It should be written as `254` unless there is a reason to do otherwise.

 In principle the entire disk contents could have been encoded in '4 and 4' form, to give exactly the same data density as FM encoding. In practice '5 and 3' encoding was the first to be deployed; '5 and 3' was phased out in favour of '6 and 2' in 1980.

 # '6 and 2' Encoding
 Apple's second deployed sector data encoding fits six data bits into every on-disk byte.

+An entire buffer of 256 bytes is encoded into 342 bytes as an atomic unit.
+
 In overview:
-* the two lowest bits are taken from each of the 256 source bytes;
-* the remaining six bits for each of the source bytes fill the final 256 on-disk bytes of the sector;
-* 86 bytes before those 256 contain the separated low bits, so the total data size is 256+86=342 bytes;
-* an exclusive OR checksum is used, but to reduce decoding time it is applied within the six-bit data rather than as a completely orthogonal field, as described below; and
+* each of the source bytes is cut into two parts: its highest 6 bits and its lowest two;
+* the first 86 bytes of the encoded sector are used to keep the lowest two bits of all bytes;
+* the remaining portions of six bits fill the final 256 on-disk bytes of the sector;
+* an exclusive OR checksum is used, but to reduce decoding time it is applied within the six-bit data as described below; and
 * a three-byte prologue and a three-byte epilogue are applied.

-The first 84 bytes after the prologue are consistently formed as:
+After decoding the first 84 bytes after the prologue are formed as:

    byte n = {
        bits 4 & 5: low two bits of source byte n + 172, reversed
@@ -89,11 +93,13 @@ n+172 would be out of bounds for the 85th and 86th bytes, so they are formed as:
        bits 0 & 1: low two bits of source byte n, reversed
    }

-From the 87th byte onwards the content is then:
+From the 87th byte onwards the content is:

    byte n+87 = high six bits of source byte n

-For checksumming purposes, the 342nd byte is duplicated to create a 343rd. Each six-bit value is then exclusive ORd with the value one before it. The first value is unaltered. This allows the decoder to keep a running exclusive OR tally from the start of the data to the end; it is seeded with the decoded value in byte 0, and from then on determines the true value of byte n by running from start to finish, exclusive ORing the received value with the running tally and storing the result.
+To produce a check sum, the 342nd byte is duplicated to create a 343rd. Each unencoded six-bit value is exclusive ORd with the value one before it. The first value is unaltered.
+
+This allows the decoder to keep a running exclusive OR tally from the start of the data to the end; it is seeded with the decoded value in byte 0, and from then on determines the true value of byte n by running from start to finish, exclusive ORing the received value with the running tally and storing the result.

 As long as the final byte decoded matches the one stored on disk as the 343rd, the exclusive OR test passes.

@@ -125,7 +131,7 @@ So the complete sector data is formed on disk as:
    epilogue de, aa, eb

 # Sector interleaving and file formats
-Apple's '5 and 3' operating systems physically interleave sectors on the disk surface. The `6 and 2` operating systems do not; a raw reading of the disk surface would show sector 0, followed by sector 1, followed by sector 2, etc. Instead they apply an internal remapping of DOS logical sectors to on-disk physical sectors.
+Apple's '5 and 3' operating systems physically interleave sectors on the disk surface. The '6 and 2' operating systems do not; a raw reading of the disk surface would show sector 0, followed by sector 1, followed by sector 2, etc. Instead they apply an internal remapping of DOS logical sectors to on-disk physical sectors.

 The most common type of Apple disk image — variously DSK, DO, PO and other extensions — is a sector-contents-only dump, containing the original sectors in logical order. So to map them back to real media a program must produce sectors in ordinary ascending order but pick sector contents from non-sequential parts of the file: e.g. for a DOS 3.3 image the first on-disk sector, labelled as sector 0, should contain the first sector from the file but the second on-disk sector, labelled as sector 1, should contain the 8th sector from the file.