Don Worth's Beneath Apple DOS extracted from his Apple II floppies
Go to file
2017-07-21 06:09:18 -07:00
D2S1 Remove dot commands from appendix A 2017-07-21 06:09:18 -07:00
scripts Add script, document process better 2017-07-21 04:45:23 -07:00
.editorconfig
.gitattributes
appendixA.txt Clean up appendix A paragraphs 2017-07-21 06:09:18 -07:00
ch01.txt
ch02.txt
ch03.txt
ch04.txt
ch05.txt
ch06.txt Fix bitrot in CH6.2, append to ch06.txt 2017-07-21 04:20:16 -07:00
ch07.txt Reflow chapter 7 paragraphs 2017-07-21 04:56:15 -07:00
ch08.txt Chapter 8 cleanup 2017-07-21 05:32:14 -07:00
README.md Add script, document process better 2017-07-21 04:45:23 -07:00
toc.txt TOC cleanup 2017-07-21 05:44:32 -07:00

Don Worth's Beneath Apple DOS

Don Worth wrote a very cool book for the Apple II. Actually, he wrote several, but here is one of them that I happened to need. He found a bunch of his disks containing the original text in his garage, and he was happy to have his original disks be released into the hands of whomever might want to use them. Since the OCR versions of this book are ... less than great ... I've decided to try and convert his originals.

The Goal

I'd like to see a proper version of this book. Text, figures, all of it. To do that is not going to be trivial, but it starts with clean text. We don't have that on archive.org, yet, but perhaps we can fix that? Please feel free to join in--send patches, help add stuff, etc.

The method

Documenting this for other texts to be converted in future...

First we need to extract the text documents from the disks and turn them into something we can use on a modern system:

  1. The DOS 3.3 disks were dumped using cppo
  2. Apply scripts/extract_piewriter.py to each document file which did the following transformations:
    • For characters 0xa0-0xfe, strip the high bit to get pure ASCII
    • Convert 0x0d and 0x8d (return) characters ti 0x0a (newline)
    • Escape all else in C-style
  3. Remove NUL at end of .txt files and renamed the assembly source to .s
  4. Remove trailing whitespace
  5. Normalize dot commands (lowercase, spacing) for easier mechanical parsing.
  6. Remove the obvious dot commands (.pp is a paragraph break, .sp creates vertical space, .br seems to be a line break, .bp a page break) and attempt to remove or interpret others as seems appropriate

This process has probably broken the .s files and there were some files that don't appear to have actually been part of the text (or maybe they were edits and revisions?), and there was bitrot in the files suggesting the disks the source documents were stored on were losing their integrity.