Add doc/About.md describing the content of the doc/ directory, as well as

the procedure used when creating the new scans.
This commit is contained in:
Philip Zembrod 2024-08-04 13:40:59 +02:00
parent 9be38bc7ad
commit c0d0e019f5

122
doc/About.md Normal file
View File

@ -0,0 +1,122 @@
# About the Scanned Manuals
This directory's main content is scanned versions of the original German
manuals of VolksForth from the 80s. There were 4 main flavours of VolksForth
and accordingly 4 manuals: C64/C16/Plus4, Atari ST, CP/M and MSDOS.
The manuals for C64/C16/Plus4, Atari ST and MSDOS have recently been rescanned
and OCR-ed. Of the CP/M manual we have an older scan and an almost complete
[Org Mode](https://orgmode.org/) transcript. A partial Org Mode transcript also
exist of the MSDOS manual.
Based on the different text versions of the different manuals (transscripts,
sidecar files from `ocrmypdf`), a translation into an English manual is being
started in the 6502/C64/doc directory for the C64 3.9.6 release. Eventually
this is intended to result in a unified manual for all versions.
Note: The mix of Org Mode and Markdown in documents here stems from different
stems from different prefernces or past habits of different contributors.
## VolksForth CBM 3.80 Manual
The [doc/cbm/](cbm) directory contains the German manual for the C64/C16/Plus4
VolksForth version 3.80.
* [vf-cbm-380-manual-de.pdf](cbm/vf-cbm-380-manual-de.pdf) is the scanned and
OCR-ed PDF.
* [vf-cbm-380-manual-de.sidecar.txt](cbm/vf-cbm-380-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](cbm/raw-scans) contains the raw PDF files as produced by the
scanner from the paper orignal.
## VolksForth Atari ST 3.80 Manual
The [doc/atari-st/](atari-st) directory contains the German manual for the
Atari ST FolksForth version 3.80.
* [vf-atari-st-380-manual-de.pdf](atari-st/vf-atari-st-380-manual-de.pdf) is
the scanned and OCR-ed PDF.
* [vf-atari-st-380-manual-de.sidecar.txt](atari-st/vf-atari-st-380-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](cbm/raw-scans) contains the raw PDF files as produced by the
scanner from the paper orignal.
* [LIESMICH.TXT](atari-st/LIESMICH.TXT) is an overview, in German,
of VolksForth and of the files that come with the Atari ST version.
Note: The .SCR files are Forth screen files, i.e. sources, and they have
since been renamed to .FB (for Forth Block source).
* [README.TXT](atari-st/README.TXT) is the same, in English.
* [CHANGES.ORG](atari-st/CHANGES.ORG) is a change log, in German, between
versions 3.7 and 3.80.
## VolksForth CP/M 3.80 Manual
The [doc/cpm/](cpm) directory contains the German manual for the CP/M
VolksForth version 3.80. Note that the CP/M VolksForth was shipped with the
C64/C16/Plus4 manual, and the CP/M manual only describes the CP/M VolksForth's
differences compared to the C64 etc. version.
* [VolksForth-3.80-CPM.pdf](cpm/VolksForth-3.80-CPM.pdf) is the scanned
and OCR-ed PDF.
* [readme.org](cpm/readme.org) is a transcript of the scanned PDF. Note that
the order of the chapters differ slightly between scan and transcript.
## VolksForth MSDOS 3.81 Manual
The [doc/msdos/](msdos) directory contains the German manual for the MSDOS
VolksForth version 3.81.
* [vf-msdos-381-manual-de.pdf](msdos/vf-msdos-381-manual-de.pdf) is the scanned
and OCR-ed PDF.
* [vf-msdos-381-manual-de.sidecar.txt](msdos/vf-msdos-381-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](msdos/raw-scans) contains the raw PDF files as produced by the
scanner from the paper orignal.
* [LIESMICH.TXT](msdos/LIESMICH.TXT) is a partial transcript of the scanned
PDF.
* [README.TXT](msdos/README.TXT) is a started cross-platform overview of
VolksForth, in English.
## Scanning and OCR notes
For the records, this is the procedure used to create the 3 newly-scanned PDFs:
The scans were made from 3 printed manual copies in mint condition; the manuals
are in A5 format.
The scanner used is a HP Color LaserJet MFP M477fdn which has a document feeder
with two-sided scanning ability, and a fixed A4 scanning size.
Since a full VolksForth manual exceeds the capacity of the feeder,
each manual was split into 3 batches; the resulting A4 PDFs are now sitting
in the `raw-scans/` directories.
The raw scans `scan0000.pdf` to `scan0002.pdf` were concatenated and cropped
using the Linux GUI tool `pdfarranger` (version 1.4.2). Steps:
* Drag & drop all files from `raw-scans/` into `pdfarranger` window.
* Press ctrl-A to select all pages.
* Edit -> Crop
* Set lower margin to 29% (1 - (1 / sqrt(2)).
* Set left and right margin to 14.5% (29% / 2).
* Click "OK.
* Edit -> Edit Properties
* Set Creator to "Forth Gesellschaft e.V." (in other PDF vierers this is
displayed as the Author property).
* Save as "newly-cropped.pdf"
The final searchable PDF was created from the intermediate `newly-cropped.pdf`
by adding an OCR text layer using OCRmyPDF:
```
ocrmypdf -l deu -d -c -i newly-cropped.pdfvf-<version>-manual-de.pdf --sidecar vf-<version>-manual-de.sidecar.txt
```
The sidecar file contains the OCR-ed text added into the text layer and is
expected to be useful as input for a machine-aided translation of the manual
into English.
A note about PDF versions: The raw scans are PDF-1.4, `pdfarranger` outputs
PDF-1.3 which seems to cause problems (error 14) when opening files with
Adobe Acrobat. `ocrmypdf` produces PDF/A-2b which does not seem to cause these
problems.