Merge branch 'manuals-ocr' into x16-r47

This commit is contained in:
Philip Zembrod
2024-08-04 14:28:53 +02:00
7 changed files with 44434 additions and 0 deletions

122
doc/About.md Normal file
View File

@@ -0,0 +1,122 @@
# About the Scanned Manuals
This directory's main content is scanned versions of the original German
manuals of VolksForth from the 80s. There were 4 main flavours of VolksForth
and accordingly 4 manuals: C64/C16/Plus4, Atari ST, CP/M and MSDOS.
The manuals for C64/C16/Plus4, Atari ST and MSDOS have recently been rescanned
and OCR-ed. Of the CP/M manual we have an older scan and an almost complete
[Org Mode](https://orgmode.org/) transcript. A partial Org Mode transcript also
exist of the MSDOS manual.
Based on the different text versions of the different manuals (transscripts,
sidecar files from `ocrmypdf`), a translation into an English manual is being
started in the 6502/C64/doc directory for the C64 3.9.6 release. Eventually
this is intended to result in a unified manual for all versions.
Note: The mix of Org Mode and Markdown in documents here stems from different
stems from different prefernces or past habits of different contributors.
## VolksForth CBM 3.80 Manual
The [doc/cbm/](cbm) directory contains the German manual for the C64/C16/Plus4
VolksForth version 3.80.
* [vf-cbm-380-manual-de.pdf](cbm/vf-cbm-380-manual-de.pdf) is the scanned and
OCR-ed PDF.
* [vf-cbm-380-manual-de.sidecar.txt](cbm/vf-cbm-380-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](cbm/raw-scans) contains the raw PDF files as produced by the
scanner from the paper orignal.
## VolksForth Atari ST 3.80 Manual
The [doc/atari-st/](atari-st) directory contains the German manual for the
Atari ST FolksForth version 3.80.
* [vf-atari-st-380-manual-de.pdf](atari-st/vf-atari-st-380-manual-de.pdf) is
the scanned and OCR-ed PDF.
* [vf-atari-st-380-manual-de.sidecar.txt](atari-st/vf-atari-st-380-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](atari-st/raw-scans) contains the raw PDF files as produced by
the scanner from the paper orignal.
* [LIESMICH.TXT](atari-st/LIESMICH.TXT) is an overview, in German,
of VolksForth and of the files that come with the Atari ST version.
Note: The .SCR files are Forth screen files, i.e. sources, and they have
since been renamed to .FB (for Forth Block source).
* [README.TXT](atari-st/README.TXT) is the same, in English.
* [CHANGES.ORG](atari-st/CHANGES.ORG) is a change log, in German, between
versions 3.7 and 3.80.
## VolksForth CP/M 3.80 Manual
The [doc/cpm/](cpm) directory contains the German manual for the CP/M
VolksForth version 3.80. Note that the CP/M VolksForth was shipped with the
C64/C16/Plus4 manual, and the CP/M manual only describes the CP/M VolksForth's
differences compared to the C64 etc. version.
* [VolksForth-3.80-CPM.pdf](cpm/VolksForth-3.80-CPM.pdf) is the scanned
and OCR-ed PDF.
* [readme.org](cpm/readme.org) is a transcript of the scanned PDF. Note that
the order of the chapters differ slightly between scan and transcript.
## VolksForth MSDOS 3.81 Manual
The [doc/msdos/](msdos) directory contains the German manual for the MSDOS
VolksForth version 3.81.
* [vf-msdos-381-manual-de.pdf](msdos/vf-msdos-381-manual-de.pdf) is the scanned
and OCR-ed PDF.
* [vf-msdos-381-manual-de.sidecar.txt](msdos/vf-msdos-381-manual-de.sidecar.txt)
is the sidecar text output generated by
[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)'s option `--sidecar`.
* [raw-scans/](msdos/raw-scans) contains the raw PDF files as produced by the
scanner from the paper orignal.
* [LIESMICH.TXT](msdos/LIESMICH.TXT) is a partial transcript of the scanned
PDF.
* [README.TXT](msdos/README.TXT) is a started cross-platform overview of
VolksForth, in English.
## Scanning and OCR notes
For the records, this is the procedure used to create the 3 newly-scanned PDFs:
The scans were made from 3 printed manual copies in mint condition; the manuals
are in A5 format.
The scanner used is a HP Color LaserJet MFP M477fdn which has a document feeder
with two-sided scanning ability, and a fixed A4 scanning size.
Since a full VolksForth manual exceeds the capacity of the feeder,
each manual was split into 3 batches; the resulting A4 PDFs are now sitting
in the `raw-scans/` directories.
The raw scans `scan0000.pdf` to `scan0002.pdf` were concatenated and cropped
using the Linux GUI tool `pdfarranger` (version 1.4.2). Steps:
* Drag & drop all files from `raw-scans/` into `pdfarranger` window.
* Press ctrl-A to select all pages.
* Edit -> Crop
* Set lower margin to 29% (1 - (1 / sqrt(2)).
* Set left and right margin to 14.5% (29% / 2).
* Click "OK.
* Edit -> Edit Properties
* Set Creator to "Forth Gesellschaft e.V." (in other PDF vierers this is
displayed as the Author property).
* Save as "newly-cropped.pdf"
The final searchable PDF was created from the intermediate `newly-cropped.pdf`
by adding an OCR text layer using OCRmyPDF:
```
ocrmypdf -l deu -d -c -i newly-cropped.pdfvf-<version>-manual-de.pdf --sidecar vf-<version>-manual-de.sidecar.txt
```
The sidecar file contains the OCR-ed text added into the text layer and is
expected to be useful as input for a machine-aided translation of the manual
into English.
A note about PDF versions: The raw scans are PDF-1.4, `pdfarranger` outputs
PDF-1.3 which seems to cause problems (error 14) when opening files with
Adobe Acrobat. `ocrmypdf` produces PDF/A-2b which does not seem to cause these
problems.

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff