We're using the latest community GSOS release as a convenient example of a
large ProDOS volume with lots of directory structure, forks, and at least a few
files that can be identified by eyeball as having been extracted correctly.
Then I went looking for something DOS 3.3 with T type files on the disk to
verify those were being extracted correctly. Randomly stumbled across The
Correspondent 4.4 which has them, but also had some surprises for us. It
crashed cppo! A combination of removing bashbyter functions and ensuring a
variable gets initialized caused cppo to be able to cat and dump it. We do
nothing special about filenames made up entirely of control characters,
however, so they'll print incorrectly and extract as non-printable characters
in your filesystem. It's legal (if crazy) on most UNIX filesystems. I have
_no idea_ if or how to handle these things on macOS or Windows.
The old way involved a lot more sequence duplication. Now just turn the
bytes object into a mutable bytearray, iterate through the mask and
change what we need, then change it back.
A few clumsy if statements got rewritten with truth tables to verify
that the somewhat simplified conditions still evaluated the same. Other
changes are mostly cosmetic.
Originally cppo was written as a shell script and was never intended to
be a library of functions to be used by anyone else. Single-file Python
modules are often written to be run as standalone programs either to do
what they do from the command line or for testing purposes.
Theoretically you do this if your code provides useful stuff for other
programs to use, and it's hard to argue that cppo does that yet, but it
is intended to do so in the future. Let's start working toward that.
A few of my local copies of cppo have some/most of the code reformatted
in a more "pythonic" coding style. I still use hard tabs for
indentation because even us diehard console-using developers have
editors that can have whatever tabstop we want on a per-file basis, and
we have editorconfig. It's 2017, get with the times, even for a program
made for accessing files for a 1977 computer! ;) No functional changes
here, save for an if statement processing extensions replaces multiple
conditionals with an if x in tuple construct.
Python's native sequence slicing method calls for start with optional
stop and step. This is sometimes exactly what you want, but especially
when parsing binary files, you're gonna want start/length instead. If
start was an expression, messy.
In cppo, there's a function slyce that returns a sliced sequence using
start/length/step metrics, and this is used exclusively for slicing
sequences. Except sometimes you really want Python's start/stop...
I figure: Let's do it Python's way with the slicing syntax, but instead
of seq[start:start+length], you can use sli(): seq[sli(start,length)].
It's not currently used that way, but it now can be. :)
A holdover from DOS 8.3 filenames, files on Windows cannot end with a
dot. We append a - to such names on Windows platforms in all
operations, which should solve the problem, but we'd just duplicated
that code about a dozen times. No need, do it once and we can add
whatever filesystem rules for the host system we need to in one spot.
This one's missing a lot of the cleanups I've done to the others (it
isn't even python3), but it has the debug print statements and the
formatting is generally pretty good. I'll go through my local trees and
begin applying some fixes to this code in various repositories and we'll
see if we can't begin refactoring it completely.