The code has been informally tested using some sample data.
Trivia: as I discovered by accident, apparently LibreOffice on Linux can
open binhex-encoded PICTs, as long as the file uses a `\n` line ending
after the header (`binhex.py` uses a `\r\r` separator, but manually
editing this produces a result that LibreOffice opens).
The test data appears not to have actually taken advantage of RLE, but
it could be successfully decoded, and re-encoding and re-decoding the
result was a successful round trip.
New approach: explicitly split the input into runs (tracking the length
and byte value for each run), and emit encoded data for each run when
the next one starts (and at the end of the data). This seems overall
easier to understand.
The rle_encode implementation works somewhat differently from the
original C; rather than using numeric indices into the original data
with a nested loop to detect runs, it tries to use a single stateful
loop (remembering how many repeated characters have been seen so far).
It is an open question whether runs of the RUNCHAR byte (0x90) should be
encoded. The specification is ambiguous:
http://fileformats.archiveteam.org/wiki/RLE90
so it's unclear whether a sequence like 0x90 0x00 0x90 0x09 can be used
to encode 10 0x90 bytes, or whether this would represent 0x90 followed
by 9 0x00 bytes, or just what. It appears that the original
standard library decoder interprets this as "10 0x90 bytes", since it
gets the byte to be repeated from the output buffer. However, it also
appears that the original standard library encoder doesn't attempt this
encoding, and will turn runs of 0x90 input into runs of alternating 0x90
and 0x00 bytes.
The code here is not tested, but should preserve that behaviour.
Since the standard library `binascii` module is unavailable whenever the
vendored `binhex` would be used, the latter is modified to refer to
`hqx` in the same package instead.
For future-proofing, `hqx` delegates to a `slow_hqx` implementation,
which could be overridden by a C implementation when available.
The function names are also slightly altered to make more sense in the
context of this change. However, the original interface will be
preserved, so that no further change in `binhex.py` will be required.
Previously the Configfile would say something like
'HWInitCodeOffset=0x00000000=HWInit' if the size field was mistakenly
set to zero. Now we just search for whatever can be found at that
location.
Conceived in a dream, this feature turned out to be more of a bug. The
whole idea of tbxi is to work with easily-editable text files and naked
binaries. Patching is best accomplished with a separate script.
The Pippin ROM has lower checksums than expected, by about 4%. Without
recovering the missing data, this probably cannot be replicated, so we
just detect it and live with it.