Set of tools to deal with specially encoded Macintosh files
Go to file
2018-03-22 21:07:53 -05:00
binhex Fix most warnings. 2018-03-22 21:07:53 -05:00
comm Fix most warnings. 2018-03-22 21:07:53 -05:00
crc Fix most warnings. 2018-03-22 21:07:53 -05:00
doc Fix permissions, remove pre-built files. 2018-03-22 20:07:52 -05:00
fileio Fix most warnings. 2018-03-22 21:07:53 -05:00
hexbin Fix most warnings. 2018-03-22 21:07:53 -05:00
macunpack Fix most warnings. 2018-03-22 21:07:53 -05:00
man Fix permissions, remove pre-built files. 2018-03-22 20:07:52 -05:00
mixed Fix most warnings. 2018-03-22 21:07:53 -05:00
util Fix most warnings. 2018-03-22 21:07:53 -05:00
.gitignore Fix most warnings. 2018-03-22 21:07:53 -05:00
makefile Fix permissions, remove pre-built files. 2018-03-22 20:07:52 -05:00
README Fix permissions, remove pre-built files. 2018-03-22 20:07:52 -05:00

This is version 2.0b3 of macutil (22-OCT-1992).

This package contains the following utilities:
	macunpack
	hexbin
	macsave
	macstream
	binhex
	tomac
	frommac

Requirements:
a.  Of course a C compiler.
b.  A 32-bit machine with large memory (or at least the ability to 'malloc'
    large chunks of memory).  For reasons of efficiency and simplicity the
    programs work 'in-core', also many files are first read in core.
    If somebody can take the trouble to do it differently, go ahead!
    There are also probably in a number of places implicit assumptions that
    an int is 32 bits.  If you encounter such occurrences feel free to
    notify me.
c.  A Unix (tm) machine, or something very close.  There are probably quite
    a lot of Unix dependencies.  Also here, if you have replacements, feel
    free to send comments.
d.  This version normally uses the 'mkdir' system call available on BSD Unix
    and some versions of SysV Unix.  You can change that, see the makefile for
    details.

File name translation:

The programs use a table driven program to do Mac filename -> Unix filename
translation.  When compiled without further changes the translation is as
follows:
    Printable ASCII characters except space and slash are not changed.
    Slash and space are changed to underscore, as are all characters that
    do not fall in the following group.
    Accented letters are translated to their unaccented counterparts.
If your system supports the Latin-1 character set, you can change this
translation scheme by specifying '-DLATIN1' for the 'CF' macro in the
makefile.  This will translate all accented letters (and some symbols)
to their Latin-1 counterpart.  This feature is untested (I do not have
access to systems that cater for Latin-1), so use with care.
Future revisions of the program will have user settable conversions.

Another feature of filename translation is that when the -DNODOT flag is
specified in the CF macro an initial period will be translated to underscore.

MacBinary stream:

Most programs allow MacBinary streams as either input or output.  A
MacBinary stream is a series of files in MacBinary format pasted
together.  Embedded within a MacBinary stream can be information about
folders.  So a MacBinary stream can contain all information about a
folder and its constituents.

Appleshare support:

Optionally the package can be compiled for systems that support the sharing
of Unix and Mac filesystems.  The package supports AUFS (AppleTalk Unix File
Server) from CAP (Columbia AppleTalk Package) and AppleDouble (from Apple).
It will not support both at the same time.  Moreover this support requires
the existence of the 'mkdir' system call.  And finally, as implemented it
probably will work on big-endian BSD compatible systems.  If you have a SysV
system with restricted filename lengths you can get problems.  I do not know
also whether the structures are stored native or Apple-wise on little-endian
systems.  And also, I did not test it fully; having no access to either AUFS
or AppleDouble systems.

Acknowledgements:
a.  Macunpack is for a large part based on the utilities 'unpit' and 'unsit'
    written by:
	Allan G. Weber
	weber%brand.usc.edu@oberon.usc.edu
    (wondering whether that is still valid!).  I combined the two into a
    single program and did a lot of modification.  For information on the
    originals, see the files README.unpit and README.unsit.
b.  The crc-calculating routines are based on a routine originally written by:
	Mark G. Mendel
	UUCP: ihnp4!umn-cs!hyper!mark
    (this will not work anymore for sure!).  Also here I modified the stuff
    and expanded it, see the files README.crc and README.crc.orig.
c.  LZW-decompression is taken from the sources of compress that are floating
    around.  Probably I did not use the most efficient version, but this
    program was written to get it done.  The version I based it on (4.0) is
    authored by:
	Steve Davies            (decvax!vax135!petsd!peora!srd)
	Jim McKie               (decvax!mcvax!jim)  (Hi Jim!)
	Joe Orost               (decvax!vax135!petsd!joe)
	Spencer W. Thomas       (decvax!harpo!utah-cs!utah-gr!thomas)
	Ken Turkowski           (decvax!decwrl!turtlevax!ken)
	James A. Woods          (decvax!ihnp4!ames!jaw)
    I am sure those e-mail addresses also will not work!
d.  Optional AUFS support comes from information supplied by:
	Casper H.S. Dik
	University of Amsterdam
	Kruislaan  409
	1098 SJ  Amsterdam
	Netherlands

	phone: +31205922022
	email: casper@fwi.uva.nl
    This is an e-mail address that will workm but the address and phone
    number ar no longer valid.
    See the makefile.
    Some caveats are applicable:
    1.  I did not fully test it (we do not use it).  But the unpacking
	appears to be correct.  Anyhow, as the people who initially compile
	it and use it will be system administrators I am confident they are
	able to locate bugs!  (What if an archive contains a Macfile with
	the name .finderinfo or .resource?  I have had two inputs for AUFS
	support [I took Caspers; his came first], but both do not deal with
	that.  Does CAP deal with it?)  Also I have no idea whether this
	version supports it under SysV, so beware.
    2.	From one of the README's supplied by Casper:
	    Files will not appear in an active folder, because Aufs doesn't like
	    people working behind it's back.
	    Simply opening and closing the folder will suffice.
	Appears to be the same problem as when you are unpacking or in some
	other way creating files in a folder open to multifinder.  I have seen
	bundle bits disappear this way.  So if after unpacking you see the
	generic icon; check whether a different icon should appear and check
	the bundle bit.
	    The desktop isn't updated, but that doesn't seem to matter. 
	I dunno, not using it.
e.  Man pages are now supplied.  The base was provided by:
	Douglas Siebert
	ISCA
	dsiebert@icaen.uiowa.edu
f.  Because of some problems the Uncompactor has been rewritten, it is now
    based on sources from the dearchiver unzip (of PC fame).  Apparently the
    base code is by:
	Samuel H. Smith
    I have no further address available, but as soon as I find a better
    attribution, I will include it.
g.  UnstuffIt's LZAH code comes from lharc (also of PC fame) by:
	Haruhiko Okumura,
	Haruyasu Yoshizaki,
	Yooichi Tagawa.
h.  Zoom's code comes from information supplied by Jon W{tte
    (d88-jwa@nada.kth.se).  The Zoo decompressor is based on the routine
    written by Rahul Dhesi (dhesi@cirrus.COM).  This again is based on
    code by Haruhiko Okumura.  See also the file README.zoom.
i.  MacLHa's decompressors are identical to the ones mentioned in g and h.
j.  Most of hexbin's code is based on code written/modified by:
	Dave Johnson, Brown University Computer Science
	Darin Adler, TMQ Software
	Jim Budler, amdcad!jimb
	Dan LaLiberte, liberte@uiucdcs
	ahm (?)
	Jeff Meyer, John Fluke Company
	Guido van Rossum, guido@cwi.nl (Hi!)
    (most of the e-mail addresses will not work, the affiliation may also
    be incorrect by now.)  See also the file README.hexbin.
k.  The dl code in hexbin comes is based on the original distribution of
    SUMacC.
l.  The mu code in hexbin is a slight modification of the hcx code (the
    compressions are identical).
m.  The MW code for StuffIt is loosely based on code by Daniel H. Bernstein
    (brnstnd@acf10.nyu.edu).
n.  Tomac and frommac are loosely based on the original macput and macget
    by (the e-mail address will not work anymore):
	Dave Johnson
	ddj%brown@csnet-relay.arpa
	Brown University Computer Science

-------------------------------------------------------------------------------
Macunpack will unpack PackIt, StuffIt, Diamond, Compactor/Compact Pro, most
StuffItClassic/StuffItDeluxe, and all Zoom and LHarc/MacLHa archives, and
archives created by later versions of DiskDoubler.
Also it will decode files created by BinHex5.0, MacBinary, UMCP,
Compress It, ShrinkToFit, MacCompress, DiskDoubler and AutoDoubler.

(PackIt, StuffIt, Diamond, Compactor, Compact/Pro, Zoom and LHarc/MacLHa are
archivers written by respectively: Harry R. Chesley, Raymond Lau, Denis Sersa,
Bill Goodman, Jon W{tte* and Kazuaki Ishizaki.  BinHex 5.0, MacBinary and
UMCP are by respectively: Yves Lempereur, Gregory J. Smith, Information
Electronics.  ShrinkToFit is by Roy T. Hashimoto, Compress It by Jerry
Whitnell, and MacCompress, DiskDoubler and AutoDoubler are all by
Lloyd Chambers.)

* from his signature:
	Jon W{tte - Yes, that's a brace - Damn Swede.
Actually it is an a with two dots above; some (German inclined) people
refer to it (incorrectly) as a-umlaut.

It does not deal with:
a.  Password protected archives.
b.  Multi-segment archives.
c.  Plugin methods for Zoom.
d.  MacLHa archives not packed in MacBinary mode (the program deals very
    poorly with that!).

Background:
There are millions of ways to pack files, and unfortunately, all have been
implemented one way or the other.  Below I will give some background
information about the packing schemes used by the different programs
mentioned above.  But first some background about compression (I am no
expert, more comprehensive information can be found in for instance:
Tomothy Bell, Ian H. Witten and John G. Cleary, Modelling for Text
Compression, ACM Computing Surveys, Vol 21, No 4, Dec 1989, pp 557-591).

Huffman encoding (also called Shannon-Fano coding or some other variation
    of the name).  An encoding where the length of the code for the symbols
    depends on the frequency of the symbols.  Frequent symbols have shorter
    codes than infrequent symbols.  The normal method is to first scan the
    file to be compressed, and to assign codes when this is done (see for
    instance: D. E. Knuth, the Art of Computer Programming).  Later methods
    have been designed to create the codes adaptively; for a survey see:
    Jeremy S. Vetter, Design and Analysis of Dynamic Huffman Codes, JACM,
    Vol 34, No 4, Oct 1987, pp 825-845.
LZ77: The first of two Ziv-Lempel methods.  Using a window of past encoded
    text, output consists of triples for each sequence of newly encoded
    symbols: a back pointer and length of past text to be repeated and the
    first symbol that is not part of that sequence.  Later versions allowed
    deviation from the strict alternation of pointers and uncoded symbols
    (LZSS by Bell).  Later Brent included Huffman coding of the pointers
    (LZH).
LZ78: While LZ77 uses a window of already encoded text as a dictionary,
    LZ78 dynamically builds the dictionary.  Here again pointers are strictly
    alternated with unencoded new symbols.  Later Welch (LZW) managed to
    eliminate the output of unencoded symbols.  This algorithm is about
    the same as the one independently invented by Miller and Wegman (MW).
    A problem with these two schemes is that they are patented.  Thomas
    modified LZW to LZC (as used in the Unix compress command).  While LZ78
    and LZW become static once the dictionary is full, LZC has possibilities
    to reset the dictionary.  Many LZC variants are in use, depending on the
    size of memory available.  They are distinguished by the maximum number
    of bits that are used in a code.
A number of other schemes are proposed and occasionally used.  The main
advantage of the LZ type schemes is that (especially) decoding is fairly fast.

Programs background:

Plain programs:
BinHex 5.0:
    Unlike what its name suggest this is not a true successor of BinHex 4.0.
    BinHex 5.0 takes the MacBinary form of a file and stores it in the data
    fork of the newly created file.
    Although BinHex 5.0 does not create BinHex 4.0 compatible files, StuffIt
    will give the creator type of BinHex 5.0 (BnHq) to its binhexed files,
    rather than the creator type of BinHex 4.0 (BNHQ).  The program knows
    about that.
MacBinary:
    As its name suggests, it does the same as BinHex 5.0.
UMCP:
    Looks similar, but the file as stored by UMCP is not true MacBinary.
    Size fields are modified, the result is not padded to a multiple of 128,
    etc.  Macunpack deals with all that, but until now is not able to
    correctly restore the finder flags of the original file.  Also, UMCP
    created files have type "TEXT" and creator "ttxt", which can create a
    bit of confusion.  Macunpack will recognize these files only if the
    creator has been modified to "UMcp".

Compressors:
ShrinkToFit:
    This program uses a Huffman code to compress.  It has an option (default
    checked for some reason), COMP, for which I do not yet know the
    meaning.  Compressing more than a single file in a single run results
    in a failure for the second and subsequent files.
Compress It:
    Also uses a Huffman code to compress.
MacCompress:
    MacCompress has two modes of operation, the first mode is (confusingly)
    MacCompress, the second mode is (again confusingly) UnixCompress.  In
    MacCompress mode both forks are compressed using the LZC algorithm.
    In UnixCompress mode only the data fork is compressed, and some shuffling
    of resources is performed.  Upto now macunpack only deals with MacCompress
    mode.  The LZC variant MacCompress uses depends on memory availability.
    12 bit to 16 bit LZC can be used.

Archivers:
ArcMac:
    Nearly PC-Arc compatible.  Arc knows 8 compression methods, I have seen
    all of them used by ArcMac, except the LZW techniques.  Here they are:
    1:	No compression, shorter header
    2:	No compression
    3:	(packing) Run length encoding
    4:	(squeezing) RLE followed by Huffman encoding
    5:	(crunching) LZW
    6:	(crunching) RLE followed by LZW
    7:	(crunching) as the previous but with a different hash function
    8:	(crunching) RLE followed by 12-bit LZC
    9:	(squashing) 13-bit LZC
PackIt:
    When archiving a file PackIt either stores the file uncompressed or
    stores the file Huffman encoded.  In the latter case both forks are
    encoded using the same Huffman tree.
StuffIt and StuffIt Classic/Deluxe:
    These have the ability to use different methods for the two forks of a
    file.  The following standard methods I do know about (the last three
    are only used by the Classic/Deluxe version 2.0 of StuffIt):
    0:	No compression
    1:	Run length encoding
    2:	14-bit LZC compression
    3:	Huffman encoding
    5:	LZAH: like LZH, but the Huffman coding used is adaptive
    6:	A Huffman encoding using a fixed (built-in) Huffman tree
    8:	A MW encoding
Diamond:
    Uses a LZ77 like frontend plus a Fraenkel-Klein like backend (see
    Apostolico & Galil, Combinatorial Algorithms on Words, pages 169-183).
Compactor/Compact Pro:
    Like StuffIt, different encodings are possible for data and resource fork.
    Only two possible methods are used:
    0:	Run length encoding
    1:	RLE followed by some form of LZH
Zoom:
    Data and resource fork are compressed with the same method.  The standard
    uses either no compression or some form of LZH
MacLHa:
    Has two basic modes of operation, Mac mode and Normal mode.  In Mac mode
    the file is archived in MacBinary form.  In normal mode only the forks
    are archived.  Normal mode should not be used (and can not be unpacked
    by macunpack) as all information about data fork size/resource fork size,
    type, creator etc. is lost.  It knows quite a few methods, some are
    probably only used in older versions, the only methods I have seen used
    are -lh0-, -lh1- and -lh5-.  Methods known by MacLHa:
    -lz4-:  No compression
    -lz5-:  LZSS
    -lzs-:  LZSS, another variant
    -lh0-:  No compression
    -lh1-:  LZAH (see StuffIt)
    -lh2-:  Another form of LZAH
    -lh3-:  A form of LZH, different from the next two
    -lh4-:  LZH with a 4096 byte buffer (as far as I can see the coding in
	    MacLHa is wrong)
    -lh5-:  LZH with a 8192 byte buffer
DiskDoubler:
    The older version of DiskDoubler is compatible with MacCompress.  It does
    not create archives, it only compresses files.  The newer version (since
    3.0) does both archiving and compression.  The older version uses LZC as
    its compression algorithm, the newer version knows a number of different
    compression algorithms.  Many (all?) are algorithms used in other
    archivers.  Probably this is done to simplify conversion from other formats
    to DiskDoubler format archives.  I have seen actual DiskDoubler archives
    that used methods 0, 1 and 8:
    0:	No compression
    1:	LZC
    2:	unknown
    3:	RLE
    4:	Huffman (or no compression)
    5:	unknown
    6:	unknown
    7:	An improved form of LZSS
    8:	Compactor/Compact Pro compatible RLE/LZH or RLE only
    9:	unknown
    The DiskDoubler archive format contains many subtle twists that make it
    difficult to properly read the archive (or perhaps this is on purpose?).

Naming:
Some people have complained about the name conflict with the unpack utility
that is already available on Sys V boxes.  I had forgotten it, so there
really was a problem.  The best way to solve it was to trash pack/unpack/pcat
and replace it by compress/uncompress/zcat.  Sure, man uses it; but man uses
pcat, so you can retain pcat.  If that was not an option you were able to feel
free to rename the program.  But finally I relented.  It is now macunpack.

When you have problems unpacking an archive feel free to ask for information.
I am especially keen when the program detects an unknown method.  If you
encounter such an archive, please, mail a 'binhexed' copy of the archive
to me so that I can deal with it.  Password protected archives are (as
already stated) not implemented.  I do not have much inclination to do that.
Also I feel no inclination to do multi-segment archives.

-------------------------------------------------------------------------------
Hexbin will de-hexify files created in BinHex 4.0 compatible format (hqx)
but also the older format (dl, hex and hcx).  Moreover it will uudecode
files uuencoded by UUTool (the only program I know that does UU hexification
of all Mac file information).

There are currently many programs that are able to create files in BinHex 4.0
compatible format.  There are however some slight differences, and most
de-hexifiers are not able to deal with all the variations.  This program is
very simple minded.  First it will intuit (based on the input) whether the
file is in dl, hex, hcx or hqx format.  Next it will de-hexify the file.
When the format is hqx, it will check whether more files follow, and continue
processing.  So you can catenate multiple (hqx) hexified files together and
feed them as a single file to hexbin.  Also hexbin does not mind whether lines
are separated by CR's, LF's or combinations of the two.  Moreover, it will
strip all leading, trailing and intermediate garbage introduced by mailers
etc.  Next, it does not mind if a file is not terminated by a CR or an LF
(as StuffIt 1.5.1 and earlier did), but in that case a second file is not
allowed to follow it.  Last, while most hexifiers output lines of equal length,
some do not.  Hexbin will deal with that, but there are some caveats; see the
-c option in the man page.

Background:

dl format:
    This was the first hexified format used.  Programs to deal with it came
    from SUMacC.  This format only coded resource forks, 4 bits in a byte.
hex format:
    I think this is the first format from Yves Lempereur.  Like dl format,
    it codes 4 bits in a byte, but is able to code both resource and
    data fork.  Is it BinHex 2.0?
hcx format:
    A compressing variant of hex format.  Codes 6 bits in a byte.
    Is it BinHex 3.0?
hqx format:
    Like hcx, but using a different coding (possibly to allow for ASCII->EBCDIC
    and EBCDIC->ASCII translation, which not always results in an identical
    file).  Moreover this format also encodes the original Mac filename.
mu format:
    The conversion can be done by the UUTool program from Octavian Micro
    Development.  It encodes both forks and also some finder info.  You will
    in general not use this with uudecode on non Mac systems, with uudecode
    only the data fork will be uudecoded.  UU hexification is well known (and
    fairly old) in Unix environments.  Moreover it has been ported to lots of
    other systems.
-------------------------------------------------------------------------------
Macsave reads a MacBinary stream from standard input and writes the
files according to the options.
-------------------------------------------------------------------------------
Macstream reads files from the Unix host and will output a MacBinary stream
containing all those files together with information about the directory
structure.
-------------------------------------------------------------------------------
Binhex will read a MacBinary stream, or will read files/directories as
indicated on the command line, and will output all files in binhexed (.hqx)
format.  Information about the directory structure is lost.
-------------------------------------------------------------------------------
Tomac will transmit a MacBinary stream, or named files to the Mac using
the XMODEM protocol.
-------------------------------------------------------------------------------
Frommac will receive one or more files from the Mac using the XMODEM protocol.
-------------------------------------------------------------------------------
This is an ongoing project, more stuff will appear.

All comments are still welcome.  Thanks for the comments I already received.

dik t. winter, amsterdam, nederland
email: dik@cwi.nl

--
Note:
In these programs all algorithms are implemented based on publicly available
software to prevent any claim that would prevent redistribution due to
Copyright.  Although parts of the code would indeed fall under the Copyright
by the original author, use and redistribution of all such code is explicitly
allowed.  For some parts of it the GNU software license does apply.
--
Appendix.

BinHex 4.0 compatible file creators:

Type	Creator		Created by

"TEXT"	"BthX"		BinHqx
"TEXT"	"BNHQ"		BinHex
"TEXT"	"BnHq"		StuffIt and StuffIt Classic
"TEXT"	"ttxt"		Compactor

Files recognized by macunpack:

Type	Creator		Recognized as

"APPL"	"DSEA"		"DiskDoubler"		Self extracting
"APPL"	"EXTR"		"Compactor"		Self extracting
"APPL"	"Mooz"		"Zoom"			Self extracting
"APPL"	"Pack"		"Diamond"		Self extracting
"APPL"	"arc@"		"ArcMac"		Self extracting (not yet)
"APPL"	"aust"		"StuffIt"		Self extracting
"ArCv"	"TrAS"		"AutoSqueeze"				(not yet)
"COMP"	"STF "		"ShrinkToFit"
"DD01"	"DDAP"		"DiskDoubler"
"DDAR"	"DDAP"		"DiskDoubler"
"DDF."	"DDAP"		"DiskDoubler" (any fourth character)
"DDf."	"DDAP"		"DiskDoubler" (any fourth character)
"LARC"	"LARC"		"MacLHa (LHARC)"
"LHA "	"LARC"		"MacLHa (LHA)"
"PACT"	"CPCT"		"Compactor"
"PIT "	"PIT "		"PackIt"
"Pack"	"Pack"		"Diamond"
"SIT!"	"SIT!"		"StuffIt"
"SITD"	"SIT!"		"StuffIt Deluxe"
"Smal"	"Jdw "		"Compress It"
"TEXT"	"BnHq"		"BinHex 5.0"
"TEXT"	"GJBU"		"MacBinary 1.0"
"TEXT"	"UMcp"		"UMCP"
"ZIVM"	"LZIV"		"MacCompress(M)"
"ZIVU"	"LZIV"		"MacCompress(U)"			(not yet)
"mArc"	"arc*"		"ArcMac"				(not yet)
"zooM"	"zooM"		"Zoom"