mirror of
https://github.com/fadden/nulib2.git
synced 2024-09-27 00:54:57 +00:00
730 lines
43 KiB
HTML
730 lines
43 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<head>
|
|
<title>NuFX Addendum</title>
|
|
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
|
|
<meta content="t, default" name="Microsoft Border">
|
|
|
|
<link href="../main.css" rel="stylesheet" type="text/css" />
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000"><!--msnavigation--><table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td>
|
|
|
|
<p align="center"><font size="6"><strong>NuFX Addendum</strong></font><br>
|
|
<nobr>[ <a href="../index.htm" target="">Home</a> ]</nobr> <nobr>[ <a href="index.htm" target="">Up</a> ]</nobr> <nobr>[ NuFX Addendum ]</nobr> <nobr>[ <a href="nulib2-preserve.htm" target="">ProDOS Attribute Preservation</a> ]</nobr></p>
|
|
<hr>
|
|
|
|
</td></tr><!--msnavigation--></table>
|
|
|
|
<!--msnavigation--><msnavigation border="0" cellpadding="0" cellspacing="0" dir="ltr" width="100%"><tr><!--msnavigation--><msnavigation valign="top"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><msnavigation border="0" cellpadding="0" cellspacing="0" dir="ltr" width="100%"><tr><msnavigation valign="top"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><tr><msnavigation valign="top">
|
|
|
|
|
|
<h6>NuFX Addendum - <b>By Andy McFadden - Last revised 2022/11/06</b></h6>
|
|
<p align="left">This addendum clarifies and extends certain aspects of the
|
|
<a href="FTN.e08002.htm"> NuFX specification</a>. This is not an
|
|
"official" modification
|
|
of the original document - it has not been reviewed and approved by
|
|
the original author - but anyone developing NuFX utilities would do
|
|
well to follow these recommendations.</p>
|
|
|
|
<h2 align="left">Purpose</h2>
|
|
<p align="left">The NuFX specification defines a very loose structure, and
|
|
leaves much to the imagination of the implementer. For example, "If a
|
|
utility finds a redundancy in a Thread Record, it must decide whether to skip
|
|
the record or to do something with that particular thread...".
|
|
A strict specification would define a standard approach that all applications
|
|
must follow when dealing with the anomalous condition, to ensure consistent
|
|
handling of all archives.</p>
|
|
<p align="left">This document refines the NuFX specification and brings some of
|
|
the "fuzzy" areas into sharper focus. Nothing in this document
|
|
contravenes the original document.</p>
|
|
<p align="left">In the text below, "<b>must</b>" is an imperative that
|
|
has to be obeyed, and "<b>should</b>" is a recommendation that authors
|
|
are strongly encouraged to follow.</p>
|
|
<hr>
|
|
<h2 align="left"> Clarifications</h2>
|
|
<h3 align="left"> Pronunciation</h3>
|
|
<p align="left">What's the correct way to pronounce "NuFX"? One
|
|
approach is letter-by-letter ("en you eff ecks"), another is minimal-syllable
|
|
("new fix"). According to the file type note, it's a bit of both ("new eff ecks").</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Use of ".SDK" suffix</h3>
|
|
<p align="left">Originally, only ".SHK" was used to represent a NuFX
|
|
archive. Over time, a convention of using ".SDK" to represent
|
|
archives with a single disk image in them has arisen. This is very
|
|
convenient for emulators on systems that rely on the file extension (e.g.
|
|
Windows), so use of ".SDK" is encouraged.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Archives with no records</h3>
|
|
<p align="left">An archive without records, i.e. nothing but a master header
|
|
block, serves no purpose. However, it can be useful to have a "create new
|
|
archive" operation that creates an empty file to be populated later.</p>
|
|
<p align="left"><b>Creating:</b> Archives without any records in them may be
|
|
created.</p>
|
|
<p align="left"><b>Opening:</b> If asked to open a record-less archive,
|
|
the application should recognize that the archive is empty and proceed as if it
|
|
were a new archive.</p>
|
|
<p align="left"><b>Modifying:</b> If all records in an archive are deleted, the
|
|
archive file should be deleted as well.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left"> Records with no threads</h3>
|
|
<p align="left">A record without threads is pretty pointless. The initial
|
|
release of the NuFX spec mandated that there be at least one thread attached
|
|
to each record, but this language was removed from later versions.</p>
|
|
<p align="left"><b>Creating:</b> Records without threads must never be
|
|
created. All records must have at least one thread.</p>
|
|
<p align="left"><b>Extracting: </b>Empty records should be ignored.</p>
|
|
<p align="left"> </p>
|
|
<h3 align="left">Records with only a filename thread</h3>
|
|
<p align="left">GS/ShrinkIt v1.1 has a bug that prevents it from creating an empty
|
|
data thread when asked to add a zero-byte file. This results in a thread
|
|
with a filename and nothing else. (If it was the first new record added,
|
|
it will have an empty comment thread as well.)<p align="left">There is no valid
|
|
reason for deliberately creating such a file.
|
|
<p align="left"><b>Creating:</b> Records composed solely of a filename thread
|
|
must not be created.
|
|
<p align="left"><b>Extracting:</b> Records with nothing but a filename thread
|
|
should be ignored. <i>For GSHK v1.1 bug compatibility</i>: if a record has a filename
|
|
thread, and no other threads except "message" threads (i.e. no data
|
|
threads or control threads), then a zero-byte data fork file should be
|
|
created. Otherwise, the record should be ignored. If the ProDOS
|
|
storage type field indicates an extended file, a zero-byte resource fork should
|
|
also be created.
|
|
<p align="left">
|
|
<h3 align="left">Records with no filename</h3>
|
|
<p align="left">A record without a filename thread is a curious beast.
|
|
Ideally, there wouldn't be any such thing as a filename thread, since it doesn't
|
|
really make sense to have a record without one. Expanding the record
|
|
header to hold a pre-sized buffer would've made many things simpler.<p align="left">This
|
|
particular situation occurred with older versions of ShrinkIt (e.g. v1.1) that failed to store
|
|
a volume name when compressing a DOS 3.3 disk. There was no filename in
|
|
the record header, nor one in a filename thread.<p align="left">The only
|
|
situation where a record without a filename makes sense is if the record holds
|
|
nothing but comments or other archive "meta data", such as a
|
|
"create directory" control thread.<p align="left"><b>Creating:
|
|
</b>Records without filenames must not be created, unless the record is intended
|
|
to contain
|
|
nothing but archive meta-data. Deletion of the filename thread should only
|
|
be done if a new filename thread is being added. If data threads are added
|
|
to a record without a filename, then a filename thread must be added as well.<p align="left"><b>Extracting</b>:
|
|
If the record contains file data, the application may either prompt the user for
|
|
a filename to use, or supply a generated one.<p align="left">
|
|
<h3 align="left">Records with more than one filename thread</h3>
|
|
<p align="left">This is an unusual situation that should only arise if an
|
|
application is buggy. Every record created by a modern application should
|
|
have no more than one filename thread.<p align="left"><b>Creating:</b> Records
|
|
with multiple filename threads must not be created.<p align="left"><b>Extracting:</b>
|
|
Applications must use the first filename thread. If a buggy application wants to
|
|
append an additional filename thread, their buggy filename will be ignored.<p align="left"> <h3 align="left">Records
|
|
with filenames in two places</h3>
|
|
<p align="left">The old way of storing filenames, used by NuLib and old versions of ShrinkIt, was to
|
|
put the filename in the record header. To facilitate renaming, the
|
|
filename was moved into a thread. Thus, there are two possible locations
|
|
where the filename may live, and no guarantee that only one will be used.
|
|
<p align="left"><b>Creating:</b>
|
|
Never put the filename in the record header when creating a new record.
|
|
It's okay to leave existing records alone, but if an application has the
|
|
opportunity to rewrite the record header, the record filename must be removed.
|
|
<p align="left"><b>Extracting:</b>
|
|
The thread filename takes precedence over the record header filename.
|
|
|
|
<p align="left">
|
|
<h3 align="left">Filename character set</h3>
|
|
<p align="left">Filenames in NuFX archives use the Mac OS Roman character set,
|
|
which is ASCII plus some symbols and the usual set of latin language characters
|
|
(see <a href="https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT">
|
|
Unicode definition</a>). The NuFX filename definition was intended to
|
|
accommodate files from HFS volumes, which may contain any character except ':'.
|
|
Control characters, including NUL ('\0'), were allowed but discouraged.
|
|
<p align="left">
|
|
On modern systems, converting between Mac OS Roman and Unicode is useful and
|
|
(mostly) straightforward. Dealing with embedded null bytes is very
|
|
annoying in C-like languages though.
|
|
<p align="left"><strong>Creating:</strong>
|
|
Convert Unicode to Mac OS Roman, replacing any untranslatable characters with
|
|
'?'. Embedded nulls may be replaced with '?'.</p>
|
|
<p align="left"><strong>Extracting:</strong> Convert Mac OS Roman to Unicode. If embedded nulls
|
|
are encountered, they may be replaced with something appropriate for the
|
|
current system. Applications should not ignore the problem and
|
|
truncate the filename; if they do, they must be prepared to handle duplicate or empty
|
|
filenames.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Filesystem separator characters</h3>
|
|
<p align="left">Every record header has a "file system separator"
|
|
character ("fssep") in the "file_sys_info" word. This
|
|
is usually something like ':' for GS/OS or '/' for UNIX. It's necessary to
|
|
know what the separator is in order to break a pathname down into its individual
|
|
components.</p>
|
|
<p align="left">Not all filesystems support subdirectories, however,
|
|
which means that not all filenames need to have a separator. The
|
|
appropriate separator character for such a filesystem is not defined in the NuFX
|
|
spec. Clearly it should be something illegal on the source filesystem, or
|
|
we could inadvertently see pathnames where they don't exist (e.g. a file called
|
|
"foo:bar" on DOS 3.3 if the fssep char were set to ':').</p>
|
|
<p align="left">The trouble is, DOS 3.3 doesn't actually have any illegal characters, just a field
|
|
of 30 characters padded with spaces. Pascal disks are similar. Since
|
|
we must define an fssep for every filename, our best choice is to use '\0'
|
|
(0x00), because it's unlikely to occur, and any program that stores names in C
|
|
strings will find it awkward to store and scan for '\0'.</p>
|
|
<p align="left">This situation also applies to archived disk images, which must
|
|
be simple filenames.</p>
|
|
<p>The application should have some understanding of which filesystems
|
|
have subdirectories and which don't, which would allow it to disregard the
|
|
fssep char when it can't be relevant for a record, but it's easier to
|
|
let the fssep char's usefulness be self-evident.</p>
|
|
<p align="left">(NOTE: NufxLib v2.0.3 rejected 0x00 as an fssep character. This was a bug.)</p>
|
|
|
|
<p align="left"><b>Creating:</b> When adding files directly from filesystems without subdirectories, use 0x00 as
|
|
the fssep char.</p>
|
|
<p align="left"><b>Extracting:</b> An fssep char of 0x00 means
|
|
the pathname is just the filename.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Disk image pathnames</h3>
|
|
<p align="left">While files may have multiple path components (e.g.
|
|
"subdir:subdir2:filename"), it makes no sense for disk images to have
|
|
them. The stored filename for a disk is either the disk's ProDOS volume
|
|
name, or for non-ProDOS disks, a simple label defined by the user. Since
|
|
the eventual target is a disk device, specifying a subdirectory path makes no
|
|
sense.</p>
|
|
<p align="left">The issue becomes a little more confusing when storage of
|
|
disk images used for emulators is considered. At first glance, it seems
|
|
useful to be able to store a hierarchy of disk images. In practice, such
|
|
images would either be archived as a hierarchy of .PO files, or as an archive of
|
|
.SDK archives.</p>
|
|
<p>Ultimately, the disk volume name is embedded in the disk image itself. The
|
|
name stored in the archive is purely decorative.</p>
|
|
<p align="left"><b>Adding/renaming</b> Applications must
|
|
strip any leading path components from disk image "storage names"
|
|
(The NuFX specification does explicitly forbid the use of a filesystem separator
|
|
character in a disk volume name.)</p>
|
|
<p align="left"><b>Extracting:</b>
|
|
Applications extracting directly to a disk must strip leading path components
|
|
before assigning the ProDOS volume name. Applications extracting images to
|
|
a file don't need to do anything unusual.
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Filename case sensitivity</h3>
|
|
<p align="left">There isn't a "filename is case-sensitive" flag in
|
|
NuFX archives. Since it was designed primarily for ProDOS and HFS
|
|
filesystems, neither of which is case-sensitive, we should assume that case is
|
|
not meant to be significant when determining whether two records have the same
|
|
filename. This becomes important when adding files (to test for
|
|
duplicates), extracting files by name, and when attempting
|
|
to display archive contents as a hierarchical tree.</p>
|
|
<p>HFS files will use the Mac OS Roman character set, so a simple ASCII
|
|
case conversion will be inadequate. An HFS filename comparison routine must
|
|
be used.</p>
|
|
<p align="left">Applications
|
|
should try to recognize that "foo/bar", "foo/BAR", and
|
|
"FOO/bar" are the same file, but it's probably not worth
|
|
"probing" a case-sensitive filesystem like Linux ext2 to guarantee
|
|
such.
|
|
|
|
<p align="left"> <h3 align="left">Duplicate filenames</h3>
|
|
<p align="left">There is nothing in the NuFX specification that prevents having
|
|
more than one file with the same name in an archive. In practice, this is
|
|
inconvenient, especially for users with command-line tools. On the other
|
|
hand, if the underlying filesystem is case-sensitive, the extracted files may
|
|
not actually collide, so it may not make sense for all applications to treat
|
|
this as an iron-clad rule.<p align="left">When comparing names, be sure to take
|
|
the filesystem separator character into account. "foo:bar" could
|
|
be a simple filename or a partial pathname depending on whether ':' is the
|
|
separator. Two names should be considered identical if each distinct path
|
|
component matches, so "foo/bar" and "foo:bar" are identical
|
|
if the separators are '/' and ':', respectively. Comparisons should be
|
|
case-insensitive.<p align="left"><b>Adding/renaming:</b> Applications
|
|
should prevent multiple records from having the same case-insensitive filename.
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Pre-sized or not pre-sized</h3>
|
|
<p align="left">The specification declares that filename threads and comments
|
|
use pre-sized buffers. It does not define what other members of the
|
|
message and filename classes are, which makes it difficult to know what to do
|
|
with a request to create a heretofore undefined thread type. The NuFX
|
|
format does not provide any definitive clue as to whether a thread is pre-sized,
|
|
so such decisions must be based on the thread class and thread kind.</p>
|
|
<p align="left">Filename threads and comment threads are pre-sized. All
|
|
other threads are not pre-sized (including other members of the
|
|
"filename" and "message" classes).</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Proper pre-size for filename threads</h3>
|
|
<p align="left">ShrinkIt allocates a 32-byte pre-sized buffer for the
|
|
filename. If the filename is larger than 32 bytes, the buffer grows to fit
|
|
the filename exactly. If renaming files is considered useful, then the
|
|
buffer should always be slightly larger than is needed to hold the
|
|
filename. (Filenames longer than 32 characters are most likely the result
|
|
of nested directories, so renaming the file itself is inhibited if the buffer
|
|
length is an exact match.)</p>
|
|
<!-- <p align="left">Side note: GSHK appears to have a bug where it can't deal with
|
|
32-byte HFS filenames (e.g. "foo:abcdefghijabcdefghijabcdefghijxy"
|
|
can't be added to an archive). Emulating this behavior is discouraged.</p> -->
|
|
<p>Side note: the specification does not specify a minimum or maximum length
|
|
for a filename. The specification notes that "GS/OS can create
|
|
8,000-character filenames", so that seems safe to use as an upper bound.
|
|
Zero-length filenames cannot be stored in a record header, because a length
|
|
of zero indicates that the filename is in a thread, so it's reasonable to
|
|
require that filenames be at least one character long. A zero-length filename
|
|
should be treated the same as a missing filename.</p>
|
|
<p align="left"><b>Creating:</b> If GS/ShrinkIt compatibility is not important,
|
|
all filenames should have at least 8 bytes of free space in the filename thread.
|
|
For GSHK compatibility, the filename thread compThreadEOF must be the greater of
|
|
32 and the filename length.</p>
|
|
<p align="left"><b>Renaming:</b> It is acceptable to have fewer than 8 bytes of
|
|
free space remaining after a file is renamed. However, if the filename
|
|
itself exceeds the buffer size and the thread must be rebuilt, the 8-byte
|
|
padding should be added.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Thread ordering</h3>
|
|
<p align="left">The NuFX specification specifies a general ordering for
|
|
threads ("blocks must occur in the following fashion"), but doesn't indicate
|
|
what should be done if they appear out of order. Handling out-of-order
|
|
threads isn't impossible, but it can be inconvenient.</p>
|
|
<p align="left">For example, if an archive is being unpacked as it is received,
|
|
it is important to know the filename before receiving the data. If the
|
|
filename thread comes after the data threads, the application has to write the
|
|
incoming data into a temp file, and then rename it later when the filename
|
|
thread finally shows up. It would also be nice to be able to display file
|
|
comments as the file is being downloaded.</p>
|
|
<p align="left"><b>Creating:</b> The filename thread must precede all other
|
|
threads. The recommended ordering for common thread types is:</p>
|
|
<ul>
|
|
<li>Filename</li>
|
|
<li>Message(s) (i.e. comments)</li>
|
|
<li>Data threads (data fork, resource fork, disk image)</li>
|
|
<li>all other threads</li>
|
|
</ul>
|
|
<p align="left"><b>Extracting:</b> If the filename thread does not appear before
|
|
the first data-class thread, the record may be ignored.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Incompatible thread types</h3>
|
|
<p align="left">There are some combinations of threads that must never appear in
|
|
a single record.</p>
|
|
<p align="left"><b>Creating:</b></p>
|
|
<ul>
|
|
<li> If a <b>data fork</b> is present, the record must not
|
|
contain another data fork or a disk image.</li>
|
|
<li> If a <b>resource fork</b> is present, the record must not
|
|
contain another resource fork or a disk image.</li>
|
|
<li> If a <b>disk image</b> is present, the record must not
|
|
contain another disk image, a data fork, or a resource fork.</li>
|
|
<li> If a <b>control-class thread</b> is present, the record must
|
|
not contain any data-class threads.</li>
|
|
</ul>
|
|
<p align="left"><b>Extracting:</b> When incompatible threads are found, they
|
|
should be ignored in favor of the earlier threads. For example, if two
|
|
data forks are found in the same record, only the first one should be extracted.
|
|
If a data-class thread is found first, subsequent control-class threads should
|
|
be ignored, and vice-versa.</p>
|
|
<p align="left"> </p>
|
|
<h3 align="left">Compressed threads</h3>
|
|
<p align="left">Some threads are compressed, some aren't. The
|
|
specification isn't very specific.</p>
|
|
<p align="left">All data-class threads may be compressed. All other
|
|
classes of threads must not be compressed.</p>
|
|
<p align="left"> </p>
|
|
<h3 align="left">ProDOS storage type</h3>
|
|
<p align="left">The ProDOS storage type has little meaning on most
|
|
systems. However, certain values are significant.</p>
|
|
<ul>
|
|
<li>
|
|
<p align="left">For records with <b>only a data fork</b>, the storage type
|
|
must be one of 0, 1, 2, or 3. The specific choice is not useful to
|
|
anyone, but a nonzero value (say, 1) should be used.</li>
|
|
<li>
|
|
<p align="left">For records with <b>a resource fork</b>, the storage type
|
|
must be "5" (ProDOS extended file).</li>
|
|
<li>
|
|
<p align="left">For records with a <b>disk image</b> thread, the storage
|
|
type must be equal to the disk block size (typically 512).</li>
|
|
<li>
|
|
<p align="left">For records <b>without data-class threads</b>, the storage
|
|
type must be "0".</li>
|
|
</ul>
|
|
<p align="left">Storage type 0x0d, which is used by ProDOS for directories, must
|
|
not be used.</p>
|
|
<p align="left">It is important to update the storage type as threads are added
|
|
and deleted, so that it always accurately reflects the contents of the record.</p>
|
|
<p>The spec seems to claim that HFS volumes have 524 bytes per block (though
|
|
the assertion was weakened from "would" to "might" in the final version).
|
|
This refers to the 12 "tag" bytes available on 3.5" floppies, which are
|
|
accessible from Mac OS but not actually required by HFS.</p>
|
|
|
|
<p> </p>
|
|
<h3>GS/OS option lists and HFS file types</h3>
|
|
<p>GS/OS was designed to work with a variety of different filesystems.
|
|
Instead of trying to handle all conceivable file attributes explicitly,
|
|
GS/OS returns filesystem-specific values in "option lists". These can
|
|
be provided to the get/set file info calls when copying files around.</p>
|
|
<p>Files on HFS volumes have two four-byte values, called file type and
|
|
creator, that identify the file contents. These are part of the Macintosh
|
|
Finder info structures, called FInfo and FXInfo. Files copied from HFS
|
|
to ProDOS may have this data stored in the extended key block of a forked
|
|
file (see ProDOS technical note #25). This appears as two 18-byte chunks,
|
|
consisting of a size byte followed by a type byte, and then 16 bytes of
|
|
FInfo or FXInfo data (which are defined in <i>Inside Macintosh: Macintosh
|
|
Toolbox Essentials</i>, page 7-47). To expose the data to applications,
|
|
certain GS/OS calls pass an "option list" with the contents. Most of
|
|
the fields are uninteresting to anything but the Mac Finder on the system
|
|
where the files were stored, so for our purposes the option list may be
|
|
viewed simply as a way to preserve the file type and creator.</p>
|
|
|
|
<p>Experiments with the GS/OS Exerciser reveal that the option list returned
|
|
doesn't include the size/type bytes. For an HFS file copied to ProDOS
|
|
with GS/OS, the GetFileInfo call returns a 32-byte buffer that begins
|
|
with FInfo. When called on an HFS volume, the option list is 36 bytes,
|
|
with the last four bytes set to 02 00 00 00. GSHK appears to record these
|
|
exactly as it receives them, which means the first four bytes hold the
|
|
HFS file type, and the second four bytes hold the HFS creator, in
|
|
big-endian byte order. Because most of the fields only have meaning to the
|
|
Macintosh finder, the rest of the data is zeroes. Files archived from an
|
|
HFS volume created by a Macintosh would presumably have nonzero data in
|
|
more places.</p>
|
|
<p>When archiving files from an HFS volume under GS/OS, GSHK records the
|
|
ProDOS type/auxtype rather than the full HFS file type and creator,
|
|
because that's what the GS/OS file info query returns. The only way to
|
|
recover the original Mac Finder types is through the option list.</p>
|
|
<p>Sometimes the option list found in a NuFX archive is a little messed up,
|
|
e.g. the size field says 36 bytes, but there's only space for 18 bytes in
|
|
the record header.</p>
|
|
<p>Side note: the NuFX specification reversed the values of MFS and HFS
|
|
in the file_sys_id enumeration. In practice, GS/ShrinkIt
|
|
correctly uses the GS/OS FST definitions: MFS=5, HFS=6.</p>
|
|
<p><b>Opening:</b> Assume the option_size field is correct
|
|
unless it exceeds attrib_count-2. If it's too large, clip it down to size.
|
|
If the filesystem type is ProDOS or HFS, the option list is at least 16 bytes
|
|
long, and the second 4 bytes are nonzero,
|
|
use the first 4 bytes of the option list data as the file type and
|
|
the second 4 bytes as the creator. If a secondary test is desired to
|
|
avoid garbage, the creator value is usually ASCII.</p>
|
|
<p><b>Creating:</b> If a record has HFS type values, generate a
|
|
filesystem-specific option list (32 bytes for ProDOS or 36 bytes for HFS)
|
|
and store them there.</p>
|
|
<p><b>Updating:</b> Always output the actual record size. Do not propagate
|
|
incorrect size values. Retaining option lists for ProDOS and HFS entries
|
|
is required, since they may have the only copy of the original file type
|
|
and creator, but only if at least one of the first 8 bytes of the option
|
|
list are nonzero. Updates to the archive attributes that alter the file/aux
|
|
type should usually retain the option list, since the purpose may be to
|
|
improve ProDOS usability without losing the original type information.</p>
|
|
|
|
<p> </p>
|
|
<h3>ProDOS vs. HFS file types</h3>
|
|
<p>The initial release of the specification stated that the HFS file type and
|
|
creator should be stored in the record header. The final version of the
|
|
specification abdicates responsibility for defining the field, stating simply,
|
|
"For ProDOS 8 or GS/OS, this field should always be what the operating system
|
|
returns when asked".</p>
|
|
<p>For reference, when an application asks GS/OS to get the information for
|
|
a file on an HFS volume, it returns a ProDOS file type and aux type (usually
|
|
BIN), and puts the HFS type and creator into an option list. If this
|
|
behavior defines the field, then this is how the types should be stored.</p>
|
|
<p>However, the vague wording of the specification raises the possibility that
|
|
a Mac OS-based archiver should store the file type and creator directly in
|
|
the record header, because that's what "the operating system" returned. The
|
|
record header does not provide a way to define the source of the type values,
|
|
so an extraction program attempting to set the file info would need to draw
|
|
conclusions based on whether the types are small enough values to be valid
|
|
for ProDOS.</p>
|
|
<p>It's worth noting that files on an AppleShare volume have independent
|
|
ProDOS and HFS file types. When a ProDOS file is written to the AppleShare
|
|
FST, Mac OS type and creator values are generated according to a scheme
|
|
documented in the AppleShare FST public ERS document. It's possible that a
|
|
Mac archiver could store ProDOS file types as HFS file types that are
|
|
actually ProDOS file types that must be decoded based on a collection of
|
|
rules.</p>
|
|
<p>To avoid ambiguity, we want to follow the GS/OS behavior, regardless of
|
|
what the host operating system does.</p>
|
|
<p><b>Creating:</b> store the ProDOS file type and aux type in the record
|
|
header. For files on HFS volumes, put a simple ProDOS type (BIN or TXT)
|
|
in the record header, and put the file type and creator in an option list.</p>
|
|
<p><b>Extracting:</b> if the file type and aux type do not fit in 8 and 16
|
|
bits, respectively, treat them as values from HFS.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Disk image size values</h3>
|
|
<p>For a compressed disk image, the "storage_type" and
|
|
"extra_type" fields take on different meanings: the extra_type
|
|
field holds the block size (usually 512), and the extra_type field holds
|
|
the block count (e.g. 280 for a 140KB disk).</p>
|
|
<p>These fields are more important than you might expect, because
|
|
ShrinkIt doesn't appear to set the thread EOF value for disk images. (A quick
|
|
test with ShrinkIt v3.4 on a 5.25" DOS disk yielded a thread EOF of zero,
|
|
while GS/ShrinkIt v1.1 on a 3.5" ProDOS disk generated a mysterious
|
|
thread EOF of $4a00.)
|
|
Worse, some older versions of ShrinkIt tended to leave the
|
|
"storage_type" set to 2.
|
|
Apparently, ShrinkIt just uses extra_type * 512 as the uncompressed size when
|
|
trying to figure out what sort of disk it has. An early version of
|
|
GS/ShrinkIt went one step further: it used a block count of 280 with a block
|
|
size of 256, resulting in archives that apparently held 70K disk images.</p>
|
|
<p>It is simple enough to disregard the thread EOF value, and
|
|
replace the storage_type when it is absurdly small, but there is a deeper
|
|
problem. If you delete a 140KB disk image thread and replace it with an
|
|
800KB disk image thread, the block count stored in the extra_type no longer
|
|
accurately reflects the contents of the record. (This linkage between the
|
|
record header and the thread contents is the reason why this document forbids
|
|
mixing of disk image threads with any other data-class thread, including other
|
|
disk images.)</p>
|
|
<p>Because the length of the disk image thread can only be determined from
|
|
the extra_type field, it is important for applications that support changing
|
|
the file and aux types to prevent such changes in records with disk images.</p>
|
|
<p><b>Creating:</b> Applications must update the record's storage_type and
|
|
extra_type fields whenever a disk image thread is added. The value
|
|
(storage_type * extra_type) must be equal to the uncompressed size. The
|
|
application should reject disk image files that are not a multiple of
|
|
512 bytes. For consistency with other applications, the thread EOF field
|
|
should be zeroed.</p>
|
|
<p><b>Extracting:</b> The application must ignore the thread EOF, and
|
|
normalize storage_type to 512 if it is less than 16 (0x0f is the largest
|
|
valid ProDOS storage type). The value (512 * extra_type) should be
|
|
used as the uncompressed size. If the uncompressed size is zero, the
|
|
thread may be ignored.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Access permissions</h3>
|
|
<p align="left">NuFX supports four boolean access permission flags (read, write,
|
|
destroy, rename) and two boolean attributes (backup needed, invisible) in the
|
|
"access" field. This matches up with ProDOS capabilities nicely,
|
|
but very few other operating systems support all six.</p>
|
|
<p align="left">Applications authors should consider the following approaches:</p>
|
|
<ol>
|
|
<li>
|
|
<p align="left"><b>Preserve all.</b> All flags in the access field
|
|
must be preserved. It is not required that the extracted files obey
|
|
the original semantics -- an "invisible" file might be visible,
|
|
and a file with "rename" disabled might still be rename-able --
|
|
but when the files are re-added, the permissions must match.</li>
|
|
<li>
|
|
<p align="left"><b>Locked/unlocked.</b> A file with read enabled, and
|
|
write, destroy, rename, and invisible disabled, is considered
|
|
"locked" (access 0x01 or 0x21). All other files are
|
|
considered "unlocked". When a file is extracted and then
|
|
added to an archive, the locked/unlocked status must be preserved.
|
|
Locked files are added with access 0x21, and unlocked files are added with
|
|
access 0xe3.</li>
|
|
</ol>
|
|
<p align="left">It is acceptable for an application to find a middle ground
|
|
between these two, and preserve more of the flags accurately than approach #2
|
|
does, but approach #2 should be considered the minimum acceptable level of
|
|
support.</p>
|
|
<p align="left"> </p>
|
|
<h3 align="left">Empty directories</h3>
|
|
<p align="left">Directories do not need to be stored explicitly unless they are
|
|
empty. The NuFX specification manages to avoid describing how directories
|
|
are actually supposed to be stored, saying only: "A Thread Record must exist to inform a utility that a directory is to
|
|
be created through the use of the proper control_thread value."</p>
|
|
<p align="left">What is in a "create directory" control thread?
|
|
It appears that the intent was to have the thread contain the pathname that
|
|
needed to be created. In theory, you could have several of these things,
|
|
and create an entire hierarchy from a single record. Such threads should
|
|
not be compressed, but their compThreadEOF should always match their threadEOF
|
|
(i.e. they're not pre-sized).</p>
|
|
<p align="left">It's a little tricky to say, "add a control thread whenever
|
|
you find a directory with nothing in it". What if the directory has
|
|
files in it, but you don't have the access permissions necessary to read the
|
|
files?</p>
|
|
<p align="left">Does such a record require a filename? Probably not.
|
|
However, if it doesn't have a filename, ShrinkIt might not display the record,
|
|
and you'd have no way to manipulate it. Adding a "record label"
|
|
is easy and useful.</p>
|
|
<p align="left">(I'm strongly tempted to punt on the control threads and just
|
|
use storage type 0x0d to indicate that a directory should be created. This
|
|
is in direct opposition to the NuFX specification, however, so I'm reluctant to
|
|
do so.)</p>
|
|
<p align="left"><b>Creating:</b> Applications not interested in preserving empty
|
|
directories need do nothing. Otherwise, the application must add a
|
|
"create directory" control thread whenever a directory is encountered
|
|
for which no files are added to the archive.</p>
|
|
<p align="left"><b>Extracting:</b> A directory must be created when a control
|
|
thread is present. As noted in the NuFX specification, the application
|
|
must also create any directories listed in the record's pathname that don't yet
|
|
exist.</p>
|
|
|
|
<p align="left"> </p>
|
|
<h3 align="left">Message thread format</h3>
|
|
<p align="left">The specification says that message threads are ASCII text, but
|
|
doesn't specify an EOL character. For the benefit of Apple II utilities,
|
|
it's best to use a carriage return (Ctrl+M). The comments are expected to
|
|
be readable on 8-bit Apple IIs, so plain ASCII rather than Mac OS Roman should
|
|
be used.</p>
|
|
<p align="left"><b>Creating:</b> Convert any EOL markers to CR, and any
|
|
non-ASCII characters (i.e. bytes with the high bit set) to ASCII.</p>
|
|
<p align="left"><b>Extracting:</b> Assume that the comment may be using CR, LF,
|
|
or CRLF, and convert as needed for display. GS/ShrinkIt used a
|
|
proportional font, so there is no need to worry about formatting to preserve
|
|
"ASCII art" in comments.</p>
|
|
|
|
<p> </p>
|
|
<h3>Message thread maximum length</h3>
|
|
<p>Comments are rarely used, and when they are they tend to be fairly short.
|
|
The contents are never compressed, aren't covered by a CRC, and aren't
|
|
extracted to files, making them a bad way to convey vital information.
|
|
Adding and editing the comment field was introduced with GS/ShrinkIt, which
|
|
creates a pre-sized comment on the first entry in each batch. The editor
|
|
does not expand or reduce the length of the field, which is limited to
|
|
1,000 bytes. It does support longer comments created by other programs.</p>
|
|
<p>It's convenient to assign a maximum possible length to comments, so that
|
|
they can be manipulated by code that doesn't need to handle their maximum
|
|
possible length of 4GB. A cap of 64KB (same as ZIP) seems reasonable as an
|
|
absolute maximum, considering likely content and what Apple II software can
|
|
support.</p>
|
|
<p><b>Creating:</b> Limit comments to 64KB. Applications may establish a
|
|
lower limit, but should allow them to be at least 1000 bytes.</p>
|
|
<p><b>Updating:</b> Truncation of comments longer than 64KB is
|
|
discouraged but allowed.</p>
|
|
|
|
<p> </p>
|
|
<h3>Master EOF</h3>
|
|
<p>For the most part, ShrinkIt correctly sets the MasterEOF field
|
|
in the Master Header block. The field was introduced with version 1 of the
|
|
header definition. A very old version of ShrinkIt left it set to
|
|
zero (this is the same version that completely omitted the filename for DOS 3.3
|
|
disk images). GS/ShrinkIt appears to initialize it to 48 (the size of the
|
|
MH block), and if the creation process is interrupted you can end up with a
|
|
partial archive with a nonzero EOF.</p>
|
|
<p>The master EOF is useful as a quick file truncation test, but provides
|
|
no other value. The record count in the header is more important.</p>
|
|
<p><b>Opening:</b> Don't assume the master EOF is accurate. Walk through the
|
|
list of records to determine the actual end-of-file before appending new
|
|
records.</p>
|
|
<p><b>Updating:</b> Applications must write the correct MasterEOF
|
|
value if an archive is modified.</p>
|
|
|
|
<p> </p>
|
|
<hr>
|
|
|
|
<h2 align="left">Extensions</h2>
|
|
<p align="left">Unofficial extensions to the NuFX specification. Anyone
|
|
working with NuFX archives should take heed.</p>
|
|
<h3 align="left">New compression formats</h3>
|
|
<p align="left">Thread formats 0x0000 through 0x0005 are already defined. The
|
|
following thread format values have been added:</p>
|
|
<ul>
|
|
<li>
|
|
<p align="left">0x0006 - deflate. The thread contains data conforming
|
|
to RFC 1951 (deflate 1.3 specification), which is the compression format
|
|
used by ZIP and gzip. The canonical implementation is "zlib".
|
|
Visit <a href="https://zlib.net/">zlib.net</a> for more details.</li>
|
|
<li>
|
|
<p align="left">0x0007 - bzip2. The thread contains BWT+Huffman
|
|
compressed data as output by "libbz2". Visit
|
|
<a href="https://sourceware.org/bzip2/">sourceware.org/bzip2</a>
|
|
for more information.</li>
|
|
</ul>
|
|
<p align="left">Support for these formats is nonexistent on the Apple II, so
|
|
they should not be used except in situations where compatibility is unimportant
|
|
(e.g. collections of disk archives for use with A2 emulators).</p>
|
|
|
|
<p align="left">I found that "deflate" generally does as well or
|
|
better than "bzip2" on Apple II binaries, disk images, and small text
|
|
files. Deflate is also faster and uses less memory, and you're more likely
|
|
to find libz installed on a given system than you are libbz2 For these
|
|
reasons, use of deflate should be encouraged in favor of bzip2.</p>
|
|
|
|
<hr>
|
|
<h2 align="left">NuFX Quirks</h2>
|
|
<p align="left">This section identifies some quirks in NuFX or ShrinkIt that,
|
|
while not bugs, are worth noting.</p>
|
|
<h3 align="left">Filename separator character</h3>
|
|
<p align="left">Originally, the filename was stored in the record header, so it
|
|
made sense that the filename separator character ("fssep char") should
|
|
also be there. When the filenames were moved into threads, the fssep char
|
|
got left behind. If a record has two filenames, they'd better have the
|
|
same fssep char, or interpreting one of them will be impossible. (This is
|
|
one of the reasons why it's important to clearly define which filename takes
|
|
precedence in all circumstances.)</p>
|
|
|
|
<h3 align="left">Files with zero or two CRCs</h3>
|
|
<p align="left">The "threadCRC" field in the thread header block can
|
|
have one of three meanings: nothing (v0, v1), the CRC of the compressed data
|
|
(v2), or the CRC of the uncompressed data (v3). Version 2 records weren't
|
|
generated by anything significant, and can be ignored. (If you actually find
|
|
an archive with v2 records, it's reasonable to just treat them as v1.)</p>
|
|
<p align="left">Version 1 records generally have threads compressed with LZW/1
|
|
data. The LZW/1 compression format includes the 16-bit CRC of the uncompressed
|
|
data at the start of the thread. Version 3 records generally have threads compressed
|
|
with LZW/2 data, which does not include a CRC.</p>
|
|
<p align="left">Applications like P8 ShrinkIt and NuLib create v1 records and
|
|
compress with LZW/1, while GS/ShrinkIt and NuLib2 create v3 records and compress
|
|
with LZW/2. This means that each compressed thread has exactly one CRC.
|
|
(Uncompressed data stored by P8 ShrinkIt has no CRC at all.)
|
|
So what happens if you tell NuLib2 to create a new record with
|
|
LZW/1, or tell it to add a new LZW/2 thread to an existing v1 record?</p>
|
|
<p align="left">In one case, you end up with two CRCs; in the other, you end up
|
|
with no CRC on your data at all. Unfortunately, the v3 thread
|
|
CRC is computed with a different initial value, so it is necessary to compute
|
|
the CRC twice for LZW/1 data, not merely store the same value twice.</p>
|
|
<p>When replacing a data thread in an existing record, it's tempting to
|
|
update the record to the latest (v3), but this may come at a cost. For
|
|
example, if the record has both resource and data forks, and only the data fork
|
|
is being replaced, it would be necessary to uncompress the resource fork to
|
|
calculate its uncompressed CRC. Programs that rewrite records should be
|
|
prepared to output v1 or v3.</p>
|
|
|
|
<h3 align="left">Extra data in compressed threads</h3>
|
|
<p align="left">ShrinkIt adds an extra byte at the end of all LZW compressed
|
|
data, probably due to an off-by-one bug in the compression code. It turns
|
|
out that it's possible to get even more "extra" bytes at the end.</p>
|
|
<p align="left">ShrinkIt's LZW-I algorithm always operates on a 4K buffer,
|
|
largely because it was originally designed for compressing 5.25" disks with
|
|
4K tracks.
|
|
On small files, or at the end of a large one, the last bit of data is padded out
|
|
to 4K and then compressed. Ordinarily this is barely noticeable, because
|
|
the compression routines do an RLE (Run-Length Encoding) pass before applying
|
|
LZW.</p>
|
|
<p align="left">However, if both RLE and LZW fail to make the 4K block any
|
|
smaller, it is stored without compression. This means the whole 4K,
|
|
complete with padding, gets written to the archive. This doesn't cause any
|
|
problems, but can make you wonder where all the extra bits came from.</p>
|
|
<p align="left">The SQ compression algorithm, as implemented by Don Elton's SQ3,
|
|
appears to add an extra 0xff to the end of the compressed data. It can
|
|
safely be ignored.</p>
|
|
|
|
<h3 align="left">Preserving BXY and SEA wrappers</h3>
|
|
<p align="left">Preserving BXY wrappers is pretty easy, since the Binary II
|
|
format is well documented. Updating block counts and file lengths is all
|
|
that is required.</p>
|
|
<p>Preserving SEA wrappers is a little more obscure, since there
|
|
is no documentation on the format. A bit of reverse engineering reveals that
|
|
SEA files are OMF executables with two segments. The first segment holds the
|
|
extraction code, and is the same for all archives. The second holds the NuFX
|
|
data, and requires that a few length values in the segment header be adjusted.
|
|
Also, to be correct, the file must have a $00 byte appended after the NuFX
|
|
data (it's an OMF "END" opcode).</p>
|
|
<p>The archives have a minor bug: an offset field in the header is off by one,
|
|
so actually loading the segment in GS/OS would likely fail. The segment
|
|
header has the "skip" flag set, though, so this isn't a problem in practice.</p>
|
|
|
|
<h3 align="left">Y2K</h3>
|
|
<p align="left">The NuFX standard says that the Date/Time format is the same as
|
|
that returned by the IIgs ReadTimeHex toolbox call. That call returns the
|
|
year as (year - 1900), so the year 2000 is stored as "100".
|
|
ProDOS 8 clock drivers, on the other hand, return 40-99 for 1940-1999, and
|
|
0-39 for 2000-2039. As a result, archives created with P8 ShrinkIt use 0
|
|
for the year 2000 instead of 100.</p>
|
|
<p align="left">When creating archives, always use 100 for the year 2000, but
|
|
also accept the year 0. However, if you find a Date/Time with zero in all
|
|
useful fields (second, minute, hour, day, month, year), treat it as an
|
|
unspecified date rather than midnight of January 1, 2000.</p>
|
|
<hr>
|
|
<p>This document is Copyright © 2000-2004 by <a href="https://www.fadden.com/">Andy
|
|
McFadden</a>. All Rights Reserved.</p>
|
|
<p>The latest version can be found on the NuLib web site at
|
|
<a href="https://www.nulib.com/">https://www.nulib.com/</a>.</p>
|
|
</td></tr></table></td></tr></table></td></tr></table></td></tr></table><!--msnavigation--></td></tr><!--msnavigation--></table><!--msnavigation--></td></tr><!--msnavigation--></table></body>
|
|
|
|
</html>
|