nulib2/library/nufx-addendum.htm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
    <title>NuFX Addendum</title>

    <meta http-equiv="Content-Language" content="en-us">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1" />

    <meta content="t, default" name="Microsoft Border">

    <link href="../main.css" rel="stylesheet" type="text/css" />
</head>

<body bgcolor="#FFFFFF" text="#000000"><!--msnavigation--><table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td>

<p align="center"><font size="6"><strong>NuFX Addendum</strong></font><br>
<nobr>[&nbsp;<a href="../index.htm" target="">Home</a>&nbsp;]</nobr> <nobr>[&nbsp;<a href="index.htm" target="">Up</a>&nbsp;]</nobr> <nobr>[&nbsp;NuFX&nbsp;Addendum&nbsp;]</nobr> <nobr>[&nbsp;<a href="nulib2-preserve.htm" target="">ProDOS&nbsp;Attribute&nbsp;Preservation</a>&nbsp;]</nobr></p>
<hr>

</td></tr><!--msnavigation--></table>

<!--msnavigation--><msnavigation border="0" cellpadding="0" cellspacing="0" dir="ltr" width="100%"><tr><!--msnavigation--><msnavigation valign="top"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><msnavigation border="0" cellpadding="0" cellspacing="0" dir="ltr" width="100%"><tr><msnavigation valign="top"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><msnavigation border="0" cellpadding="0" cellspacing="0" width="100%"><tr><msnavigation valign="top">


<h6>NuFX Addendum - <b>By Andy McFadden - Last revised 2022/11/06</b></h6>
<p align="left">This addendum clarifies and extends certain aspects of the
<a href="FTN.e08002.htm"> NuFX specification</a>. This is not an
&quot;official&quot; modification
of the original document - it has not been reviewed and approved by
the original author - but anyone developing NuFX utilities would do
well to follow these recommendations.</p>

<h2 align="left">Purpose</h2>
<p align="left">The NuFX specification defines&nbsp;a very loose structure, and
leaves much to the imagination of the implementer.&nbsp; For example, &quot;If a
utility finds a redundancy in a Thread Record, it must decide whether to skip
the record or to do something with that particular thread...&quot;.&nbsp;
A strict specification would define a standard approach that all applications
must follow when dealing with the anomalous condition, to ensure consistent
handling of all archives.</p>
<p align="left">This document refines the NuFX specification and brings some of
the &quot;fuzzy&quot; areas into sharper focus.&nbsp; Nothing in this document
contravenes the original document.</p>
<p align="left">In the text below, &quot;<b>must</b>&quot; is an imperative that
has to be obeyed, and &quot;<b>should</b>&quot; is a recommendation that authors
are strongly encouraged to follow.</p>
<hr>
<h2 align="left"> Clarifications</h2>
<h3 align="left"> Pronunciation</h3>
<p align="left">What's the correct way to pronounce &quot;NuFX&quot;? One
approach is letter-by-letter ("en you eff ecks"), another is minimal-syllable
("new fix").  According to the file type note, it's a bit of both ("new eff ecks").</p>

<p align="left">&nbsp;</p>
<h3 align="left">Use of &quot;.SDK&quot; suffix</h3>
<p align="left">Originally, only &quot;.SHK&quot; was used to represent a NuFX
archive.&nbsp; Over time, a convention of using &quot;.SDK&quot; to represent
archives with a single disk image in them has arisen.&nbsp; This is very
convenient for emulators on systems that rely on the file extension (e.g.
Windows), so use of &quot;.SDK&quot; is encouraged.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Archives with no records</h3>
<p align="left">An archive without records, i.e. nothing but a master header
block, serves no purpose.  However, it can be useful to have a "create new
archive" operation that creates an empty file to be populated later.</p>
<p align="left"><b>Creating:</b> Archives without any records in them may be
created.</p>
<p align="left"><b>Opening:</b> If asked to open a record-less archive,
the application should recognize that the archive is empty and proceed as if it
were a new archive.</p>
<p align="left"><b>Modifying:</b> If all records in an archive are deleted, the
archive file should be deleted as well.</p>

<p align="left">&nbsp;</p>
<h3 align="left"> Records with no threads</h3>
<p align="left">A record without threads is pretty pointless.  The initial
release of the NuFX spec mandated that there be at least one thread attached
to each record, but this language was removed from later versions.</p>
<p align="left"><b>Creating:</b> Records without threads must never be
created.&nbsp; All records must have at least one thread.</p>
<p align="left"><b>Extracting: </b>Empty records should be ignored.</p>
<p align="left">&nbsp;</p>

<h3 align="left">Records with only a filename thread</h3>
<p align="left">GS/ShrinkIt v1.1 has a bug that prevents it from creating an empty
data thread when asked to add a zero-byte file.&nbsp; This results in a thread
with a filename and nothing else.&nbsp; (If it was the first new record added,
it will have an empty comment thread as well.)</p>
<p>GS/ShrinkIt does nothing when asked to extract records without threads.</p>
<p align="left"><b>Creating:</b> Records composed solely of a filename thread
must not be created.</p>
<p align="left"><b>Extracting:</b> Records with nothing but a filename thread
should be ignored.&nbsp; <i>For GSHK v1.1 bug compatibility</i>: if a record has a filename
thread, and no other threads except &quot;message&quot; threads (i.e. no data
threads or control threads), then a zero-byte data fork file should be
created.&nbsp; Otherwise, the record should be ignored.&nbsp; If the ProDOS
storage type field indicates an extended file, a zero-byte resource fork should
also be created.</p>
<p align="left">&nbsp;</p>

<h3 align="left">Records with no filename</h3>
<p align="left">A record without a filename thread is a curious beast.&nbsp;
Ideally, there wouldn't be any such thing as a filename thread, since it doesn't
really make sense to have a record without one.&nbsp; Expanding the record
header to hold a pre-sized buffer would've made many things simpler.<p align="left">This
particular situation occurred with older versions of ShrinkIt (e.g. v1.1) that failed to store
a volume name when compressing a DOS 3.3 disk.&nbsp; There was no filename in
the record header, nor one in a filename thread.<p align="left">The only
situation where a record without a filename makes sense is if the record holds
nothing but comments or other archive &quot;meta data&quot;, such as a
&quot;create directory&quot; control thread.<p align="left"><b>Creating:
</b>Records without filenames must not be created, unless the record is intended
to contain
nothing but archive meta-data.&nbsp; Deletion of the filename thread should only
be done if a new filename thread is being added.&nbsp; If data threads are added
to a record without a filename, then a filename thread must be added as well.<p align="left"><b>Extracting</b>:
If the record contains file data, the application may either prompt the user for
a filename to use, or supply a generated one.<p align="left">&nbsp;
<h3 align="left">Records with more than one filename thread</h3>
<p align="left">This is an unusual situation that should only arise if an
application is buggy.&nbsp; Every record created by a modern application should
have no more than one filename thread.<p align="left"><b>Creating:</b> Records
with multiple filename threads must not be created.<p align="left"><b>Extracting:</b>
 Applications must use the first filename thread.&nbsp; If a buggy application wants to
append an additional filename thread, their buggy filename will be ignored.<p align="left">&nbsp;<h3 align="left">Records
with filenames in two places</h3>
<p align="left">The old way of storing filenames, used by NuLib and old versions of ShrinkIt, was to
put the filename in the record header.&nbsp; To facilitate renaming, the
filename was moved into a thread.&nbsp; Thus, there are two possible locations
where the filename may live, and no guarantee that only one will be used.
<p align="left"><b>Creating:</b>
Never put the filename in the record header when creating a new record.&nbsp;
It's okay to leave existing records alone, but if an application has the
opportunity to rewrite the record header, the record filename must be removed.
<p align="left"><b>Extracting:</b>
The thread filename takes precedence over the record header filename.

<p align="left">
&nbsp;<h3 align="left">Filename character set</h3>
<p align="left">Filenames in NuFX archives use the Mac OS Roman character set,
which is ASCII plus some symbols and the usual set of latin language characters
(see <a href="https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT">
Unicode definition</a>).&nbsp; The NuFX filename definition was intended to
accommodate files from HFS volumes, which may contain any character except ':'.&nbsp;
Control characters, including NUL ('\0'), were allowed but discouraged.
<p align="left">
On modern systems, converting between Mac OS Roman and Unicode is useful and
(mostly) straightforward.&nbsp; Dealing with embedded null bytes is very
annoying in C-like languages though.
<p align="left"><strong>Creating:</strong>
Convert Unicode to Mac OS Roman, replacing any untranslatable characters with
'?'.&nbsp; Embedded nulls may be replaced with '?'.</p>
<p align="left"><strong>Extracting:</strong> Convert Mac OS Roman to Unicode.&nbsp; If embedded nulls
are encountered, they may be replaced with something appropriate for the
current system.&nbsp; Applications should not ignore the problem and
truncate the filename; if they do, they must be prepared to handle duplicate or empty
filenames.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Filesystem separator characters</h3>
<p align="left">Every record header has a &quot;file system separator&quot;
character (&quot;fssep&quot;) in the &quot;file_sys_info&quot; word.&nbsp; This
is usually something like ':' for GS/OS or '/' for UNIX.&nbsp; It's necessary to
know what the separator is in order to break a pathname down into its individual
components.</p>
<p align="left">Not all filesystems support subdirectories, however,
which means that not all filenames need to have a separator.&nbsp; The
appropriate separator character for such a filesystem is not defined in the NuFX
spec.&nbsp; Clearly it should be something illegal on the source filesystem, or
we could inadvertently see pathnames where they don't exist (e.g. a file called
&quot;foo:bar&quot; on DOS 3.3 if the fssep char were set to ':').</p>
<p align="left">The trouble is, DOS 3.3 doesn't actually have any illegal characters, just a field
of 30 characters padded with spaces.&nbsp; Pascal disks are similar.&nbsp; Since
we must define an fssep for every filename, our best choice is to use '\0'
(0x00), because it's unlikely to occur, and any program that stores names in C
strings will find it awkward to store and scan for '\0'.</p>
<p align="left">This situation also applies to archived disk images, which must
be simple filenames.</p>
<p>The application should have some understanding of which filesystems
have subdirectories and which don't, which would allow it to disregard the
fssep char when it can't be relevant for a record, but it's easier to
let the fssep char's usefulness be self-evident.</p>
<p align="left">(NOTE: NufxLib v2.0.3 rejected 0x00 as an fssep character. This was a bug.)</p>

<p align="left"><b>Creating:</b> When adding files directly from filesystems without subdirectories, use 0x00 as
the fssep char.</p>
<p align="left"><b>Extracting:</b> An fssep char of 0x00 means
the pathname is just the filename.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Disk image pathnames</h3>
<p align="left">While files may have multiple path components (e.g.
&quot;subdir:subdir2:filename&quot;), it makes no sense for disk images to have
them.&nbsp; The stored filename for a disk is either the disk's ProDOS volume
name, or for non-ProDOS disks, a simple label defined by the user.&nbsp; Since
the eventual target is a disk device, specifying a subdirectory path makes no
sense.</p>
<p align="left">The issue becomes a little more confusing when storage of
disk images used for emulators is considered.&nbsp; At first glance, it seems
useful to be able to store a hierarchy of disk images.&nbsp; In practice, such
images would either be archived as a hierarchy of .PO files, or as an archive of
.SDK archives.</p>
<p>Ultimately, the disk volume name is embedded in the disk image itself. The
name stored in the archive is purely decorative.</p>
<p align="left"><b>Adding/renaming</b> Applications must
strip any leading path components from disk image &quot;storage names&quot;
(The NuFX specification does explicitly forbid the use of a filesystem separator
character in a disk volume name.)</p>
<p align="left"><b>Extracting:</b>
Applications extracting directly to a disk must strip leading path components
before assigning the ProDOS volume name.&nbsp; Applications extracting images to
a file don't need to do anything unusual.

<p align="left">&nbsp;</p>
<h3 align="left">Filename case sensitivity</h3>
<p align="left">There isn't a &quot;filename is case-sensitive&quot; flag in
NuFX archives.&nbsp; Since it was designed primarily for ProDOS and HFS
filesystems, neither of which is case-sensitive, we should assume that case is
not meant to be significant when determining whether two records have the same
filename.&nbsp; This becomes important when adding files (to test for
duplicates), extracting files by name, and when attempting
to display archive contents as a hierarchical tree.</p>
<p>HFS files will use the Mac OS Roman character set, so a simple ASCII
case conversion will be inadequate.  An HFS filename comparison routine must
be used.</p>
<p align="left">Applications
should try to recognize that &quot;foo/bar&quot;, &quot;foo/BAR&quot;, and
&quot;FOO/bar&quot; are the same file, but it's probably not worth
&quot;probing&quot; a case-sensitive filesystem like Linux ext2 to guarantee
such.

<p align="left">&nbsp;<h3 align="left">Duplicate filenames</h3>
<p align="left">There is nothing in the NuFX specification that prevents having
more than one file with the same name in an archive.&nbsp; In practice, this is
inconvenient, especially for users with command-line tools.&nbsp; On the other
hand, if the underlying filesystem is case-sensitive, the extracted files may
not actually collide, so it may not make sense for all applications to treat
this as an iron-clad rule.<p align="left">When comparing names, be sure to take
the filesystem separator character into account.&nbsp; &quot;foo:bar&quot; could
be a simple filename or a partial pathname depending on whether ':' is the
separator.&nbsp; Two names should be considered identical if each distinct path
component matches, so &quot;foo/bar&quot; and &quot;foo:bar&quot; are identical
if the separators are '/' and ':', respectively.&nbsp; Comparisons should be
case-insensitive.<p align="left"><b>Adding/renaming:</b> Applications
should prevent multiple records from having the same case-insensitive filename.

<p align="left">&nbsp;</p>
<h3 align="left">Pre-sized or not pre-sized</h3>
<p align="left">The specification declares that filename threads and comments
use pre-sized buffers.&nbsp; It does not define what other members of the
message and filename classes are, which makes it difficult to know what to do
with a request to create a heretofore undefined thread type.&nbsp; The NuFX
format does not provide any definitive clue as to whether a thread is pre-sized,
so such decisions must be based on the thread class and thread kind.</p>
<p align="left">Filename threads and comment threads are pre-sized.&nbsp; All
other threads are not pre-sized (including other members of the
&quot;filename&quot; and &quot;message&quot; classes).</p>

<p align="left">&nbsp;</p>
<h3 align="left">Proper pre-size for filename threads</h3>
<p align="left">ShrinkIt allocates a 32-byte pre-sized buffer for the
filename.&nbsp; If the filename is larger than 32 bytes, the buffer grows to fit
the filename exactly.&nbsp; If renaming files is considered useful, then the
buffer should always be slightly larger than is needed to hold the
filename.&nbsp; (Filenames longer than 32 characters are most likely the result
of nested directories, so renaming the file itself is inhibited if the buffer
length is an exact match.)</p>
<!-- <p align="left">Side note: GSHK appears to have a bug where it can't deal with
32-byte HFS filenames (e.g. &quot;foo:abcdefghijabcdefghijabcdefghijxy&quot;
can't be added to an archive). Emulating this behavior is discouraged.</p> -->
<p>Side note: the specification does not specify a minimum or maximum length
for a filename.  The specification notes that "GS/OS can create
8,000-character filenames", so that seems safe to use as an upper bound.
Zero-length filenames cannot be stored in a record header, because a length
of zero indicates that the filename is in a thread, so it's reasonable to
require that filenames be at least one character long.  A zero-length filename
should be treated the same as a missing filename.</p>
<p align="left"><b>Creating:</b> If GS/ShrinkIt compatibility is not important,
all filenames should have at least 8 bytes of free space in the filename thread.&nbsp;
For GSHK compatibility, the filename thread compThreadEOF must be the greater of
32 and the filename length.</p>
<p align="left"><b>Renaming:</b> It is acceptable to have fewer than 8 bytes of
free space remaining after a file is renamed.&nbsp; However, if the filename
itself exceeds the buffer size and the thread must be rebuilt, the 8-byte
padding should be added.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Thread ordering</h3>
<p align="left">The NuFX specification specifies a general ordering for
threads ("blocks must occur in the following fashion"), but doesn't indicate
what should be done if they appear out of order.  Handling out-of-order
threads isn't impossible, but it can be inconvenient.</p>
<p align="left">For example, if an archive is being unpacked as it is received,
it is important to know the filename before receiving the data.  If the
filename thread comes after the data threads, the application has to write the
incoming data into a temp file, and then rename it later when the filename
thread finally shows up.  It would also be nice to be able to display file
comments as the file is being downloaded.</p>
<p align="left"><b>Creating:</b> The filename thread must precede all other
threads.  The recommended ordering for common thread types is:</p>
<ul>
  <li>Filename</li>
  <li>Message(s) (i.e. comments)</li>
  <li>Data threads (data fork, resource fork, disk image)</li>
  <li>all other threads</li>
</ul>
<p align="left"><b>Extracting:</b> If the filename thread does not appear before
the first data-class thread, the record may be ignored.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Incompatible thread types</h3>
<p align="left">There are some combinations of threads that must never appear in
a single record.</p>
<p align="left"><b>Creating:</b></p>
<ul>
  <li> If a <b>data fork</b> is present, the record must not
    contain another data fork or a disk image.</li>
  <li> If a <b>resource fork</b> is present, the record must not
    contain another resource fork or a disk image.</li>
  <li> If a <b>disk image</b> is present, the record must not
    contain another disk image, a data fork, or a resource fork.</li>
  <li> If a <b>control-class thread</b> is present, the record must
    not contain any data-class threads.</li>
</ul>
<p align="left"><b>Extracting:</b> When incompatible threads are found, they
should be ignored in favor of the earlier threads.&nbsp; For example, if two
data forks are found in the same record, only the first one should be extracted.&nbsp;
If a data-class thread is found first, subsequent control-class threads should
be ignored, and vice-versa.</p>
<p align="left">&nbsp;</p>
<h3 align="left">Compressed threads</h3>
<p align="left">Some threads are compressed, some aren't.&nbsp; The
specification isn't very specific.</p>
<p align="left">All data-class threads may be compressed.&nbsp; All other
classes of threads must not be compressed.</p>
<p align="left">&nbsp;</p>
<h3 align="left">ProDOS storage type</h3>
<p align="left">The ProDOS storage type has little meaning on most
systems.&nbsp; However, certain values are significant.</p>
<ul>
  <li>
    <p align="left">For records with <b>only a data fork</b>, the storage type
    must be one of 0, 1, 2, or 3.  The specific choice is not useful to
    anyone, but a nonzero value (say, 1) should be used.</li>
  <li>
    <p align="left">For records with <b>a resource fork</b>, the storage type
    must be &quot;5&quot; (ProDOS extended file).</li>
  <li>
    <p align="left">For records with a <b>disk image</b> thread, the storage
    type must be equal to the disk block size (typically 512).</li>
  <li>
    <p align="left">For records <b>without data-class threads</b>, the storage
    type must be &quot;0&quot;.</li>
</ul>
<p align="left">Storage type 0x0d, which is used by ProDOS for directories, must
not be used.</p>
<p align="left">It is important to update the storage type as threads are added
and deleted, so that it always accurately reflects the contents of the record.</p>
<p>The spec seems to claim that HFS volumes have 524 bytes per block (though
the assertion was weakened from "would" to "might" in the final version).
This refers to the 12 "tag" bytes available on 3.5" floppies, which are
accessible from Mac OS but not actually required by HFS.</p>

<p>&nbsp;</p>
<h3>GS/OS option lists and HFS file types</h3>
<p>GS/OS was designed to work with a variety of different filesystems.
Instead of trying to handle all conceivable file attributes explicitly,
GS/OS returns filesystem-specific values in "option lists".  These can
be provided to the get/set file info calls when copying files around.</p>
<p>Files on HFS volumes have two four-byte values, called file type and
creator, that identify the file contents.  These are part of the Macintosh
Finder info structures, called FInfo and FXInfo.  Files copied from HFS
to ProDOS may have this data stored in the extended key block of a forked
file (see ProDOS technical note #25).  This appears as two 18-byte chunks,
consisting of a size byte followed by a type byte, and then 16 bytes of
FInfo or FXInfo data (which are defined in <i>Inside Macintosh: Macintosh
Toolbox Essentials</i>, page 7-47).  To expose the data to applications,
certain GS/OS calls pass an "option list" with the contents.  Most of
the fields are uninteresting to anything but the Mac Finder on the system
where the files were stored, so for our purposes the option list may be
viewed simply as a way to preserve the file type and creator.</p>

<p>Experiments with the GS/OS Exerciser reveal that the option list returned
doesn't include the size/type bytes.  For an HFS file copied to ProDOS
with GS/OS, the GetFileInfo call returns a 32-byte buffer that begins
with FInfo.  When called on an HFS volume, the option list is 36 bytes,
with the last four bytes set to 02 00 00 00.  GSHK appears to record these
exactly as it receives them, which means the first four bytes hold the
HFS file type, and the second four bytes hold the HFS creator, in
big-endian byte order.  Because most of the fields only have meaning to the
Macintosh finder, the rest of the data is zeroes.  Files archived from an
HFS volume created by a Macintosh would presumably have nonzero data in
more places.</p>
<p>When archiving files from an HFS volume under GS/OS, GSHK records the
ProDOS type/auxtype rather than the full HFS file type and creator,
because that's what the GS/OS file info query returns. The only way to
recover the original Mac Finder types is through the option list.</p>
<p>Sometimes the option list found in a NuFX archive is a little messed up,
e.g. the size field says 36 bytes, but there's only space for 18 bytes in
the record header.</p>
<p>Side note: the NuFX specification reversed the values of MFS and HFS
in the file_sys_id enumeration.  In practice, GS/ShrinkIt
correctly uses the GS/OS FST definitions: MFS=5, HFS=6.</p>
<p><b>Opening:</b> Assume the option_size field is correct
unless it exceeds attrib_count-2. If it's too large, clip it down to size.
If the filesystem type is ProDOS or HFS, the option list is at least 16 bytes
long, and the second 4 bytes are nonzero,
use the first 4 bytes of the option list data as the file type and
the second 4 bytes as the creator.  If a secondary test is desired to
avoid garbage, the creator value is usually ASCII.</p>
<p><b>Creating:</b> If a record has HFS type values, generate a
filesystem-specific option list (32 bytes for ProDOS or 36 bytes for HFS)
and store them there.</p>
<p><b>Updating:</b> Always output the actual record size. Do not propagate
incorrect size values. Retaining option lists for ProDOS and HFS entries
is required, since they may have the only copy of the original file type
and creator, but only if at least one of the first 8 bytes of the option
list are nonzero.  Updates to the archive attributes that alter the file/aux
type should usually retain the option list, since the purpose may be to
improve ProDOS usability without losing the original type information.</p>

<p>&nbsp;</p>
<h3>ProDOS vs. HFS file types</h3>
<p>The initial release of the specification stated that the HFS file type and
creator should be stored in the record header.  The final version of the
specification abdicates responsibility for defining the field, stating simply,
"For ProDOS 8 or GS/OS, this field should always be what the operating system
returns when asked".</p>
<p>For reference, when an application asks GS/OS to get the information for
a file on an HFS volume, it returns a ProDOS file type and aux type (usually
BIN), and puts the HFS type and creator into an option list.  If this
behavior defines the field, then this is how the types should be stored.</p>
<p>However, the vague wording of the specification raises the possibility that
a Mac OS-based archiver should store the file type and creator directly in
the record header, because that's what "the operating system" returned.  The
record header does not provide a way to define the source of the type values,
so an extraction program attempting to set the file info would need to draw
conclusions based on whether the types are small enough values to be valid
for ProDOS.</p>
<p>It's worth noting that files on an AppleShare volume have independent
ProDOS and HFS file types.  When a ProDOS file is written to the AppleShare
FST, Mac OS type and creator values are generated according to a scheme
documented in the AppleShare FST public ERS document.  It's possible that a
Mac archiver could store ProDOS file types as HFS file types that are
actually ProDOS file types that must be decoded based on a collection of
rules.</p>
<p>To avoid ambiguity, we want to follow the GS/OS behavior, regardless of
what the host operating system does.</p>
<p><b>Creating:</b> store the ProDOS file type and aux type in the record
header.  For files on HFS volumes, put a simple ProDOS type (BIN or TXT)
in the record header, and put the file type and creator in an option list.</p>
<p><b>Extracting:</b> if the file type and aux type do not fit in 8 and 16
bits, respectively, treat them as values from HFS.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Disk image size values</h3>
<p>For a compressed disk image, the &quot;storage_type&quot; and
&quot;extra_type&quot; fields take on different meanings: the extra_type
field holds the block size (usually 512), and the extra_type field holds
the block count (e.g. 280 for a 140KB disk).</p>
<p>These fields are more important than you might expect, because
ShrinkIt doesn't appear to set the thread EOF value for disk images.  (A quick
test with ShrinkIt v3.4 on a 5.25" DOS disk yielded a thread EOF of zero,
while GS/ShrinkIt v1.1 on a 3.5" ProDOS disk generated a mysterious
thread EOF of $4a00.)
Worse, some older versions of ShrinkIt tended to leave the
&quot;storage_type&quot; set to 2.
Apparently, ShrinkIt just uses extra_type * 512 as the uncompressed size when
trying to figure out what sort of disk it has. An early version of
GS/ShrinkIt went one step further: it used a block count of 280 with a block
size of 256, resulting in archives that apparently held 70K disk images.</p>
<p>It is simple enough to disregard the thread EOF value, and
replace the storage_type when it is absurdly small, but there is a deeper
problem. If you delete a 140KB disk image thread and replace it with an
800KB disk image thread, the block count stored in the extra_type no longer
accurately reflects the contents of the record. (This linkage between the
record header and the thread contents is the reason why this document forbids
mixing of disk image threads with any other data-class thread, including other
disk images.)</p>
<p>Because the length of the disk image thread can only be determined from
the extra_type field, it is important for applications that support changing
the file and aux types to prevent such changes in records with disk images.</p>
<p><b>Creating:</b> Applications must update the record's storage_type and
extra_type fields whenever a disk image thread is added. The value
(storage_type * extra_type) must be equal to the uncompressed size.  The
application should reject disk image files that are not a multiple of
512 bytes.  For consistency with other applications, the thread EOF field
should be zeroed.</p>
<p><b>Extracting:</b> The application must ignore the thread EOF, and
normalize storage_type to 512 if it is less than 16 (0x0f is the largest
valid ProDOS storage type). The value (512 * extra_type) should be
used as the uncompressed size. If the uncompressed size is zero, the
thread may be ignored.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Access permissions</h3>
<p align="left">NuFX supports four boolean access permission flags (read, write,
destroy, rename) and two boolean attributes (backup needed, invisible) in the
&quot;access&quot; field.&nbsp; This matches up with ProDOS capabilities nicely,
but very few other operating systems support all six.</p>
<p align="left">Applications authors should consider the following approaches:</p>
<ol>
  <li>
    <p align="left"><b>Preserve all.</b>&nbsp; All flags in the access field
    must be preserved.&nbsp; It is not required that the extracted files obey
    the original semantics -- an &quot;invisible&quot; file might be visible,
    and a file with &quot;rename&quot; disabled might still be rename-able --
    but when the files are re-added, the permissions must match.</li>
  <li>
    <p align="left"><b>Locked/unlocked.</b>&nbsp; A file with read enabled, and
    write, destroy, rename, and invisible disabled, is considered
    &quot;locked&quot; (access 0x01 or 0x21).&nbsp; All other files are
    considered &quot;unlocked&quot;.&nbsp; When a file is extracted and then
    added to an archive, the locked/unlocked status must be preserved.&nbsp;
    Locked files are added with access 0x21, and unlocked files are added with
    access 0xe3.</li>
</ol>
<p align="left">It is acceptable for an application to find a middle ground
between these two, and preserve more of the flags accurately than approach #2
does, but approach #2 should be considered the minimum acceptable level of
support.</p>
<p align="left">&nbsp;</p>
<h3 align="left">Empty directories</h3>
<p align="left">Directories do not need to be stored explicitly unless they are
empty.&nbsp; The NuFX specification manages to avoid describing how directories
are actually supposed to be stored, saying only: &quot;A Thread Record must exist to inform a utility that a directory is to
be created through the use of the proper control_thread value.&quot;</p>
<p align="left">What is in a &quot;create directory&quot; control thread?&nbsp;
It appears that the intent was to have the thread contain the pathname that
needed to be created.&nbsp; In theory, you could have several of these things,
and create an entire hierarchy from a single record.&nbsp; Such threads should
not be compressed, but their compThreadEOF should always match their threadEOF
(i.e. they're not pre-sized).</p>
<p align="left">It's a little tricky to say, &quot;add a control thread whenever
you find a directory with nothing in it&quot;.&nbsp; What if the directory has
files in it, but you don't have the access permissions necessary to read the
files?</p>
<p align="left">Does such a record require a filename?&nbsp; Probably not.&nbsp;
However, if it doesn't have a filename, ShrinkIt might not display the record,
and you'd have no way to manipulate it.&nbsp; Adding a &quot;record label&quot;
is easy and useful.</p>
<p align="left">(I'm strongly tempted to punt on the control threads and just
use storage type 0x0d to indicate that a directory should be created.&nbsp; This
is in direct opposition to the NuFX specification, however, so I'm reluctant to
do so.)</p>
<p align="left"><b>Creating:</b> Applications not interested in preserving empty
directories need do nothing.&nbsp; Otherwise, the application must add a
&quot;create directory&quot; control thread whenever a directory is encountered
for which no files are added to the archive.</p>
<p align="left"><b>Extracting:</b> A directory must be created when a control
thread is present.&nbsp; As noted in the NuFX specification, the application
must also create any directories listed in the record's pathname that don't yet
exist.</p>

<p align="left">&nbsp;</p>
<h3 align="left">Message thread format</h3>
<p align="left">The specification says that message threads are ASCII text, but
doesn't specify an EOL character. For the benefit of Apple II utilities,
it's best to use a carriage return (Ctrl+M). The comments are expected to
be readable on 8-bit Apple IIs, so plain ASCII rather than Mac OS Roman should
be used.</p>
<p align="left"><b>Creating:</b> Convert any EOL markers to CR, and any
non-ASCII characters (i.e. bytes with the high bit set) to ASCII.</p>
<p align="left"><b>Extracting:</b> Assume that the comment may be using CR, LF,
or CRLF, and convert as needed for display. GS/ShrinkIt used a
proportional font, so there is no need to worry about formatting to preserve
&quot;ASCII art&quot; in comments.</p>

<p>&nbsp;</p>
<h3>Message thread maximum length</h3>
<p>Comments are rarely used, and when they are they tend to be fairly short.
The contents are never compressed, aren't covered by a CRC, and aren't
extracted to files, making them a bad way to convey vital information.
Adding and editing the comment field was introduced with GS/ShrinkIt, which
creates a pre-sized comment on the first entry in each batch.  The editor
does not expand or reduce the length of the field, which is limited to
1,000 bytes.  It does support longer comments created by other programs.</p>
<p>It's convenient to assign a maximum possible length to comments, so that
they can be manipulated by code that doesn't need to handle their maximum
possible length of 4GB.  A cap of 64KB (same as ZIP) seems reasonable as an
absolute maximum, considering likely content and what Apple II software can
support.</p>
<p><b>Creating:</b> Limit comments to 64KB.  Applications may establish a
lower limit, but should allow them to be at least 1000 bytes.</p>
<p><b>Updating:</b> Truncation of comments longer than 64KB is
discouraged but allowed.</p>

<p>&nbsp;</p>
<h3>Master EOF</h3>
<p>For the most part, ShrinkIt correctly sets the MasterEOF field
in the Master Header block.  The field was introduced with version 1 of the
header definition.  A very old version of ShrinkIt left it set to
zero (this is the same version that completely omitted the filename for DOS 3.3
disk images). GS/ShrinkIt appears to initialize it to 48 (the size of the
MH block), and if the creation process is interrupted you can end up with a
partial archive with a nonzero EOF.</p>
<p>The master EOF is useful as a quick file truncation test, but provides
no other value.  The record count in the header is more important.</p>
<p><b>Opening:</b> Don't assume the master EOF is accurate.  Walk through the
list of records to determine the actual end-of-file before appending new
records.</p>
<p><b>Updating:</b> Applications must write the correct MasterEOF
value if an archive is modified.</p>

<p>&nbsp;</p>
<hr>

<h2 align="left">Extensions</h2>
<p align="left">Unofficial extensions to the NuFX specification.&nbsp; Anyone
working with NuFX archives should take heed.</p>
<h3 align="left">New compression formats</h3>
<p align="left">Thread formats 0x0000 through 0x0005 are already defined.&nbsp; The
following thread format values have been added:</p>
<ul>
  <li>
    <p align="left">0x0006 - deflate. The thread contains data conforming
    to RFC 1951 (deflate 1.3 specification), which is the compression format
    used by ZIP and gzip.  The canonical implementation is "zlib".
    Visit <a href="https://zlib.net/">zlib.net</a> for more details.</li>
  <li>
    <p align="left">0x0007 - bzip2.&nbsp; The thread contains BWT+Huffman
    compressed data as output by &quot;libbz2&quot;. Visit
    <a href="https://sourceware.org/bzip2/">sourceware.org/bzip2</a>
    for more information.</li>
</ul>
<p align="left">Support for these formats is nonexistent on the Apple II, so
they should not be used except in situations where compatibility is unimportant
(e.g. collections of disk archives for use with A2 emulators).</p>

<p align="left">I found that &quot;deflate&quot; generally does as well or
better than &quot;bzip2&quot; on Apple II binaries, disk images, and small text
files.&nbsp; Deflate is also faster and uses less memory, and you're more likely
to find libz installed on a given system than you are libbz2&nbsp; For these
reasons, use of deflate should be encouraged in favor of bzip2.</p>

<hr>
<h2 align="left">NuFX Quirks</h2>
<p align="left">This section identifies some quirks in NuFX or ShrinkIt that,
while not bugs, are worth noting.</p>
<h3 align="left">Filename separator character</h3>
<p align="left">Originally, the filename was stored in the record header, so it
made sense that the filename separator character (&quot;fssep char&quot;) should
also be there.&nbsp; When the filenames were moved into threads, the fssep char
got left behind.&nbsp; If a record has two filenames, they'd better have the
same fssep char, or interpreting one of them will be impossible.&nbsp; (This is
one of the reasons why it's important to clearly define which filename takes
precedence in all circumstances.)</p>

<h3 align="left">Files with zero or two CRCs</h3>
<p align="left">The &quot;threadCRC&quot; field in the thread header block can
have one of three meanings: nothing (v0, v1), the CRC of the compressed data
(v2), or the CRC of the uncompressed data (v3). Version 2 records weren't
generated by anything significant, and can be ignored.  (If you actually find
an archive with v2 records, it's reasonable to just treat them as v1.)</p>
<p align="left">Version 1 records generally have threads compressed with LZW/1
data. The LZW/1 compression format includes the 16-bit CRC of the uncompressed
data at the start of the thread. Version 3 records generally have threads compressed
with LZW/2 data, which does not include a CRC.</p>
<p align="left">Applications like P8 ShrinkIt and NuLib create v1 records and
compress with LZW/1, while GS/ShrinkIt and NuLib2 create v3 records and compress
with LZW/2. This means that each compressed thread has exactly one CRC.
(Uncompressed data stored by P8 ShrinkIt has no CRC at all.)
So what happens if you tell NuLib2 to create a new record with
LZW/1, or tell it to add a new LZW/2 thread to an existing v1 record?</p>
<p align="left">In one case, you end up with two CRCs; in the other, you end up
with no CRC on your data at all. Unfortunately, the v3 thread
CRC is computed with a different initial value, so it is necessary to compute
the CRC twice for LZW/1 data, not merely store the same value twice.</p>
<p>When replacing a data thread in an existing record, it's tempting to
update the record to the latest (v3), but this may come at a cost.  For
example, if the record has both resource and data forks, and only the data fork
is being replaced, it would be necessary to uncompress the resource fork to
calculate its uncompressed CRC.  Programs that rewrite records should be
prepared to output v1 or v3.</p>

<h3 align="left">Extra data in compressed threads</h3>
<p align="left">ShrinkIt adds an extra byte at the end of all LZW compressed
data, probably due to an off-by-one bug in the compression code.&nbsp; It turns
out that it's possible to get even more &quot;extra&quot; bytes at the end.</p>
<p align="left">ShrinkIt's LZW-I algorithm always operates on a 4K buffer,
largely because it was originally designed for compressing 5.25&quot; disks with
4K tracks.&nbsp;
On small files, or at the end of a large one, the last bit of data is padded out
to 4K and then compressed.&nbsp; Ordinarily this is barely noticeable, because
the compression routines do an RLE (Run-Length Encoding) pass before applying
LZW.</p>
<p align="left">However, if both RLE and LZW fail to make the 4K block any
smaller, it is stored without compression.&nbsp; This means the whole 4K,
complete with padding, gets written to the archive.&nbsp; This doesn't cause any
problems, but can make you wonder where all the extra bits came from.</p>
<p align="left">The SQ compression algorithm, as implemented by Don Elton's SQ3,
appears to add an extra 0xff to the end of the compressed data.&nbsp; It can
safely be ignored.</p>

<h3 align="left">Preserving BXY and SEA wrappers</h3>
<p align="left">Preserving BXY wrappers is pretty easy, since the Binary II
format is well documented.&nbsp; Updating block counts and file lengths is all
that is required.</p>
<p>Preserving SEA wrappers is a little more obscure, since there
is no documentation on the format.  A bit of reverse engineering reveals that
SEA files are OMF executables with two segments.  The first segment holds the
extraction code, and is the same for all archives.  The second holds the NuFX
data, and requires that a few length values in the segment header be adjusted.
Also, to be correct, the file must have a $00 byte appended after the NuFX
data (it's an OMF "END" opcode).</p>
<p>The archives have a minor bug: an offset field in the header is off by one,
so actually loading the segment in GS/OS would likely fail.  The segment
header has the "skip" flag set, though, so this isn't a problem in practice.</p>

<h3 align="left">Y2K</h3>
<p align="left">The NuFX standard says that the Date/Time format is the same as
that returned by the IIgs ReadTimeHex toolbox call.&nbsp; That call returns the
year as (year - 1900), so the year 2000 is stored as &quot;100&quot;.&nbsp;
ProDOS 8 clock drivers, on the other hand,&nbsp; return 40-99 for 1940-1999, and
0-39 for 2000-2039.&nbsp; As a result, archives created with P8 ShrinkIt use 0
for the year 2000 instead of 100.</p>
<p align="left">When creating archives, always use 100 for the year 2000, but
also accept the year 0.&nbsp; However, if you find a Date/Time with zero in all
useful fields (second, minute, hour, day, month, year), treat it as an
unspecified date rather than midnight of January 1, 2000.</p>
<hr>
<p>This document is Copyright &copy; 2000-2004 by <a href="https://www.fadden.com/">Andy
McFadden</a>.&nbsp; All Rights Reserved.</p>
<p>The latest version can be found on the NuLib web site at
<a href="https://www.nulib.com/">https://www.nulib.com/</a>.</p>
</td></tr></table></td></tr></table></td></tr></table></td></tr></table><!--msnavigation--></td></tr><!--msnavigation--></table><!--msnavigation--></td></tr><!--msnavigation--></table></body>

</html>