diff --git a/library/nufx-addendum.htm b/library/nufx-addendum.htm index fad3cac..162ab17 100644 --- a/library/nufx-addendum.htm +++ b/library/nufx-addendum.htm @@ -1,595 +1,618 @@ - - - - - NuFX Addendum - - - - - - - - - - -
- -

NuFX Addendum
-Home ] Up ] [ NuFX Addendum ] ProDOS Attribute Preservation ]

-
- -
- - - - -
NuFX Addendum - By Andy McFadden - Last revised 2022/05/25
-

This addendum clarifies and extends certain aspects of the NuFX -specification.  This is not an "official" modification -of the original document - it has not been reviewed and approved by -the original author - but anyone developing NuFX utilities would do -well to follow these recommendations.

- -

Purpose

-

The NuFX specification defines a very loose structure, and -leaves much to the imagination of the implementer.  For example, "If a -utility finds a redundancy in a Thread Record, it must decide whether to skip -the record or to do something with that particular thread...".  -A strict specification would declare that the situation must never arise, and -define a standard approach for dealing with the anomalous condition.  The -current specification declares that the situation may arise, and -requires the application author to come up with a solution.

-

This document refines the NuFX specification and brings some of -the "fuzzy" areas into sharper focus.  Nothing in this document -contravenes the original document.

-

In the text below, "must" is an imperative that -has to be obeyed, and "should" is a recommendation that authors -are strongly encouraged to follow.

-
-

Clarifications

-

Pronunciation

-

What's the correct way to pronounce "NuFX"?  The -specification doesn't say.  There are two basic camps, letter-by-letter -("en you eff ecks") and minimal-syllable ("new fix" or -"new fuchs").  I don't recall how Andy Nicholas says it, so let -it be "new fix".

-

 

-

Use of ".SDK" suffix

-

Originally, only ".SHK" was used to represent a NuFX -archive.  Over time, a convention of using ".SDK" to represent -archives with a single disk image in them has arisen.  This is very -convenient for emulators on systems that rely on the file extension (e.g. -Windows), so use of ".SDK" is encouraged.

-

 

-

Archives with no records

-

An archive without records, i.e. nothing but a master header -block, serves no purpose.

-

Creating: Archives without any records in them must never -be created.  All archives must have at least one record.

-

Opening: If asked to open a record-less archive, -the application should recognize that the archive is empty and proceed as if it -were a new archive.

-

Modifying: If all records in an archive are deleted, the -archive file must be deleted as well.

-

 

-

Records with no threads

-

A record without threads is pretty pointless.

-

Creating: Records without threads must never be -created.  All records must have at least one thread.

-

Extracting: Empty records should be ignored.

-

 

-

Records with only a filename thread

-

GS/ShrinkIt v1.1 has a bug that prevents it from creating an empty -data thread when asked to add a zero-byte file.  This results in a thread -with a filename and nothing else.  (If it was the first new record added, -it will have an empty comment thread as well.)

There is no valid -reason for deliberately creating such a file. -

Creating: Records composed solely of a filename thread -must not be created. -

Extracting: Records with nothing but a filename thread -should be ignored.  For GSHK v1.1 bug compatibility: if a record has a filename -thread, and no other threads except "message" threads (i.e. no data -threads or control threads), then a zero-byte data fork file should be -created.  Otherwise, the record should be ignored.  If the ProDOS -storage type field indicates an extended file, a zero-byte resource fork should -also be created. -

  -

Records with no filename

-

A record without a filename thread is a curious beast.  -Ideally, there wouldn't be any such thing as a filename thread, since it doesn't -really make sense to have a record without one.  Expanding the record -header to hold a pre-sized buffer would've made many things simpler.

This -particular situation occurred with older versions of ShrinkIt (e.g. v1.1) that failed to store -a volume name when compressing a DOS 3.3 disk.  There was no filename in -the record header, nor one in a filename thread.

The only -situation where a record without a filename makes sense is if the record holds -nothing but comments or other archive "meta data", such as a -"create directory" control thread.

Creating: -Records without filenames must not be created, unless the record is intended -to contain -nothing but archive meta-data.  Deletion of the filename thread should only -be done if a new filename thread is being added.  If data threads are added -to a record without a filename, then a filename thread must be added as well.

Extracting: -If the record contains file data, the application may either prompt the user for -a filename to use, or supply a generated one.

  -

Records with more than one filename thread

-

This is an unusual situation that should only arise if an -application is buggy.  Every record created by a modern application should -have no more than one filename thread.

Creating: Records -with multiple filename threads must not be created.

Extracting: - Applications must use the first filename thread.  If a buggy application wants to -append an additional filename thread, their buggy filename will be ignored.

 

Records -with filenames in two places

-

The old way of storing filenames, used by NuLib and old versions of ShrinkIt, was to -put the filename in the record header.  To facilitate renaming, the -filename was moved into a thread.  Thus, there are two possible locations -where the filename may live, and no guarantee that only one will be used..

Creating: -Never put the filename in the record header when creating a new record.  -It's okay to leave existing records alone, but if an application has the -opportunity to rewrite the record header, the record filename must be removed.

Extracting: -The thread filename takes precedence over the record header filename.

Filename character set

-

Filenames in NuFX archives use the Mac OS Roman character set, -which is ASCII plus some symbols and the usual set of latin language characters -(see -Unicode definition).  The NuFX filename definition was intended to -accommodate files from HFS volumes, which may contain any character except ':'.  -Control characters, including NUL ('\0'), were allowed but discouraged.

-On modern systems, converting between Mac OS Roman and Unicode is useful and -(mostly) straightforward.  Dealing with embedded null bytes is very -annoying in C-like languages though.

Creating: -Convert Unicode to Mac OS Roman, replacing any untranslatable characters with -'?'.  Embedded nulls must be replaced with '?'.

-Extracting: Convert Mac OS Roman to Unicode.  If embedded nulls -are encountered, they should be replaced with something appropriate for the -current system.  Applications are allowed to ignore the problem and -truncate the filename, but must be prepared to handle duplicate or empty -filenames.

 

File -system separator characters

-

Every record header has a "file system separator" -character ("fssep") in the "file_sys_info" word.  This -is usually something like ':' for GS/OS or '/' for UNIX.  It's necessary to -know what the separator is in order to break a pathname down into its individual -components.

Not all filesystems support subdirectories, however, -which means that not all filenames need to have a separator.  The -appropriate separator character for such a filesystem is not defined in the NuFX -spec.  Clearly it should be something illegal on the source filesystem, or -we could inadvertently see pathnames where they don't exist (e.g. a file called -"foo:bar" on DOS 3.3 if the fssep char were set to ':').

The -trouble is, DOS 3.3 doesn't actually have any illegal characters, just a field -of 30 characters padded with spaces.  Pascal disks are similar.  Since -we must define an fssep for every filename, our best choice is to use '\0' -(0x00), because it's unlikely to occur, and any program that stores names in C -strings will find it awkward to store and scan for '\0'.

This -situation also applies to archived disk images, which must be simple filenames.

(NOTE: -as of v2.0.3, NufxLib rejects 0x00 as an fssep character.  This is a bug.)

Creating: -When adding files directly from filesystems without subdirectories, use 0x00 as -the fssep char.

Extracting: An fssep char of 0x00 means -the pathname is just the filename.

 

Disk -image pathnames

-

While files may have multiple path components (e.g. -"subdir:subdir2:filename"), it makes no sense for disk images to have -them.  The stored filename for a disk is either the disk's ProDOS volume -name, or for non-ProDOS disks, a simple label defined by the user.  Since -the eventual target is a disk device, specifying a subdirectory path makes no -sense.

The issue becomes a little more confusing when storage of -disk images used for emulators is considered.  At first glance, it seems -useful to be able to store a hierarchy of disk images.  In practice, such -images would either be archived as a hierarchy of .PO files, or as an archive of -.SDK archives.

Adding/renaming Applications must -strip any leading path components from disk image "storage names".  -(The NuFX specification does explicitly forbid the use of a filesystem separator -character in a disk volume name.)

Extracting: -Applications extracting directly to a disk must strip leading path components -before assigning the ProDOS volume name.  Applications extracting images to -a file don't need to do anything unusual. - -

 

Filename case sensitivity

-

There isn't a "filename is case-sensitive" flag in -NuFX archives.  Since it was designed primarily for ProDOS and HFS -filesystems, neither of which is case-sensitive, we should assume that case is -not meant to be significant when determining whether two records have the same -filename.  This becomes important when adding files (to test for -duplicates), extracting files by name, and when attempting -to display archive contents as a hierarchical tree.

Applications -should try to recognize that "foo/bar", "foo/BAR", and -"FOO/bar" are the same file, but it's probably not worth -"probing" a case-sensitive filesystem like Linux ext2 to guarantee -such. - -

 

Duplicate filenames

-

There is nothing in the NuFX specification that prevents having -more than one file with the same name in an archive.  In practice, this is -inconvenient, especially for users with command-line tools.  On the other -hand, if the underlying filesystem is case-sensitive, the extracted files may -not actually collide, so it may not make sense for all applications to treat -this as an iron-clad rule.

When comparing names, be sure to take -the filesystem separator character into account.  "foo:bar" could -be a simple filename or a partial pathname depending on whether ':' is the -separator.  Two names should be considered identical if each distinct path -component matches, so "foo/bar" and "foo:bar" are identical -if the separators are '/' and ':', respectively.  Comparisons should be -case-insensitive.

Adding/renaming: Applications -should prevent multiple records from having the same filename. -

 

-

Pre-sized or not pre-sized

-

The specification declares that filename threads and comments -use pre-sized buffers.  It does not define what other members of the -message and filename classes are, which makes it difficult to know what to do -with a request to create a heretofore undefined thread type.  The NuFX -format does not provide any definitive clue as to whether a thread is pre-sized, -so such decisions must be based on the thread class and thread kind.

-

Filename threads and comment threads are pre-sized.  All -other threads are not pre-sized (including other members of the -"filename" and "message" classes).

-

 

-

Proper pre-size for filename threads

-

ShrinkIt allocates a 32-byte pre-sized buffer for the -filename.  If the filename is larger than 32 bytes, the buffer grows to fit -the filename exactly.  If renaming files is considered useful, then the -buffer should always be slightly larger than is needed to hold the -filename.  (Filenames longer than 32 characters are most likely the result -of nested directories, so renaming the file itself is inhibited if the buffer -length is an exact match.) -

Side note: GSHK appears to have a bug where it can't deal with -32-byte HFS filenames (e.g. "foo:abcdefghijabcdefghijabcdefghijxy" -can't be added to an archive).  Emulating this behavior is discouraged. -

Creating: If GS/ShrinkIt compatibility is not important, -all filenames should have at least 8 bytes of free space in the filename thread.  -For GSHK compatibility, the filename thread compThreadEOF must be the greater of -32 and the filename length.

-

Renaming: It is acceptable to have fewer than 8 bytes of -free space remaining after a file is renamed.  However, if the filename -itself exceeds the buffer size and the thread must be rebuilt, the 8-byte -padding should be added.

-

 

-

Thread ordering

-

The NuFX specification does not require that threads appear in -any particular order.  However, writing them in a certain order can make -some operations significantly easier.

-

For example, if an archive is being unpacked as it is received, -it is important to know the filename before receiving the data.  If the -filename thread comes after the data threads, the application has to write the -incoming data into a temp file, and then rename it later when the filename -thread finally shows up.  It would also be nice to be able to display file -comments as the file is being downloaded.

-

Creating: The filename thread must precede all other -threads.  The recommended (but not required) ordering for common thread -types is:

-
    -
  • -

    Filename

  • -
  • -

    Message(s) (i.e. comments)

  • -
  • -

    Data fork

  • -
  • -

    Disk image

  • -
  • -

    Resource fork

  • -
  • -

    all other threads

  • -
-

Extracting: If the filename thread does not appear before -the first data-class thread, the record may be ignored.

-

 

-

Incompatible thread types

-

There are some combinations of threads that must never appear in -a single record.

-

Creating:

-
    -
  • -

    If a data fork is present, the record must not - contain another data fork or a disk image.

  • -
  • -

    If a resource fork is present, the record must not - contain another resource fork or a disk image.

  • -
  • -

    If a disk image is present, the record must not - contain another disk image, a data fork, or a resource fork.

  • -
  • -

    If a control-class thread is present, the record must - not contain any data-class threads.

  • -
-

Extracting: When incompatible threads are found, they -should be ignored in favor of the earlier threads.  For example, if two -data forks are found in the same record, only the first one should be extracted.  -If a data-class thread is found first, subsequent control-class threads should -be ignored, and vice-versa.

-

 

-

Compressed threads

-

Some threads are compressed, some aren't.  The -specification isn't very specific.

-

All data-class threads may be compressed.  All other -classes of threads must not be compressed.

-

 

-

ProDOS storage type

-

The ProDOS storage type has little meaning on most -systems.  However, certain values are significant.

-
    -
  • -

    For records with only a data fork, the storage type - must be one of 0, 1, 2, or 3.  The value "2" is recommended - for applications that don't wish to mimic ProDOS behavior exactly.

  • -
  • -

    For records with a resource fork, the storage type - must be "5" (ProDOS extended file).

  • -
  • -

    For records with a disk image thread, the storage - type must be equal to the disk block size (typically 512).

  • -
  • -

    For records without data-class threads, the storage - type must be "0".

  • -
-

Storage type 0x0d, which is used by ProDOS for directories, must -not be used.

-

It is important to update the storage type as threads are added -and deleted, so that it always accurately reflects the contents of the record.

-

 

-

Disk block size and block count

-

For a compressed disk image, the "storage_type" and -"extra_type" fields take on a different meaning, notably the block -size (typically 512) and block count (e.g. 280 for a 140K disk) of the disk.

-

These fields are more important than you might expect, because -some older versions of ShrinkIt would set the thread EOF to a strange value like -68096 (which, curiously enough, is 133 * 512).  These same versions of -ShrinkIt tended to leave the "storage_type" set to 2.  -Apparently, ShrinkIt just used extra_type * 512 as the uncompressed size when -trying to figure out what sort of disk it had.  An early version of -GS/ShrinkIt went one step further: it used a block count of 280 with a block -size of 256, resulting in archives that apparently held 70K disk images.

-

It is simple enough to disregard the thread EOF value, and -replace the storage_type when it is absurdly small, but there is a deeper -problem.  If you delete a 140K disk image thread and replace it with an -800K disk image thread, the block count stored in the extra_type no longer -accurately reflects the contents of the record.  (This linkage between the -record header and the thread contents is the reason why this document forbids -mixing of disk image threads with any other data-class thread, including other -disk images.)

-

Creating: Applications must update the extra_type -whenever a disk image thread is added.  The value (storage_type * -extra_type) must be equal to the uncompressed size.  The application may -wish to reject threads that are not a multiple of 512 bytes.

-

Extracting: The application must normalize storage_type -to 512 if it is less than 16 (0x0f is the largest possible ProDOS storage -type).  The value storage_type * extra_type must then be used as the -uncompressed size.  If the uncompressed size is zero, the thread may be -ignored.

-

 

-

Access permissions

-

NuFX supports four boolean access permission flags (read, write, -destroy, rename) and two boolean attributes (backup needed, invisible) in the -"access" field.  This matches up with ProDOS capabilities nicely, -but very few other operating systems support all six.

-

Applications authors should consider the following approaches:

-
    -
  1. -

    Preserve all.  All flags in the access field - must be preserved.  It is not required that the extracted files obey - the original semantics -- an "invisible" file might be visible, - and a file with "rename" disabled might still be rename-able -- - but when the files are re-added, the permissions must match.

  2. -
  3. -

    Locked/unlocked.  A file with read enabled, and - write, destroy, rename, and invisible disabled, is considered - "locked" (access 0x01 or 0x21).  All other files are - considered "unlocked".  When a file is extracted and then - added to an archive, the locked/unlocked status must be preserved.  - Locked files are added with access 0x21, and unlocked files are added with - access 0xe3.

  4. -
-

It is acceptable for an application to find a middle ground -between these two, and preserve more of the flags accurately than approach #2 -does, but approach #2 should be considered the minimum acceptable level of -support.

-

 

-

Empty directories

-

Directories do not need to be stored explicitly unless they are -empty.  The NuFX specification manages to avoid describing how directories -are actually supposed to be stored, saying only: "A Thread Record must exist to inform a utility that a directory is to -be created through the use of the proper control_thread value."

-

What is in a "create directory" control thread?  -It appears that the intent was to have the thread contain the pathname that -needed to be created.  In theory, you could have several of these things, -and create an entire hierarchy from a single record.  Such threads should -not be compressed, but their compThreadEOF should always match their threadEOF -(i.e. they're not pre-sized).

-

It's a little tricky to say, "add a control thread whenever -you find a directory with nothing in it".  What if the directory has -files in it, but you don't have the access permissions necessary to read the -files?

-

Does such a record require a filename?  Probably not.  -However, if it doesn't have a filename, ShrinkIt might not display the record, -and you'd have no way to manipulate it.  Adding a "record label" -is easy and useful.

-

(I'm strongly tempted to punt on the control threads and just -use storage type 0x0d to indicate that a directory should be created.  This -is in direct opposition to the NuFX specification, however, so I'm reluctant to -do so.)

-

Creating: Applications not interested in preserving empty -directories need do nothing.  Otherwise, the application must add a -"create directory" control thread whenever a directory is encountered -for which no files are added to the archive.

-

Extracting: A directory must be created when a control -thread is present.  As noted in the NuFX specification, the application -must also create any directories listed in the record's pathname that don't yet -exist.

-

 

-

Message thread format

-

The specification says that message threads are ASCII text, but -doesn't specify an EOL character.  For the benefit of Apple II utilities, -it's best to use a carriage return (ctrl-M).  The comments are expected to -be readable on 8-bit Apple IIs, so plain ASCII rather than Mac OS Roman should -be used.

-

Creating: Convert any EOL markers to CR, and any -non-ASCII characters (i.e. bytes with the high bit set) to ASCII.

-

Extracting: Assume that the comment may be using CR, LF, -or CRLF, and convert as needed for display.  GS/ShrinkIt used a -proportional font, so there is no need to worry about formatting to preserve "ASCII art" in -comments.

-

 

- -

GS/OS option lists and HFS file types

-

Files on HFS volumes have two four-byte values, called file type and -creator, that identify the file contents. These are part of the -Macintosh Finder info structures, called FInfo and FXInfo. -Files copied from HFS to ProDOS may have this data stored in the extended -key block of a forked file (see ProDOS technical note #25). This appears -as two 18-byte chunks, consisting of a size byte followed by a type -byte, and then 16 bytes of FInfo or FXInfo data. -To expose the data to applications, GS/OS returns an "option list" -with the contents on certain calls. Most of the fields are uninteresting -to anything but the Mac Finder, so the option list may be viewed simply -as a way to preserve the file type and creator.

- -

GS/ShrinkIt tries to record this data, but doesn't entirely succeed. A -file archived from HFS will have a 36-byte option list in the record, but -with the size/type bytes removed, and some extra junk near the end. In some -archives it appears to drop some of the data without altering the size, -e.g. the size field says 36 bytes, but there's only space for 18 bytes -in the record header.

-

Unfortunately, when archiving files from an HFS volume under GS/OS, -GSHK records the ProDOS type/auxtype rather than the full HFS file type -and creator (likely because that's what GS/OS provides). The only way to -recover the original Finder types is through the malformed option list.

-

Side note: the NuFX specification reversed the values of MFS and HFS -in the file_sys_id enumeration. In practice, GS/ShrinkIt -correctly uses the GS/OS FST definitions: MFS=5, HFS=6.

-

Opening: Assume the option_size field is correct -unless it exceeds attrib_count-2. If it's too large, clip it down to size. -If the filesystem type is ProDOS or HFS, and the first 8 bytes look like -ASCII, use the first 4 bytes of the option list data as the file type and -the second 4 bytes as the creator.

-

Updating: Always use the actual size. Do not -propagate incorrect values. Retaining option lists for ProDOS and HFS -entries is required, since that may have the only record of the original -file type and creator. Updates to the archive attributes that alter -the file/aux type should modify the values in the record and delete the -option list, or provide a way to edit the option list independently.

- -

 

-

Master EOF

-

For the most part, ShrinkIt correctly sets the MasterEOF field -in the Master Header block.  A very old version of ShrinkIt left it set to -zero (this is the same version that completely omitted the filename for DOS 3.3 -disk images).  GS/ShrinkIt appears to initialize it to 48 (the size of the -MH block), and if the creation process is interrupted you can end up with a -partial archive with a nonzero EOF.

-

Opening: Accept a MasterEOF of zero, but reject a -MasterEOF of 48.  Don't assume the MasterEOF is accurate.

-

Updating: Applications must write the correct MasterEOF -value if an archive is modified.

-
-

Extensions

-

Unofficial extensions to the NuFX specification.  Anyone -working with NuFX archives should take heed.

-

New compression formats

-

Thread formats 0x0000 through 0x0005 are already defined.  The -following thread format values have been added:

-
    -
  • -

    0x0006 - deflate.  The thread contains data conforming - to RFC 1951 (deflate1.3 specification).  A more practical way of - putting it is it contains exactly the data that zlib v1.1.4 outputs.  Visit http://www.zlib.org/ - for more details.

  • -
  • -

    0x0007 - bzip2.  The thread contains BWT+Huffman - compressed data as output by Julian Seward's "libbz2" v1.0.2.  Visit - http://sources.redhat.com/bzip2/ - for more information.

  • -
-

Support for these formats is nonexistent on the Apple II, so -they should not be used except in situations where compatibility is unimportant -(e.g. collections of disk archives for use with A2 emulators).

- -

I found that "deflate" generally does as well or -better than "bzip2" on Apple II binaries, disk images, and small text -files.  Deflate is also faster and uses less memory, and you're more likely -to find libz installed on a given system than you are libbz2  For these -reasons, use of deflate should be encouraged in favor of bzip2.

- -
-

NuFX Quirks

-

This section identifies some quirks in NuFX or ShrinkIt that, -while not bugs, are worth noting.

-

Filename separator character

-

Originally, the filename was stored in the record header, so it -made sense that the filename separator character ("fssep char") should -also be there.  When the filenames were moved into threads, the fssep char -got left behind.  If a record has two filenames, they'd better have the -same fssep char, or interpreting one of them will be impossible.  (This is -one of the reasons why it's important to clearly define which filename takes -precedence in all circumstances.)

-

Files with zero or two CRCs

-

The "threadCRC" field in the thread header block can -have one of three meanings: nothing (v0, v1), the CRC of the compressed data -(v2), or the CRC of the uncompressed data (v3).  The version 2 meaning -wasn't used in anything significant, and can be ignored.

-

Version 1 records generally have threads compressed with LZW/1 -data.  The LZW/1 compression format includes a 16-bit CRC at the start of -the thread.  Version 3 records generally have threads compressed with LZW/2 -data, which does not include a CRC.

-

Applications like P8 ShrinkIt and NuLib creation v1 records and -compress with LZW/1, while GS/ShrinkIt and NuLib2 create v3 records and compress -with LZW/2.  This means that each compressed thread has exactly one CRC.  -So what happens if you tell NuLib2 to create a new record with -LZW/1, or tell it to add a new LZW/2 thread to an existing v1 record?

-

In one case, you end up with two CRCS; in the other, you end up -with no CRC on your data at all.  For some bizarre reason, the v3 thread -CRC is computed with a different initial value, so it is necessary to compute -the CRC twice, not merely store the same value twice.

-

Please select your compression methods appropriately.  -Also, bear in mind that uncompressed data stored with P8 ShrinkIt has no CRC -whatsoever.

-

Extra data in compressed threads

-

ShrinkIt adds an extra byte at the end of all LZW compressed -data, probably due to an off-by-one bug in the compression code.  It turns -out that it's possible to get even more "extra" bytes at the end.

-

ShrinkIt's LZW-I algorithm always operates on a 4K buffer, -largely because it was originally designed for compressing 5.25" disks with -4K tracks.  -On small files, or at the end of a large one, the last bit of data is padded out -to 4K and then compressed.  Ordinarily this is barely noticeable, because -the compression routines do an RLE (Run-Length Encoding) pass before applying -LZW.

-

However, if both RLE and LZW fail to make the 4K block any -smaller, it is stored without compression.  This means the whole 4K, -complete with padding, gets written to the archive.  This doesn't cause any -problems, but can make you wonder where all the extra bits came from.

-

The SQ compression algorithm, as implemented by Don Elton's SQ3, -appears to add an extra 0xff to the end of the compressed data.  It can -safely be ignored.

-

Preserving BXY and SEA wrappers

-

Preserving BXY wrappers is pretty easy, since the Binary II -format is well documented.  Updating block counts and file lengths is all -that is required.

-

Preserving SEA wrappers is a little harder, since (as far as I -can tell) there is no documentation on the format.  A little -experimentation shows that the SEA header is always 12005 bytes long, and the -only part that changes from file to file is a short piece right before the NuFX -archive begins.

-

It is necessary to update the file length in three different -places, all right next to each other, one of which is offset by 64 bytes.  -I would guess the header allows for more than one archive to be present, but -since such things have never actually been created, the possibility can be -ignored.

-

Y2K

-

The NuFX standard says that the Date/Time format is the same as -that returned by the IIgs ReadTimeHex toolbox call.  That call returns the -year as (year - 1900), so the year 2000 is stored as "100".  -ProDOS 8 clock drivers, on the other hand,  return 40-99 for 1940-1999, and -0-39 for 2000-2039.  As a result, archives created with P8 ShrinkIt use 0 -for the year 2000 instead of 100.

-

When creating archives, always use 100 for the year 2000, but -also accept the year 0.  However, if you find a Date/Time with zero in all -useful fields (second, minute, hour, day, month, year), treat it as an -unspecified date rather than midnight of January 1, 2000.

-
-

This document is Copyright © 2000-2004 by Andy -McFadden.  All Rights Reserved.

-

The latest version can be found on the NuLib web site at -http://www.nulib.com/.

- - - + + + + + NuFX Addendum + + + + + + + + + + +
+ +

NuFX Addendum
+Home ] Up ] [ NuFX Addendum ] ProDOS Attribute Preservation ]

+
+ +
+ + + + +
NuFX Addendum - By Andy McFadden - Last revised 2022/05/28
+

This addendum clarifies and extends certain aspects of the NuFX +specification.  This is not an "official" modification +of the original document - it has not been reviewed and approved by +the original author - but anyone developing NuFX utilities would do +well to follow these recommendations.

+ +

Purpose

+

The NuFX specification defines a very loose structure, and +leaves much to the imagination of the implementer.  For example, "If a +utility finds a redundancy in a Thread Record, it must decide whether to skip +the record or to do something with that particular thread...".  +A strict specification would define a standard approach that all applications +must follow when dealing with the anomalous condition, to ensure consistent +handling of all archives.

+

This document refines the NuFX specification and brings some of +the "fuzzy" areas into sharper focus.  Nothing in this document +contravenes the original document.

+

In the text below, "must" is an imperative that +has to be obeyed, and "should" is a recommendation that authors +are strongly encouraged to follow.

+
+

Clarifications

+

Pronunciation

+

What's the correct way to pronounce "NuFX"?  The +specification doesn't say.  There are two basic camps, letter-by-letter +("en you eff ecks") and minimal-syllable ("new fix" or +"new fuchs").  I don't recall how Andy Nicholas says it, so let +it be "new fix".

+

 

+

Use of ".SDK" suffix

+

Originally, only ".SHK" was used to represent a NuFX +archive.  Over time, a convention of using ".SDK" to represent +archives with a single disk image in them has arisen.  This is very +convenient for emulators on systems that rely on the file extension (e.g. +Windows), so use of ".SDK" is encouraged.

+

 

+

Archives with no records

+

An archive without records, i.e. nothing but a master header +block, serves no purpose.

+

Creating: Archives without any records in them must never +be created.  All archives must have at least one record.

+

Opening: If asked to open a record-less archive, +the application should recognize that the archive is empty and proceed as if it +were a new archive.

+

Modifying: If all records in an archive are deleted, the +archive file must be deleted as well.

+ +

 

+

Records with no threads

+

A record without threads is pretty pointless.

+

Creating: Records without threads must never be +created.  All records must have at least one thread.

+

Extracting: Empty records should be ignored.

+

 

+

Records with only a filename thread

+

GS/ShrinkIt v1.1 has a bug that prevents it from creating an empty +data thread when asked to add a zero-byte file.  This results in a thread +with a filename and nothing else.  (If it was the first new record added, +it will have an empty comment thread as well.)

There is no valid +reason for deliberately creating such a file. +

Creating: Records composed solely of a filename thread +must not be created. +

Extracting: Records with nothing but a filename thread +should be ignored.  For GSHK v1.1 bug compatibility: if a record has a filename +thread, and no other threads except "message" threads (i.e. no data +threads or control threads), then a zero-byte data fork file should be +created.  Otherwise, the record should be ignored.  If the ProDOS +storage type field indicates an extended file, a zero-byte resource fork should +also be created. +

  +

Records with no filename

+

A record without a filename thread is a curious beast.  +Ideally, there wouldn't be any such thing as a filename thread, since it doesn't +really make sense to have a record without one.  Expanding the record +header to hold a pre-sized buffer would've made many things simpler.

This +particular situation occurred with older versions of ShrinkIt (e.g. v1.1) that failed to store +a volume name when compressing a DOS 3.3 disk.  There was no filename in +the record header, nor one in a filename thread.

The only +situation where a record without a filename makes sense is if the record holds +nothing but comments or other archive "meta data", such as a +"create directory" control thread.

Creating: +Records without filenames must not be created, unless the record is intended +to contain +nothing but archive meta-data.  Deletion of the filename thread should only +be done if a new filename thread is being added.  If data threads are added +to a record without a filename, then a filename thread must be added as well.

Extracting: +If the record contains file data, the application may either prompt the user for +a filename to use, or supply a generated one.

  +

Records with more than one filename thread

+

This is an unusual situation that should only arise if an +application is buggy.  Every record created by a modern application should +have no more than one filename thread.

Creating: Records +with multiple filename threads must not be created.

Extracting: + Applications must use the first filename thread.  If a buggy application wants to +append an additional filename thread, their buggy filename will be ignored.

 

Records +with filenames in two places

+

The old way of storing filenames, used by NuLib and old versions of ShrinkIt, was to +put the filename in the record header.  To facilitate renaming, the +filename was moved into a thread.  Thus, there are two possible locations +where the filename may live, and no guarantee that only one will be used..

Creating: +Never put the filename in the record header when creating a new record.  +It's okay to leave existing records alone, but if an application has the +opportunity to rewrite the record header, the record filename must be removed.

Extracting: +The thread filename takes precedence over the record header filename.

Filename character set

+

Filenames in NuFX archives use the Mac OS Roman character set, +which is ASCII plus some symbols and the usual set of latin language characters +(see +Unicode definition).  The NuFX filename definition was intended to +accommodate files from HFS volumes, which may contain any character except ':'.  +Control characters, including NUL ('\0'), were allowed but discouraged.

+On modern systems, converting between Mac OS Roman and Unicode is useful and +(mostly) straightforward.  Dealing with embedded null bytes is very +annoying in C-like languages though.

Creating: +Convert Unicode to Mac OS Roman, replacing any untranslatable characters with +'?'.  Embedded nulls must be replaced with '?'.

+

Extracting: Convert Mac OS Roman to Unicode.  If embedded nulls +are encountered, they should be replaced with something appropriate for the +current system.  Applications are allowed to ignore the problem and +truncate the filename, but must be prepared to handle duplicate or empty +filenames.

+ +

 

+

File system separator characters

+

Every record header has a "file system separator" +character ("fssep") in the "file_sys_info" word.  This +is usually something like ':' for GS/OS or '/' for UNIX.  It's necessary to +know what the separator is in order to break a pathname down into its individual +components.

+

Not all filesystems support subdirectories, however, +which means that not all filenames need to have a separator.  The +appropriate separator character for such a filesystem is not defined in the NuFX +spec.  Clearly it should be something illegal on the source filesystem, or +we could inadvertently see pathnames where they don't exist (e.g. a file called +"foo:bar" on DOS 3.3 if the fssep char were set to ':').

+

The trouble is, DOS 3.3 doesn't actually have any illegal characters, just a field +of 30 characters padded with spaces.  Pascal disks are similar.  Since +we must define an fssep for every filename, our best choice is to use '\0' +(0x00), because it's unlikely to occur, and any program that stores names in C +strings will find it awkward to store and scan for '\0'.

+

This situation also applies to archived disk images, which must +be simple filenames.

+

The application should have some understanding of which filesystems +have subdirectories and which don't, which would allow it to disregard the +fssep char when it can't be relevant for a record, but it's easier to +let the fssep char's usefulness be self-evident.

+

(NOTE: +as of v2.0.3, NufxLib rejects 0x00 as an fssep character.  This is a bug.)

+ +

Creating: When adding files directly from filesystems without subdirectories, use 0x00 as +the fssep char.

+

Extracting: An fssep char of 0x00 means +the pathname is just the filename.

+ +

 

+

Disk image pathnames

+

While files may have multiple path components (e.g. +"subdir:subdir2:filename"), it makes no sense for disk images to have +them.  The stored filename for a disk is either the disk's ProDOS volume +name, or for non-ProDOS disks, a simple label defined by the user.  Since +the eventual target is a disk device, specifying a subdirectory path makes no +sense.

+

The issue becomes a little more confusing when storage of +disk images used for emulators is considered.  At first glance, it seems +useful to be able to store a hierarchy of disk images.  In practice, such +images would either be archived as a hierarchy of .PO files, or as an archive of +.SDK archives.

+

Adding/renaming Applications must +strip any leading path components from disk image "storage names".  +(The NuFX specification does explicitly forbid the use of a filesystem separator +character in a disk volume name.)

+

Extracting: +Applications extracting directly to a disk must strip leading path components +before assigning the ProDOS volume name.  Applications extracting images to +a file don't need to do anything unusual. + +

 

+

Filename case sensitivity

+

There isn't a "filename is case-sensitive" flag in +NuFX archives.  Since it was designed primarily for ProDOS and HFS +filesystems, neither of which is case-sensitive, we should assume that case is +not meant to be significant when determining whether two records have the same +filename.  This becomes important when adding files (to test for +duplicates), extracting files by name, and when attempting +to display archive contents as a hierarchical tree.

+

Applications +should try to recognize that "foo/bar", "foo/BAR", and +"FOO/bar" are the same file, but it's probably not worth +"probing" a case-sensitive filesystem like Linux ext2 to guarantee +such. + +

 

Duplicate filenames

+

There is nothing in the NuFX specification that prevents having +more than one file with the same name in an archive.  In practice, this is +inconvenient, especially for users with command-line tools.  On the other +hand, if the underlying filesystem is case-sensitive, the extracted files may +not actually collide, so it may not make sense for all applications to treat +this as an iron-clad rule.

When comparing names, be sure to take +the filesystem separator character into account.  "foo:bar" could +be a simple filename or a partial pathname depending on whether ':' is the +separator.  Two names should be considered identical if each distinct path +component matches, so "foo/bar" and "foo:bar" are identical +if the separators are '/' and ':', respectively.  Comparisons should be +case-insensitive.

Adding/renaming: Applications +should prevent multiple records from having the same filename. +

 

+

Pre-sized or not pre-sized

+

The specification declares that filename threads and comments +use pre-sized buffers.  It does not define what other members of the +message and filename classes are, which makes it difficult to know what to do +with a request to create a heretofore undefined thread type.  The NuFX +format does not provide any definitive clue as to whether a thread is pre-sized, +so such decisions must be based on the thread class and thread kind.

+

Filename threads and comment threads are pre-sized.  All +other threads are not pre-sized (including other members of the +"filename" and "message" classes).

+

 

+

Proper pre-size for filename threads

+

ShrinkIt allocates a 32-byte pre-sized buffer for the +filename.  If the filename is larger than 32 bytes, the buffer grows to fit +the filename exactly.  If renaming files is considered useful, then the +buffer should always be slightly larger than is needed to hold the +filename.  (Filenames longer than 32 characters are most likely the result +of nested directories, so renaming the file itself is inhibited if the buffer +length is an exact match.) +

Side note: GSHK appears to have a bug where it can't deal with +32-byte HFS filenames (e.g. "foo:abcdefghijabcdefghijabcdefghijxy" +can't be added to an archive).  Emulating this behavior is discouraged. +

Creating: If GS/ShrinkIt compatibility is not important, +all filenames should have at least 8 bytes of free space in the filename thread.  +For GSHK compatibility, the filename thread compThreadEOF must be the greater of +32 and the filename length.

+

Renaming: It is acceptable to have fewer than 8 bytes of +free space remaining after a file is renamed.  However, if the filename +itself exceeds the buffer size and the thread must be rebuilt, the 8-byte +padding should be added.

+

 

+

Thread ordering

+

The NuFX specification does not require that threads appear in +any particular order.  However, writing them in a certain order can make +some operations significantly easier.

+

For example, if an archive is being unpacked as it is received, +it is important to know the filename before receiving the data.  If the +filename thread comes after the data threads, the application has to write the +incoming data into a temp file, and then rename it later when the filename +thread finally shows up.  It would also be nice to be able to display file +comments as the file is being downloaded.

+

Creating: The filename thread must precede all other +threads.  The recommended (but not required) ordering for common thread +types is:

+
    +
  • Filename
  • +
  • Message(s) (i.e. comments)
  • +
  • Data fork
  • +
  • Disk image
  • +
  • Resource fork
  • +
  • all other threads
  • +
+

Extracting: If the filename thread does not appear before +the first data-class thread, the record may be ignored.

+

 

+

Incompatible thread types

+

There are some combinations of threads that must never appear in +a single record.

+

Creating:

+
    +
  • If a data fork is present, the record must not + contain another data fork or a disk image.
  • +
  • If a resource fork is present, the record must not + contain another resource fork or a disk image.
  • +
  • If a disk image is present, the record must not + contain another disk image, a data fork, or a resource fork.
  • +
  • If a control-class thread is present, the record must + not contain any data-class threads.
  • +
+

Extracting: When incompatible threads are found, they +should be ignored in favor of the earlier threads.  For example, if two +data forks are found in the same record, only the first one should be extracted.  +If a data-class thread is found first, subsequent control-class threads should +be ignored, and vice-versa.

+

 

+

Compressed threads

+

Some threads are compressed, some aren't.  The +specification isn't very specific.

+

All data-class threads may be compressed.  All other +classes of threads must not be compressed.

+

 

+

ProDOS storage type

+

The ProDOS storage type has little meaning on most +systems.  However, certain values are significant.

+
    +
  • +

    For records with only a data fork, the storage type + must be one of 0, 1, 2, or 3.  The value "2" is recommended + for applications that don't wish to mimic ProDOS behavior exactly.

  • +
  • +

    For records with a resource fork, the storage type + must be "5" (ProDOS extended file).

  • +
  • +

    For records with a disk image thread, the storage + type must be equal to the disk block size (typically 512).

  • +
  • +

    For records without data-class threads, the storage + type must be "0".

  • +
+

Storage type 0x0d, which is used by ProDOS for directories, must +not be used.

+

It is important to update the storage type as threads are added +and deleted, so that it always accurately reflects the contents of the record.

+

 

+

Disk block size and block count

+

For a compressed disk image, the "storage_type" and +"extra_type" fields take on a different meaning, notably the block +size (typically 512) and block count (e.g. 280 for a 140K disk) of the disk.

+

These fields are more important than you might expect, because +some older versions of ShrinkIt would set the thread EOF to a strange value like +68096 (which, curiously enough, is 133 * 512).  These same versions of +ShrinkIt tended to leave the "storage_type" set to 2.  +Apparently, ShrinkIt just used extra_type * 512 as the uncompressed size when +trying to figure out what sort of disk it had.  An early version of +GS/ShrinkIt went one step further: it used a block count of 280 with a block +size of 256, resulting in archives that apparently held 70K disk images.

+

It is simple enough to disregard the thread EOF value, and +replace the storage_type when it is absurdly small, but there is a deeper +problem.  If you delete a 140K disk image thread and replace it with an +800K disk image thread, the block count stored in the extra_type no longer +accurately reflects the contents of the record.  (This linkage between the +record header and the thread contents is the reason why this document forbids +mixing of disk image threads with any other data-class thread, including other +disk images.)

+

Creating: Applications must update the extra_type +whenever a disk image thread is added.  The value (storage_type * +extra_type) must be equal to the uncompressed size.  The application may +wish to reject threads that are not a multiple of 512 bytes.

+

Extracting: The application must normalize storage_type +to 512 if it is less than 16 (0x0f is the largest possible ProDOS storage +type).  The value storage_type * extra_type must then be used as the +uncompressed size.  If the uncompressed size is zero, the thread may be +ignored.

+

 

+

Access permissions

+

NuFX supports four boolean access permission flags (read, write, +destroy, rename) and two boolean attributes (backup needed, invisible) in the +"access" field.  This matches up with ProDOS capabilities nicely, +but very few other operating systems support all six.

+

Applications authors should consider the following approaches:

+
    +
  1. +

    Preserve all.  All flags in the access field + must be preserved.  It is not required that the extracted files obey + the original semantics -- an "invisible" file might be visible, + and a file with "rename" disabled might still be rename-able -- + but when the files are re-added, the permissions must match.

  2. +
  3. +

    Locked/unlocked.  A file with read enabled, and + write, destroy, rename, and invisible disabled, is considered + "locked" (access 0x01 or 0x21).  All other files are + considered "unlocked".  When a file is extracted and then + added to an archive, the locked/unlocked status must be preserved.  + Locked files are added with access 0x21, and unlocked files are added with + access 0xe3.

  4. +
+

It is acceptable for an application to find a middle ground +between these two, and preserve more of the flags accurately than approach #2 +does, but approach #2 should be considered the minimum acceptable level of +support.

+

 

+

Empty directories

+

Directories do not need to be stored explicitly unless they are +empty.  The NuFX specification manages to avoid describing how directories +are actually supposed to be stored, saying only: "A Thread Record must exist to inform a utility that a directory is to +be created through the use of the proper control_thread value."

+

What is in a "create directory" control thread?  +It appears that the intent was to have the thread contain the pathname that +needed to be created.  In theory, you could have several of these things, +and create an entire hierarchy from a single record.  Such threads should +not be compressed, but their compThreadEOF should always match their threadEOF +(i.e. they're not pre-sized).

+

It's a little tricky to say, "add a control thread whenever +you find a directory with nothing in it".  What if the directory has +files in it, but you don't have the access permissions necessary to read the +files?

+

Does such a record require a filename?  Probably not.  +However, if it doesn't have a filename, ShrinkIt might not display the record, +and you'd have no way to manipulate it.  Adding a "record label" +is easy and useful.

+

(I'm strongly tempted to punt on the control threads and just +use storage type 0x0d to indicate that a directory should be created.  This +is in direct opposition to the NuFX specification, however, so I'm reluctant to +do so.)

+

Creating: Applications not interested in preserving empty +directories need do nothing.  Otherwise, the application must add a +"create directory" control thread whenever a directory is encountered +for which no files are added to the archive.

+

Extracting: A directory must be created when a control +thread is present.  As noted in the NuFX specification, the application +must also create any directories listed in the record's pathname that don't yet +exist.

+

 

+

Message thread format

+

The specification says that message threads are ASCII text, but +doesn't specify an EOL character.  For the benefit of Apple II utilities, +it's best to use a carriage return (ctrl-M).  The comments are expected to +be readable on 8-bit Apple IIs, so plain ASCII rather than Mac OS Roman should +be used.

+

Creating: Convert any EOL markers to CR, and any +non-ASCII characters (i.e. bytes with the high bit set) to ASCII.

+

Extracting: Assume that the comment may be using CR, LF, +or CRLF, and convert as needed for display.  GS/ShrinkIt used a +proportional font, so there is no need to worry about formatting to preserve "ASCII art" in +comments.

+

 

+ +

GS/OS option lists and HFS file types

+

Files on HFS volumes have two four-byte values, called file type and +creator, that identify the file contents. These are part of the Macintosh +Finder info structures, called FInfo and FXInfo. Files copied from HFS +to ProDOS may have this data stored in the extended key block of a forked +file (see ProDOS technical note #25). This appears as two 18-byte chunks, +consisting of a size byte followed by a type byte, and then 16 bytes of +FInfo or FXInfo data (which are defined in Inside Macintosh: Macintosh +Toolbox Essentials, page 7-47). To expose the data to applications, +certain GS/OS calls pass an "option list" with the contents. Most of +the fields are uninteresting to anything but the Mac Finder, so for our +purposes the option list may be viewed simply as a way to preserve the +file type and creator.

+ +

Experiments with the GS/OS exerciser reveal that the option list doesn't +include the size/type bytes. For an HFS file copied to ProDOS with GS/OS, +the GetFileInfo call returns a 32-byte buffer that begins with FInfo. +When called on an HFS volume, the option list is 36 bytes, with the last +four bytes set to 02 00 00 00. GSHK appears to record these exactly as +it receives them, which means the first four bytes hold the HFS file type, +and the second four bytes hold the HFS creator. Because most of the +fields only have meaning to the Macintosh finder, the rest of the data is +zeroes. Files archived from an HFS volume created by a Macintosh would +presumably have nonzero data in more places.

+ +

Sometimes the option list is a little messed up, e.g. the size field +says 36 bytes, but there's only space for 18 bytes in the record header.

+

Unfortunately, when archiving files from an HFS volume under GS/OS, +GSHK records the ProDOS type/auxtype rather than the full HFS file type +and creator (likely because that's what GS/OS provides). The only way to +recover the original Mac Finder types is through the option list.

+

Side note: the NuFX specification reversed the values of MFS and HFS +in the file_sys_id enumeration. In practice, GS/ShrinkIt +correctly uses the GS/OS FST definitions: MFS=5, HFS=6.

+

Opening: Assume the option_size field is correct +unless it exceeds attrib_count-2. If it's too large, clip it down to size. +If the filesystem type is ProDOS or HFS, and the second 4 bytes look like +ASCII, use the first 4 bytes of the option list data as the file type and +the second 4 bytes as the creator.

+

Creating: For broadest compatibility it would be best to store +a ProDOS type/auxtype in the record header, and generate a filesystem-specific +option list with the HFS Finder types (i.e. 32 bytes for ProDOS or 36 +bytes for HFS). For an HFS record, simply using the four-byte HFS types +is allowed.

+

Updating: Always output the actual record size. Do not propagate +incorrect size values. Retaining option lists for ProDOS and HFS entries +is required, since they may have the only copy of the original file type +and creator. Updates to the archive attributes that alter the file/aux +type should usually retain the option list, since the purpose may be to +improve ProDOS usability without losing the original type information. +However, applications are allowed to delete the option list in this case, +especially if it is storing the full HFS types in the record.

+ +

 

+

Master EOF

+

For the most part, ShrinkIt correctly sets the MasterEOF field +in the Master Header block.  A very old version of ShrinkIt left it set to +zero (this is the same version that completely omitted the filename for DOS 3.3 +disk images).  GS/ShrinkIt appears to initialize it to 48 (the size of the +MH block), and if the creation process is interrupted you can end up with a +partial archive with a nonzero EOF.

+

Opening: Accept a MasterEOF of zero, but reject a +MasterEOF of 48.  Don't assume the MasterEOF is accurate.

+

Updating: Applications must write the correct MasterEOF +value if an archive is modified.

+
+

Extensions

+

Unofficial extensions to the NuFX specification.  Anyone +working with NuFX archives should take heed.

+

New compression formats

+

Thread formats 0x0000 through 0x0005 are already defined.  The +following thread format values have been added:

+
    +
  • +

    0x0006 - deflate.  The thread contains data conforming + to RFC 1951 (deflate1.3 specification).  A more practical way of + putting it is it contains exactly the data that zlib v1.1.4 outputs.  Visit http://www.zlib.org/ + for more details.

  • +
  • +

    0x0007 - bzip2.  The thread contains BWT+Huffman + compressed data as output by Julian Seward's "libbz2" v1.0.2.  Visit + http://sources.redhat.com/bzip2/ + for more information.

  • +
+

Support for these formats is nonexistent on the Apple II, so +they should not be used except in situations where compatibility is unimportant +(e.g. collections of disk archives for use with A2 emulators).

+ +

I found that "deflate" generally does as well or +better than "bzip2" on Apple II binaries, disk images, and small text +files.  Deflate is also faster and uses less memory, and you're more likely +to find libz installed on a given system than you are libbz2  For these +reasons, use of deflate should be encouraged in favor of bzip2.

+ +
+

NuFX Quirks

+

This section identifies some quirks in NuFX or ShrinkIt that, +while not bugs, are worth noting.

+

Filename separator character

+

Originally, the filename was stored in the record header, so it +made sense that the filename separator character ("fssep char") should +also be there.  When the filenames were moved into threads, the fssep char +got left behind.  If a record has two filenames, they'd better have the +same fssep char, or interpreting one of them will be impossible.  (This is +one of the reasons why it's important to clearly define which filename takes +precedence in all circumstances.)

+

Files with zero or two CRCs

+

The "threadCRC" field in the thread header block can +have one of three meanings: nothing (v0, v1), the CRC of the compressed data +(v2), or the CRC of the uncompressed data (v3).  The version 2 meaning +wasn't used in anything significant, and can be ignored.

+

Version 1 records generally have threads compressed with LZW/1 +data.  The LZW/1 compression format includes a 16-bit CRC at the start of +the thread.  Version 3 records generally have threads compressed with LZW/2 +data, which does not include a CRC.

+

Applications like P8 ShrinkIt and NuLib creation v1 records and +compress with LZW/1, while GS/ShrinkIt and NuLib2 create v3 records and compress +with LZW/2.  This means that each compressed thread has exactly one CRC.  +So what happens if you tell NuLib2 to create a new record with +LZW/1, or tell it to add a new LZW/2 thread to an existing v1 record?

+

In one case, you end up with two CRCS; in the other, you end up +with no CRC on your data at all.  For some bizarre reason, the v3 thread +CRC is computed with a different initial value, so it is necessary to compute +the CRC twice, not merely store the same value twice.

+

Please select your compression methods appropriately.  +Also, bear in mind that uncompressed data stored with P8 ShrinkIt has no CRC +whatsoever.

+

Extra data in compressed threads

+

ShrinkIt adds an extra byte at the end of all LZW compressed +data, probably due to an off-by-one bug in the compression code.  It turns +out that it's possible to get even more "extra" bytes at the end.

+

ShrinkIt's LZW-I algorithm always operates on a 4K buffer, +largely because it was originally designed for compressing 5.25" disks with +4K tracks.  +On small files, or at the end of a large one, the last bit of data is padded out +to 4K and then compressed.  Ordinarily this is barely noticeable, because +the compression routines do an RLE (Run-Length Encoding) pass before applying +LZW.

+

However, if both RLE and LZW fail to make the 4K block any +smaller, it is stored without compression.  This means the whole 4K, +complete with padding, gets written to the archive.  This doesn't cause any +problems, but can make you wonder where all the extra bits came from.

+

The SQ compression algorithm, as implemented by Don Elton's SQ3, +appears to add an extra 0xff to the end of the compressed data.  It can +safely be ignored.

+

Preserving BXY and SEA wrappers

+

Preserving BXY wrappers is pretty easy, since the Binary II +format is well documented.  Updating block counts and file lengths is all +that is required.

+

Preserving SEA wrappers is a little harder, since (as far as I +can tell) there is no documentation on the format.  A little +experimentation shows that the SEA header is always 12005 bytes long, and the +only part that changes from file to file is a short piece right before the NuFX +archive begins.

+

It is necessary to update the file length in three different +places, all right next to each other, one of which is offset by 64 bytes.  +I would guess the header allows for more than one archive to be present, but +since such things have never actually been created, the possibility can be +ignored.

+

Y2K

+

The NuFX standard says that the Date/Time format is the same as +that returned by the IIgs ReadTimeHex toolbox call.  That call returns the +year as (year - 1900), so the year 2000 is stored as "100".  +ProDOS 8 clock drivers, on the other hand,  return 40-99 for 1940-1999, and +0-39 for 2000-2039.  As a result, archives created with P8 ShrinkIt use 0 +for the year 2000 instead of 100.

+

When creating archives, always use 100 for the year 2000, but +also accept the year 0.  However, if you find a Date/Time with zero in all +useful fields (second, minute, hour, day, month, year), treat it as an +unspecified date rather than midnight of January 1, 2000.

+
+

This document is Copyright © 2000-2004 by Andy +McFadden.  All Rights Reserved.

+

The latest version can be found on the NuLib web site at +http://www.nulib.com/.

+ + +