NufxLib API
Home ] NuLib Downloads ] NuLib Library ] NuLib2 Manual ] [ NufxLib API ] Bugs & Features ]


 
NufxLib v3.0.0 API - By Andy McFadden - Last revised 2015/01/09

Table of contents

Introduction

NuFX, short for "New File Exchange", is a file format developed by Andy Nicholas for archiving files and disks on the Apple II series of computers.  The format was devised in tandem with the development of ShrinkIt, which became the standard archive software for the Apple II soon after it's release in 1989.  NuFX archives usually have filenames that end in ".SHK".

This document describes the API (Application Program Interface) for NufxLib, a library of functions that manipulate NuFX archives.

Good engineering practices dictate that an API should be minimal and complete.  The confusion generated by redundant and overlapping interfaces can be as harmful as an omitted vital feature.  I feel pretty good about the "complete" part, since NufxLib provides a way to do pretty much everything that I can reasonably expect somebody to want to do, but in some cases "minimal" has been swept aside in the name of convenience.  (See the Design Notes section for additional commentary on this topic.)

The NuFX specification is extremely general, and does not explicitly allow or forbid unusual conditions like having a record with two filenames in it.  NufxLib follows the NuFX specification on everything that is spelled out, but restricts some of the undefined behaviors to the subset defined in the NuFX Addendum.

In this document, the term "threads" usually refers to NuFX threads -- structures in the archive -- not CPU threads.

Goals

Not Goals

That explains what I set out to do.  Here's a quick summary of what I accomplished.

Features

The library is protected by copyright, but can be distributed under the terms of theBSD License.  See the file "COPYING-LIB" for full details.

Interface changes from v1.x to v2.x

Some changes were made during the development of NufxLib that broke binary compatibility with version 1.1 of the library.  The changes were:

In addition, a NuTestRecord call was added.

Applications written against v1.x may need to be updated.  Check the NufxLib "samples" directory for examples of programs that use the updated calls.

To make version management easier, v2.x includes the version number in the NufxLib.h header file.  This allows dynamically-linked applications to compare a "compiled" version against a "linked" version.

Interface changes from v2.x to v3.x

This was a major source code cleanup effort, one aspect of which was switching from general C types ("unsigned long") to types with explicit sizes ("uint32_t").  In some cases this caused some compilers to report errors, even though there's a fair chance that binary compatibility wasn't affected.  Since it was an API-breaking change at some level, the major version number was bumped.

The other major API change was the separation of Mac OS Roman and Unicode strings, which were previously blended freely.


NuFX Archive Format Overview

This document assumes that you are already familiar with the NuFX archive format, as described in the Apple II File Type Note for $e0/8002 and the Winter 1990 issue of Call-A.P.P.L.E.  For those unwilling to wade through the technical documentation, here is a quick overview.

A NuFX archive is composed of a Master Header followed by a series of Records.  Each Record is composed of a Record Header and one or more Threads.  The general idea is to store one file per Record.

Each Thread holds a blob of data.  The data can be a data or resource fork of a file, a disk image, a comment, or the filename for the Record.  The Threads are identifed by a "class" and a "kind".  The "class" tells you if it's a data thread, comment, filename, or something else, and the "kind" refines the class.  For example, the resource fork of a file is a data-class thread with a "kind" of 2.

Some Threads, notably filenames and comments, are pre-sized, meaning that the space allocated for them in the archive is larger than what is actually used.  Filenames usually have at least 32 bytes set aside for them, though in practice a simple ProDOS filename will be shorter.  This makes it possible to rename files and update comments without having to reconstruct the archive.

The archive Master Header has only a few bits of information, such as the number of records and the date the archive was created.  Unlike a ZIP archive, NuFX has no central table of contents.  If you want to display the contents of an archive, you have to read the first Record header, pull the filename out (usually by finding and reading a filename Thread), compute the total size of the Record, and seek forward past the data.  Repeat the process with each subsequent record, until you reach the end of the archive. 

The predominant compression algorithm is a slightly modified LZW (Ziv-Lempel-Welch).  It's fast, but not very effective compared to the standard methods used in modern archivers.

API Overview

There are five basic categories of API calls.

ReadOnly calls do not modify the archive in any way.  The operations include things like listing and extracting files.  These can be used on archive files opened read-only or read-write.

StreamingReadOnly calls are a subset of ReadOnly calls that can be made on a streaming archive.  A "streaming" archive is one that cannot be seeked, e.g. an archive being received over a network socket or a pipe from stdin.  The same functions are invoked as for ReadOnly archives, but in some rare cases the results may be different.

ReadWrite calls change the archive contents.  Functions that add and delete files are here.  These can only be used on archive files opened read-write.

General calls can be made regardless of how the archive was opened.  Functions included here can get and set archive parameters and define callbacks.

Helper functions don't do anything to the archive.  They're functions or macros that do useful things with some of the data types returned.  They're included as a convenience.

The library does everything it can to aid multi-threading.  You should be able to perform simultaneous operations on multiple archives (assuming you have the reentrant versions of certain libc calls available).  You cannot, however, invoke multiple simultaneous operations on a single archive.

There is a general philosophy of laziness employed.  For example, opening an archive does not cause the entire table of contents to be read.  (In a NuFX archive, that would require scanning through most of the file.)  As a result, there are actually three different ways to get the table of contents out of an archive:

  1. Read-as-you-go, forgetful.  On StreamingReadOnly files, we handle an individual record and then throw the data away.  We can't seek back to deal with the record again, so there's no point in holding onto it.  This feature allows applications with very limited memory to list the contents of and extract files from very large archives.
  2. Read-as-you-go.  While performing a whole-archive operation (e.g. getting the complete list of contents, or extracting all files) on a non-streaming file, we read and save the table of contents as we go.  This saves us from having to scan through the archive once to get the contents, and then running through it again to extract the files.  This might seem silly, but if you're extracting from an archive on a slow medium (e.g. floppy or a sluggish CD-ROM drive), it cuts the time required nearly in half.
  3. Read up front.  For operations like single-file extractions and anything in the ReadWrite category, we want to have full knowledge of the archive up front.  There are cases where this will be less efficient than a more cleverly designed algorithm, but it's much simpler this way.

For write operations, a certain form of laziness is again employed.  If you want to delete three records from various points in an archive, you don't want to have to update the archive three times.  NufxLib handles this by deferring all write operations until a "flush" call is made.  In most cases, a "flush" results in a new archive being constructed in a temp file, which is subsequently renamed over the original.  The flush call does not close the archive, so it is possible to do things like:

  1. Open the archive.
  2. Queue up a bunch of operations.
  3. Flush changes.
  4. Queue up some more operations.
  5. Abort changes.
  6. Queue up yet more operations.
  7. Flush changes.
  8. Close archive.

In certain restricted cases, such as updating a comment or appending new records, the original archive can (optionally) be updated in place, saving a potentially lengthy copying of data.

As a final example of laziness, NufxLib does not re-read the archive it has just written after a Flush.  It would have been easier to write all changes, throw out all data structures, and re-read the archive from scratch, but that could be slow.  Instead, the library keeps track of the changes it has made -- something that gets a little tricky when filename threads are updated.  Being lazy is often more work.

Filenames stored in archives use the Mac OS Roman character set.  The low 128 characters are ASCII, the high 128 are specified here.  NufxLib will convert between Mac OS Roman and Unicode when necessary, and provides conversion functions for application use.

When specifying a "local filename", i.e. a file on Linux or Windows, the API expects a Unicode string.  When referring to an archived file by name (the "storage name"), the API uses the Mac OS Roman form.  The parameter and field names reflect the character set ("UNI" or "MOR"), and use the UNICHAR type for Unicode strings.  On Linux and Mac OS X the filename is encoded with UTF-8.  On Windows it should be encoded with UTF-16, but that hasn't been implemented yet, so the API still uses 8-bit characters and effectively treats MOR strings as if they were Windows Code Page 1252.  (This means the behavior of NufxLib is essentially unchanged for 3.0 on Windows.)

Data Types and Source Conventions

All API calls and data types begin with "Nu", and all constants start with "kNu".  All internal functions start with "Nu_", and any internal data tables with global scope start with "gNu".  Hopefully these rules will avoid compile-time and link-time name conflicts.

For details about the fields available in different structures, see the NufxLib.h header file.  Everything in NufxLib.h is public.  Most of these types have a direct analog with a field or structure in the NuFX specification.

UNICHAR (char -or- wchar_t): All filenames for "local" files, i.e. files on the Linux or Windows filesystem, should use UNICHAR.  This will be char on Linux and Mac OS X.  Someday it will be wchar_t for Win32, but for now it's an 8-bit char there as well.

Windows uses UTF-16 encoding, so wchar_t is required.  (Unicode filename handling for Windows is incomplete, so the code does not currently use wide chars.)

NuError (enum): Most library functions return NuError.  A value of zero (kNuErrNone) indicates success, anything else indicates failure.  Values less than zero are NufxLib errors, while values greater than zero are system errors (like ENOENT).

NuResult (enum): Callback functions return these values to tell NufxLib how things went.  For example, an error callback can tell the library to Abort, Retry, or Skip.  (Okay, it can Ignore too.)

NuRecordIdx and NuThreadIdx (uint32_t): These are used to identify a specific record or thread in API calls.  Their values are assigned when the archive file is read.  They aren't reused, so if you delete some records and add some new ones, the indices of the deleted records won't appear again.  Do not assume that the indices start at a specific value or are assigned in a particular order.  The indices are assigned when the archive is opened, and if you close and reopen the archive, they may be completely different.

NuThreadID (uint32_t): This is a combination of the 16-bit "thread class" and the 16-bit "thread kind".  Constants are defined for common values, e.g. kNuThreadIDDataFork (0x00020000) indicates a data fork.

NuThreadFormat (enum): An enumeration of constants representing the 16-bit "thread format" value.  This is used to specify a type of compression (uncompressed, LZW/1, LZW/2, etc).

NuFileSysID (enum): An enumeration of GS/OS file system identifiers.

NuStorageType (enum): An enumeration of ProDOS storage types.  There are extended (forked) files, directories, and three types of plain files.

NuArchive (opaque struct): This is the fundamental state structure for all API calls.  Every call takes one of these as an argument.  The structure contains all of the information about the archive and pending operations.

NuCallback (pointer to function): Callback function declarations must match this type.  An example would be "NuResult MyFunction(NuArchive* pArchive, void* args)".

NuValueID (enum): An identifier for settable values.  You can change certain NufxLib parameters after opening an archive.  This enum is how you specify which parameter you want to change.

NuValue (uint32_t): The new value for the parameter specified by the NuValueID.

NuAttrID (enum): An identifier for archive attributes.  You can get information about archive attributes (characteristics of the archive itself) through a NufxLib interface.  This type has an enumeration of the legal values.

NuAttr (uint32_t): The value for the attribute specified by the NuAttrID is placed in one of these.

NuDataSource (opaque struct): Some of the fancier NufxLib calls allow you to use data from a file on disk, a file that's already open, or a buffer of memory.  This struct contains that specification.

NuDataSink (opaque struct): Like NuDataSource, this specifies a data location.  This struct is for data being extracted.

NuDateTime (struct): This holds the date and time in an expanded format, using the same structure as TimeRec from "misctool.h" on the IIgs.

NuThread (struct): The fields from the thread header, as well as a few new ones like the absolute file offset, are accessible.

NuRecord (struct): This has all of the fields from the NuFX Record structure, as well as some convenience fields (like "filename", which always points to the right filename whether it was stored in the record header or came out of a thread).  Some calls cause a NuRecord structure to be passed to a callback function, where it can be accessed directly.  The Threads are represented as an array of NuThread structures attached to the NuRecord.

NuMasterHeader (struct): This holds the data from the archive's master header block.

NuRecordAttr (struct): Some of the fields in a NuRecord can be changed, such as the file type and modification date.  This structure contains the modifiable fields, and is used as an argument to two of the API calls.

NuFileDetails (struct): When adding files, it is up to the application to supply many of the details about the file, such as the file type, access permissions, and modification date.  This structure provides a way to pass those values into the library.

NuSelectionProposal (struct): Selection callback functions receive one of these.

NuPathnameProposal (struct): Pathname filter callback functions receive one of these.

NuProgressData (struct): Progress update callback functions receive one of these.

NuProgressState (enum): A component of NuProgressData, this tells the callback function what the library is doing.

NuErrorStatus (struct): Error handling callback functions receive one of these.

Files are referenced with standard libc FILE* pointers.  The library uses fseek and ftell, which are defined by POSIX to take a signed long integer for the offset argument, so archives larger than 2GB cannot be handled.


ReadOnly Interfaces

These interfaces can be used on read-only and read-write archives.  A subset, described later, can also be used on streaming-read-only archives.

NuError NuOpenRO(const UNICHAR* archivePathnameUNI, NuArchive** ppArchive)

Creates a new NuArchive structure for the "archivePathname" file.  The file will be opened in read-only mode.

Attempting to use ReadWrite interfaces on a read-only archive will fail.

NuError NuContents(NuArchive* pArchive, NuCallback contentFunc)

Read the list of entries from the archive.  If the full table of contents has already been read, the in-memory copy will be used.

"contentFunc" is a callback function that will be called once for every record in the archive.  The callback function should look something like this:

NuResult EntryListing(NuArchive* pArchive, const NuRecord* pRecord)

(Depending on your compiler, you may have to declare "pRecord" as a void* and cast it in the function.)

The record passed to the callback function does not reflect the results of any un-flushed changes.  Additions, deletions, and updates will not be visible until NuFlush is called.

The application must not attempt to retain a copy of "pRecord" after the callback returns, as the structure may be freed.  Anything of interest should be copied out.

NuError NuExtract(NuArchive* pArchive)

Try to extract all files from the archive.  Each entry is passed through the SelectionFilter callback, if one has been supplied, to determine whether or not it should be extracted.  The OutputPathnameFilter callback is invoked to covert the filenames to something appropriate for the target filesystem.

On systems that support forked files, a record with both data and resource forks can be extracted to the individual forks of the same file.  On systems without native support for forks, the data can be extracted into two different files by using the OutputPathnameFilter.  If the system doesn't support forks, and no OutputPathnameFilter is specified, then the forks will be extracted into the same file.  Depending on the value of the kNuValueHandleExisting parameter, this could result in one fork overwriting the other, in one fork not getting extracted, or in the HandleError callback getting invoked.  (The HandleError callback can choose to rename the file, overwrite it, skip the current entry, or abort the entire process.)

The global EOL conversion setting is applied to all threads, but is automatically turned off for disk image threads.

NuError NuExtractRecord(NuArchive* pArchive, NuRecordIdx recordIdx)

Extract a single record.  Otherwise identical to NuExtract.  The SelectionFilter callback, if specified, will be invoked.

There are a number of ways to get the recordIdx.  You can call NuContents and use the callback to find the one you want.  You can get the recordIdx by the filename stored in the archive, with NuGetRecordIdxByFilename.  Or, you can get it by the record's offset in the archive, using NuGetRecordIdxByPosition.

NuError NuExtractThread(NuArchive* pArchive, NuThreadIdx threadIdx, NuDataSink* pDataSink)

Extract a single thread.  Specify the thread index and a place to put the data.  The SelectionFilter callback, if specified, will be invoked.

Remember that, if EOL conversion is enabled in the data sink, the amount of data that comes out of a thread may not match pThread's "actualThreadEOF" value.

(In some ways it doesn't really make sense to call the SelectionFilter callback when a specific thread has been singled out for extraction.  However, it's easy to disable (set the callback to NULL), it may prove useful, and it keeps the interface consistent.)

NuError NuTest(NuArchive* pArchive)

The NuTest call is functionally equivalent to NuExtract in every way but one: it doesn't actually extract anything.  If you want to test a subset of the files, supply a SelectionFilter callback.

This won't test filenames or comments because those aren't extracted by NuExtract.  However, since such threads don't have CRCs, there's really nothing to test anyway.  The parts that can be tested for correctness are verified automatically when the archive table of contents is read.

NuError NuTestRecord(NuArchive* pArchive, NuRecordIdx recordIdx)

A single-record version of NuTest.

NuError NuGetRecord(NuArchive* pArchive, NuRecordIdx recordIdx, const NuRecord** pRecord)

Get a pointer to the record header.  The thread array can be accessed through this pointer.

As with callbacks, when you get a const pointer, it is very important that you don't try to modify it.  The structure pointed to is part of the current archive state, so the effects of changes are unpredictable.  If you wish to alter fields in the Record header, use the NuSetRecordAttr call.

IMPORTANT: you must discard this pointer if you call NuFlush or NuClose.

NuError NuGetRecordIdxByName(NuArchive* pArchive, const char* nameMOR, NuRecordIdx* pRecordIdx)

Get the recordIdx for the first record in the archive whose case-insensitive filename matches "name".  The value retrieved can be used with any call that takes a NuThreadIdx argument.

The "name" string must match the record's filename exactly, including the filename separator character.

If you know what you want to extract from an archive by name, use this.

NuError NuGetRecordIdxByPosition(NuArchive* pArchive, uint32_t position, NuRecordIdx* pRecordIdx)

Get the recordIdx for nth record in the archive.  "position" is zero-based, meaning the very first record in the archive is at position 0, the next is at position 1, and so on.  The value retrieved can be used with any call that takes a NuRecordIdx argument.

This could be useful when an application is certain that it is only interested in the very first record in the archive, e.g. an Apple II emulator opening a disk image.


StreamingReadOnly Interfaces

A streaming archive is presented to the library as a FILE* that can't be seeked, generally because it was handed to the application via a pipe or shell redirect.  A subset of the ReadOnly interfaces are supported.  All of them leave the stream pointed at the first byte past the end of the archive.

This calls are also useful for files on disk in situations where memory is at a premium.  Because it's impossible to seek backwards in the archive, no attempt is made to remember anything about records other than the one most recently read.

The interfaces supported are:

NuContents
Behaves just like the non-streaming version.
NuTest
Behaves just like the non-streaming version.
NuExtract
Behaves like the non-streaming version for well-formed archives.  If a filename is stored in a thread, and the filename thread comes after a data thread, NufxLib would need to extract the data before it knows what filename to use.  This currently results in an error.

There is one interface that only applies to StreamingReadOnly archives:

NuError NuStreamOpenRO(FILE* infp, NuArchive** ppArchive)

Creates a new NuArchive structure for "infp".  The file must be positioned at the start of the archive.

It should be possible to concatenate multiple archives together, and use them by issuing consecutive NuStreamOpenRO calls.

If your system requires fopen(filename, "rb") instead of "r" (e.g. Win32), make sure the archive file was opened with "b", or you may get "unexpected EOF" complaints.


ReadWrite Interfaces

NuError NuOpenRW(const UNICHAR* archivePathnameUNI, const UNICHAR* tempPathnameUNI, uint32_t flags, NuArchive** ppArchive)

Open a file for read-write operations.  A pointer to the new archive is returned via "ppArchive".

"archivePathnameUNI" is the name of the archive to open.  If the file has zero length, the archive will be treated as if NufxLib had just created it.

"tempPathnameUNI" is the name of the temp file to use.  The call will fail if the temp file already exists.  The temp file must be in a location that allows it to be renamed over the original archive when a "flush" operation has completed.  If "tempPathname" ends in six 'X's, e.g. "tmpXXXXXX", the name will be treated as a mktemp-style pattern, and a unique six-character string will be substituted before the file is opened.  Note that the temp file will be opened even if "kNuValueModifyOrig" is set.

"flags" is a bit vector of boolean flags that affect how the archive is opened.  If no flags are set, and the archive doesn't exist, the call will fail.  If "kNuOpenCreat" is set, the archive will be created if it doesn't exist.  If "kNuOpenCreat" and "kNuOpenExcl" are both set, the call will fail if "archivePathname" already exists (i.e. the archive *must* be created).

If the archive was just created, "kNuValueModifyOrig" will be set to "true".

NufxLib can tell the difference between a BXY file (NuFX in a Binary II wrapper) and a BNY file with several entries whose first entry happens to be a NuFX archive.  Access to BNY files that happen to have a ShrinkIt archive in them isn't supported.

NuError NuFlush(NuArchive* pArchive, long *pStatusFlags)

Commits all pending write operations.

"pStatusFlags" gets a bit vector of flags regarding the status of the archive.  If a non-kNuErrNone result is returned, "pStatusFlags" may contain one or more of the following:

Some of the above are mutually exclusive, e.g. only one of kNuFlushSucceeded, kNuFlushAborted, and kNuFlushCorrupted will be set.

Any records without threads -- either created that way or having had all threads deleted -- will be removed.  Newly-created records without filename threads will have one added.  (Existing records without filenames are frowned upon but left alone.)

Normally, the archive is reconstructed in the temp file, and the temp file is renamed over the original archive after all of the operations have completed successfully.  As a performance optimization, if kNuValueModifyOrig is "true", NuFlush will try to modify the archive in place.  This is only possible if the changes made to the archive consist entirely of additions of new files, updates to pre-sized threads, and/or setting record attributes.  If other changes have been made, the update will be done through the temp file.

If an operation fails during the flush, all changes will be aborted.  If something fails in a way that can't be recovered from, such as failing to rename the temp file after a successful flush or failing partway through an update to the original archive, the archive may be switched to read-only mode to prevent future operations from compounding the problem.

NuError NuAddRecord(NuArchive* pArchive, const NuFileDetails* pFileDetails, NuRecordIdx* pRecordIdx)

Add a new record with no threads.  The index of the created record is returned in "pRecordIdx".  This always creates a "version 3" record, and expects that the filename will be stored in a thread.

"pFileDetails" is a pointer to a NuFileDetails structure.  This contains most of the interesting fields in a record, such as access flags, dates, file types, and the filename.  The "threadID" field is ignored for this call.

"pRecordIdx" may be NULL.  However, the only way to add threads to the record is with NuAddThread, which requires the record index as a parameter, so you almost certainly want to get this value.

If no filename thread is added, the NuFlush call will use the "storageName" field from the "pFileDetails" parameter to create a filename thread for it.

If no threads are added at all, the NuFlush call will throw the record away.

The "pFileDetails->storageName" may not start with the filename separator argument, e.g. "/tmp/foo" is illegal but "tmp/foo" is okay.

If a disk image thread is added to the record, and the "storageType" and "extraType" values set by "pFileDetails" aren't compatible, the entries will be replaced with values appropriate for the thread.  For records with non-disk data-class threads, the storageType will be adjusted when necessary.

Depending on the values of kNuValueAllowDuplicates and kNuValueHandleExisting, this may replace an existing record in the archive.  See Replacing Existing Records and Files for details.

NuError NuAddThread(NuArchive* pArchive, NuRecordIdx recordIdx, NuThreadID threadID, NuDataSource* pDataSource, NuThreadIdx* pThreadIdx)

Add a new thread to a record.  You may add threads to an existing record or a newly created one.  Some combinations of threads are not allowed, and will cause an error to be returned.  (See the NuFX Addendum for details.)

"recordIdx" is the index of the record being added to.

"threadID" is the class and kind of the thread being added.  This defines how the data is labeled in the archive, and whether the contents of "pDataSource" are to be regarded as pre-sized or not.

"pDataSource" is where the data comes from.  If the source is uncompressed, the thread will be compressed with the compression value currently defined by kNuValueDataCompression.  (You can set the value independently for each call to NuAddThread.)  Only data-class threads will be compressed.  If you're adding a pre-sized thread, such as a comment or filename, set the "otherLen" field in the data source.

"pThreadIdx" gets the thread index of the newly created thread.  This parameter may be set to NULL.

Threads will be arranged in an appropriate order that may not be the same as the order in which NuAddThread was called.

If "threadID" indicates the thread is a disk image, then the uncompressed length must either be a multiple of 512 bytes, or must be equal to (recExtraType * recStorageType) in the record header.

On successful completion, the library takes ownership of "pDataSource".  The structure will be freed after a NuFlush call completes successfully or all changes are aborted.  Until NuFlush or NuAbort completes, it is vital that you don't free the underlying resource.  That is, don't close the FILE*, delete the file, or free the buffer that the data source references.  If you don't want to keep track of the resources used by FP and Buffer sources, you can specify "fcloseFunc" or "freeFunc" functions to have them released automatically.  See the explanation of NuDataSource for details.

NuError NuAddFile(NuArchive* pArchive, const UNICHAR* pathnameUNI, const NuFileDetails* pFileDetails, short fromRsrcFork, NuRecordIdx* pRecordIdx)

Add a file to the archive.  This is a combination of NuAddRecord and NuAddThread, but goes a little beyond that.  If you add a file whose pFileDetails->threadID indicates a data fork, and another file whose pFileDetails->threadID indicates a resource fork, and both files have the same pFileDetails->storageName, then the two files will be combined into a single record. 

"pathnameUNI" is how to open the file.  It does not have any bearing on the filename stored in the archive.  Because all write operations are deferred, NufxLib will not open or even test the existence of the file before NuFlush is called.

"pFileDetails" describes the file types, dates, and access flags associated with the file, as well as the filename that will be stored in the archive ("storageName").  If two forks are placed in the same record, whichever was added first will determine the record's characteristics.

"fromRsrcFork" should be set if NufxLib should get the data out of the "pathname" file's resource fork.  If the underlying filesystem doesn't support resource forks, then the argument has no effect.  It does not have any impact on whether the data is stored as a data fork thread or resource fork thread -- that is decided by the "threadID" field of "pFileDetails".

"pRecordIdx" gets the record index of the new (or existing) record.  This argument may be NULL.

The "pFileDetails->storageName" may not start with the filename separator argument, i.e. "/tmp/foo" is illegal but "tmp/foo" is okay.

If "pFileDetails->threadID" indicates the thread is a disk image, then the uncompressed length must either be a multiple of 512 bytes, or must be equal to recExtraType * recStorageType.

On systems with forked files, such as GS/OS and Mac OS, it will be necessary to call NuAddFile twice on forked files.  The call will automatically join forks with identical names.

Depending on the values of kNuValueAllowDuplicates and kNuValueHandleExisting, this may replace an existing record in the archive.  See Replacing Existing Records and Files for details.

Adding a directory will not cause NufxLib to recursively descend through the directory hierarchy.  That's the application's job.  Requests to add directories are currently ignored.  [A future release may add a "create directory" control thread, so we can store empty directories.]

NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx, const char* pathnameMOR, char fssep)

Rename an existing record.  Pass in the index of the record to update, the new name, and the filename separator character.  Setting the name to an empty string is not permitted.

This call will do one of three things to the archive.  If a filename thread is present in the record, and it has enough room to hold the new filename, then the existing thread will be updated.  If a filename thread is present, but doesn't have enough space to hold the new name, then the existing thread will be deleted and a new filename thread will be added.  Finally, if no filename thread is present, a new one will be added, and the filename in the record header (if one was set) will be dropped.

NufxLib does not currently test for the existence of records with an identical name.  This is probably a bug (ought to obey the kNuValueAllowDuplicates setting).

NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx, const NuRecordAttr* pRecordAttr)

Set a record's attributes.  The fields in the NuRecordAttr struct replace the fields in the record.  This can be used to change filetypes, modification dates, access flags, and the file_sys_id field.

The changes become visible to NuContents calls only after NuFlush is called.

You can fill in values in the NuRecordAttr from a NuRecord struct with the NuRecordCopyAttr call.

NuError NuUpdatePresizedThread(NuArchive* pArchive, NuThreadIdx threadIdx, NuDataSource* pDataSource, long* pMaxLen)

Update the contents of a pre-sized thread.  This can only be used on filename and comment threads.  Attempting to use it on other threads results in a kNuErrNotPreSized return value.

"threadIdx" is the index of the thread to update, and "pDataSource" is where the data comes from.  The "otherLen" field in "pDataSource" is ignored, because this call cannot be used to resize an existing thread.  (The only way to do that is to delete the thread and then create a new one.)

"pMaxLen" will hold the maximum size of the thread if the call succeeds.  If the call fails because the existing thread is too small, kNuErrNotPreSized is returned and "pMaxLen" will be valid.  (You can also get the size by examining the thread's thCompThreadEOF field.)

This cannot be used on newly-added, deleted, or updated threads.

On successful completion, the library takes ownership of "pDataSource".  The structure will be freed after a NuFlush call completes successfully or all changes are aborted.  Until NuFlush or NuAbort completes, it is vital that you don't free the underlying resource.  That is, don't close the FILE*, delete the file, or free the buffer that the data source references.  If you don't want to keep track of the resources used by FP and Buffer sources, you can specify "fcloseFunc" or "freeFunc" functions to have them closed automatically.  See the explanation of NuDataSource for details.

NuError NuDelete(NuArchive* pArchive)

Bulk delete.  This tries to delete every record in the archive, invoking the SelectionFilter callback if one has been specified.

You cannot delete a record that is newly-added, has been modified, has already been deleted, or has had threads added, deleted, or updated.  Such records will be skipped over, so your selection filter simply won't see them.

Because deletion is a deferred write operation, none of the records will actually be deleted until NuFlush is called.  If NuDelete was successful in its attempt to delete every record, and no new records were added, the NuFlush call will mark the archive as being brand new (this differs from v1.0, which failed with kNuErrAllDeleted).  As a result, if you close the empty archive without adding anything to it, the archive file will be removed.

NuError NuDeleteRecord(NuArchive* pArchive, NuRecordIdx recordIdx)

Delete a single record, specified by record index.

You cannot delete a record that is newly-added, has been modified, has already been deleted, or has had threads added, deleted, or updated.

The record will be removed when NuFlush is called.

NuError NuDeleteThread(NuArchive* pArchive, NuThreadIdx threadIdx)

Delete a single thread, specified by thread index.  If you delete all of the threads in a record, and don't add any new ones, the record will be removed.

You cannot delete a thread that is newly-added, deleted, or has been updated.

The thread will not be removed until NuFlush is called.


General Interfaces

Archive Operations

NuError NuClose(NuArchive* pArchive)

Closes the archive.  If the archive was opened read-write, any pending changes will be flushed first.  If the flush attempt fails, NuClose will leave the archive open and return with an error.

When the archive is closed, the temp file associated with a read/write archive will be closed and removed.

All data structures associated with the archive are freed.  Attempting to use "pArchive" further results in an error (or worse).

NuError NuAbort(NuArchive* pArchive)

Abort all pending changes.  NufxLib will throw out every pending modification request, returning to the state it was in following the most recent Open or Flush.

This does not close or manipulate any files, except for those pointed to by data sources with "fcloseFunc" set.  For the most part it simply updates internal data structures.

It's perfectly safe to call this if there are no pending changes.  The call just returns without doing anything.

NuError NuGetMasterHeader(NuArchive* pArchive, const NuMasterHeader** ppMasterHeader)

Get a pointer to the NuFX MasterHeader block.  One useful item here is the number of records in the archive.

IMPORTANT: do not retain the pointer after calling NuFlush or NuAbort.

NuError NuGetExtraData(NuArchive* pArchive, void** ppData)
NuError NuSetExtraData(NuArchive* pArchive, void* pData)

Store an arbitrary void* pointer in the NuArchive structure.  This can be useful for accessing application data within a callback without resorting to global variables.

NuError NuGetValue(NuArchive* pArchive, NuValueID ident, NuValue* pValue)
NuError NuSetValue(NuArchive* pArchive, NuValueID ident, NuValue value)

Manipulate one of NufxLib's configurable values.  See the tables for details.

NuError NuGetAttr(NuArchive* pArchive, NuAttrID ident, NuAttr* pAttr)

Get an archive attribute, such as whether it's wrapped in a Binary II header.  See the tables for details.

NuError NuDebugDumpArchive(NuArchive* pArchive)

Print debugging information to stdout.  The output contains a rather verbose description of the archive.  This call is only functional if the library was built with debugging enabled.  If the library was built without assertions or debug messages, this call returns an error.

 

Data Sources

Sources and sinks provide a way for the application to add from and extract to something other than a named file on disk.  There are three kinds of sources and sinks:

  1. File objects are named files on disk.  They are accessed by filename.
  2. FP objects are FILE pointers.  Pass in a pointer to any file at any offset.
  3. Buffer objects are pointers to memory.

NuDataSource objects are used in conjunction with deferred write calls.  They specify a location from which data is read.  All DataSource creation calls take the following arguments:

The remaining arguments are detailed next.

NuError NuCreateDataSourceForFile(NuThreadFormat threadFormat, uint32_t otherLen, const UNICHAR* pathnameUNI, short isFromRsrcFork, NuDataSource** ppDataSource)

Create a data source from a file on disk.  Because all write operations are deferred, the file will not actually be opened until NuFlush is called.  This means that if the file is unreadable or doesn't exist, the data source create call will succeed, but the eventual NuFlush call will fail.

The entire contents of the file will be used.  The file is opened when needed and closed when processing completes.

"pathnameUNI" is the name of the file to open.  If you use the same pathname with more than one data source, each data source will open and close the file.

"isFromRsrcFork" determines whether the data fork or resource fork should be opened.  This only has meaning on systems like Mac OS and GS/OS, where the "open" call determines which fork is opened.  For other systems, always set it to "false".

NuError NuCreateDataSourceForFP(NuThreadFormat threadFormat, uint32_t otherLen, FILE* fp, long offset, long length, NuCallback fcloseFunc, NuDataSource** ppDataSource)

Create a data source from a FILE*.  The FILE* must be seekable, i.e. you can't use a stream like stdin.  Because all write operations are deferred, any problems with the stream, such as an early EOF, will not be detected until the NuFlush call is made.

"fp" is the stream to use.  It will be seeked immediately before use, so it is permissible to use the same fp in more than one data source.  If you are developing for a system that differentiates between fopen(filename, "r") and fopen(filename, "rb"), use the latter or you may get "unexpected EOF" failures.

"offset" is the starting offset in the file.  The file will be seeked to this point right before it is used.

"length" is the number of bytes to use.

The "fcloseFunc" parameter points to a function that calls fclose() on its argument.  It's bad practice (especially in the Win32 DLL world) to allocate in the app and free in the library, so this provides a way to let the library choose when to close the file, but let the application manage its own heap.  If this argument is nil, the FILE* will not be closed when processing on this data source completes.

IMPORTANT: if you use the same FILE* in more than one data source, do not provide an fcloseFunc for any of them.  Deferred write operations are not guaranteed to happen in any particular order, so if you set fcloseFunc the library may close the file when it is still needed.

NuError NuCreateDataSourceForBuffer(NuThreadFormat threadFormat, uint32_t otherLen, const uint8_t* buffer, long offset, long length, NuCallback freeFunc, NuDataSource** ppDataSource)

Create a data source from a memory buffer.  Invalid memory references will not be detected until NuFlush is called.

"buffer" is a pointer to the memory you want to use.  It is okay for "buffer" to be nil so long as "offset" and "length" are zero.  This may be useful when creating an empty comment thread.

"offset" is the offset from "buffer" at which the data starts.

"length" is the number of bytes to use.

The "freeFunc" parameter points to a function that calls "free", "delete", or "delete[]" on its argument.  There's no way for nufxlib to know exactly how the memory was allocated (malloc/new/new[]/custom), so the application needs to supply a function to clean it up.  If this argument is nil, the buffer will not be freed when processing on this data source completes..  (Side note: the "offset" parameter exists so that you can use part of a buffer and then let the library free the whole thing afterward.)

IMPORTANT: if you use the same memory buffer in more than one data source, do not provide a freeFunc for any of them.  Deferred write operations are not guaranteed to happen in any particular order.

NuError NuFreeDataSource(NuDataSource* pDataSource)

Free a data source.  You should only do this if the data source was not used in a successful deferred write call.

If "fcloseFunc" or "freeFunc" is set in the data source, the appropriate action will be taken.  (NufxLib may actually make copies of DataSource objects with ref-counting, so freeing your object may not cause an immediate fclose or free.)

void NuDataSourceSetRawCrc(NuDataSource* pDataSource)

When the data source contains already-compressed data, there's no way for NufxLib to compute the CRC of the uncompressed data without expanding it.  Version 3 records require a data CRC in the thread header.  This provides a way for the application to specify what value should be in the "thThreadCrc" field.

 

Data Sinks

NuDataSink calls are used with the thread extraction function.  They allow the application to specify where data is to be written to.  All DataSink creation calls take the following arguments:

The remaining arguments are detailed next.

NuError NuCreateDataSinkForFile(short doExpand, NuValue convertEOL, UNICHAR* pathnameUNI, UNICHAR fssep, NuDataSink** ppDataSink)

Create a data sink for a named file on disk.  The file will be opened, written to, and then closed.

Because of a peculiarity in NufxLib design, the OutputPathnameFilter callback will be invoked during the extraction if one has been installed.  Since your application supplied the filename, it most likely won't want to change it, but this can still be useful in the case where the file exists and needs to be renamed.  (This might even be useful, e.g. if your application insists on using the record's filename directly when creating a data sink.)

"pathnameUNI" is the full pathname of the file to write to.

"fssep" is the filesystem separator used in the pathname.  This is necessary so NufxLib can build any missing directory components.

Using the same pathname in more than one data sink will likely yield disappointing results, as subsequent extractions will overwite the earlier ones.

NuError NuCreateDataSinkForFP(short doExpand, NuValue convertEOL, FILE* fp, NuDataSink** ppDataSink)

Create a data sink from a FILE*.  The stream must be writeable, and must be seeked to the desired offset before the extract call is made.

"fp" is the stream to use.

Using the same FILE* in more than one data sink isn't necessary: you can just re-use the same data sink.  The stream is never seeked, so subsequent extractions will append to the earlier ones.

NuError NuCreateDataSinkForBuffer(short doExpand, NuValue convertEOL, uint8_t* buffer, uint32_t bufLen, NuDataSink** ppDataSink)

Use a memory buffer as a data sink.

"buffer" is a pointer to the memory buffer.

"bufLen" is the maximum amount of data that the memory buffer can hold.

You can re-use a buffer data sink on multiple extractions.  The pointer will be advanced, and bufLen decreased.  Exceeding the size of the buffer causes the extraction to fail with a buffer overrun error.  (Thus, you can extract more than one thread into the same buffer, but you can't extract one thread into multiple buffers.)

NuError NuFreeDataSink(NuDataSink* pDataSink)

Free a NuDataSink.

NuError NuDataSinkGetOutCount(NuDataSink* pDataSink, uint32_t* pOutCount)

Get the number of bytes that have been written to a data sink.  The result will be placed into "pOutCount".  This can come in handy if you've extracted a number of things into a memory buffer and aren't sure exactly how much is in there (perhaps because of EOL conversions).

 

Callback Setters

These functions allow you to set callbacks on a per-archive basis.

Most NufxLib calls are illegal in a callback function (NufxLib is not reentrant for a single NuArchive).  The only calls you are allowed to make are NuGetExtraData, NuSetExtraData, NuGetValue, NuSetValue, and NuGetAttr.

The application must not keep copies of pointers passed to a callback.  If you want to keep the information from (say) a NuRecord*, you will need to copy the contents of the struct to local storage.

If something has a "const" pointer, don't write to it.  The results of doing so are unpredictable (but most likely bad).

All callbacks are of type NuCallback, which is defined as:

NuResult (*NuCallback)(NuArchive* pArchive, void* args);

The "set" functions return the previous callback, all of which default to NULL.  If the "pArchive" argument is invalid, the calls will fail and return kNuInvalidCallback.

 

NuError NuSetSelectionFilter(NuArchive* pArchive, NuCallback filterFunc)

The selection filter callback is used to select records and threads during bulk operations. The argument passed into the callback is a "const NuSelectionProposal*":

typedef struct NuSelectionProposal {
	const NuRecord*     pRecord;
	const NuThread*     pThread;
} NuSelectionProposal;

These are pointers to the NuFX record and thread that we are about to act upon. During an extract operation, "pThread" will point at the thread we are about to extract. During a delete operation, "pThread" will point at the first thread in the record we are about to delete.

Valid return values from a selection filter:

If no selection filter is specified, then all records will be selected.

 

NuError NuSetOutputPathnameFilter(NuArchive* pArchive, NuCallback filterFunc)

When extracting files, this callback allows you to change the name of the file that will be opened on disk. It will be called once for every thread we extract. The argument to the callback is a "NuPathnameProposal*":

typedef struct NuPathnameProposal {
	const UNICHAR*      pathnameUNI;
	UNICHAR             filenameSeparator;
	const NuRecord*     pRecord;
	const NuThread*     pThread;

	const UNICHAR*      newPathnameUNI;
	UNICHAR             newFilenameSeparator;
	NuDataSink*         newDataSink;
} NuPathnameProposal;

The fields are:

If a record contains a data fork and a resource fork, your filter will be called twice with the same pathname. (You can examine pThread to see what kind of fork is being extracted.) If the OS requires that extended files be initially created as such, then the file will always be created as "extended" if the record indicates that a resource fork is present.

This mechanism can be used to implement a "rename file being extracted" feature. If an error handler is defined, and it returns kNuRename when NufxLib tries to overwrite an existing file, then the pathname filter will be invoked again.

Valid return values from the output pathname filter:

If no OutputPathnameFilter is set, the files will be opened with the names that appear in the archive.

 

NuError NuSetProgressUpdater(NuArchive* pArchive, NuCallback updateFunc)

During add, extract, and test operations, NufxLib will send progress update messages via the ProgressUpdater callback. The argument to the callback is a "const NuProgressData*":

typedef struct NuProgressData {
	NuOperation         operation;
	NuProgressState     state;
	short               percentComplete;

	const UNICHAR*      origPathnameUNI;
	const UNICHAR*      pathnameUNI;
	const UNICHAR*      filenameUNI;
	const NuRecord*     pRecord;

	uint32_t            uncompressedLength;
	uint32_t            uncompressedProgress;

	struct {
		NuThreadFormat  threadFormat;
	} compress;

	struct {
		uint32_t            totalCompressedLength;
		uint32_t            totalUncompressedLength;

		const NuThread*     pThread;
		NuValue             convertEOL;
	} expand;
} NuProgressData;

The possible values for a NuOperation value are:

Deleting files and listing contents don't cause the progress update callback to be called, so you'll never see "kNuOpDelete" or "kNuOpContents" in a progress handler. The possible values for a NuProgressState value are:

Some values (say, kNuProgressCompressing) are only appropriate for certain operations (kNuOpAdd).

Valid return values from a progress updater are:

If no ProgressUpdater is defined, no progress update information will be sent.

 

NuError NuSetErrorHandler(NuArchive* pArchive, NuCallback errorFunc)

The ErrorHandler callback deals with all exceptional conditions that arise. The callback may define hard-coded policy or query the user for directions. The argument to the callback is a "const NuErrorStatus*":

typedef struct NuErrorStatus {
	NuOperation         operation;
	NuError             err;
	int                 sysErr;
	const UNICHAR*      message;
	const NuRecord*     pRecord;
	const UNICHAR*      pathnameUNI;
	const void*         origPathname;
	UNICHAR             filenameSeparator;

	char                canAbort;
	char                canRetry;
	char                canIgnore;
	char                canSkip;
	char                canRename;
	char                canOverwrite;
} NuErrorStatus;

Some situations that may arise:

operation == kNuOpExtract
err == kNuErrFileExists
We're extracting a file to the same pathname as an existing file, and our overwrite policy is set to "maybe".
err == kNuErrNotNewer
We're extracting a file to the same pathname as an existing file, and we're only allowed to overwrite older files.
err == kNuErrDuplicateNotFound
We're extracting a file in the "must overwrite" mode, but the file doesn't exist.
operation == kNuOpAdd
err == kNuErrRecordExists
We're adding a file whose "storage name" matches a file already in the archive, and our overwrite policy is set to "maybe".
err == kNuErrFileNotFound
We tried to add a file, but when we went to open it the file didn't exist.
operation == kNuOpTest
err == kNuErrBadMHCRC
The master header CRC was bad.
err == kNuErrBadRHCRC
A record header CRC was bad.
err == kNuErrBadThreadCRC
A thread header CRC was bad.
err == kNuErrBadDataCRC
A thread in the data (e.g. LZW/1 CRC) was bad.

The valid return values are defined by the NuErrorStatus structure.

If no ErrorHandler is defined, an appropriate default action (usually kNuAbort) is taken.

 

NuError NuSetErrorMessageHandler(NuArchive* pArchive, NuCallback messageFunc)
NuError NuSetGlobalErrorMessageHandler(NuCallback messageFunc)

Specify a callback to receive text error messages.  These are typically an error message followed by an explanation of the error code that the library is about to return.  The callback takes an argument of type "const NuErrorMessage*", which is defined as:

typedef struct NuErrorMessage {
	const char*         message;
	NuError             err;
	short               isDebug;

	const char*         file;
	int                 line;
	const char*         function;
} NuErrorMessage;

The return value is ignored.

Some error messages aren't associated with an archive, generally because they occur when an archive is being opened.  Since there's no way to associate them with a single archive, the handler must be global to the entire library.  The second form of this call allows you to specify where global error messages should be sent.  The arguments to the callback are identical, but "pArchive" will be nil.

If no callback is specified, the messages are sent to stderr.  If your application doesn't have a stderr (perhaps it's a GUI application), be sure to set both the ErrorMessag and GlobalErrorMessage handlers.


Helper Functions

Some of these are macros, some are functions.  None require that an archive be open.

NuError NuGetVersion(int32_t* pMajorVersion, int32_t* pMinorVersion, int32_t* pBugVersion, const char** ppBuildDate, const char** ppBuildFlags)

Get some information about NufxLib's version.  This sets the major and minor version numbers, as well as setting strings with the build date and some build flags.

Any or all of the arguments may be NULL, for values you aren't interested in.

The format of "ppBuildDate" is not defined [though it probably should be].

The format of "ppBuildFlags" is a string of compiler flags separated by white space (spaces or tabs).  It is expected to represent an "interesting subset" of the flags sent to the compiler, such as the level of optimization used.

const char* NuStrError(NuError err)

Return a pointer to a string describing a NufxLib error.  NufxLib errors are "err" values less than zero.  "err" values greater than zero are system errors that can be processed with strerror() or perror(), and an "err" value of zero indicates success.

NuError NuTestFeature(NuFeature feature)

Test for support of an optional feature.  See the tables for a list.  Returns kNuErrNone on success, kNuErrUnsupFeature if the feature is known but not supported, or kNuErrUnknownFeature if the feature is not recognized at all (probably because the version of NufxLib you're linked with is older than what you compiled against).

uint32_t NuMakeThreadID(unsigned short class, unsigned short kind)

Construct a NuThreadID, given a thread class and thread kind.

uint32_t NuGetThreadID(const NuThread* pThread)

Construct a NuThreadID, using the thread class and thread kind defined in a NuThread.

uint16_t NuThreadIDGetClass(NuThreadID threadID)

Pull the thread class out of a NuThreadID.

uint16_t NuThreadIDGetKind(NuThreadID threadID)

Pull the thread kind out of a NuThreadID.

char NuGetSepFromSysInfo(unsigned short sysInfo)

Pull the filename separator character out of the file_sys_info word.

uint16_t NuSetSepInSysInfo(unsigned short sysInfo, char newSep)

Put the filename separator character into a file_sys_info word.  Returns the new value.

uint32_t NuRecordGetNumThreads(const NuRecord* pRecord)

Return the number of threads in a record.

const NuThread* NuGetThread(const NuRecord* pRecord, int idx)

Get the idx-th thread from pRecord.  If idx is less than zero or past the end of the thread array, nil is returned.

void NuRecordCopyAttr(NuRecordAttr* pRecordAttr, const NuRecord* pRecord)

Copy data from "pRecord" into "pRecordAttr".  Only the fields that exist in a NuRecordAttr are copied.  This can be useful in conjunction with the SetRecordAttr call.

NuError NuRecordCopyThreads(const NuRecord* pRecord, NuThread** ppThreads)

Copy the thread array out of a record.  This is useful if you want to keep your own copy of a thread array.

short NuIsPresizedThread(NuThreadID threadID)

Returns "true" if the threadID is considered pre-sized by NufxLib.  Right now, only filenames and comments are given this treatment.

size_t NuConvertMORToUNI(const char* stringMOR, UNICHAR* bufUNI, size_t bufSize)

Convert Mac OS Roman to Unicode (UTF-8 or UTF-16).  Returns the number of bytes required to hold the converted string.  "bufUNI" may be NULL.  [Not implemented for Win32.]

size_t NuConvertUNIToMOR(const UNICHAR* stringUNI, char* bufMOR, size_t bufSize)

Convert Unicode to Mac OS Roman.  Returns the number of bytes required to hold the converted string.  "bufMOR" may be NULL.  [Not implemented for Win32.]

 


Tables

Configurable Values (NuValue)

kNuValueIgnoreCRC Boolean (false).  Don't verify header or data CRCs.  This can provide a minor speed improvement, but allows certain kinds of errors to go undetected.
kNuValueDataCompression Enum (kNuCompressLZW2).  Threads that can be compressed (i.e. data-class threads) will be compressed with the specified compression.  Possible values are:
  • kNuCompressNone (no compression)
  • kNuCompressSQ (SQueeze)
  • kNuCompressLZW1 (ShrinkIt's dynamic LZW/1)
  • kNuCompressLZW2 (ShrinkIt's dynamic LZW/2)
  • kNuCompressLZC12 (12-bit LZW from "compress")
  • kNuCompressLZC16 (16-bit LZW from "compress")
  • kNuCompressDeflate [requires zlib]
kNuValueDiscardWrapper Boolean (false).  If changes are made to the archive that cause a new copy to be reconstructed in the temp file, then when this is set to "true" any BXY, BSE, or SEA wrapper will be stripped off.  This also causes any "junk" at the start of the file to be removed.
kNuValueEOL Enum (system-dependent).  End-of-line marker appropriate for the current system.  If EOL conversion is enabled, extracted files will be converted to this EOL value.  Valid values are:
  • kNuEOLCR (carriage return, for ProDOS, GS/OS, Mac OS)
  • kNuEOLLF (line feed, for UNIX)
  • kNuEOLCRLF (CR+LF, for MS-DOS and Win32)
kNuValueConvertExtractedEOL Enum (kNuConvertOff).  This determines whether "bulk" extractions do EOL conversions.  Possible values:
  • kNuConvertOff (don't try to convert)
  • kNuConvertOn (always convert)
  • kNuConvertAuto (convert if the input appears to be a text file)
kNuValueOnlyUpdateOlder Boolean (false).  If set, only overwrite existing records and files if the item being added or extracted is newer than the one being replaced.  Useful for an "update" or "freshen" option.  The date used for comparison is the modification date.
kNuValueAllowDuplicates Boolean (false).  If set to "true", duplicate records are allowed in the archive.  If "false", the collision will be handled according to the kNuValueHandleExisting setting.  Filename comparisons are case-insensitive.
kNuValueHandleExisting Enum (kNuMaybeOverwrite).  This determines how duplicate filename collisions are handled.  Valid values:
  • kNuMaybeOverwrite (the ErrorHandler callback is invoked)
  • kNuNeverOverwrite (the file being added or extracted is skipped)
  • kNuAlwaysOverwrite (the existing file or record is deleted)
  • kNuMustOverwrite (fails if the file or record doesn't exist, useful for a "freshen" option)

The case sensitivity when extracting is determined by the underlying filesystem.

kNuValueModifyOrig Boolean (false, unless the archive was just created by NufxLib).  If this is "true", then an effort will be made to handle all updates in the original archive, rather than reconstructing the entire archive in a temp file.  Updates to pre-sized threads, changes to record attributes, and additions of new files can all be made to the original archive.  There is some risk of corruption if the flush fails, so use this with caution.
kNuValueMimicSHK Boolean (false).  If set, attempt to mimic the behavior of ShrinkIt as closely as possible.  See the ShrinkIt Compatibility Mode section.
kNuValueMaskDataless Boolean (false).  If set to "true", records without data threads have "fake" threads created for them, so that they appear as they would had they been created correctly.
kNuValueStripHighASCII Boolean (false).  If set to "true", files filled with high-ASCII characters will be stripped if and only if an EOL conversion is performed. 
kNuValueJunkSkipMax Integer (1024).  If the archive file doesn't start with a recognized sequence, NufxLib will assume that some junk has been added to the start of the file and will scan forward at most this many bytes in an attempt to locate the real archive start.
kNuValueIgnoreLZW2Len Boolean (false).  If set to "true", the length value embedded in LZW2 compressed chunks is ignored.  This is useful for archives created with a specific broken application.  (This is deprecated -- use HandleBadMac instead.)
kNuValueHandleBadMac Boolean (false).  Recognize and handle "bad Mac" archives, which have a bad value ('?') for the filename separator character, and write an LZW/2 length value in big-endian order.

 

Archive Attributes (NuAttr)

kNuAttrArchiveType Returns one of the following:
  • kNuArchiveNuFX (NuFX archive, e.g. ".SHK" or ".SDK")
  • kNuArchiveBinaryII (Binary II, e.g. ".BNY" or ".BQY") [not supported]
  • kNuArchiveNuFXInBNY (NuFX inside Binary II, e.g. ".BXY")
  • kNuArchiveNuFXSelfEx (self-extracting GSHK archive, e.g. ".SEA")
  • kNuArchiveNuFXSelfExInBNY (self-ex inside Binary II, e.g. ".BSE")
kNuAttrNumRecords Returns the number of records in the archive.  This value does not reflect unflushed changes.
kNuAttrHeaderOffset Returns the offset of the NuFX header from the start of the file.  This will be nonzero for archives with a Binary II or self-extracting wrapper.
kNuAttrJunkOffset Returns the amount of junk found at the start of the file.  A nonzero value here indicates that junk was found.

 

Feature Tests (NuFeature)

kNuFeatureCompressSQ Test for support of SQueeze compression
kNuFeatureCompressLZW Test for support of ShrinkIt LZW/1 and LZW/2 compression
kNuFeatureCompressLZC Test for support of 12- and 16-bit LZC
kNuFeatureCompressDeflate Test for support of zlib "deflate"
kNuFeatureCompressBzip2 Test for support of libbz2 "bzip2"

 


Additional Commentary

Replacing Existing Records and Files

When using NuAddFile or NuAddRecord, there are three flags that affect what happens when an existing record has the same name:

AllowDuplicates (default false)
if set, adding a record with the same name as an existing record is allowed.
OnlyUpdateOlder (default false)
if set, we refuse to replace an existing record unless its modification date is older.
HandleExisting (default "maybe overwrite")
can be set to "maybe overwrite" (prompts the user), "never overwrite" (returns an error), "always overwrite" (overwrites the existing record), and "must overwrite" (causes an error if the record *doesn't* exist). The "maybe overwrite" value is treated as "never overwrite" if an error-handling callback isn't defined.
It's important to understand how these interact with each other, and what they mean to both existing records and newly-added (pre-flush) records. Two of them also have an effect when extracting files.

The AllowDuplicates flag determines whether or not we think duplicate records are at all interesting. If an application sets it to true, the floodgates are opened, and the two other flags are ignored.

The OnlyUpdateOlder flag is considered next. If it's set to true, and an existing, identically named file in the archive appears to be the same age or newer than the file being added, the record creation attempt fails with an error (kNuErrNotNewer).

The HandleExisting flag comes into play if we get past the first two. If a matching entry is found in the archive, NufxLib either deletes it and allows the add, prompts the user for instructions, or rejects it with an error. NuAddFile and NuAddRecord will return with kNuErrRecordExists if they can't replace an existing record. If "must overwrite" is set, and a matching record does not exist, then kNuErrDuplicateNotFound is returned.

Both AddFile and AddRecord check for duplicates among existing and newly added files. You aren't allowed to delete items that were just added, so HandleExisting flag is ignored for files you have marked for addition but haven't yet flushed.

AddFile has an additional behavior that takes precedence over all of the flags: it will try to match up the individual forks of a file. If it finds a file in the newly-added list with the same name and a compatible data thread, the new file will be added to the existing record. (A "compatible" data thread is the other half of a forked file, e.g. the application added the data fork, and is now adding the resource fork from the same file.) If the record was found but is not compatible, the AllowDuplicates behavior is used to decide if another "new" record with the same name should be added, or if an error should be returned.

If this sort of treatment is undesirable, i.e. you want a data fork and a resource fork with the same filename to be stored as two separate records, then you should call AddRecord and AddThread. AddFile is meant as a convenience for common operations.

It is possible for NuAddFile and NuAddRecord to partially complete. If a record exists and is deleted, but the call later fails for some other reason, the record will still be deleted.

Searching for existing records can take time on a large archive.  Disabling AllowDuplicates will allow NufxLib to avoid having to search through the lists of records to find matches.

 

When extracting files from an archive, the "OnlyUpdateOlder" and "HandleExisting" flags are applied to the files on disk. This is done much like the above.

To implement NuLib2's "update" feature, "OnlyUpdateOlder" needs to be set to true. To implement the "freshen" feature, "OnlyUpdateOlder" is set to true and "HandleExisting" is set to "must overwrite".

When extractions are done in bulk, the kNuErrDuplicateNotFound and kNuErrNotNewer errors are passed to the application's error handler function. The error handler is expected to return kNuSkip after perhaps updating the progress status message, but is allowed to abort or require NufxLib to overwrite the file anyway. If no error handler is defined, the file is skipped silently.

 

ShrinkIt Compatibility Mode

One of the goals was to be as compatible with ShrinkIt as possible.  ShrinkIt and GS/ShrinkIt occasionally do some strange things, so some of the compatible behaviors are only activated when the "mimic ShrinkIt" flag is set.

These behaviors are:

Some GS/ShrinkIt behaviors are not fully emulated:

Regarding the last item: a quick test with a handful of empty files showed that GS/ShrinkIt v1.1 failed to extract the empty files it had just archived.  P8 ShrinkIt v3.4 gets really confused on such archives, and insists that the first entry is a zero-byte disk archive, while the other empty files are actually four bytes long.  When asked to extract the files, it does nothing.  When adding empty files, P8 ShrinkIt v3.4 does the correct thing, and creates an empty data thread.

The default NufxLib behavior is to work around the bug.  When extracting files with a filename but no data or control threads, a zero-byte data file will be created.  (In NufxLib v1.0 the default was to ignore such entries unless the "mimic" flag was enabled.  This was changed in v1.1 to be enabled at all times.  As of v1.2, an empty resource fork is also created if the record's storage type indicates it's an extended file.)  If the "MaskDataless" flag is enabled, fake data threads are created, and applications won't even know there's a problem in the archive.

In general, with "mimic ShrinkIt" mode enabled, it should be possible to extract files from a GS/ShrinkIt archive and re-add them to a new archive, with little perceptible difference between the two.  Of course, it's up to the application to ensure that all threads (including comments) are retained, file dates aren't altered, and so on.  The only situations where NufxLib cannot produce identical results are bugs (e.g. zero-length data files always require more space) and option lists (which NufxLib does not currently support).

The bottom line: it is perfectly normal for NufxLib archives to be a few bytes smaller than GS/ShrinkIt archives, even when "mimic ShrinkIt" mode is enabled.  (An example: my 20MB boot partition compressed to about 14MB.  With "mimic" mode off, the file was 13K smaller, or about 0.1%.)

 

Compression Formats

Of the various compression formats that NufxLib supports, only LZW/1 and LZW/2 are widely supported. The latest versions of ShrinkIt and II Unshrink try to unpack SQ compression but fail. Archives that use SQ, LZC12, and LZC16 can only be unpacked by GS/ShrinkIt, NuLib, and NufxLib v1.1 and later.

The "deflate" and "bzip2" algorithms are not supported by anything other than NufxLib v1.1+ and CiderPress. They are intended to be used with archives that will never be unpacked on an Apple II. Disk images compressed with these algorithms are especially useful with emulators that use NufxLib.

Some tests with deflate and bzip2 showed that, surprisingly, deflate is nearly always better than bzip2 for Apple II files.  This is because deflate appears to do slightly better on machine code and small (< 32K) text files.  Since most Apple II files and disk images fall into these categories, there is little advantage to using bzip2.  Because deflate uses less memory and is faster, and libbz2 isn't nearly as ubiquitous as libz, I've chosen to disable bzip2 by default.

You can use the NuFeatureTest call to test for the presence of any of the compression algorithms.  This makes it possible to build a library or DLL without LZW in it.


Porting

NufxLib v1.0 was developed under Solaris 2.5 and Red Hat Linux 6.0, and was ported to Win32 shortly before the alpha release.  Porting to other UNIX-like platforms has been straightforward, with most differences contained in the "autoconf" configuration system.  For example, the BeOS/PPC port was largely a matter of getting the compiler settings right.

Mac OS and GS/OS have the ability to store file types and resource forks natively.  Support for this is not currently part of NufxLib.  A data-fork-only port, akin to what is used on UNIX and Win32, should be straightforward though.  (In fact, Mac OS X "just worked".)

Once upon a time a GS/OS port was imagined.  This never happened, and likely never will.


Design Notes

The decision to pass FILE* structures instead of file descriptors was somewhat arbitrary.  The library uses buffered I/O internally, so it was convenient to have them passed in, rather than having an fd passed in and rely on the existence of an fdopen() call.  On the other hand, if an application is built with a different version of the stdio library (in which the structure of a FILE* are different), linking with NufxLib might not work.  Given that NufxLib is distributed as BSD-licensed source code, I don't see this as being a major problem, since you can always rebuild NufxLib with the altered stdio lib.  (This caused me some grief under Windows, because the non-debug multithreaded DLL version of libc apparently does something wonky with FILE* and fwrite.  If the Win32 DLL and Win32 app aren't linked against the same libc, fwrite() will crash.  Other versions of libc, e.g. debug multithreaded and debug single-threaded, interact just fine with each other.)  My conclusion after fighting with Win32 is that it would have been better to pass file descriptors or a "NuFILE*" with read/write/seek operations that reside wholly within the NufxLib library.

The decision to pass data sources and sinks around as structures rather than as function pointers was born of a desire to reduce complexity.  Setting up a data source or sink requires making a function call with a lot of arguments, but once that's done you can forget all about it -- the code will happily close your files and free your memory when you're done with it.  A functional interface would require passing in read, write, close, and seek functions, which gives the application more flexibility but essentially requires the application to implement its own version of the data source and sink structures.  Since NufxLib is intended for manipulating archives, not compressing streams of data, the added flexibility did not justify the cost.  (I'm becoming less certain of that as time goes by.  If I had it all to do again, I probably would use the functional interface for all file accesses.)

It might have been useful to allow read/write/seek hooks for the archive itself.  The current architecture prevents you from processing an archive that has been loaded into memory, unless you have memory-based FILE* streams in your libc.  This became annoying during the development of CiderPress, because I wanted to handle archives within a wrapper, such as ".shk.gz".

Returning a pointer to an allocated NuArchive structure worked pretty well until I wanted to set a parameter that affected the way open works (the junk-skipping feature).  Creating the structure in a separate call before the "open" would have been better.

It might, for portability reasons, have been better to require a "create file" callback.  This would offload most of the system-dependent stuff in FileIO.c onto the application.  I chose not to do this because I felt it moved too much of the work out of NufxLib and into the hands of the developer.  Requiring the application to deal with the "OnlyUpdateOlder" and "HandleExisting" flags seemed excessive, and if there really are wide variants in the way files are created and modification dates are tested, then we might as well solve the problem once and for all in the library instead of requiring every application to solve it for themselves.  (I could, of course, provide sample code for several different platforms, but sample code tends to suffer from bit rot.)

There is no real support for GS/OS option lists.  The only place you'd ever want to add these is on a IIgs, and I find it unlikely that NufxLib will edge out GS/ShrinkIt as the preferred archiver.  (Besides, I question their value even on a IIgs.)  NufxLib will very carefully preserve them when modifying a record, but there's no way to add, delete, or modify them directly.

The use of RecordIdx and ThreadIdx, rather than record filename and thread offset number, was chosen for a number of reasons.  The most important was that they are unambiguous.  Consider that two records may have the same filename, that one record may have two filenames, and that a record may have no filename at all, and the need for RecordIdx becomes painfully clear.  I could've avoided ThreadIdx, by using a combination of RecordIdx and thread number, but after a few additions and deletions there is a clear advantage to having a unique identifier for every thread.  Besides, it allowed me to design calls like NuExtractThread so that they only had to take one identifier as an argument, reducing the amount of stuff an application needs to keep track of, as well as the amount of error checking that has to be done in the library.

Allowing threads to be copied without expanding and recompressing them is neat, but if I 'd know how cluttered the interface would become I probably wouldn't have supported the feature.  The NuDataSource calls are confusing enough as it is with the pre-sized thread stuff.

The "bulk" NuAdd interface can be cumbersome.  When extracting you can skip the bulk approach and handle filename conflicts yourself, but when adding you don't have a good alternative if you're adding lots of files.  You would have to follow every NuAdd with a NuFlush, which has some performance problems because (unless you configure the safety options off) NuFlush will write all data to the temp file and rotate it.

I chose not to implement EOL conversions when adding files.  It's too painful to do this in the library.  It would be easier for the application to write an EOL-converted file into a temp file, then use the file add call on that.  (The "storage name" is set independently from the source file name, so there's no problem with temp file names showing up in the archive.)  One approach that could be used within NufxLib would be to "pre-flight" the file by doing an EOL conversion pass to determine the final file length, then feed that length into the compression functions.  The "NuStraw" interface would do the conversion transparently to the compression routines.

The compression functions might have been better written with a zlib-like API.  This would have made it easier to extract the code and use it in other projects.  The only disadvantage of doing so is that it adds a little extra buffer copying overhead.

Some attention should have been paid to internationalization.

Some perhaps useful calls that weren't implemented:


History

A brief history of NufxLib releases.  See "ChangeLog.txt" in the sources for more detail.

Version Date Comments
v0.0 mid-1998 Work begins
v0.1 2000/01/17 First version viewed by test volunteers.
v0.5 2000/02/09 Alpha test version.
v0.6 2000/03/05 Beta test version.
v1.0 2000/05/18 Initial release.
v1.0.1 2000/05/22 Added workaround for badly-formed archives.
v1.1 2002/10/20 Many new features, notably support for several compression formats.
v2.0 2003/03/18 Support for Win32 DLL features.
v2.0.1 2003/10/16 Added junk-skipping and a workaround for bad option lists; Mac OS X stuff from sheppy.
v2.0.2 2004/03/10 Handle zeroed MasterEOF, and correctly set permissions on "locked" files.
v2.0.3 2004/10/11 Fixed some obscure bugs that CiderPress was hitting.
v2.1.0 2005/09/17 Added kNuValueIgnoreLZW2Len.
v2.1.1 2006/02/18 Fix two minor bugs.
v2.2.0 2007/02/19 Switched to BSD license.  Identify "bad Mac" archives automatically.
v2.2.2 2014/10/30 Updated build files, especially for Win32.  Moved to github.
v3.0.0 2015/01/09 Source code overhaul.  Added Unicode filename handling.
 

Acknowledgements

I'd like to thank Eric Shepherd for participating in some ping-pong e-mail sessions while I tried to get autoconf, BeOS, and some crufty versions of "make" figured out for v1.0.


This document is Copyright © 2000-2015 by Andy McFadden.  All Rights Reserved.

The latest version can be found on the NuLib web site at http://www.nulib.com/.