NufxLib API |
NufxLib v3.0.0 API - By Andy McFadden - Last revised 2015/01/09Table of contents
IntroductionNuFX, short for "New File Exchange", is a file format developed by Andy Nicholas for archiving files and disks on the Apple II series of computers. The format was devised in tandem with the development of ShrinkIt, which became the standard archive software for the Apple II soon after it's release in 1989. NuFX archives usually have filenames that end in ".SHK". This document describes the API (Application Program Interface) for NufxLib, a library of functions that manipulate NuFX archives. Good engineering practices dictate that an API should be minimal and complete. The confusion generated by redundant and overlapping interfaces can be as harmful as an omitted vital feature. I feel pretty good about the "complete" part, since NufxLib provides a way to do pretty much everything that I can reasonably expect somebody to want to do, but in some cases "minimal" has been swept aside in the name of convenience. (See the Design Notes section for additional commentary on this topic.) The NuFX specification is extremely general, and does not explicitly allow or forbid unusual conditions like having a record with two filenames in it. NufxLib follows the NuFX specification on everything that is spelled out, but restricts some of the undefined behaviors to the subset defined in the NuFX Addendum. In this document, the term "threads" usually refers to NuFX threads -- structures in the archive -- not CPU threads. Goals
Not Goals
That explains what I set out to do. Here's a quick summary of what I accomplished. Features
The library is protected by copyright, but can be distributed under the terms of theBSD License. See the file "COPYING-LIB" for full details. Interface changes from v1.x to v2.xSome changes were made during the development of NufxLib that broke binary compatibility with version 1.1 of the library. The changes were:
In addition, a NuTestRecord call was added. Applications written against v1.x may need to be updated. Check the NufxLib "samples" directory for examples of programs that use the updated calls. To make version management easier, v2.x includes the version number in the NufxLib.h header file. This allows dynamically-linked applications to compare a "compiled" version against a "linked" version. Interface changes from v2.x to v3.xThis was a major source code cleanup effort, one aspect of which was switching from general C types ("unsigned long") to types with explicit sizes ("uint32_t"). In some cases this caused some compilers to report errors, even though there's a fair chance that binary compatibility wasn't affected. Since it was an API-breaking change at some level, the major version number was bumped. The other major API change was the separation of Mac OS Roman and Unicode strings, which were previously blended freely. NuFX Archive Format OverviewThis document assumes that you are already familiar with the NuFX archive format, as described in the Apple II File Type Note for $e0/8002 and the Winter 1990 issue of Call-A.P.P.L.E. For those unwilling to wade through the technical documentation, here is a quick overview. A NuFX archive is composed of a Master Header followed by a series of Records. Each Record is composed of a Record Header and one or more Threads. The general idea is to store one file per Record. Each Thread holds a blob of data. The data can be a data or resource fork of a file, a disk image, a comment, or the filename for the Record. The Threads are identifed by a "class" and a "kind". The "class" tells you if it's a data thread, comment, filename, or something else, and the "kind" refines the class. For example, the resource fork of a file is a data-class thread with a "kind" of 2. Some Threads, notably filenames and comments, are pre-sized, meaning that the space allocated for them in the archive is larger than what is actually used. Filenames usually have at least 32 bytes set aside for them, though in practice a simple ProDOS filename will be shorter. This makes it possible to rename files and update comments without having to reconstruct the archive. The archive Master Header has only a few bits of information, such as the number of records and the date the archive was created. Unlike a ZIP archive, NuFX has no central table of contents. If you want to display the contents of an archive, you have to read the first Record header, pull the filename out (usually by finding and reading a filename Thread), compute the total size of the Record, and seek forward past the data. Repeat the process with each subsequent record, until you reach the end of the archive. The predominant compression algorithm is a slightly modified LZW (Ziv-Lempel-Welch). It's fast, but not very effective compared to the standard methods used in modern archivers. API OverviewThere are five basic categories of API calls. ReadOnly calls do not modify the archive in any way. The operations include things like listing and extracting files. These can be used on archive files opened read-only or read-write. StreamingReadOnly calls are a subset of ReadOnly calls that can be made on a streaming archive. A "streaming" archive is one that cannot be seeked, e.g. an archive being received over a network socket or a pipe from stdin. The same functions are invoked as for ReadOnly archives, but in some rare cases the results may be different. ReadWrite calls change the archive contents. Functions that add and delete files are here. These can only be used on archive files opened read-write. General calls can be made regardless of how the archive was opened. Functions included here can get and set archive parameters and define callbacks. Helper functions don't do anything to the archive. They're functions or macros that do useful things with some of the data types returned. They're included as a convenience. The library does everything it can to aid multi-threading. You should be able to perform simultaneous operations on multiple archives (assuming you have the reentrant versions of certain libc calls available). You cannot, however, invoke multiple simultaneous operations on a single archive. There is a general philosophy of laziness employed. For example, opening an archive does not cause the entire table of contents to be read. (In a NuFX archive, that would require scanning through most of the file.) As a result, there are actually three different ways to get the table of contents out of an archive:
For write operations, a certain form of laziness is again employed. If you want to delete three records from various points in an archive, you don't want to have to update the archive three times. NufxLib handles this by deferring all write operations until a "flush" call is made. In most cases, a "flush" results in a new archive being constructed in a temp file, which is subsequently renamed over the original. The flush call does not close the archive, so it is possible to do things like:
In certain restricted cases, such as updating a comment or appending new records, the original archive can (optionally) be updated in place, saving a potentially lengthy copying of data. As a final example of laziness, NufxLib does not re-read the archive it has just written after a Flush. It would have been easier to write all changes, throw out all data structures, and re-read the archive from scratch, but that could be slow. Instead, the library keeps track of the changes it has made -- something that gets a little tricky when filename threads are updated. Being lazy is often more work. Filenames stored in archives use the Mac OS Roman character set. The low 128 characters are ASCII, the high 128 are specified here. NufxLib will convert between Mac OS Roman and Unicode when necessary, and provides conversion functions for application use. When specifying a "local filename", i.e. a file on Linux or Windows, the API expects a Unicode string. When referring to an archived file by name (the "storage name"), the API uses the Mac OS Roman form. The parameter and field names reflect the character set ("UNI" or "MOR"), and use the UNICHAR type for Unicode strings. On Linux and Mac OS X the filename is encoded with UTF-8. On Windows it should be encoded with UTF-16, but that hasn't been implemented yet, so the API still uses 8-bit characters and effectively treats MOR strings as if they were Windows Code Page 1252. (This means the behavior of NufxLib is essentially unchanged for 3.0 on Windows.) Data Types and Source ConventionsAll API calls and data types begin with "Nu", and all constants start with "kNu". All internal functions start with "Nu_", and any internal data tables with global scope start with "gNu". Hopefully these rules will avoid compile-time and link-time name conflicts. For details about the fields available in different structures, see the NufxLib.h header file. Everything in NufxLib.h is public. Most of these types have a direct analog with a field or structure in the NuFX specification. UNICHAR (char -or- wchar_t): All filenames for "local"
files, i.e. files on the Linux or Windows filesystem, should use UNICHAR.
This will be Windows uses UTF-16 encoding, so NuError (enum): Most library functions return NuError. A value of zero (kNuErrNone) indicates success, anything else indicates failure. Values less than zero are NufxLib errors, while values greater than zero are system errors (like ENOENT). NuResult (enum): Callback functions return these values to tell NufxLib how things went. For example, an error callback can tell the library to Abort, Retry, or Skip. (Okay, it can Ignore too.) NuRecordIdx and NuThreadIdx (uint32_t): These are used to identify a specific record or thread in API calls. Their values are assigned when the archive file is read. They aren't reused, so if you delete some records and add some new ones, the indices of the deleted records won't appear again. Do not assume that the indices start at a specific value or are assigned in a particular order. The indices are assigned when the archive is opened, and if you close and reopen the archive, they may be completely different. NuThreadID (uint32_t): This is a combination of the 16-bit "thread class" and the 16-bit "thread kind". Constants are defined for common values, e.g. kNuThreadIDDataFork (0x00020000) indicates a data fork. NuThreadFormat (enum): An enumeration of constants representing the 16-bit "thread format" value. This is used to specify a type of compression (uncompressed, LZW/1, LZW/2, etc). NuFileSysID (enum): An enumeration of GS/OS file system identifiers. NuStorageType (enum): An enumeration of ProDOS storage types. There are extended (forked) files, directories, and three types of plain files. NuArchive (opaque struct): This is the fundamental state structure for all API calls. Every call takes one of these as an argument. The structure contains all of the information about the archive and pending operations. NuCallback (pointer to function): Callback function declarations must match this type. An example would be "NuResult MyFunction(NuArchive* pArchive, void* args)". NuValueID (enum): An identifier for settable values. You can change certain NufxLib parameters after opening an archive. This enum is how you specify which parameter you want to change. NuValue (uint32_t): The new value for the parameter specified by the NuValueID. NuAttrID (enum): An identifier for archive attributes. You can get information about archive attributes (characteristics of the archive itself) through a NufxLib interface. This type has an enumeration of the legal values. NuAttr (uint32_t): The value for the attribute specified by the NuAttrID is placed in one of these. NuDataSource (opaque struct): Some of the fancier NufxLib calls allow you to use data from a file on disk, a file that's already open, or a buffer of memory. This struct contains that specification. NuDataSink (opaque struct): Like NuDataSource, this specifies a data location. This struct is for data being extracted. NuDateTime (struct): This holds the date and time in an expanded format, using the same structure as TimeRec from "misctool.h" on the IIgs. NuThread (struct): The fields from the thread header, as well as a few new ones like the absolute file offset, are accessible. NuRecord (struct): This has all of the fields from the NuFX Record structure, as well as some convenience fields (like "filename", which always points to the right filename whether it was stored in the record header or came out of a thread). Some calls cause a NuRecord structure to be passed to a callback function, where it can be accessed directly. The Threads are represented as an array of NuThread structures attached to the NuRecord. NuMasterHeader (struct): This holds the data from the archive's master header block. NuRecordAttr (struct): Some of the fields in a NuRecord can be changed, such as the file type and modification date. This structure contains the modifiable fields, and is used as an argument to two of the API calls. NuFileDetails (struct): When adding files, it is up to the application to supply many of the details about the file, such as the file type, access permissions, and modification date. This structure provides a way to pass those values into the library. NuSelectionProposal (struct): Selection callback functions receive one of these. NuPathnameProposal (struct): Pathname filter callback functions receive one of these. NuProgressData (struct): Progress update callback functions receive one of these. NuProgressState (enum): A component of NuProgressData, this tells the callback function what the library is doing. NuErrorStatus (struct): Error handling callback functions receive one of these. Files are referenced with standard libc FILE* pointers. The library uses fseek and ftell, which are defined by POSIX to take a signed long integer for the offset argument, so archives larger than 2GB cannot be handled. ReadOnly InterfacesThese interfaces can be used on read-only and read-write archives. A subset, described later, can also be used on streaming-read-only archives. NuError NuOpenRO(const UNICHAR* archivePathnameUNI, NuArchive** ppArchive)Creates a new NuArchive structure for the "archivePathname" file. The file will be opened in read-only mode. Attempting to use ReadWrite interfaces on a read-only archive will fail. NuError NuContents(NuArchive* pArchive, NuCallback contentFunc)Read the list of entries from the archive. If the full table of contents has already been read, the in-memory copy will be used. "contentFunc" is a callback function that will be called once for every record in the archive. The callback function should look something like this:
(Depending on your compiler, you may have to declare "pRecord" as a void* and cast it in the function.) The record passed to the callback function does not reflect the results of any un-flushed changes. Additions, deletions, and updates will not be visible until NuFlush is called. The application must not attempt to retain a copy of "pRecord" after the callback returns, as the structure may be freed. Anything of interest should be copied out. NuError NuExtract(NuArchive* pArchive)Try to extract all files from the archive. Each entry is passed through the SelectionFilter callback, if one has been supplied, to determine whether or not it should be extracted. The OutputPathnameFilter callback is invoked to covert the filenames to something appropriate for the target filesystem. On systems that support forked files, a record with both data and resource forks can be extracted to the individual forks of the same file. On systems without native support for forks, the data can be extracted into two different files by using the OutputPathnameFilter. If the system doesn't support forks, and no OutputPathnameFilter is specified, then the forks will be extracted into the same file. Depending on the value of the kNuValueHandleExisting parameter, this could result in one fork overwriting the other, in one fork not getting extracted, or in the HandleError callback getting invoked. (The HandleError callback can choose to rename the file, overwrite it, skip the current entry, or abort the entire process.) The global EOL conversion setting is applied to all threads, but is automatically turned off for disk image threads. NuError NuExtractRecord(NuArchive* pArchive, NuRecordIdx recordIdx)Extract a single record. Otherwise identical to NuExtract. The SelectionFilter callback, if specified, will be invoked. There are a number of ways to get the recordIdx. You can call NuContents and use the callback to find the one you want. You can get the recordIdx by the filename stored in the archive, with NuGetRecordIdxByFilename. Or, you can get it by the record's offset in the archive, using NuGetRecordIdxByPosition. NuError NuExtractThread(NuArchive* pArchive, NuThreadIdx threadIdx, NuDataSink* pDataSink)Extract a single thread. Specify the thread index and a place to put the data. The SelectionFilter callback, if specified, will be invoked. Remember that, if EOL conversion is enabled in the data sink, the amount of data that comes out of a thread may not match pThread's "actualThreadEOF" value. (In some ways it doesn't really make sense to call the SelectionFilter callback when a specific thread has been singled out for extraction. However, it's easy to disable (set the callback to NULL), it may prove useful, and it keeps the interface consistent.) NuError NuTest(NuArchive* pArchive)The NuTest call is functionally equivalent to NuExtract in every way but one: it doesn't actually extract anything. If you want to test a subset of the files, supply a SelectionFilter callback. This won't test filenames or comments because those aren't extracted by NuExtract. However, since such threads don't have CRCs, there's really nothing to test anyway. The parts that can be tested for correctness are verified automatically when the archive table of contents is read. NuError NuTestRecord(NuArchive* pArchive, NuRecordIdx recordIdx)A single-record version of NuTest. NuError NuGetRecord(NuArchive* pArchive, NuRecordIdx recordIdx, const NuRecord** pRecord)Get a pointer to the record header. The thread array can be accessed through this pointer. As with callbacks, when you get a const pointer, it is very important that you don't try to modify it. The structure pointed to is part of the current archive state, so the effects of changes are unpredictable. If you wish to alter fields in the Record header, use the NuSetRecordAttr call. IMPORTANT: you must discard this pointer if you call NuFlush or NuClose. NuError NuGetRecordIdxByName(NuArchive* pArchive, const char* nameMOR, NuRecordIdx* pRecordIdx)Get the recordIdx for the first record in the archive whose case-insensitive filename matches "name". The value retrieved can be used with any call that takes a NuThreadIdx argument. The "name" string must match the record's filename exactly, including the filename separator character. If you know what you want to extract from an archive by name, use this. NuError NuGetRecordIdxByPosition(NuArchive* pArchive, uint32_t position, NuRecordIdx* pRecordIdx)Get the recordIdx for nth record in the archive. "position" is zero-based, meaning the very first record in the archive is at position 0, the next is at position 1, and so on. The value retrieved can be used with any call that takes a NuRecordIdx argument. This could be useful when an application is certain that it is only interested in the very first record in the archive, e.g. an Apple II emulator opening a disk image. StreamingReadOnly InterfacesA streaming archive is presented to the library as a FILE* that can't be seeked, generally because it was handed to the application via a pipe or shell redirect. A subset of the ReadOnly interfaces are supported. All of them leave the stream pointed at the first byte past the end of the archive. This calls are also useful for files on disk in situations where memory is at a premium. Because it's impossible to seek backwards in the archive, no attempt is made to remember anything about records other than the one most recently read. The interfaces supported are:
There is one interface that only applies to StreamingReadOnly archives: NuError NuStreamOpenRO(FILE* infp, NuArchive** ppArchive)Creates a new NuArchive structure for "infp". The file must be positioned at the start of the archive. It should be possible to concatenate multiple archives together, and use them by issuing consecutive NuStreamOpenRO calls. If your system requires fopen(filename, "rb") instead of "r" (e.g. Win32), make sure the archive file was opened with "b", or you may get "unexpected EOF" complaints. ReadWrite InterfacesNuError NuOpenRW(const UNICHAR* archivePathnameUNI, const UNICHAR* tempPathnameUNI, uint32_t flags, NuArchive** ppArchive)Open a file for read-write operations. A pointer to the new archive is returned via "ppArchive". "archivePathnameUNI" is the name of the archive to open. If the file has zero length, the archive will be treated as if NufxLib had just created it. "tempPathnameUNI" is the name of the temp file to use. The call will fail if the temp file already exists. The temp file must be in a location that allows it to be renamed over the original archive when a "flush" operation has completed. If "tempPathname" ends in six 'X's, e.g. "tmpXXXXXX", the name will be treated as a mktemp-style pattern, and a unique six-character string will be substituted before the file is opened. Note that the temp file will be opened even if "kNuValueModifyOrig" is set. "flags" is a bit vector of boolean flags that affect how the archive is opened. If no flags are set, and the archive doesn't exist, the call will fail. If "kNuOpenCreat" is set, the archive will be created if it doesn't exist. If "kNuOpenCreat" and "kNuOpenExcl" are both set, the call will fail if "archivePathname" already exists (i.e. the archive *must* be created). If the archive was just created, "kNuValueModifyOrig" will be set to "true". NufxLib can tell the difference between a BXY file (NuFX in a Binary II wrapper) and a BNY file with several entries whose first entry happens to be a NuFX archive. Access to BNY files that happen to have a ShrinkIt archive in them isn't supported. NuError NuFlush(NuArchive* pArchive, long *pStatusFlags)Commits all pending write operations. "pStatusFlags" gets a bit vector of flags regarding the status of the archive. If a non-kNuErrNone result is returned, "pStatusFlags" may contain one or more of the following:
Some of the above are mutually exclusive, e.g. only one of kNuFlushSucceeded, kNuFlushAborted, and kNuFlushCorrupted will be set. Any records without threads -- either created that way or having had all threads deleted -- will be removed. Newly-created records without filename threads will have one added. (Existing records without filenames are frowned upon but left alone.) Normally, the archive is reconstructed in the temp file, and the temp file is renamed over the original archive after all of the operations have completed successfully. As a performance optimization, if kNuValueModifyOrig is "true", NuFlush will try to modify the archive in place. This is only possible if the changes made to the archive consist entirely of additions of new files, updates to pre-sized threads, and/or setting record attributes. If other changes have been made, the update will be done through the temp file. If an operation fails during the flush, all changes will be aborted. If something fails in a way that can't be recovered from, such as failing to rename the temp file after a successful flush or failing partway through an update to the original archive, the archive may be switched to read-only mode to prevent future operations from compounding the problem. NuError NuAddRecord(NuArchive* pArchive, const NuFileDetails* pFileDetails, NuRecordIdx* pRecordIdx)Add a new record with no threads. The index of the created record is returned in "pRecordIdx". This always creates a "version 3" record, and expects that the filename will be stored in a thread. "pFileDetails" is a pointer to a NuFileDetails structure. This contains most of the interesting fields in a record, such as access flags, dates, file types, and the filename. The "threadID" field is ignored for this call. "pRecordIdx" may be NULL. However, the only way to add threads to the record is with NuAddThread, which requires the record index as a parameter, so you almost certainly want to get this value. If no filename thread is added, the NuFlush call will use the "storageName" field from the "pFileDetails" parameter to create a filename thread for it. If no threads are added at all, the NuFlush call will throw the record away. The "pFileDetails->storageName" may not start with the filename separator argument, e.g. "/tmp/foo" is illegal but "tmp/foo" is okay. If a disk image thread is added to the record, and the "storageType" and "extraType" values set by "pFileDetails" aren't compatible, the entries will be replaced with values appropriate for the thread. For records with non-disk data-class threads, the storageType will be adjusted when necessary. Depending on the values of kNuValueAllowDuplicates and kNuValueHandleExisting, this may replace an existing record in the archive. See Replacing Existing Records and Files for details. NuError NuAddThread(NuArchive* pArchive, NuRecordIdx recordIdx, NuThreadID threadID, NuDataSource* pDataSource, NuThreadIdx* pThreadIdx)Add a new thread to a record. You may add threads to an existing record or a newly created one. Some combinations of threads are not allowed, and will cause an error to be returned. (See the NuFX Addendum for details.) "recordIdx" is the index of the record being added to. "threadID" is the class and kind of the thread being added. This defines how the data is labeled in the archive, and whether the contents of "pDataSource" are to be regarded as pre-sized or not. "pDataSource" is where the data comes from. If the source is uncompressed, the thread will be compressed with the compression value currently defined by kNuValueDataCompression. (You can set the value independently for each call to NuAddThread.) Only data-class threads will be compressed. If you're adding a pre-sized thread, such as a comment or filename, set the "otherLen" field in the data source. "pThreadIdx" gets the thread index of the newly created thread. This parameter may be set to NULL. Threads will be arranged in an appropriate order that may not be the same as the order in which NuAddThread was called. If "threadID" indicates the thread is a disk image, then the uncompressed length must either be a multiple of 512 bytes, or must be equal to (recExtraType * recStorageType) in the record header. On successful completion, the library takes ownership of "pDataSource". The structure will be freed after a NuFlush call completes successfully or all changes are aborted. Until NuFlush or NuAbort completes, it is vital that you don't free the underlying resource. That is, don't close the FILE*, delete the file, or free the buffer that the data source references. If you don't want to keep track of the resources used by FP and Buffer sources, you can specify "fcloseFunc" or "freeFunc" functions to have them released automatically. See the explanation of NuDataSource for details. NuError NuAddFile(NuArchive* pArchive, const UNICHAR* pathnameUNI, const NuFileDetails* pFileDetails, short fromRsrcFork, NuRecordIdx* pRecordIdx)Add a file to the archive. This is a combination of NuAddRecord and NuAddThread, but goes a little beyond that. If you add a file whose pFileDetails->threadID indicates a data fork, and another file whose pFileDetails->threadID indicates a resource fork, and both files have the same pFileDetails->storageName, then the two files will be combined into a single record. "pathnameUNI" is how to open the file. It does not have any bearing on the filename stored in the archive. Because all write operations are deferred, NufxLib will not open or even test the existence of the file before NuFlush is called. "pFileDetails" describes the file types, dates, and access flags associated with the file, as well as the filename that will be stored in the archive ("storageName"). If two forks are placed in the same record, whichever was added first will determine the record's characteristics. "fromRsrcFork" should be set if NufxLib should get the data out of the "pathname" file's resource fork. If the underlying filesystem doesn't support resource forks, then the argument has no effect. It does not have any impact on whether the data is stored as a data fork thread or resource fork thread -- that is decided by the "threadID" field of "pFileDetails". "pRecordIdx" gets the record index of the new (or existing) record. This argument may be NULL. The "pFileDetails->storageName" may not start with the filename separator argument, i.e. "/tmp/foo" is illegal but "tmp/foo" is okay. If "pFileDetails->threadID" indicates the thread is a disk image, then the uncompressed length must either be a multiple of 512 bytes, or must be equal to recExtraType * recStorageType. On systems with forked files, such as GS/OS and Mac OS, it will be necessary to call NuAddFile twice on forked files. The call will automatically join forks with identical names. Depending on the values of kNuValueAllowDuplicates and kNuValueHandleExisting, this may replace an existing record in the archive. See Replacing Existing Records and Files for details. Adding a directory will not cause NufxLib to recursively descend through the directory hierarchy. That's the application's job. Requests to add directories are currently ignored. [A future release may add a "create directory" control thread, so we can store empty directories.] NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx, const char* pathnameMOR, char fssep)Rename an existing record. Pass in the index of the record to update, the new name, and the filename separator character. Setting the name to an empty string is not permitted. This call will do one of three things to the archive. If a filename thread is present in the record, and it has enough room to hold the new filename, then the existing thread will be updated. If a filename thread is present, but doesn't have enough space to hold the new name, then the existing thread will be deleted and a new filename thread will be added. Finally, if no filename thread is present, a new one will be added, and the filename in the record header (if one was set) will be dropped. NufxLib does not currently test for the existence of records with an identical name. This is probably a bug (ought to obey the kNuValueAllowDuplicates setting). NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx, const NuRecordAttr* pRecordAttr)Set a record's attributes. The fields in the NuRecordAttr struct replace the fields in the record. This can be used to change filetypes, modification dates, access flags, and the file_sys_id field. The changes become visible to NuContents calls only after NuFlush is called. You can fill in values in the NuRecordAttr from a NuRecord struct with the NuRecordCopyAttr call. NuError NuUpdatePresizedThread(NuArchive* pArchive, NuThreadIdx threadIdx, NuDataSource* pDataSource, long* pMaxLen)Update the contents of a pre-sized thread. This can only be used on filename and comment threads. Attempting to use it on other threads results in a kNuErrNotPreSized return value. "threadIdx" is the index of the thread to update, and "pDataSource" is where the data comes from. The "otherLen" field in "pDataSource" is ignored, because this call cannot be used to resize an existing thread. (The only way to do that is to delete the thread and then create a new one.) "pMaxLen" will hold the maximum size of the thread if the call succeeds. If the call fails because the existing thread is too small, kNuErrNotPreSized is returned and "pMaxLen" will be valid. (You can also get the size by examining the thread's thCompThreadEOF field.) This cannot be used on newly-added, deleted, or updated threads. On successful completion, the library takes ownership of "pDataSource". The structure will be freed after a NuFlush call completes successfully or all changes are aborted. Until NuFlush or NuAbort completes, it is vital that you don't free the underlying resource. That is, don't close the FILE*, delete the file, or free the buffer that the data source references. If you don't want to keep track of the resources used by FP and Buffer sources, you can specify "fcloseFunc" or "freeFunc" functions to have them closed automatically. See the explanation of NuDataSource for details. NuError NuDelete(NuArchive* pArchive)Bulk delete. This tries to delete every record in the archive, invoking the SelectionFilter callback if one has been specified. You cannot delete a record that is newly-added, has been modified, has already been deleted, or has had threads added, deleted, or updated. Such records will be skipped over, so your selection filter simply won't see them. Because deletion is a deferred write operation, none of the records will actually be deleted until NuFlush is called. If NuDelete was successful in its attempt to delete every record, and no new records were added, the NuFlush call will mark the archive as being brand new (this differs from v1.0, which failed with kNuErrAllDeleted). As a result, if you close the empty archive without adding anything to it, the archive file will be removed. NuError NuDeleteRecord(NuArchive* pArchive, NuRecordIdx recordIdx)Delete a single record, specified by record index. You cannot delete a record that is newly-added, has been modified, has already been deleted, or has had threads added, deleted, or updated. The record will be removed when NuFlush is called. NuError NuDeleteThread(NuArchive* pArchive, NuThreadIdx threadIdx)Delete a single thread, specified by thread index. If you delete all of the threads in a record, and don't add any new ones, the record will be removed. You cannot delete a thread that is newly-added, deleted, or has been updated. The thread will not be removed until NuFlush is called. General InterfacesArchive OperationsNuError NuClose(NuArchive* pArchive)Closes the archive. If the archive was opened read-write, any pending changes will be flushed first. If the flush attempt fails, NuClose will leave the archive open and return with an error. When the archive is closed, the temp file associated with a read/write archive will be closed and removed. All data structures associated with the archive are freed. Attempting to use "pArchive" further results in an error (or worse). NuError NuAbort(NuArchive* pArchive)Abort all pending changes. NufxLib will throw out every pending modification request, returning to the state it was in following the most recent Open or Flush. This does not close or manipulate any files, except for those pointed to by data sources with "fcloseFunc" set. For the most part it simply updates internal data structures. It's perfectly safe to call this if there are no pending changes. The call just returns without doing anything. NuError NuGetMasterHeader(NuArchive* pArchive, const NuMasterHeader** ppMasterHeader)Get a pointer to the NuFX MasterHeader block. One useful item here is the number of records in the archive. IMPORTANT: do not retain the pointer after calling NuFlush or NuAbort. NuError NuGetExtraData(NuArchive* pArchive, void** ppData)
|
kNuValueIgnoreCRC | Boolean (false). Don't verify header or data CRCs. This can provide a minor speed improvement, but allows certain kinds of errors to go undetected. |
kNuValueDataCompression | Enum (kNuCompressLZW2). Threads that can be compressed
(i.e. data-class threads) will be compressed with the specified
compression. Possible values are:
|
kNuValueDiscardWrapper | Boolean (false). If changes are made to the archive that cause a new copy to be reconstructed in the temp file, then when this is set to "true" any BXY, BSE, or SEA wrapper will be stripped off. This also causes any "junk" at the start of the file to be removed. |
kNuValueEOL | Enum (system-dependent). End-of-line marker
appropriate for the current system. If EOL conversion is enabled,
extracted files will be converted to this EOL value. Valid values
are:
|
kNuValueConvertExtractedEOL | Enum (kNuConvertOff). This determines whether
"bulk" extractions do EOL conversions. Possible values:
|
kNuValueOnlyUpdateOlder | Boolean (false). If set, only overwrite existing records and files if the item being added or extracted is newer than the one being replaced. Useful for an "update" or "freshen" option. The date used for comparison is the modification date. |
kNuValueAllowDuplicates | Boolean (false). If set to "true", duplicate records are allowed in the archive. If "false", the collision will be handled according to the kNuValueHandleExisting setting. Filename comparisons are case-insensitive. |
kNuValueHandleExisting | Enum (kNuMaybeOverwrite). This determines how
duplicate filename collisions are handled. Valid values:
The case sensitivity when extracting is determined by the underlying filesystem. |
kNuValueModifyOrig | Boolean (false, unless the archive was just created by NufxLib). If this is "true", then an effort will be made to handle all updates in the original archive, rather than reconstructing the entire archive in a temp file. Updates to pre-sized threads, changes to record attributes, and additions of new files can all be made to the original archive. There is some risk of corruption if the flush fails, so use this with caution. |
kNuValueMimicSHK | Boolean (false). If set, attempt to mimic the behavior of ShrinkIt as closely as possible. See the ShrinkIt Compatibility Mode section. |
kNuValueMaskDataless | Boolean (false). If set to "true", records without data threads have "fake" threads created for them, so that they appear as they would had they been created correctly. |
kNuValueStripHighASCII | Boolean (false). If set to "true", files filled with high-ASCII characters will be stripped if and only if an EOL conversion is performed. |
kNuValueJunkSkipMax | Integer (1024). If the archive file doesn't start with a recognized sequence, NufxLib will assume that some junk has been added to the start of the file and will scan forward at most this many bytes in an attempt to locate the real archive start. |
kNuValueIgnoreLZW2Len | Boolean (false). If set to "true", the length value embedded in LZW2 compressed chunks is ignored. This is useful for archives created with a specific broken application. (This is deprecated -- use HandleBadMac instead.) |
kNuValueHandleBadMac | Boolean (false). Recognize and handle "bad Mac" archives, which have a bad value ('?') for the filename separator character, and write an LZW/2 length value in big-endian order. |
kNuAttrArchiveType | Returns one of the following:
|
kNuAttrNumRecords | Returns the number of records in the archive. This value does not reflect unflushed changes. |
kNuAttrHeaderOffset | Returns the offset of the NuFX header from the start of the file. This will be nonzero for archives with a Binary II or self-extracting wrapper. |
kNuAttrJunkOffset | Returns the amount of junk found at the start of the file. A nonzero value here indicates that junk was found. |
kNuFeatureCompressSQ | Test for support of SQueeze compression |
kNuFeatureCompressLZW | Test for support of ShrinkIt LZW/1 and LZW/2 compression |
kNuFeatureCompressLZC | Test for support of 12- and 16-bit LZC |
kNuFeatureCompressDeflate | Test for support of zlib "deflate" |
kNuFeatureCompressBzip2 | Test for support of libbz2 "bzip2" |
When using NuAddFile or NuAddRecord, there are three flags that affect what happens when an existing record has the same name:
The AllowDuplicates flag determines whether or not we think duplicate records are at all interesting. If an application sets it to true, the floodgates are opened, and the two other flags are ignored.
The OnlyUpdateOlder flag is considered next. If it's set to true, and an existing, identically named file in the archive appears to be the same age or newer than the file being added, the record creation attempt fails with an error (kNuErrNotNewer).
The HandleExisting flag comes into play if we get past the first two. If a matching entry is found in the archive, NufxLib either deletes it and allows the add, prompts the user for instructions, or rejects it with an error. NuAddFile and NuAddRecord will return with kNuErrRecordExists if they can't replace an existing record. If "must overwrite" is set, and a matching record does not exist, then kNuErrDuplicateNotFound is returned.
Both AddFile and AddRecord check for duplicates among existing and newly added files. You aren't allowed to delete items that were just added, so HandleExisting flag is ignored for files you have marked for addition but haven't yet flushed.
AddFile has an additional behavior that takes precedence over all of the flags: it will try to match up the individual forks of a file. If it finds a file in the newly-added list with the same name and a compatible data thread, the new file will be added to the existing record. (A "compatible" data thread is the other half of a forked file, e.g. the application added the data fork, and is now adding the resource fork from the same file.) If the record was found but is not compatible, the AllowDuplicates behavior is used to decide if another "new" record with the same name should be added, or if an error should be returned.
If this sort of treatment is undesirable, i.e. you want a data fork and a resource fork with the same filename to be stored as two separate records, then you should call AddRecord and AddThread. AddFile is meant as a convenience for common operations.
It is possible for NuAddFile and NuAddRecord to partially complete. If a record exists and is deleted, but the call later fails for some other reason, the record will still be deleted.
Searching for existing records can take time on a large archive. Disabling AllowDuplicates will allow NufxLib to avoid having to search through the lists of records to find matches.
When extracting files from an archive, the "OnlyUpdateOlder" and "HandleExisting" flags are applied to the files on disk. This is done much like the above.
To implement NuLib2's "update" feature, "OnlyUpdateOlder" needs to be set to true. To implement the "freshen" feature, "OnlyUpdateOlder" is set to true and "HandleExisting" is set to "must overwrite".
When extractions are done in bulk, the kNuErrDuplicateNotFound and kNuErrNotNewer errors are passed to the application's error handler function. The error handler is expected to return kNuSkip after perhaps updating the progress status message, but is allowed to abort or require NufxLib to overwrite the file anyway. If no error handler is defined, the file is skipped silently.
One of the goals was to be as compatible with ShrinkIt as possible. ShrinkIt and GS/ShrinkIt occasionally do some strange things, so some of the compatible behaviors are only activated when the "mimic ShrinkIt" flag is set.
These behaviors are:
Some GS/ShrinkIt behaviors are not fully emulated:
Regarding the last item: a quick test with a handful of empty files showed that GS/ShrinkIt v1.1 failed to extract the empty files it had just archived. P8 ShrinkIt v3.4 gets really confused on such archives, and insists that the first entry is a zero-byte disk archive, while the other empty files are actually four bytes long. When asked to extract the files, it does nothing. When adding empty files, P8 ShrinkIt v3.4 does the correct thing, and creates an empty data thread.
The default NufxLib behavior is to work around the bug. When extracting files with a filename but no data or control threads, a zero-byte data file will be created. (In NufxLib v1.0 the default was to ignore such entries unless the "mimic" flag was enabled. This was changed in v1.1 to be enabled at all times. As of v1.2, an empty resource fork is also created if the record's storage type indicates it's an extended file.) If the "MaskDataless" flag is enabled, fake data threads are created, and applications won't even know there's a problem in the archive.
In general, with "mimic ShrinkIt" mode enabled, it should be possible to extract files from a GS/ShrinkIt archive and re-add them to a new archive, with little perceptible difference between the two. Of course, it's up to the application to ensure that all threads (including comments) are retained, file dates aren't altered, and so on. The only situations where NufxLib cannot produce identical results are bugs (e.g. zero-length data files always require more space) and option lists (which NufxLib does not currently support).
The bottom line: it is perfectly normal for NufxLib archives to be a few bytes smaller than GS/ShrinkIt archives, even when "mimic ShrinkIt" mode is enabled. (An example: my 20MB boot partition compressed to about 14MB. With "mimic" mode off, the file was 13K smaller, or about 0.1%.)
Of the various compression formats that NufxLib supports, only LZW/1 and LZW/2 are widely supported. The latest versions of ShrinkIt and II Unshrink try to unpack SQ compression but fail. Archives that use SQ, LZC12, and LZC16 can only be unpacked by GS/ShrinkIt, NuLib, and NufxLib v1.1 and later.
The "deflate" and "bzip2" algorithms are not supported by anything other than NufxLib v1.1+ and CiderPress. They are intended to be used with archives that will never be unpacked on an Apple II. Disk images compressed with these algorithms are especially useful with emulators that use NufxLib.
Some tests with deflate and bzip2 showed that, surprisingly, deflate is nearly always better than bzip2 for Apple II files. This is because deflate appears to do slightly better on machine code and small (< 32K) text files. Since most Apple II files and disk images fall into these categories, there is little advantage to using bzip2. Because deflate uses less memory and is faster, and libbz2 isn't nearly as ubiquitous as libz, I've chosen to disable bzip2 by default.
You can use the NuFeatureTest call to test for the presence of any of the compression algorithms. This makes it possible to build a library or DLL without LZW in it.
NufxLib v1.0 was developed under Solaris 2.5 and Red Hat Linux 6.0, and was ported to Win32 shortly before the alpha release. Porting to other UNIX-like platforms has been straightforward, with most differences contained in the "autoconf" configuration system. For example, the BeOS/PPC port was largely a matter of getting the compiler settings right.
Mac OS and GS/OS have the ability to store file types and resource forks natively. Support for this is not currently part of NufxLib. A data-fork-only port, akin to what is used on UNIX and Win32, should be straightforward though. (In fact, Mac OS X "just worked".)
Once upon a time a GS/OS port was imagined. This never happened, and likely never will.
The decision to pass FILE* structures instead of file descriptors was somewhat arbitrary. The library uses buffered I/O internally, so it was convenient to have them passed in, rather than having an fd passed in and rely on the existence of an fdopen() call. On the other hand, if an application is built with a different version of the stdio library (in which the structure of a FILE* are different), linking with NufxLib might not work. Given that NufxLib is distributed as BSD-licensed source code, I don't see this as being a major problem, since you can always rebuild NufxLib with the altered stdio lib. (This caused me some grief under Windows, because the non-debug multithreaded DLL version of libc apparently does something wonky with FILE* and fwrite. If the Win32 DLL and Win32 app aren't linked against the same libc, fwrite() will crash. Other versions of libc, e.g. debug multithreaded and debug single-threaded, interact just fine with each other.) My conclusion after fighting with Win32 is that it would have been better to pass file descriptors or a "NuFILE*" with read/write/seek operations that reside wholly within the NufxLib library.
The decision to pass data sources and sinks around as structures rather than as function pointers was born of a desire to reduce complexity. Setting up a data source or sink requires making a function call with a lot of arguments, but once that's done you can forget all about it -- the code will happily close your files and free your memory when you're done with it. A functional interface would require passing in read, write, close, and seek functions, which gives the application more flexibility but essentially requires the application to implement its own version of the data source and sink structures. Since NufxLib is intended for manipulating archives, not compressing streams of data, the added flexibility did not justify the cost. (I'm becoming less certain of that as time goes by. If I had it all to do again, I probably would use the functional interface for all file accesses.)
It might have been useful to allow read/write/seek hooks for the archive itself. The current architecture prevents you from processing an archive that has been loaded into memory, unless you have memory-based FILE* streams in your libc. This became annoying during the development of CiderPress, because I wanted to handle archives within a wrapper, such as ".shk.gz".
Returning a pointer to an allocated NuArchive structure worked pretty well until I wanted to set a parameter that affected the way open works (the junk-skipping feature). Creating the structure in a separate call before the "open" would have been better.
It might, for portability reasons, have been better to require a "create file" callback. This would offload most of the system-dependent stuff in FileIO.c onto the application. I chose not to do this because I felt it moved too much of the work out of NufxLib and into the hands of the developer. Requiring the application to deal with the "OnlyUpdateOlder" and "HandleExisting" flags seemed excessive, and if there really are wide variants in the way files are created and modification dates are tested, then we might as well solve the problem once and for all in the library instead of requiring every application to solve it for themselves. (I could, of course, provide sample code for several different platforms, but sample code tends to suffer from bit rot.)
There is no real support for GS/OS option lists. The only place you'd ever want to add these is on a IIgs, and I find it unlikely that NufxLib will edge out GS/ShrinkIt as the preferred archiver. (Besides, I question their value even on a IIgs.) NufxLib will very carefully preserve them when modifying a record, but there's no way to add, delete, or modify them directly.
The use of RecordIdx and ThreadIdx, rather than record filename and thread offset number, was chosen for a number of reasons. The most important was that they are unambiguous. Consider that two records may have the same filename, that one record may have two filenames, and that a record may have no filename at all, and the need for RecordIdx becomes painfully clear. I could've avoided ThreadIdx, by using a combination of RecordIdx and thread number, but after a few additions and deletions there is a clear advantage to having a unique identifier for every thread. Besides, it allowed me to design calls like NuExtractThread so that they only had to take one identifier as an argument, reducing the amount of stuff an application needs to keep track of, as well as the amount of error checking that has to be done in the library.
Allowing threads to be copied without expanding and recompressing them is neat, but if I 'd know how cluttered the interface would become I probably wouldn't have supported the feature. The NuDataSource calls are confusing enough as it is with the pre-sized thread stuff.
The "bulk" NuAdd interface can be cumbersome. When extracting you can skip the bulk approach and handle filename conflicts yourself, but when adding you don't have a good alternative if you're adding lots of files. You would have to follow every NuAdd with a NuFlush, which has some performance problems because (unless you configure the safety options off) NuFlush will write all data to the temp file and rotate it.
I chose not to implement EOL conversions when adding files. It's too painful to do this in the library. It would be easier for the application to write an EOL-converted file into a temp file, then use the file add call on that. (The "storage name" is set independently from the source file name, so there's no problem with temp file names showing up in the archive.) One approach that could be used within NufxLib would be to "pre-flight" the file by doing an EOL conversion pass to determine the final file length, then feed that length into the compression functions. The "NuStraw" interface would do the conversion transparently to the compression routines.
The compression functions might have been better written with a zlib-like API. This would have made it easier to extract the code and use it in other projects. The only disadvantage of doing so is that it adds a little extra buffer copying overhead.
Some attention should have been paid to internationalization.
Some perhaps useful calls that weren't implemented:
A brief history of NufxLib releases. See "ChangeLog.txt" in the sources for more detail.
Version | Date | Comments |
v0.0 | mid-1998 | Work begins |
v0.1 | 2000/01/17 | First version viewed by test volunteers. |
v0.5 | 2000/02/09 | Alpha test version. |
v0.6 | 2000/03/05 | Beta test version. |
v1.0 | 2000/05/18 | Initial release. |
v1.0.1 | 2000/05/22 | Added workaround for badly-formed archives. |
v1.1 | 2002/10/20 | Many new features, notably support for several compression formats. |
v2.0 | 2003/03/18 | Support for Win32 DLL features. |
v2.0.1 | 2003/10/16 | Added junk-skipping and a workaround for bad option lists; Mac OS X stuff from sheppy. |
v2.0.2 | 2004/03/10 | Handle zeroed MasterEOF, and correctly set permissions on "locked" files. |
v2.0.3 | 2004/10/11 | Fixed some obscure bugs that CiderPress was hitting. |
v2.1.0 | 2005/09/17 | Added kNuValueIgnoreLZW2Len. |
v2.1.1 | 2006/02/18 | Fix two minor bugs. |
v2.2.0 | 2007/02/19 | Switched to BSD license. Identify "bad Mac" archives automatically. |
v2.2.2 | 2014/10/30 | Updated build files, especially for Win32. Moved to github. |
v3.0.0 | 2015/01/09 | Source code overhaul. Added Unicode filename handling. |
I'd like to thank Eric Shepherd for participating in some ping-pong e-mail sessions while I tried to get autoconf, BeOS, and some crufty versions of "make" figured out for v1.0.
This document is Copyright © 2000-2015 by Andy McFadden. All Rights Reserved.
The latest version can be found on the NuLib web site at http://www.nulib.com/.