2000-05-23 01:55:31 +00:00
|
|
|
NufxLib NOTES
|
|
|
|
Last revised: 2000/01/23
|
|
|
|
|
|
|
|
|
|
|
|
The interface is documented in "nufxlibapi.html", available from the
|
|
|
|
www.nulib.com web site. This discusses some of the internal design that
|
|
|
|
may be of interest.
|
|
|
|
|
|
|
|
Some familiarity with the NuFX file format is assumed.
|
|
|
|
|
|
|
|
|
|
|
|
Read-Write Data Structures
|
|
|
|
==========================
|
|
|
|
|
|
|
|
For both read-only and read-write files (but not streaming read-only files),
|
|
|
|
the archive is represented internally as a linked list of Records, each
|
|
|
|
of which has an array of Threads attached. No attempt is made to
|
|
|
|
optimize searches by filename, so use of the "replace existing entry when
|
|
|
|
filenames match" option should be restricted to situations where it is
|
|
|
|
necessary. Otherwise, O(N^2) behavior can result.
|
|
|
|
|
|
|
|
Modifications, such as deletions, changes to filename threads, and
|
|
|
|
additions of new records, are queued up in a separate list until a NuFlush
|
|
|
|
call is issued. The list works much the same way as the temporary file:
|
|
|
|
when the operation completes, the "new" list becomes the "original" list.
|
|
|
|
If the operation is aborted, the "new" list is scrubbed, and the "original"
|
|
|
|
list remains unmodified.
|
|
|
|
|
|
|
|
Just as it is inefficient to write data to the temp file when it's not
|
|
|
|
necessary to do so, it is inefficient to allocate a complete copy of the
|
|
|
|
records from the original list if none are changed. As a result, there are
|
|
|
|
actually two "new" lists, one with a copy of the original record list, and
|
|
|
|
one with new additions. The "copy" list starts out uninitialized, and
|
|
|
|
remains that way until one of the entries from the original list is
|
|
|
|
modified. When that happens, the entire original list is duplicated, and
|
|
|
|
the changes are made directly to members of the "copy" list. (This is
|
|
|
|
important for really large archives, like a by-file archive with the
|
|
|
|
entire contents of a hard drive, where the record index could be several
|
|
|
|
megabytes in size.)
|
|
|
|
|
|
|
|
It would be more *memory* efficient to simply maintain a list of what
|
|
|
|
has changed. However, we can't disturb the "original" list in any way or
|
|
|
|
we lose the ability to roll back quickly if the operation is aborted.
|
|
|
|
Consequently, we need to create a new list of records that reflects
|
|
|
|
the state of the new archive, so that when we rename the temp file over
|
|
|
|
the original, we can simply "rename" the new record list over the original.
|
|
|
|
Since we're going to need the new list eventually, we might as well create
|
|
|
|
it as soon as it is needed, and deal with memory allocation failures up
|
|
|
|
front rather than during the update process. (Some items, such as the
|
|
|
|
record's file offset in the archive, have to be updated even for records
|
|
|
|
that aren't themselves changing... which means we potentially need to
|
|
|
|
modify all existing record structures, so we need a complete copy of the
|
|
|
|
record list regardless of how little or how much has changed.)
|
|
|
|
|
|
|
|
This also ties into the "modify original archive file directly if possible"
|
|
|
|
option, which avoids the need for creating and renaming a temp file. If
|
|
|
|
the only changes are updates to pre-sized records (e.g. renaming a file
|
|
|
|
inside the archive, or updating a comment), or adding new records onto the
|
|
|
|
end, there is little risk and possibly a huge efficiency gain in just
|
|
|
|
modifying the archive in place. If none of the operations caused the
|
|
|
|
"copy" list to be initialized, then clearly there's no need to write to a
|
|
|
|
temp file. (It's not actually this simple, because updates to pre-sized
|
|
|
|
threads are annotated in the "copy" list.)
|
|
|
|
|
|
|
|
One of the goals was to be able to execute a sequence of operations like:
|
|
|
|
|
2002-09-21 01:13:21 +00:00
|
|
|
- open original archive
|
|
|
|
- read original archive
|
|
|
|
- modify archive
|
|
|
|
- flush (success)
|
|
|
|
- modify archive
|
|
|
|
- flush (failure, rollback)
|
|
|
|
- modify archive
|
|
|
|
- flush (success)
|
|
|
|
- close archive
|
2000-05-23 01:55:31 +00:00
|
|
|
|
|
|
|
The archive is opened at the start and held open across many operations.
|
|
|
|
There is never a need to re-read the entire archive. We could avoid the
|
|
|
|
need to allocate two complete Record lists by requiring that the archive be
|
|
|
|
re-scanned after changes are aborted; if we did that, we could just modify
|
|
|
|
the original record list in place, and let the changes become "permanent"
|
|
|
|
after a successful write. In many ways, though, its cleaner to have two
|
|
|
|
lists.
|
|
|
|
|
|
|
|
Archives with several thousand entries should be sufficiently rare, and
|
|
|
|
virtual memory should be sufficiently plentiful, that this won't be a
|
|
|
|
problem for anyone. Scanning repeatedly through a 15MB archive stored on a
|
|
|
|
CD-ROM is likely to be very annoying though, so the design makes every
|
|
|
|
attempt to avoid repeated scans of the archive. And in any event, this
|
|
|
|
only applies to archive updates. The memory requirements for simple file
|
|
|
|
extraction are minimal.
|
|
|
|
|
|
|
|
In summary:
|
|
|
|
|
|
|
|
"orig" list has original set of records, and is not disturbed until
|
|
|
|
the changes are committed.
|
|
|
|
"copy" list is created on first add/update/delete operation, and
|
|
|
|
initially contains a complete copy of "orig".
|
|
|
|
"new" list contains all new additions to the archive, including
|
|
|
|
new additions that replace existing entries (the existing entry
|
2002-09-21 01:13:21 +00:00
|
|
|
is deleted from "copy" and then added to "new").
|
2000-05-23 01:55:31 +00:00
|
|
|
|
|
|
|
|
|
|
|
Each Record in the list has a "thread modification" list attached to it.
|
|
|
|
Any changes to the record header or additions to the thread mod list are
|
|
|
|
made in the "copy" set; the "original" set remains untouched. The thread
|
|
|
|
mod list can have the following items in it:
|
|
|
|
|
2002-09-21 01:13:21 +00:00
|
|
|
- delete thread (NuThreadIdx)
|
|
|
|
- add thread (type, otherSize, format, +contents)
|
|
|
|
- update pre-sized thread (NuThreadIdx, +contents)
|
2000-05-23 01:55:31 +00:00
|
|
|
|
|
|
|
Contents are specified with a NuDataSource, which allows the application
|
|
|
|
to indicate that the data is already compressed. This is useful for
|
|
|
|
copying parts of records between archives without having to expand and
|
|
|
|
recompress the data.
|
|
|
|
|
|
|
|
Some interactions and concepts that are important to understand:
|
|
|
|
|
|
|
|
When a file is added, the file type information will be placed in the
|
|
|
|
"new" Record immediately (subject to some restrictions: adding a data
|
|
|
|
fork always causes the type info to be updated, adding a rsrc fork only
|
|
|
|
updates the type info if a data fork is not already present).
|
|
|
|
|
|
|
|
Deleting a record results in the Record being removed from the "copy"
|
|
|
|
list immediately. Future modify operations on that NuRecordIdx will
|
|
|
|
fail. Future read operations will work just fine until the next
|
|
|
|
NuFlush is issued, because read operations use the "original" list.
|
|
|
|
|
|
|
|
Deleting all threads from a record results in the record being
|
|
|
|
deleted, but not until the NuFlush call is issued. It is possible to
|
|
|
|
delete all the existing threads and then add new ones.
|
|
|
|
|
|
|
|
It is *not* allowed to delete a modified thread, modify a deleted thread,
|
|
|
|
or delete a record that has been modified. This limitation was added to
|
|
|
|
keep the system simple. Note this does not mean you can't delete a data
|
|
|
|
fork and add a data fork; doing so results in operations on two threads
|
|
|
|
with different NuThreadIdx values. What you can't do is update the
|
|
|
|
filename thread and then delete it, or vice-versa. (If anyone can think
|
|
|
|
of a reason why you'd want to rename a file and then delete it with the
|
|
|
|
same NuFlush call, I'll figure out a way to support it.)
|
|
|
|
|
|
|
|
Updating a filename thread is intercepted, and causes the Record's
|
|
|
|
filename cache to be updated as well. Adding a filename thread for
|
|
|
|
records where the filename is stored in the record itself cause the
|
|
|
|
"in-record" filename to be zeroed. Adding a filename thread to a
|
|
|
|
record that already has one isn't allowed; nufxlib restricts you to
|
|
|
|
a single filename thread per record.
|
|
|
|
|
|
|
|
Some actions on an archive are allowed but strongly discouraged. For
|
|
|
|
example, deleting a filename thread but leaving the data threads behind
|
|
|
|
is a valid thing to do, but leaves most archivers in a state of mild
|
|
|
|
confusion. Deleting the data threads but leaving the filename thread is
|
|
|
|
similarly perplexing.
|
|
|
|
|
|
|
|
You can't call "update thread" on a thread that doesn't yet exist,
|
|
|
|
even if an "add thread" call has been made. You can, however, call
|
|
|
|
"add thread" on a newly created Record.
|
|
|
|
|
|
|
|
When a new record is created because of a "create record" call, a filename
|
|
|
|
thread is created automatically. It is not necessary to explicitly add the
|
|
|
|
filename.
|
|
|
|
|
|
|
|
Failures encountered while committing changes to a record cause all
|
|
|
|
operations on that record to be rolled back. If, during a NuFlush, a
|
|
|
|
file add fails, the user is given the option of aborting the entire
|
|
|
|
operation or skipping the file in question (and perhaps retrying or other
|
|
|
|
options as well). Aborting the flush causes a complete rollback. If only
|
|
|
|
the thread mod operation is canceled, then all thread mods for that record
|
|
|
|
are ignored. The temp file (or archive file) will have its file pointer
|
|
|
|
reset to the original start of the record, and if the record already
|
|
|
|
existed in the original archive, the full original record will be copied
|
|
|
|
over. This may seem drastic, but it helps ensure that you don't end up
|
|
|
|
with a record in a partially created state.
|
|
|
|
|
|
|
|
If a failure occurs during an "update in place", it isn't possible to
|
|
|
|
roll back all changes. If the failure was due to a bug in NufxLib, it
|
|
|
|
is possible that the archive could be unrecoverably damaged. NufxLib
|
|
|
|
tries to identify such situations, and will leave the archive open in
|
|
|
|
read-only mode after rolling back any new file additions.
|
|
|
|
|
|
|
|
|
|
|
|
Updating Filenames
|
|
|
|
==================
|
|
|
|
|
|
|
|
Updating filenames is a small nightmare, because the filename can be
|
|
|
|
either in the record header or in a filename thread. It's possible,
|
|
|
|
but illogical, to have a single record with a filename in the record
|
|
|
|
header and two or more filenames in threads.
|
|
|
|
|
|
|
|
NufxLib will not automatically "fix" broken records, but it will prevent
|
|
|
|
applications from creating situations that should not exist.
|
|
|
|
|
|
|
|
When reading an archive, NufxLib will use the filename from the
|
|
|
|
first filename thread found. If no filename threads are found, the
|
|
|
|
filename from the record header will be used.
|
|
|
|
|
|
|
|
If you add a filename thread to a record that has a filename in the
|
|
|
|
record header, the header name will be removed.
|
|
|
|
|
|
|
|
If you update a filename thread in a record that has a filename in
|
|
|
|
the record header, the header name will be left untouched.
|
|
|
|
|
|
|
|
Adding a filename thread is only allowed if no filename thread exists,
|
|
|
|
or all existing filename threads have been deleted.
|
|
|
|
|