mirror of
https://github.com/fadden/ciderpress.git
synced 2025-01-27 12:32:36 +00:00
Merge in NufxLib v3.0.0 changes
A couple of minor fix-ups for the NufxLib snapshot.
This commit is contained in:
parent
b97584eeb6
commit
b95575595f
@ -1,3 +1,5 @@
|
|||||||
|
2005/01/09 ***** v3.0.0 shipped *****
|
||||||
|
|
||||||
2015/01/03 fadden
|
2015/01/03 fadden
|
||||||
- Mac OS X: replace Carbon FinderInfo calls with BSD xattr.
|
- Mac OS X: replace Carbon FinderInfo calls with BSD xattr.
|
||||||
- Mac OS X: fix resource fork naming.
|
- Mac OS X: fix resource fork naming.
|
||||||
|
@ -1,16 +1,17 @@
|
|||||||
NufxLib NOTES
|
NufxLib NOTES
|
||||||
Last revised: 2000/01/23
|
=============
|
||||||
|
Last revised: 2015/01/04
|
||||||
|
|
||||||
|
|
||||||
The interface is documented in "nufxlibapi.html", available from the
|
The interface is documented in "nufxlibapi.html", available from the
|
||||||
www.nulib.com web site. This discusses some of the internal design that
|
http://www.nulib.com/ web site. This discusses some of the internal
|
||||||
may be of interest.
|
design that may be of interest.
|
||||||
|
|
||||||
Some familiarity with the NuFX file format is assumed.
|
Some familiarity with the NuFX file format is assumed.
|
||||||
|
|
||||||
|
- - -
|
||||||
|
|
||||||
Read-Write Data Structures
|
### Read-Write Data Structures ###
|
||||||
==========================
|
|
||||||
|
|
||||||
For both read-only and read-write files (but not streaming read-only files),
|
For both read-only and read-write files (but not streaming read-only files),
|
||||||
the archive is represented internally as a linked list of Records, each
|
the archive is represented internally as a linked list of Records, each
|
||||||
@ -64,15 +65,15 @@ threads are annotated in the "copy" list.)
|
|||||||
|
|
||||||
One of the goals was to be able to execute a sequence of operations like:
|
One of the goals was to be able to execute a sequence of operations like:
|
||||||
|
|
||||||
- open original archive
|
open original archive
|
||||||
- read original archive
|
read original archive
|
||||||
- modify archive
|
modify archive
|
||||||
- flush (success)
|
flush (success)
|
||||||
- modify archive
|
modify archive
|
||||||
- flush (failure, rollback)
|
flush (failure, rollback)
|
||||||
- modify archive
|
modify archive
|
||||||
- flush (success)
|
flush (success)
|
||||||
- close archive
|
close archive
|
||||||
|
|
||||||
The archive is opened at the start and held open across many operations.
|
The archive is opened at the start and held open across many operations.
|
||||||
There is never a need to re-read the entire archive. We could avoid the
|
There is never a need to re-read the entire archive. We could avoid the
|
||||||
@ -92,11 +93,11 @@ extraction are minimal.
|
|||||||
|
|
||||||
In summary:
|
In summary:
|
||||||
|
|
||||||
"orig" list has original set of records, and is not disturbed until
|
- "orig" list has original set of records, and is not disturbed until
|
||||||
the changes are committed.
|
the changes are committed.
|
||||||
"copy" list is created on first add/update/delete operation, and
|
- "copy" list is created on first add/update/delete operation, and
|
||||||
initially contains a complete copy of "orig".
|
initially contains a complete copy of "orig".
|
||||||
"new" list contains all new additions to the archive, including
|
- "new" list contains all new additions to the archive, including
|
||||||
new additions that replace existing entries (the existing entry
|
new additions that replace existing entries (the existing entry
|
||||||
is deleted from "copy" and then added to "new").
|
is deleted from "copy" and then added to "new").
|
||||||
|
|
||||||
@ -106,9 +107,9 @@ Any changes to the record header or additions to the thread mod list are
|
|||||||
made in the "copy" set; the "original" set remains untouched. The thread
|
made in the "copy" set; the "original" set remains untouched. The thread
|
||||||
mod list can have the following items in it:
|
mod list can have the following items in it:
|
||||||
|
|
||||||
- delete thread (NuThreadIdx)
|
- delete thread (NuThreadIdx)
|
||||||
- add thread (type, otherSize, format, +contents)
|
- add thread (type, otherSize, format, +contents)
|
||||||
- update pre-sized thread (NuThreadIdx, +contents)
|
- update pre-sized thread (NuThreadIdx, +contents)
|
||||||
|
|
||||||
Contents are specified with a NuDataSource, which allows the application
|
Contents are specified with a NuDataSource, which allows the application
|
||||||
to indicate that the data is already compressed. This is useful for
|
to indicate that the data is already compressed. This is useful for
|
||||||
@ -179,9 +180,9 @@ is possible that the archive could be unrecoverably damaged. NufxLib
|
|||||||
tries to identify such situations, and will leave the archive open in
|
tries to identify such situations, and will leave the archive open in
|
||||||
read-only mode after rolling back any new file additions.
|
read-only mode after rolling back any new file additions.
|
||||||
|
|
||||||
|
- - -
|
||||||
|
|
||||||
Updating Filenames
|
### Updating Filenames ###
|
||||||
==================
|
|
||||||
|
|
||||||
Updating filenames is a small nightmare, because the filename can be
|
Updating filenames is a small nightmare, because the filename can be
|
||||||
either in the record header or in a filename thread. It's possible,
|
either in the record header or in a filename thread. It's possible,
|
||||||
@ -191,16 +192,148 @@ header and two or more filenames in threads.
|
|||||||
NufxLib will not automatically "fix" broken records, but it will prevent
|
NufxLib will not automatically "fix" broken records, but it will prevent
|
||||||
applications from creating situations that should not exist.
|
applications from creating situations that should not exist.
|
||||||
|
|
||||||
When reading an archive, NufxLib will use the filename from the
|
- When reading an archive, NufxLib will use the filename from the
|
||||||
first filename thread found. If no filename threads are found, the
|
first filename thread found. If no filename threads are found, the
|
||||||
filename from the record header will be used.
|
filename from the record header will be used.
|
||||||
|
|
||||||
If you add a filename thread to a record that has a filename in the
|
- If you add a filename thread to a record that has a filename in the
|
||||||
record header, the header name will be removed.
|
record header, the header name will be removed.
|
||||||
|
|
||||||
If you update a filename thread in a record that has a filename in
|
- If you update a filename thread in a record that has a filename in
|
||||||
the record header, the header name will be left untouched.
|
the record header, the header name will be left untouched.
|
||||||
|
|
||||||
Adding a filename thread is only allowed if no filename thread exists,
|
- Adding a filename thread is only allowed if no filename thread exists,
|
||||||
or all existing filename threads have been deleted.
|
or all existing filename threads have been deleted.
|
||||||
|
|
||||||
|
|
||||||
|
- - -
|
||||||
|
|
||||||
|
### Unicode Filenames ###
|
||||||
|
|
||||||
|
Modern operating systems support filenames with a broader range of
|
||||||
|
characters than the Apple II did. This presents problems and opportunities.
|
||||||
|
|
||||||
|
#### Background ####
|
||||||
|
|
||||||
|
The Apple IIgs and old Macintoshes use the Mac OS Roman ("MOR") character
|
||||||
|
set. This defines a set of characters outside the ASCII range, i.e.
|
||||||
|
byte values with the high bit set. In addition to the usual collection
|
||||||
|
of vowels with accents and umlauts, MOR has some less-common characters,
|
||||||
|
including the Apple logo.
|
||||||
|
|
||||||
|
On Windows, the high-ASCII values are generally interpreted according
|
||||||
|
to Windows Code Page 1252 ("CP-1252"), which defines a similar set
|
||||||
|
of vowels with accents and miscellaneous symbols. MOR and CP-1252
|
||||||
|
have some overlap, but you can't really translate one into the other.
|
||||||
|
The standards-approved equivalent of CP-1252 is ISO-8859-1, though
|
||||||
|
according to [wikipedia](http://en.wikipedia.org/wiki/Windows-1252)
|
||||||
|
there was some confusion between the two.
|
||||||
|
|
||||||
|
Modern operating systems support the Unicode Universal Character Set.
|
||||||
|
This system allows for a very large number of characters (over a million),
|
||||||
|
and includes definitions for all of the symbols in MOR and CP-1252.
|
||||||
|
Each character is assigned a "code point", which is a numeric value between
|
||||||
|
zero and 0x10FFFF. Most of the characters used in modern languages can
|
||||||
|
be found in the Basic Multilingual Plane (BMP), which uses code points
|
||||||
|
between zero and 0xFFFF (requiring only 16 bits).
|
||||||
|
|
||||||
|
There are different ways of encoding code points. Consider, for example,
|
||||||
|
Unicode LATIN SMALL LETTER A WITH ACUTE:
|
||||||
|
|
||||||
|
MOR: 0x87
|
||||||
|
CP-1252: 0xE1
|
||||||
|
Unicode: U+00E1
|
||||||
|
UTF-16: 0x00E1
|
||||||
|
UTF-8: 0xC3 0xA1
|
||||||
|
|
||||||
|
Or the humble TRADE MARK SIGN:
|
||||||
|
|
||||||
|
MOR: 0xAA
|
||||||
|
CP-1252: 0x99
|
||||||
|
Unicode: U+2122
|
||||||
|
UTF-16: 0x2122
|
||||||
|
UTF-8: 0xE2 0x84 0xA2
|
||||||
|
|
||||||
|
Modern Linux and Mac OS X use UTF-8 encoding in filenames. Because it's a
|
||||||
|
byte-oriented encoding, and 7-bit ASCII values are trivially represented
|
||||||
|
as 7-bit ASCII values, all of the existing system and library calls work
|
||||||
|
as they did before (i.e. if they took a `char*`, they still do).
|
||||||
|
|
||||||
|
Windows uses UTF-16, which requires at least 16 bits per code point.
|
||||||
|
Filenames are now "wide" strings, based on `wchar_t*`. Windows includes
|
||||||
|
an elaborate system of defines based around the `TCHAR` type, which can
|
||||||
|
be either `char` or `wchar_t` depending on whether a program is compiled
|
||||||
|
with `_MBCS` (Multi-Byte Character System) or `_UNICODE`. A set of
|
||||||
|
preprocessor definitions is provided that will map I/O function names,
|
||||||
|
so you can call `_tfopen(TCHAR* ...)`, and the compiler will turn it into
|
||||||
|
either `fopen(char* ...)` or `_wfopen(wchar_t* ...)`. MBCS is deprecated
|
||||||
|
in favor of Unicode, so any new code should be strictly UTF-16 based.
|
||||||
|
|
||||||
|
This means that, for code to work on both Linux and Windows, it has to
|
||||||
|
work with incompatible filename string types and different I/O functions.
|
||||||
|
|
||||||
|
#### Opening Archive Files ####
|
||||||
|
|
||||||
|
On Linux and Mac OS X, NuLib2 can open any file named on the command line.
|
||||||
|
On Windows, it's a bit trickier.
|
||||||
|
|
||||||
|
The problem is that NuLib2 provides a `main()` function that is passed a
|
||||||
|
vector of "narrow" strings. The filenames provided on the command line
|
||||||
|
will be converted from wide to narrow, so unless the filename is entirely
|
||||||
|
composed of ASCII or CP-1252 characters, some information will be lost
|
||||||
|
and it will be impossible to open the file.
|
||||||
|
|
||||||
|
NuLib2 must instead provide a `wmain()` function that takes wide strings.
|
||||||
|
The strings must be stored and passed around as wide throughout the
|
||||||
|
program, and passed into NufxLib this way (because NufxLib issues the
|
||||||
|
actual _wopen call). This means that NufxLib API must take narrow strings
|
||||||
|
when built for Linux, and wide strings when built for Windows.
|
||||||
|
|
||||||
|
#### Adding/Extracting Mac OS Roman Files ####
|
||||||
|
|
||||||
|
GS/ShrinkIt was designed to handle GS/OS files from HFS volumes, so NuFX
|
||||||
|
archive filenames use the MOR character set. To preserve the encoding
|
||||||
|
we could simply extract the values as-is and let them appear as whatever
|
||||||
|
values happen to line up in CP-1252, which is what pre-3.0 NuLib2 did.
|
||||||
|
It's much nicer to translate from MOR to Unicode when extracting, and
|
||||||
|
convert back from Unicode to MOR when adding files to an archive.
|
||||||
|
|
||||||
|
The key consideration is that the character set associated with a
|
||||||
|
filename must be tracked. The code can't simply extract a filename from
|
||||||
|
the archive and pass it to a 'creat()` call. Character set conversions
|
||||||
|
must take place at appropriate times.
|
||||||
|
|
||||||
|
With Windows it's a bit harder to confuse MOR and Unicode names, because
|
||||||
|
one uses 8-bit characters and the other uses UTF-16, but the compiler
|
||||||
|
doesn't catch everything.
|
||||||
|
|
||||||
|
#### Current State ####
|
||||||
|
|
||||||
|
NufxLib defines the UNICHAR type, which has a role very like TCHAR:
|
||||||
|
it can be `char*` or `wchar_t*`, and can be accompanied by a set of
|
||||||
|
preprocessor mappings that switch between I/O functions. The UNICHAR
|
||||||
|
type will be determined based on a define provided from the compiler
|
||||||
|
command line (perhaps `-DUSE_UTF16_FILENAMES`).
|
||||||
|
|
||||||
|
The current version of NufxLib (v3.0.0) takes the first step, defining
|
||||||
|
all filename strings as either UNICHAR or MOR, and converting between them
|
||||||
|
as necessary. This, plus a few minor tweaks to NuLib2, was enough to
|
||||||
|
get Unicode filename support working on Linux and Mac OS X.
|
||||||
|
|
||||||
|
None of the work needed to make Windows work properly has been done.
|
||||||
|
The string conversion functions are no-ops for Win32. As a result,
|
||||||
|
NuLib2 for Windows treats filenames the same way in 3.x as it did in 2.x.
|
||||||
|
|
||||||
|
There are some situations where things can go awry even with UNICHAR,
|
||||||
|
most notably printf-style arguments. These are checked by gcc, but
|
||||||
|
not by Visual Studio unless you run the static analyzer. A simple
|
||||||
|
`printf("filename=%s\n", filename)` would be correct for narrow strings
|
||||||
|
but wrong for wide strings. It will likely be necessary to define a
|
||||||
|
filename format string (similar to `PRI64d` for 64-bit values) and switch
|
||||||
|
between "%s" and "%ls".
|
||||||
|
|
||||||
|
This is a fair bit of work and requires some amount of uglification to
|
||||||
|
NuLib2 and NufxLib. Since Windows users can use CiderPress, and the
|
||||||
|
vast majority of NuFX archives use ASCII-only ProDOS file names, it's
|
||||||
|
not clear that the effort would be worthwhile.
|
||||||
|
|
@ -596,12 +596,12 @@ typedef struct NuSelectionProposal {
|
|||||||
*/
|
*/
|
||||||
typedef struct NuPathnameProposal {
|
typedef struct NuPathnameProposal {
|
||||||
const UNICHAR* pathnameUNI;
|
const UNICHAR* pathnameUNI;
|
||||||
char filenameSeparator;
|
UNICHAR filenameSeparator;
|
||||||
const NuRecord* pRecord;
|
const NuRecord* pRecord;
|
||||||
const NuThread* pThread;
|
const NuThread* pThread;
|
||||||
|
|
||||||
const UNICHAR* newPathnameUNI;
|
const UNICHAR* newPathnameUNI;
|
||||||
uint8_t newFilenameSeparator;
|
UNICHAR newFilenameSeparator;
|
||||||
/*NuThreadID newStorage;*/
|
/*NuThreadID newStorage;*/
|
||||||
NuDataSink* newDataSink;
|
NuDataSink* newDataSink;
|
||||||
} NuPathnameProposal;
|
} NuPathnameProposal;
|
||||||
@ -792,7 +792,7 @@ NUFXLIB_API NuError NuAddFile(NuArchive* pArchive, const UNICHAR* pathnameUNI,
|
|||||||
const NuFileDetails* pFileDetails, short fromRsrcFork,
|
const NuFileDetails* pFileDetails, short fromRsrcFork,
|
||||||
NuRecordIdx* pRecordIdx);
|
NuRecordIdx* pRecordIdx);
|
||||||
NUFXLIB_API NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx,
|
NUFXLIB_API NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx,
|
||||||
const char* pathnameMOR, UNICHAR fssep);
|
const char* pathnameMOR, char fssep);
|
||||||
NUFXLIB_API NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx,
|
NUFXLIB_API NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx,
|
||||||
const NuRecordAttr* pRecordAttr);
|
const NuRecordAttr* pRecordAttr);
|
||||||
NUFXLIB_API NuError NuUpdatePresizedThread(NuArchive* pArchive,
|
NUFXLIB_API NuError NuUpdatePresizedThread(NuArchive* pArchive,
|
||||||
|
@ -277,7 +277,7 @@ NuError CreateDosSource(const ImgHeader* pHeader, FILE* fp,
|
|||||||
* reversible transformation, i.e. if you do this twice you're back
|
* reversible transformation, i.e. if you do this twice you're back
|
||||||
* to ProDOS ordering.
|
* to ProDOS ordering.
|
||||||
*/
|
*/
|
||||||
for (offset = 0; offset < pHeader->dataLen; offset += 4096) {
|
for (offset = 0; offset < (long) pHeader->dataLen; offset += 4096) {
|
||||||
size_t ignored;
|
size_t ignored;
|
||||||
ignored = fread(diskBuffer + offset + 0x0000, 256, 1, fp);
|
ignored = fread(diskBuffer + offset + 0x0000, 256, 1, fp);
|
||||||
ignored = fread(diskBuffer + offset + 0x0e00, 256, 1, fp);
|
ignored = fread(diskBuffer + offset + 0x0e00, 256, 1, fp);
|
||||||
|
@ -8,6 +8,9 @@ test-basic
|
|||||||
|
|
||||||
Basic tests. Run this to verify that things are working.
|
Basic tests. Run this to verify that things are working.
|
||||||
|
|
||||||
|
On Win32 there will be a second executable, test-basic-d, that links against
|
||||||
|
the DLL rather than the static library.
|
||||||
|
|
||||||
|
|
||||||
exerciser
|
exerciser
|
||||||
=========
|
=========
|
||||||
@ -82,6 +85,15 @@ of memory on very large archives, you can reduce the memory requirements
|
|||||||
by specifying the "-f" flag.
|
by specifying the "-f" flag.
|
||||||
|
|
||||||
|
|
||||||
|
test-names
|
||||||
|
==========
|
||||||
|
|
||||||
|
Tests Unicode filename handling. Run without arguments.
|
||||||
|
|
||||||
|
(This currently fails on Win32 because the Unicode filename support is
|
||||||
|
incomplete there.)
|
||||||
|
|
||||||
|
|
||||||
test-simple
|
test-simple
|
||||||
===========
|
===========
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user