From b95575595f66a350d01b6eeba7003fbf687b50cc Mon Sep 17 00:00:00 2001 From: Andy McFadden Date: Fri, 9 Jan 2015 16:58:13 -0800 Subject: [PATCH] Merge in NufxLib v3.0.0 changes A couple of minor fix-ups for the NufxLib snapshot. --- nufxlib/ChangeLog.txt | 2 + nufxlib/{NOTES.txt => NOTES.md} | 185 +++++++++++++++++++++++++++----- nufxlib/NufxLib.h | 6 +- nufxlib/samples/ImgConv.c | 2 +- nufxlib/samples/README-S.txt | 12 +++ 5 files changed, 177 insertions(+), 30 deletions(-) rename nufxlib/{NOTES.txt => NOTES.md} (56%) diff --git a/nufxlib/ChangeLog.txt b/nufxlib/ChangeLog.txt index ca1c148..6ec7872 100644 --- a/nufxlib/ChangeLog.txt +++ b/nufxlib/ChangeLog.txt @@ -1,3 +1,5 @@ +2005/01/09 ***** v3.0.0 shipped ***** + 2015/01/03 fadden - Mac OS X: replace Carbon FinderInfo calls with BSD xattr. - Mac OS X: fix resource fork naming. diff --git a/nufxlib/NOTES.txt b/nufxlib/NOTES.md similarity index 56% rename from nufxlib/NOTES.txt rename to nufxlib/NOTES.md index a68de39..caccd89 100644 --- a/nufxlib/NOTES.txt +++ b/nufxlib/NOTES.md @@ -1,16 +1,17 @@ NufxLib NOTES -Last revised: 2000/01/23 +============= +Last revised: 2015/01/04 The interface is documented in "nufxlibapi.html", available from the -www.nulib.com web site. This discusses some of the internal design that -may be of interest. +http://www.nulib.com/ web site. This discusses some of the internal +design that may be of interest. Some familiarity with the NuFX file format is assumed. +- - - -Read-Write Data Structures -========================== +### Read-Write Data Structures ### For both read-only and read-write files (but not streaming read-only files), the archive is represented internally as a linked list of Records, each @@ -64,15 +65,15 @@ threads are annotated in the "copy" list.) One of the goals was to be able to execute a sequence of operations like: - - open original archive - - read original archive - - modify archive - - flush (success) - - modify archive - - flush (failure, rollback) - - modify archive - - flush (success) - - close archive + open original archive + read original archive + modify archive + flush (success) + modify archive + flush (failure, rollback) + modify archive + flush (success) + close archive The archive is opened at the start and held open across many operations. There is never a need to re-read the entire archive. We could avoid the @@ -92,11 +93,11 @@ extraction are minimal. In summary: - "orig" list has original set of records, and is not disturbed until + - "orig" list has original set of records, and is not disturbed until the changes are committed. - "copy" list is created on first add/update/delete operation, and + - "copy" list is created on first add/update/delete operation, and initially contains a complete copy of "orig". - "new" list contains all new additions to the archive, including + - "new" list contains all new additions to the archive, including new additions that replace existing entries (the existing entry is deleted from "copy" and then added to "new"). @@ -106,9 +107,9 @@ Any changes to the record header or additions to the thread mod list are made in the "copy" set; the "original" set remains untouched. The thread mod list can have the following items in it: - - delete thread (NuThreadIdx) - - add thread (type, otherSize, format, +contents) - - update pre-sized thread (NuThreadIdx, +contents) + - delete thread (NuThreadIdx) + - add thread (type, otherSize, format, +contents) + - update pre-sized thread (NuThreadIdx, +contents) Contents are specified with a NuDataSource, which allows the application to indicate that the data is already compressed. This is useful for @@ -179,9 +180,9 @@ is possible that the archive could be unrecoverably damaged. NufxLib tries to identify such situations, and will leave the archive open in read-only mode after rolling back any new file additions. +- - - -Updating Filenames -================== +### Updating Filenames ### Updating filenames is a small nightmare, because the filename can be either in the record header or in a filename thread. It's possible, @@ -191,16 +192,148 @@ header and two or more filenames in threads. NufxLib will not automatically "fix" broken records, but it will prevent applications from creating situations that should not exist. - When reading an archive, NufxLib will use the filename from the + - When reading an archive, NufxLib will use the filename from the first filename thread found. If no filename threads are found, the filename from the record header will be used. - If you add a filename thread to a record that has a filename in the + - If you add a filename thread to a record that has a filename in the record header, the header name will be removed. - If you update a filename thread in a record that has a filename in + - If you update a filename thread in a record that has a filename in the record header, the header name will be left untouched. - Adding a filename thread is only allowed if no filename thread exists, + - Adding a filename thread is only allowed if no filename thread exists, or all existing filename threads have been deleted. + +- - - + +### Unicode Filenames ### + +Modern operating systems support filenames with a broader range of +characters than the Apple II did. This presents problems and opportunities. + +#### Background #### + +The Apple IIgs and old Macintoshes use the Mac OS Roman ("MOR") character +set. This defines a set of characters outside the ASCII range, i.e. +byte values with the high bit set. In addition to the usual collection +of vowels with accents and umlauts, MOR has some less-common characters, +including the Apple logo. + +On Windows, the high-ASCII values are generally interpreted according +to Windows Code Page 1252 ("CP-1252"), which defines a similar set +of vowels with accents and miscellaneous symbols. MOR and CP-1252 +have some overlap, but you can't really translate one into the other. +The standards-approved equivalent of CP-1252 is ISO-8859-1, though +according to [wikipedia](http://en.wikipedia.org/wiki/Windows-1252) +there was some confusion between the two. + +Modern operating systems support the Unicode Universal Character Set. +This system allows for a very large number of characters (over a million), +and includes definitions for all of the symbols in MOR and CP-1252. +Each character is assigned a "code point", which is a numeric value between +zero and 0x10FFFF. Most of the characters used in modern languages can +be found in the Basic Multilingual Plane (BMP), which uses code points +between zero and 0xFFFF (requiring only 16 bits). + +There are different ways of encoding code points. Consider, for example, +Unicode LATIN SMALL LETTER A WITH ACUTE: + + MOR: 0x87 + CP-1252: 0xE1 + Unicode: U+00E1 + UTF-16: 0x00E1 + UTF-8: 0xC3 0xA1 + +Or the humble TRADE MARK SIGN: + + MOR: 0xAA + CP-1252: 0x99 + Unicode: U+2122 + UTF-16: 0x2122 + UTF-8: 0xE2 0x84 0xA2 + +Modern Linux and Mac OS X use UTF-8 encoding in filenames. Because it's a +byte-oriented encoding, and 7-bit ASCII values are trivially represented +as 7-bit ASCII values, all of the existing system and library calls work +as they did before (i.e. if they took a `char*`, they still do). + +Windows uses UTF-16, which requires at least 16 bits per code point. +Filenames are now "wide" strings, based on `wchar_t*`. Windows includes +an elaborate system of defines based around the `TCHAR` type, which can +be either `char` or `wchar_t` depending on whether a program is compiled +with `_MBCS` (Multi-Byte Character System) or `_UNICODE`. A set of +preprocessor definitions is provided that will map I/O function names, +so you can call `_tfopen(TCHAR* ...)`, and the compiler will turn it into +either `fopen(char* ...)` or `_wfopen(wchar_t* ...)`. MBCS is deprecated +in favor of Unicode, so any new code should be strictly UTF-16 based. + +This means that, for code to work on both Linux and Windows, it has to +work with incompatible filename string types and different I/O functions. + +#### Opening Archive Files #### + +On Linux and Mac OS X, NuLib2 can open any file named on the command line. +On Windows, it's a bit trickier. + +The problem is that NuLib2 provides a `main()` function that is passed a +vector of "narrow" strings. The filenames provided on the command line +will be converted from wide to narrow, so unless the filename is entirely +composed of ASCII or CP-1252 characters, some information will be lost +and it will be impossible to open the file. + +NuLib2 must instead provide a `wmain()` function that takes wide strings. +The strings must be stored and passed around as wide throughout the +program, and passed into NufxLib this way (because NufxLib issues the +actual _wopen call). This means that NufxLib API must take narrow strings +when built for Linux, and wide strings when built for Windows. + +#### Adding/Extracting Mac OS Roman Files #### + +GS/ShrinkIt was designed to handle GS/OS files from HFS volumes, so NuFX +archive filenames use the MOR character set. To preserve the encoding +we could simply extract the values as-is and let them appear as whatever +values happen to line up in CP-1252, which is what pre-3.0 NuLib2 did. +It's much nicer to translate from MOR to Unicode when extracting, and +convert back from Unicode to MOR when adding files to an archive. + +The key consideration is that the character set associated with a +filename must be tracked. The code can't simply extract a filename from +the archive and pass it to a 'creat()` call. Character set conversions +must take place at appropriate times. + +With Windows it's a bit harder to confuse MOR and Unicode names, because +one uses 8-bit characters and the other uses UTF-16, but the compiler +doesn't catch everything. + +#### Current State #### + +NufxLib defines the UNICHAR type, which has a role very like TCHAR: +it can be `char*` or `wchar_t*`, and can be accompanied by a set of +preprocessor mappings that switch between I/O functions. The UNICHAR +type will be determined based on a define provided from the compiler +command line (perhaps `-DUSE_UTF16_FILENAMES`). + +The current version of NufxLib (v3.0.0) takes the first step, defining +all filename strings as either UNICHAR or MOR, and converting between them +as necessary. This, plus a few minor tweaks to NuLib2, was enough to +get Unicode filename support working on Linux and Mac OS X. + +None of the work needed to make Windows work properly has been done. +The string conversion functions are no-ops for Win32. As a result, +NuLib2 for Windows treats filenames the same way in 3.x as it did in 2.x. + +There are some situations where things can go awry even with UNICHAR, +most notably printf-style arguments. These are checked by gcc, but +not by Visual Studio unless you run the static analyzer. A simple +`printf("filename=%s\n", filename)` would be correct for narrow strings +but wrong for wide strings. It will likely be necessary to define a +filename format string (similar to `PRI64d` for 64-bit values) and switch +between "%s" and "%ls". + +This is a fair bit of work and requires some amount of uglification to +NuLib2 and NufxLib. Since Windows users can use CiderPress, and the +vast majority of NuFX archives use ASCII-only ProDOS file names, it's +not clear that the effort would be worthwhile. + diff --git a/nufxlib/NufxLib.h b/nufxlib/NufxLib.h index 57c70ad..1ec1ada 100644 --- a/nufxlib/NufxLib.h +++ b/nufxlib/NufxLib.h @@ -596,12 +596,12 @@ typedef struct NuSelectionProposal { */ typedef struct NuPathnameProposal { const UNICHAR* pathnameUNI; - char filenameSeparator; + UNICHAR filenameSeparator; const NuRecord* pRecord; const NuThread* pThread; const UNICHAR* newPathnameUNI; - uint8_t newFilenameSeparator; + UNICHAR newFilenameSeparator; /*NuThreadID newStorage;*/ NuDataSink* newDataSink; } NuPathnameProposal; @@ -792,7 +792,7 @@ NUFXLIB_API NuError NuAddFile(NuArchive* pArchive, const UNICHAR* pathnameUNI, const NuFileDetails* pFileDetails, short fromRsrcFork, NuRecordIdx* pRecordIdx); NUFXLIB_API NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx, - const char* pathnameMOR, UNICHAR fssep); + const char* pathnameMOR, char fssep); NUFXLIB_API NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx, const NuRecordAttr* pRecordAttr); NUFXLIB_API NuError NuUpdatePresizedThread(NuArchive* pArchive, diff --git a/nufxlib/samples/ImgConv.c b/nufxlib/samples/ImgConv.c index 620b827..9eae3a7 100644 --- a/nufxlib/samples/ImgConv.c +++ b/nufxlib/samples/ImgConv.c @@ -277,7 +277,7 @@ NuError CreateDosSource(const ImgHeader* pHeader, FILE* fp, * reversible transformation, i.e. if you do this twice you're back * to ProDOS ordering. */ - for (offset = 0; offset < pHeader->dataLen; offset += 4096) { + for (offset = 0; offset < (long) pHeader->dataLen; offset += 4096) { size_t ignored; ignored = fread(diskBuffer + offset + 0x0000, 256, 1, fp); ignored = fread(diskBuffer + offset + 0x0e00, 256, 1, fp); diff --git a/nufxlib/samples/README-S.txt b/nufxlib/samples/README-S.txt index 3f432b2..e325638 100644 --- a/nufxlib/samples/README-S.txt +++ b/nufxlib/samples/README-S.txt @@ -8,6 +8,9 @@ test-basic Basic tests. Run this to verify that things are working. +On Win32 there will be a second executable, test-basic-d, that links against +the DLL rather than the static library. + exerciser ========= @@ -82,6 +85,15 @@ of memory on very large archives, you can reduce the memory requirements by specifying the "-f" flag. +test-names +========== + +Tests Unicode filename handling. Run without arguments. + +(This currently fails on Win32 because the Unicode filename support is +incomplete there.) + + test-simple ===========