Merge in NufxLib v3.0.0 changes

A couple of minor fix-ups for the NufxLib snapshot.
2025-01-27 12:32:36 +00:00 · 2015-01-09 16:58:13 -08:00 · 2015-01-09 16:58:13 -08:00 · b95575595f
commit b95575595f
parent b97584eeb6
5 changed files with 177 additions and 30 deletions
--- a/nufxlib/ChangeLog.txt
+++ b/nufxlib/ChangeLog.txt
@ -1,3 +1,5 @@
 2005/01/09  ***** v3.0.0 shipped *****
 2015/01/03  fadden
    - Mac OS X: replace Carbon FinderInfo calls with BSD xattr.
    - Mac OS X: fix resource fork naming.
--- a/nufxlib/NOTES.txt
+++ b/nufxlib/NOTES.txt
@ -1,16 +1,17 @@
 NufxLib NOTES
-Last revised: 2000/01/23
+=============
 Last revised: 2015/01/04
 The interface is documented in "nufxlibapi.html", available from the
-www.nulib.com web site.  This discusses some of the internal design that
+http://www.nulib.com/ web site.  This discusses some of the internal
-may be of interest.
+design that may be of interest.
 Some familiarity with the NuFX file format is assumed.
 - - -
-Read-Write Data Structures
+### Read-Write Data Structures ###
 ==========================
 For both read-only and read-write files (but not streaming read-only files),
 the archive is represented internally as a linked list of Records, each
@ -64,15 +65,15 @@ threads are annotated in the "copy" list.)
 One of the goals was to be able to execute a sequence of operations like:
-    - open original archive
+    open original archive
-    - read original archive
+    read original archive
-    - modify archive
+    modify archive
-    - flush (success)
+    flush (success)
-    - modify archive
+    modify archive
-    - flush (failure, rollback)
+    flush (failure, rollback)
-    - modify archive
+    modify archive
-    - flush (success)
+    flush (success)
-    - close archive
+    close archive
 The archive is opened at the start and held open across many operations.
 There is never a need to re-read the entire archive.  We could avoid the
@ -92,11 +93,11 @@ extraction are minimal.
 In summary:
-  "orig" list has original set of records, and is not disturbed until
+  - "orig" list has original set of records, and is not disturbed until
    the changes are committed.
-  "copy" list is created on first add/update/delete operation, and
+  - "copy" list is created on first add/update/delete operation, and
    initially contains a complete copy of "orig".
-  "new" list contains all new additions to the archive, including
+  - "new" list contains all new additions to the archive, including
    new additions that replace existing entries (the existing entry
    is deleted from "copy" and then added to "new").
@ -106,9 +107,9 @@ Any changes to the record header or additions to the thread mod list are
 made in the "copy" set; the "original" set remains untouched.  The thread
 mod list can have the following items in it:
-    - delete thread (NuThreadIdx)
+  - delete thread (NuThreadIdx)
-    - add thread (type, otherSize, format, +contents)
+  - add thread (type, otherSize, format, +contents)
-    - update pre-sized thread (NuThreadIdx, +contents)
+  - update pre-sized thread (NuThreadIdx, +contents)
 Contents are specified with a NuDataSource, which allows the application
 to indicate that the data is already compressed.  This is useful for
@ -179,9 +180,9 @@ is possible that the archive could be unrecoverably damaged.  NufxLib
 tries to identify such situations, and will leave the archive open in
 read-only mode after rolling back any new file additions.
 - - -
-Updating Filenames
+### Updating Filenames ###
 ==================
 Updating filenames is a small nightmare, because the filename can be
 either in the record header or in a filename thread.  It's possible,
@ -191,16 +192,148 @@ header and two or more filenames in threads.
 NufxLib will not automatically "fix" broken records, but it will prevent
 applications from creating situations that should not exist.
-  When reading an archive, NufxLib will use the filename from the
+  - When reading an archive, NufxLib will use the filename from the
  first filename thread found.  If no filename threads are found, the
  filename from the record header will be used.
-  If you add a filename thread to a record that has a filename in the
+  - If you add a filename thread to a record that has a filename in the
  record header, the header name will be removed.
-  If you update a filename thread in a record that has a filename in
+  - If you update a filename thread in a record that has a filename in
  the record header, the header name will be left untouched.
-  Adding a filename thread is only allowed if no filename thread exists,
+  - Adding a filename thread is only allowed if no filename thread exists,
  or all existing filename threads have been deleted.
 - - -
 ### Unicode Filenames ###
 Modern operating systems support filenames with a broader range of
 characters than the Apple II did.  This presents problems and opportunities.
 #### Background ####
 The Apple IIgs and old Macintoshes use the Mac OS Roman ("MOR") character
 set.  This defines a set of characters outside the ASCII range, i.e.
 byte values with the high bit set.  In addition to the usual collection
 of vowels with accents and umlauts, MOR has some less-common characters,
 including the Apple logo.
 On Windows, the high-ASCII values are generally interpreted according
 to Windows Code Page 1252 ("CP-1252"), which defines a similar set
 of vowels with accents and miscellaneous symbols.  MOR and CP-1252
 have some overlap, but you can't really translate one into the other.
 The standards-approved equivalent of CP-1252 is ISO-8859-1, though
 according to [wikipedia](http://en.wikipedia.org/wiki/Windows-1252)
 there was some confusion between the two.
 Modern operating systems support the Unicode Universal Character Set.
 This system allows for a very large number of characters (over a million),
 and includes definitions for all of the symbols in MOR and CP-1252.
 Each character is assigned a "code point", which is a numeric value between
 zero and 0x10FFFF.  Most of the characters used in modern languages can
 be found in the Basic Multilingual Plane (BMP), which uses code points
 between zero and 0xFFFF (requiring only 16 bits).
 There are different ways of encoding code points.  Consider, for example,
 Unicode LATIN SMALL LETTER A WITH ACUTE:
    MOR: 0x87
    CP-1252: 0xE1
    Unicode: U+00E1
    UTF-16: 0x00E1
    UTF-8: 0xC3 0xA1
 Or the humble TRADE MARK SIGN:
    MOR: 0xAA
    CP-1252: 0x99
    Unicode: U+2122
    UTF-16: 0x2122
    UTF-8: 0xE2 0x84 0xA2
 Modern Linux and Mac OS X use UTF-8 encoding in filenames.  Because it's a
 byte-oriented encoding, and 7-bit ASCII values are trivially represented
 as 7-bit ASCII values, all of the existing system and library calls work
 as they did before (i.e. if they took a `char*`, they still do).
 Windows uses UTF-16, which requires at least 16 bits per code point.
 Filenames are now "wide" strings, based on `wchar_t*`.  Windows includes
 an elaborate system of defines based around the `TCHAR` type, which can
 be either `char` or `wchar_t` depending on whether a program is compiled
 with `_MBCS` (Multi-Byte Character System) or `_UNICODE`.  A set of
 preprocessor definitions is provided that will map I/O function names,
 so you can call `_tfopen(TCHAR* ...)`, and the compiler will turn it into
 either `fopen(char* ...)` or `_wfopen(wchar_t* ...)`.  MBCS is deprecated
 in favor of Unicode, so any new code should be strictly UTF-16 based.
 This means that, for code to work on both Linux and Windows, it has to
 work with incompatible filename string types and different I/O functions.
 #### Opening Archive Files ####
 On Linux and Mac OS X, NuLib2 can open any file named on the command line.
 On Windows, it's a bit trickier.
 The problem is that NuLib2 provides a `main()` function that is passed a
 vector of "narrow" strings.  The filenames provided on the command line
 will be converted from wide to narrow, so unless the filename is entirely
 composed of ASCII or CP-1252 characters, some information will be lost
 and it will be impossible to open the file.
 NuLib2 must instead provide a `wmain()` function that takes wide strings.
 The strings must be stored and passed around as wide throughout the
 program, and passed into NufxLib this way (because NufxLib issues the
 actual _wopen call).  This means that NufxLib API must take narrow strings
 when built for Linux, and wide strings when built for Windows.
 #### Adding/Extracting Mac OS Roman Files ####
 GS/ShrinkIt was designed to handle GS/OS files from HFS volumes, so NuFX
 archive filenames use the MOR character set.  To preserve the encoding
 we could simply extract the values as-is and let them appear as whatever
 values happen to line up in CP-1252, which is what pre-3.0 NuLib2 did.
 It's much nicer to translate from MOR to Unicode when extracting, and
 convert back from Unicode to MOR when adding files to an archive.
 The key consideration is that the character set associated with a
 filename must be tracked.  The code can't simply extract a filename from
 the archive and pass it to a 'creat()` call.  Character set conversions
 must take place at appropriate times.
 With Windows it's a bit harder to confuse MOR and Unicode names, because
 one uses 8-bit characters and the other uses UTF-16, but the compiler
 doesn't catch everything.
 #### Current State ####
 NufxLib defines the UNICHAR type, which has a role very like TCHAR:
 it can be `char*` or `wchar_t*`, and can be accompanied by a set of
 preprocessor mappings that switch between I/O functions.  The UNICHAR
 type will be determined based on a define provided from the compiler
 command line (perhaps `-DUSE_UTF16_FILENAMES`).
 The current version of NufxLib (v3.0.0) takes the first step, defining
 all filename strings as either UNICHAR or MOR, and converting between them
 as necessary.  This, plus a few minor tweaks to NuLib2, was enough to
 get Unicode filename support working on Linux and Mac OS X.
 None of the work needed to make Windows work properly has been done.
 The string conversion functions are no-ops for Win32.  As a result,
 NuLib2 for Windows treats filenames the same way in 3.x as it did in 2.x.
 There are some situations where things can go awry even with UNICHAR,
 most notably printf-style arguments.  These are checked by gcc, but
 not by Visual Studio unless you run the static analyzer.  A simple
 `printf("filename=%s\n", filename)` would be correct for narrow strings
 but wrong for wide strings.  It will likely be necessary to define a
 filename format string (similar to `PRI64d` for 64-bit values) and switch
 between "%s" and "%ls".
 This is a fair bit of work and requires some amount of uglification to
 NuLib2 and NufxLib.  Since Windows users can use CiderPress, and the
 vast majority of NuFX archives use ASCII-only ProDOS file names, it's
 not clear that the effort would be worthwhile.
--- a/nufxlib/NufxLib.h
+++ b/nufxlib/NufxLib.h
@ -596,12 +596,12 @@ typedef struct NuSelectionProposal {
 */
 typedef struct NuPathnameProposal {
    const UNICHAR*  pathnameUNI;
-    char            filenameSeparator;
+    UNICHAR         filenameSeparator;
    const NuRecord* pRecord;
    const NuThread* pThread;
    const UNICHAR*  newPathnameUNI;
-    uint8_t         newFilenameSeparator;
+    UNICHAR         newFilenameSeparator;
    /*NuThreadID      newStorage;*/
    NuDataSink*     newDataSink;
 } NuPathnameProposal;
@ -792,7 +792,7 @@ NUFXLIB_API NuError NuAddFile(NuArchive* pArchive, const UNICHAR* pathnameUNI,
            const NuFileDetails* pFileDetails, short fromRsrcFork,
            NuRecordIdx* pRecordIdx);
 NUFXLIB_API NuError NuRename(NuArchive* pArchive, NuRecordIdx recordIdx,
-            const char* pathnameMOR, UNICHAR fssep);
+            const char* pathnameMOR, char fssep);
 NUFXLIB_API NuError NuSetRecordAttr(NuArchive* pArchive, NuRecordIdx recordIdx,
            const NuRecordAttr* pRecordAttr);
 NUFXLIB_API NuError NuUpdatePresizedThread(NuArchive* pArchive,
--- a/nufxlib/samples/ImgConv.c
+++ b/nufxlib/samples/ImgConv.c
@ -277,7 +277,7 @@ NuError CreateDosSource(const ImgHeader* pHeader, FILE* fp,
     * reversible transformation, i.e. if you do this twice you're back
     * to ProDOS ordering.
     */
-    for (offset = 0; offset < pHeader->dataLen; offset += 4096) {
+    for (offset = 0; offset < (long) pHeader->dataLen; offset += 4096) {
        size_t ignored;
        ignored = fread(diskBuffer + offset + 0x0000, 256, 1, fp);
        ignored = fread(diskBuffer + offset + 0x0e00, 256, 1, fp);
--- a/nufxlib/samples/README-S.txt
+++ b/nufxlib/samples/README-S.txt
@ -8,6 +8,9 @@ test-basic
 Basic tests.  Run this to verify that things are working.
 On Win32 there will be a second executable, test-basic-d, that links against
 the DLL rather than the static library.
 exerciser
 =========
@ -82,6 +85,15 @@ of memory on very large archives, you can reduce the memory requirements
 by specifying the "-f" flag.
 test-names
 ==========
 Tests Unicode filename handling.  Run without arguments.
 (This currently fails on Win32 because the Unicode filename support is
 incomplete there.)
 test-simple
 ===========