mirror of
https://github.com/fadden/nulib2.git
synced 2025-01-15 23:31:40 +00:00
9e761fbd66
Give the msnavigation junk a swift kick in the stupid.
962 lines
52 KiB
HTML
962 lines
52 KiB
HTML
<html>
|
|
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
|
|
<meta name="ProgId" content="FrontPage.Editor.Document">
|
|
<title>FTN.e08002</title>
|
|
<meta content="t, default" name="Microsoft Border">
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000">
|
|
|
|
<h2>Apple II FTN - ShrinkIt (NuFX) document</h2>
|
|
<p><a href="index.htm">Back to nulib.com library</a></p>
|
|
<h3>NOTE to CRC-16 seekers:</h3>
|
|
<p>Looks like a lot of people are hitting this page while looking for a CRC-16
|
|
algorithm. That's not what this page is about. If you want a CRC-16
|
|
implementation in C, try <a href="Crc16.c.txt">this one</a>.
|
|
If you want one in 6502 assembly, skip down to the end of this document.
|
|
<p>
|
|
<hr><pre>
|
|
Apple II
|
|
File Type Notes
|
|
_____________________________________________________________________________
|
|
Developer Technical Support
|
|
|
|
File Type: $E0 (224)
|
|
Auxiliary Type: $8002
|
|
|
|
Full Name: NuFile Exchange Archival Library
|
|
Short Name: ShrinkIt (NuFX) document
|
|
|
|
Revised by: Andy Nicholas and Matt Deatherage July 1990
|
|
Written by: Matt Deatherage July 1989
|
|
|
|
Files of this type and auxiliary type contain NuFX Archival Libraries.
|
|
Changes since July 1989: Rewrote major portions to reflect Master Version
|
|
$0002 of the NuFX standard.
|
|
_____________________________________________________________________________
|
|
|
|
Introduction
|
|
|
|
NuFX is a robust, full-featured archival standard for the Apple II family.
|
|
The standard, as presented in this Note, allows for full archival of ProDOS
|
|
and GS/OS files while keeping all file attributes with each file, as well as
|
|
providing necessary archival functions such as multiple compression schemes
|
|
and multiple archival implementations of the same standard. NuFX is
|
|
implemented in the application ShrinkIt, a free archival utility program for
|
|
enhanced IIe, IIc and IIgs computers. (Versions for earlier Apple II models
|
|
are also available.)
|
|
|
|
The NuFX standard was developed by Andrew Nicholas for Paper Bag Productions.
|
|
Comments or suggestions on the NuFX standard, or comments and suggestions on
|
|
ShrinkIt are welcome at:
|
|
|
|
Paper Bag Productions
|
|
8415 Thornberry Drive East
|
|
Upper Marlboro, MD 20772
|
|
Attn: NuFX Technical Support
|
|
America Online: ShrinkIt
|
|
GEnie: ShrinkIt
|
|
CompuServe: 70771,2615
|
|
|
|
|
|
History
|
|
|
|
The Apple II community has always lacked a well-defined method for archiving
|
|
files. NuFX is an attempt to rectify the situation by providing a flexible,
|
|
consistent standard for archiving files, disks, and other computer media.
|
|
Although many files are archived using the Binary II standard (see Apple II
|
|
File Type Note, File Type $E0, Auxiliary Type $8000), it was not designed as
|
|
an archival standard and its continued use as such creates problems. More
|
|
people are using Binary II as an archival standard than as a way to keep
|
|
attributes with a file when transferred, and this use is causing the original
|
|
intent of Binary II to become lost and unused.
|
|
|
|
NuFX, developed as an archival standard for the days of GS/OS, allows:
|
|
|
|
o Filenames longer than 64 characters (GS/OS can create 8,000-
|
|
character filenames).
|
|
o A convenient way to add to, remove from, and work on an archive.
|
|
o Including GS/OS files which contain resource forks.
|
|
o Including entire disk images.
|
|
o Including comments with a file.
|
|
o A convenient way to represent a file compressed or encrypted by a
|
|
specific application.
|
|
o A true archive standard. Binary IIs original intent was to make
|
|
transfer of Apple II files from local machines to large
|
|
information services possible; otherwise, a file's attribute
|
|
information would be lost. Use of Binary II to archive files
|
|
rather than simply maintain their attributes stretches it beyond
|
|
its original intent.
|
|
|
|
Adding all of these features to the existing Binary II standard would be
|
|
nearly impossible without violating the existing standard and causing a great
|
|
deal of confusion. Although Binary II is flexible, it is simply unable to
|
|
address all of these concerns without alienating existing Binary II extraction
|
|
programs.
|
|
|
|
To provide some differentiation between standards and provide a better
|
|
functioning format, this Note presents a new standard called NuFX (NuFile
|
|
eXchange for the Apple II; pronounced new-F-X). NuFX fixes the problems that
|
|
Apple IIgs users would soon be experiencing as other filing systems become
|
|
available for GS/OS. NuFX attempts to stem a set of problems before they have
|
|
a chance to develop. NuFX provides all of the features of Binary II, but goes
|
|
further to allow the user the ultimate in flexibility, usefulness and
|
|
performance.
|
|
|
|
|
|
Additional Date/Time Data type:
|
|
|
|
Date/Time (8 Bytes):
|
|
|
|
+000 second Byte The second, 0 through 59.
|
|
+001 minute Byte The minute, 0 through 59.
|
|
+002 hour Byte The hour, 0 through 23.
|
|
+003 year Byte The current year minus 1900.
|
|
+004 day Byte The day, 0 through 30.
|
|
+005 month Byte The month, 0 through 11 (0 = January).
|
|
+006 filler Byte Reserved, must be zero.
|
|
+007 weekDay Byte The day of the week, 1 through 7 (1 = Sunday).
|
|
|
|
The format of the Date/Time field is identical to that described for the
|
|
ReadTimeHex call in the Apple IIgs Toolbox Reference Manual.
|
|
|
|
|
|
Implementation
|
|
|
|
Figure 1 illustrates the basic structure of a NuFX archive.
|
|
|
|
| First Record | Next Record |
|
|
_______________|_______________________|_______________________|
|
|
| Master Header | Header | Data | Header | Data |
|
|
|_______________|________|______________|________|______________|
|
|
|
|
Figure 1-NuFX Archive Structure
|
|
|
|
A single master header block contains values which describe the entire archive
|
|
(those with knowledge of structured programming may consider them archive
|
|
globals). Each of the succeeding header blocks contains only information
|
|
about the record it precedes (consider each an archive local).
|
|
|
|
Each header block is followed by a list of threads, which is followed by the
|
|
actual threads. The data for each thread may be a data fork, resource fork,
|
|
message, control sequence for a NuFX utility program, or almost any kind of
|
|
sequential data.
|
|
|
|
Possible Block Combinations:
|
|
|
|
The blocks must occur in the following fashion:
|
|
|
|
Master Header Block containing N entries
|
|
|
|
Header Block
|
|
Threads list:
|
|
filename_thread (16 bytes)
|
|
message_thread (16 bytes)
|
|
data thread (16 bytes)
|
|
.
|
|
.
|
|
.
|
|
filename_thread's data (filename_thread's comp_thread_eof # of bytes)
|
|
message_thread's data (message_thread's comp_thread_eof # of bytes)
|
|
data_thread's data (data_thread's comp_thread_eof # of bytes)
|
|
.
|
|
.
|
|
.
|
|
Next Header Block (notice no second Master Header block)
|
|
Threads list (message, control, data or resource)
|
|
.
|
|
.
|
|
.
|
|
Nth Header Block
|
|
Threads list (message, control, data or resource)
|
|
|
|
Master Header Block Contents
|
|
|
|
+000 nufile_id 6 Bytes These six bytes spell the word "NuFile" in
|
|
alternating ASCII (low, then high) for
|
|
uniqueness. The six bytes are $4E $F5 $46
|
|
$E9 $6C $E5.
|
|
+006 master_crc Word A 16-bit cyclic redundancy check
|
|
(CRC) of the remaining fields in this
|
|
block (bytes +008 through +047). Any
|
|
programs which modify the master header
|
|
block must recalculate the CRC for the
|
|
master header. (see the section "A Sample
|
|
CRC Algorithm") The initial value of this
|
|
CRC is $0000.
|
|
+008 total_records Long The total number of records in this
|
|
archive file. It is possible to chain
|
|
multiple records (files or disks)
|
|
together, as it is possible to chain
|
|
different types of records together (mixed
|
|
files and disks).
|
|
+012 archive_create_when
|
|
Date/Time The date and time on which this archive
|
|
was initially created. This field should
|
|
never be changed once initially written.
|
|
If the date is not known, or is unable to
|
|
be calculated, this field should be set to
|
|
zero. If the weekday is not known, or is
|
|
unable to be calculated, this field should
|
|
be set to null.
|
|
+020 archive_mod_when
|
|
Date/Time The date of the last modification to this
|
|
archive. This field should be changed
|
|
every time a change is made to any of the
|
|
records in the archive. If the date is
|
|
not known, or is unable to be calculated,
|
|
this field should be set to zero. If the
|
|
weekday is not known, or is unable to be
|
|
calculated, this field should be set to
|
|
null.
|
|
+028 master_version
|
|
Word The master version number of the NuFX
|
|
archive. This Note describes
|
|
master_version $0002, for which the next
|
|
eight bytes are zeroed.
|
|
+030 reserved 8 Bytes Must be null ($00000000).
|
|
+038 master_eof Long The length of the NuFX archive, in
|
|
bytes. Any programs which modify the
|
|
length of an archive, either increasing it
|
|
or decreasing it in size, must change this
|
|
field in the master header to reflect the
|
|
new size.
|
|
|
|
Header Block Contents:
|
|
|
|
Following the Master Header block is a regular Header Block, which precedes
|
|
each record within the NuFX archive. A cyclic redundancy check (CRC) has been
|
|
provided to detect archives which have possibly been corrupted. The only time
|
|
the CRC should be included in a block is for the Master Header and for each of
|
|
the regular Header Blocks. The CRC ensures reliability and data integrity.
|
|
|
|
+000 nufx_id 4 Bytes These four bytes spell the word "NuFX" in
|
|
alternating ASCII (low, then high) for
|
|
uniqueness. The four bytes are $4E $F5
|
|
$46 $D8.
|
|
+004 header_crc Word The 16-bit CRC of the remaining
|
|
fields of this block (bytes +006 through
|
|
the end of the header block and any
|
|
threads following it). This field is used
|
|
to verify the integrity of the rest of the
|
|
block. Programs which create NuFX
|
|
archives must include this in every
|
|
header. It is up to the discretion of the
|
|
extracting program to check the validity
|
|
of this CRC. Any programs which might
|
|
modify the header of a particular record
|
|
must recalculate the CRC for the header
|
|
block. The initial value for this CRC is
|
|
zero ($0000).
|
|
+006 attrib_count Word This field describes the length of
|
|
the attribute section of each record in
|
|
bytes. This count measures the distance
|
|
in bytes from the first field (offset
|
|
+000) up to and including the
|
|
filename_length field. By convention, the
|
|
filename_length field will always be the
|
|
last 2 bytes of the attribute section
|
|
regardless of what has preceded it.
|
|
+008 version_number
|
|
Word Version of this record. If version_number
|
|
is $0000, no option_list fields are
|
|
present. If the version_number is $0001
|
|
option_list fields may be present. If the
|
|
version_number is $0002 then option_list
|
|
fields may be present and a valid CRC-16
|
|
exists for the compressed data in the data
|
|
threads of this record. If the
|
|
version_number is $0003 then option_list
|
|
fields may be present and a valid CRC-16
|
|
exists for the uncompressed data in the
|
|
data threads of this record. The current
|
|
version number is $0003 and should always
|
|
be used when making archives.
|
|
+010 total_threads Long The number of thread subrecords
|
|
which should be expected immediately
|
|
following the filename or pathname at the
|
|
end of this header block. This field is
|
|
extremely important as it contains the
|
|
information about the length of the last
|
|
third of the header.
|
|
+014 file_sys_id Word The native file system identifier:
|
|
$0000 reserved
|
|
$0001 ProDOS/SOS
|
|
$0002 DOS 3.3
|
|
$0003 DOS 3.2
|
|
$0004 Apple II Pascal
|
|
$0005 Macintosh HFS
|
|
$0006 Macintosh MFS
|
|
$0007 Lisa File System
|
|
$0008 Apple CP/M
|
|
$0009 reserved, do not use (The
|
|
GS/OS Character FST returns
|
|
this value)
|
|
$000A MS-DOS
|
|
$000B High Sierra
|
|
$000C ISO 9660
|
|
$000D AppleShare
|
|
$000E-$FFFF Reserved, do not use
|
|
If the file system of a disk being
|
|
archived is not known, it should be set to
|
|
zero.
|
|
+016 file_sys_info Word Information about the current filing
|
|
system. The low byte of this word (offset
|
|
+016) is the native file system separator.
|
|
For ProDOS, this is the slash (/ or $2F).
|
|
For HFS and GS/OS, the colon (: or $3F) is
|
|
used, and for MS-DOS, the separator is the
|
|
backslash (\ or $5C). This separator is
|
|
provided so archival utilities may know
|
|
how to parse a valid file or pathname from
|
|
the filename field for the receiving file.
|
|
GS/OS archival utilities should not
|
|
attempt to parse pathnames, as it is not
|
|
possible to build in syntax rules for file
|
|
systems not currently defined. Instead,
|
|
pass the pathname directory to GS/OS and
|
|
attempt translation (asking the user for
|
|
suggestions) only if GS/OS returns an
|
|
"Invalid Path Name Syntax" error. The
|
|
high byte of this word is reserved and
|
|
should remain zero.
|
|
+018 access Flag Long Bits 31-8 reserved, must be zero
|
|
Bit 7 (D) 1 = destroy enabled
|
|
Bit 6 (R) 1 = rename enabled
|
|
Bit 5 (B) 1 = file needs to be
|
|
backed up
|
|
Bits 4-3 reserved, must be zero
|
|
Bit 2 (I) 1 = file is invisible
|
|
Bit 1 (W) 1 = write enabled
|
|
Bit 0 (R) 1 = read enabled
|
|
+022 file_type Long The file type of the file being archived.
|
|
For ProDOS 8 or GS/OS, this field should
|
|
always be what the operating system
|
|
returns when asked. For disks being
|
|
archived, this field should be zero.
|
|
+026 extra_type Long The auxiliary type of the file being
|
|
archived. For ProDOS 8 or GS/OS, this
|
|
field should always be what the operating
|
|
system returns when asked. For disks
|
|
being archived, this field should be the
|
|
total number of blocks on the disk.
|
|
+030 storage_type Word For Files: The storage type of the
|
|
file. Types $1 through $3 are standard
|
|
(one-forked) files, type $5 is an extended
|
|
(two-forked) file, and type $D is a
|
|
subdirectory.
|
|
file_sys_block_size
|
|
Word For Disks: The block size used by the
|
|
device should be placed in this field.
|
|
For example, under ProDOS, this field will
|
|
be 512, while for HFS it might be 524.
|
|
The GS/OS Volume call will return this
|
|
information if asked.
|
|
+032 create_when Date/Time The date and time on which
|
|
this record was initially created. If the
|
|
creation date and time are available from
|
|
a disk device, this information should be
|
|
included. If the date is not known, or is
|
|
unable to be calculated, this field should
|
|
be set to zero. If the weekday is not
|
|
known, or is unable to be calculated, this
|
|
field should be set to zero.
|
|
+040 mod_when Date/Time The date and time on which this record was
|
|
last modified. If the modification date
|
|
and time are available from a disk device,
|
|
this information should be included. If
|
|
the date is not known, or is unable to be
|
|
calculated, this field should be set to
|
|
zero. If the weekday is not known, or is
|
|
unable to be calculated, this field should
|
|
be set to zero.
|
|
+048 archive_when Date/Time The date and time on which
|
|
this record was placed in this archive.
|
|
If the date is not known, or is unable to
|
|
be calculated, this field should be set to
|
|
zero. If the weekday is not known, or is
|
|
unable to be calculated, this field should
|
|
be set to zero.
|
|
|
|
The following option_list information is only present if the NuFX version
|
|
number for this record is $0001 or greater.
|
|
|
|
+056 option_size Word The length of the FST-specific
|
|
portion of a GS/OS option_list returned by
|
|
GS/OS. This field may be $0000,
|
|
indicating the absence of a valid
|
|
option_list.
|
|
|
|
A GS/OS option_list is formatted as follows:
|
|
|
|
+000 buffer_size
|
|
Word Size of the buffer for GS/OS to
|
|
place the option_list in, including
|
|
this count word. This must be at
|
|
least $2E.
|
|
+002 list_size
|
|
Word The number of bytes of information
|
|
returned by GS/OS.
|
|
+004 file_sys_ID
|
|
Word A file system ID word (see list
|
|
above) identifying the FST owning
|
|
the file in question.
|
|
+006 option_bytes
|
|
Bytes The bytes returned by the FST.
|
|
There are (buffer_size - 6) of them.
|
|
|
|
The option_list contains information specific to native file systems that
|
|
GS/OS doesn't normally use (such as true creator_type, file_type, and access
|
|
privileges for AppleShare). Other FSTs released in the future will follow
|
|
similar conventions to return native file system specific parameters in the
|
|
option_list. Information in the option_list should always be copied from file
|
|
to file.
|
|
|
|
The value option_size in the NuFX header is the value of list_size minus two.
|
|
Immediately following the option_size count word are (list_size - 2) bytes.
|
|
To pass these values back to the destination file system, construct an
|
|
option_list with a suitably large buffer_size, a list_size of the NuFX
|
|
option_size + 2, the file_sys_id of the source file, and the FST-returned
|
|
option_bytes.
|
|
|
|
+058 list_bytes Bytes FST-specific bytes returned in an
|
|
option_list. These are the bytes in the
|
|
GS/OS option_list not including the FST ID
|
|
word. There are option_size of them. If
|
|
option_size is an odd number, one zero
|
|
byte of padding is added to keep the block
|
|
size an even number.
|
|
|
|
Because the attributes section does not have a fixed size, the next field must
|
|
be found by looking two bytes before the offset indicated by attrib_count
|
|
(+006).
|
|
|
|
+attrib_count - 2
|
|
filename_length
|
|
Word Obsolete, should be set to zero. In
|
|
previous versions of NuFX, this field was
|
|
the length of a file name or pathname
|
|
immediately following this field.
|
|
|
|
To allow the inclusion of future
|
|
additional parameters in the attributes
|
|
section, NuFX utility programs should rely
|
|
on the attribs_count field to find the
|
|
filename_length field.
|
|
|
|
Current convention is to zero this field
|
|
when building an archive and put the file
|
|
or pathname into a filename thread so the
|
|
record can be renamed in the archive.
|
|
Archival programs should recognize both
|
|
methods to find a valid file name or
|
|
pathname.
|
|
+attrib_count
|
|
filename Bytes Filename or partial pathname if
|
|
applicable. If this is a disk being
|
|
archived, then the volume_name should be
|
|
included in this field. If a volume name
|
|
is included in this field, a separator
|
|
should not be included in, or precede the
|
|
name. If a volume name is not available,
|
|
then this field should be zeros.
|
|
|
|
If a partial pathname is specified, the
|
|
directories to which the current pathname
|
|
refers need not have preceded this
|
|
particular record. The extraction program
|
|
must test each referenced directory
|
|
individually. If the directory in
|
|
question does not exist, the extracting
|
|
program should create it.
|
|
|
|
Any utility which extracts file from a
|
|
NuFX archive must not assume that this
|
|
field will be in a format it is able to
|
|
handle. In particular, extraction
|
|
programs should check for syntax
|
|
unacceptable to the operating system under
|
|
which they run and perform whatever
|
|
conversions are necessary to parse a legal
|
|
filename or pathname. In general, assume
|
|
nothing. (GS/OS programs should pass the
|
|
filename or pathname directly to GS/OS,
|
|
and only attempt to convert the name if
|
|
GS/OS returns an "invalid pathname syntax"
|
|
error.)
|
|
|
|
Both high and low ASCII values are valid
|
|
but may not mean the same to each file
|
|
system (for example, all eight bits are
|
|
significant in AppleShare pathnames while
|
|
only seven are significant in ProDOS
|
|
pathnames).
|
|
|
|
|
|
Threads
|
|
|
|
Thread Records are 16-byte records which immediately follow the Header Block
|
|
(composed of the attributes and file name of the current record) and describe
|
|
the types of data structures which are included with a given record. The
|
|
number of Thread Records is described in the attribute section by a Word,
|
|
total_threads.
|
|
|
|
Each Thread Record should be checked for the type of information that a given
|
|
utility program can extract. If a utility is incapable of extracting a
|
|
particular thread, that thread should be skipped (with the exception of
|
|
extended files under ProDOS 8, which should be dearchived into AppleSingle
|
|
format, or both threads should be skipped). If a utility finds a redundancy
|
|
in a Thread Record, it must decide whether to skip the record or to do
|
|
something with that particular thread (i.e., if a utility finds two
|
|
message_thread threads it can either ignore the second one or display it.
|
|
Likewise, if a utility finds two data_thread threads for the same file, it
|
|
should inspect the thread_kind of each. If they match, it can either
|
|
overwrite the first thread extracted, or warn the user and skip the second
|
|
thread).
|
|
|
|
Thread records can be represented as follows:
|
|
|
|
+000 thread_class Word The classification of the thread:
|
|
$0000 message_thread
|
|
$0001 control_thread
|
|
$0002 data_thread
|
|
$0003 filename_thread
|
|
+002 thread_format Word The format of the data within the thread:
|
|
$0000 Uncompressed
|
|
$0001 Huffman Squeeze
|
|
$0002 Dynamic LZW/1 (ShrinkIt
|
|
specific)
|
|
$0003 Dynamic LZW/2 (ShrinkIt
|
|
specific)
|
|
$0004 Unix 12-bit Compress
|
|
$0005 Unix 16-bit Compress
|
|
+004 thread_kind Word Describes the kind of data within
|
|
the thread.
|
|
|
|
thread_kind must be interpreted on the basis of thread_class. See the table
|
|
below for the currently defined thread_kind interpretations:
|
|
|
|
class $0000 class $0001 class $0002 class $0003
|
|
----------- ---------------- --------------------- -----------
|
|
kind $0000 ASCII text create directory data fork of file filename
|
|
kind $0001 see below undefined disk image undefined
|
|
kind $0002 see below undefined resource fork of file undefined
|
|
|
|
+006 thread_crc Word For version_number $0003, this field
|
|
is the CRC of the original data before it
|
|
was compressed or otherwise transformed.
|
|
The CRC-16's initial value is set to $FFFF.
|
|
+008 thread_eof Long The length of the thread when uncompressed.
|
|
+012 comp_thread_eof
|
|
Long The length of the thread when compressed.
|
|
|
|
Class $0000 with kind $0000 is obsolete and should not be used.
|
|
|
|
Class $0000 with kind $0001 has a predefined comp_thread_eof and a thread_eof
|
|
whose length may change. This way, a certain amount of space may be allocated
|
|
when a record is created and edited later.
|
|
|
|
Class $0000 with kind $0002 is a standard Apple IIgs icon. comp_thread_eof is
|
|
the length of the icon image; thread_eof is ignored.
|
|
|
|
Class $0003 with kind $0000 has a predefined comp_thread_eof and a thread_eof
|
|
whose length may change. After this record is placed into the archive, the
|
|
thread_eof can be changed if the name is changed, but the length of the name
|
|
may not extend beyond the space allocated for it, comp_thread_eof.
|
|
|
|
A thread_format of $0001 indicates Huffman Squeeze. NuFX's Huffman is the
|
|
same Huffman used by ARC v5.x, SQ and USQ, the source of which is publicly
|
|
available and was originally written by Richard Greenlaw. The first word of
|
|
the thread data is the number of nodes followed by the Huffman tree and the
|
|
actual data. This is also the same algorithm decoded by the Apple II version
|
|
of USQ written by Don Elton. The C source to this is widely available.
|
|
|
|
A thread_format of $0002 indicates a special variant of LZW (LZW/1) used by
|
|
ShrinkIt. The first two bytes of this thread are a CRC-16 of the uncompressed
|
|
data within the thread. This CRC-16 is initialized to zero ($0000). The third
|
|
byte is the low-level volume number used by the eight-bit version of ShrinkIt
|
|
to format 5.25" disks. The fourth byte is the run-length character used to
|
|
decode the rest of the thread. The data which comprises the compressed file
|
|
or disk immediately follows the RLE character.
|
|
|
|
When ShrinkIt compresses a file, it reads 4096-byte chunks of the file until
|
|
it reaches the file's EOF. The last 4096-byte chunk is padded with zeroes if
|
|
the file's length is not an exact multiple of 4096. Compressing a disk is
|
|
also done by reading sequential blocks of 4096-bytes.
|
|
|
|
Each 4K chunk is first compressed with RLE compression. The RLE character is
|
|
determined by reading the fourth byte of the thread. The RLE character which
|
|
is used by most current versions of ShrinkIt is $DB. A run of characters is
|
|
represented by three bytes, consisting of the run character, the number of
|
|
characters in the run and the character in the run. If the 4K chunk expands
|
|
after being compressed with RLE then the uncompressed 4K chunk is passed to
|
|
the LZW compressor. If the 4K chunk shrinks after being compressed with RLE
|
|
then the RLE-compressed image of the 4K chunk is passed to the LZW compressor.
|
|
|
|
ShrinkIt's LZW compressor individually compresses each 4K chunk passed to it
|
|
by using variable length (9 to 12 bits) codes. The way that ShrinkIt's LZW
|
|
compressor functions is almost identical to the algorithm used in the public
|
|
domain utility Compress. The first code is $0101. The LZW string table is
|
|
cleared before compressing each 4K chunk. If the compressed chunk increases
|
|
in size, then the previous 4K chunk (which may be run-length-encoded or just
|
|
uncompressed data) is written to the file.
|
|
|
|
The first word of every 4K chunk is aligned to a byte boundary within the file
|
|
and is the length which resulted from the attempt at compressing the chunk
|
|
with RLE. If the value of this word is 4096, then RLE was not successful at
|
|
compressing the chunk. A single byte follows the word and indicates whether
|
|
or not LZW was performed on this chunk. A value of zero indicates that LZW
|
|
was not used, while a value of one indicates that LZW was used and that the
|
|
chunk must first be decompressed with LZW before doing any further processing.
|
|
|
|
To decompress a file, each 4K chunk must first be expanded if it was
|
|
compressed by LZW. If the 4K chunk wasn't compressed with LZW, then the word
|
|
which appears at the beginning of each chunk must be used to determine if the
|
|
data for the current chunk needs to be processed by the run-length decoder.
|
|
If the value of the word is 4096, then run-length decoding does not need to
|
|
occur because the data is uncompressed.
|
|
|
|
If the word indicates that the length of the chunk after being decompressed by
|
|
LZW is 4096-bytes long, then no run-length decoding needs to take place. If
|
|
value of the word is less than 4096 then the chunk must be run-length decoded
|
|
to 4096 bytes.
|
|
|
|
There are four varying degrees of compression which can occur with a chunk: it
|
|
can be uncompressed data. It can be run-length-encoded data without LZW
|
|
compression. It can also be uncompressed data on which RLE was attempted (but
|
|
failed) and then was subsequently compressed with LZW. Or, finally, the chunk
|
|
can be compressed with RLE and then also compressed with LZW.
|
|
|
|
A thread_format of $0003 indicates a special variant of LZW (LZW/2) used by
|
|
ShrinkIt. The first byte is the low-level volume number used by the eight-bit
|
|
version of ShrinkIt to format 5.25" disks. The second byte is the run-length
|
|
character used to decode the rest of the thread. The data which comprises the
|
|
compressed file or disk immediately follows the second byte of the thread.
|
|
|
|
The format of LZW/2 is almost the same as LZW/1 with a few exceptions. Unlike
|
|
LZW/1, where the LZW string table is automatically cleared before each 4K
|
|
chunk is processed, the LZW string table used by LZW/2 is only cleared when
|
|
the table becomes full, indicating a change in the redundancy of the source
|
|
text. Not clearing the string table almost always yields improved compression
|
|
ratios because the compressor's dictionary is not being depleted every 4K and
|
|
larger strings are allowed to accumulate. The clear code used by ShrinkIt is
|
|
$100. Whenever the decompressor sees a $100 code, it must clear the string
|
|
table.
|
|
|
|
The string table is also cleared when the compressor has to "back track"
|
|
because a 4K chunk became larger. Whenever a chunk that is not compressed by
|
|
LZW is seen by the decompressor, the LZW string table must be cleared. Bits
|
|
0-12 of the first word of each chunk in a LZW/2 thread indicate the size of
|
|
the chunk after being compressed with RLE. The high bit (bit 15) indicates
|
|
whether or not LZW was used on the chunk. If LZW was not used (bit 15 = 0),
|
|
the data for the chunk immediately follows the first word. If LZW was used
|
|
(bit 15 = 1), a second word which is a count of the total number of bytes used
|
|
by the current chunk follows the first word. The mark of the next chunk can
|
|
be found by taking the mark at the beginning of the current chunk and adding
|
|
the second word to it, using that as an offset for a ProDOS 8 or GS/OS SetMark
|
|
call. This is not normally necessary because the next chunk is processed
|
|
immediately after the current chunk.
|
|
|
|
This second word is an improvement over LZW/1 because if a chunk becomes
|
|
corrupted, but the second word is valid, the next chunk can be found and most
|
|
of the file recovered. The second word is not needed (and not present) when
|
|
LZW is not used on the chunk because the first word is also a count of the
|
|
number of bytes which follow that word.
|
|
|
|
A thread_format of $0004 indicates that a maximum of 12 bits per LZW code by
|
|
Compress was used to build this thread. The actual thread data contains
|
|
Compress's usual three-byte signature, the third byte of which contains the
|
|
actual number of bits per LZW code that was actually used. The number of bits
|
|
may be less than or equal to 12. Optimally, this requires (at 12 bits) a 16K
|
|
hash table to decode and should be used only for transferring to machines with
|
|
limited amounts of memory. The C source to Compress is in the public domain
|
|
and is widely available.
|
|
|
|
A thread_format of $0005 indicates that a maximum of 16 bits per LZW code by
|
|
Compress was used to build this thread. The actual thread data contains
|
|
Compress's usual three-byte signature, the third byte of which contains the
|
|
actual number of bits per LZW code that was actually used. The number of bits
|
|
may be less than or equal to 16. Optimally, this requires (at 16 bits) a 256K
|
|
hash table to decode. The C source to Compress is in the public domain and is
|
|
widely available.
|
|
|
|
If a control_thread indicates that a directory should be created on the
|
|
destination device, the path to be created must take the form of a ProDOS
|
|
partial pathname. That is, the path must not be preceded with a volume name.
|
|
For example, /Stuff/SubDir is an invalid path for this control_thread, while
|
|
SubDir/AnotherSubDir is valid.
|
|
|
|
If a control_thread indicates that a path is to be created, all subdirectories
|
|
that are contained in the pathname must be created.
|
|
|
|
control_thread threads will eventually be used to control the execution of
|
|
utility programs by allowing them to create, rename, and delete directories
|
|
and files and to move and modify files. A form of scripting language will
|
|
eventually be able to allow utility programs to perform these actions
|
|
automatically. control_thread threads will allow extraction programs to
|
|
perform operations similar to those of the Apple IIgs Installer, allowing
|
|
updates to program sets dependent on such things as creation or modification
|
|
dates and version numbers.
|
|
|
|
|
|
Extra Information
|
|
|
|
If the file system of a particular disk is not known, the file_sys_id field
|
|
should be set to zero, the volume name should also be zeroed, and all the
|
|
other fields pertaining only to files should be set to zero.
|
|
|
|
If the file system of a particular disk is known, as many of the fields as
|
|
possible should be filled with the correct information. Fields which do not
|
|
pertain to an archived disk should remain set to zero.
|
|
|
|
If an entire disk is added to the archive without some form of compression
|
|
(i.e., record_format = uncompressed), then the blocks which comprise the disk
|
|
image must be added sequentially from the first through the last block. Since
|
|
there will be no character included in the data stream to mark the end or
|
|
beginning of a block, extraction programs should rely on the
|
|
file_sys_block_size field to determine how many bytes to read from the record
|
|
to properly fill a block.
|
|
|
|
Some Useful Thread Algorithms:
|
|
|
|
The beginning of the thread records can be found with the following algorithm:
|
|
|
|
Threads := (mark at beginning of header) + (attrib_count) +
|
|
(filename_length)
|
|
|
|
The end of the thread records can be found with the following algorithm:
|
|
|
|
endOfThreads := Threads + (16 * total_threads)
|
|
|
|
The beginning of a data_thread can be found with the following formula:
|
|
|
|
Data Mark := endOfThreads + (comp_thread_eof of all threads in the thread
|
|
list which are not data prior to finding a data_thread)
|
|
|
|
The beginning of a resource_thread may be found with the following algorithm:
|
|
|
|
Resource Mark := endOfThreads + (comp_thread_eof of all threads in the
|
|
thread list which are not data prior to finding a
|
|
resource_thread)
|
|
|
|
The next record can be found using the following algorithm:
|
|
|
|
Next Mark := endOfThreads + (comp_thread_eof of each thread)
|
|
|
|
The file name and its length can be found with the following algorithm:
|
|
|
|
if (filename_length > 0)
|
|
then
|
|
length of filename is filename_length;
|
|
filename is found at attrib_count;
|
|
else
|
|
look through list of threads for a filename_thread;
|
|
if you find one, then length of filename is thread_eof;
|
|
if you don't find one, then you don't have a filename.
|
|
|
|
|
|
Directories
|
|
|
|
Directories are handled almost the same way that normal files are handled with
|
|
the exception that there will be no data in the thread which follows the
|
|
entry. A Thread Record must exist to inform a utility that a directory is to
|
|
be created through the use of the proper control_thread value.
|
|
|
|
Directories do not necessarily have to precede a record which references a
|
|
directory. For example, if a record contains Stuff/MyStuff, the directory
|
|
Stuff need not exist for the extracting program to properly extract the
|
|
record. The extracting program must check to see if each of the directories
|
|
referenced exist, and if one does not exist, create it. While this method
|
|
places a great burden on the abilities of the extraction program, it avoids
|
|
the anomalies associated with the deletion of directories within an archive.
|
|
|
|
|
|
A Sample CRC Algorithm
|
|
|
|
Paper Bag Productions provides the source code to a very fast routine which
|
|
does the CRC calculation as needed for NuFX archives. The routine makeLookup
|
|
needs to be called only once. After the first call, the routine doByte should
|
|
be called repeatedly with each new byte in succession to generate the
|
|
cumulative CRC for the block. The CRC word should be reset to null ($0000)
|
|
before beginning each new CRC.
|
|
|
|
This is the same CRC calculation which is done for CRC/Xmodem and Ymodem. The
|
|
code is easily portable to a 16-bit environment like the Apple IIgs. The only
|
|
detrimental factor with this routine is that it requires 512 bytes of main
|
|
memory to operate. If you can spare the space, this is one of the fastest
|
|
routines Paper Bag Productions knows to generate a CRC-16 on a 6502-type
|
|
machine.
|
|
|
|
The CRC word should be reset to $0000 for normal CRC-16 and to $FFFF before
|
|
generating the CRC on the unpacked data for each data thread.
|
|
|
|
|
|
*-------------------------------
|
|
* fast crc routine based on table lookups by
|
|
* Andy Nicholas - 03/30/88 - 65C02 - easily portable to nmos 6502 also.
|
|
* easily portable into orca/m format, just snip and save.
|
|
* Modified for generic EDAsm type assemblers - MD 6/19/89
|
|
|
|
X6502 turn 65c02 opcodes on
|
|
|
|
*-------------------------------
|
|
* routine to make the lookup tables
|
|
*-------------------------------
|
|
|
|
makeLookup
|
|
LDX #0 zero first page
|
|
zeroLoop STZ crclo,x zero crc lo bytes
|
|
STZ crchi,x zero crc hi bytes
|
|
INX
|
|
BNE zeroLoop
|
|
|
|
*-------------------------------
|
|
* the following is the normal bitwise computation
|
|
* tweeked a little to work in the table-maker
|
|
|
|
docrc
|
|
LDX #0 number to do crc for
|
|
|
|
fetch TXA
|
|
EOR crchi,x add byte into high
|
|
STA crchi,x of crc
|
|
|
|
LDY #8 do 8 bits
|
|
loop ASL crclo,x shift current crc-16 left
|
|
ROL crchi,x
|
|
BCC loop1
|
|
|
|
* if previous high bit wasn't set, then don't add crc
|
|
* polynomial ($1021) into the cumulative crc. else add it.
|
|
|
|
LDA crchi,x add hi part of crc poly into
|
|
EOR #$10 cumulative crc hi
|
|
STA crchi,x
|
|
|
|
LDA crclo,x add lo part of crc poly into
|
|
EOR #$21 cumulative crc lo
|
|
STA crclo,x
|
|
loop1 DEY do next bit
|
|
BNE loop done? nope, loop
|
|
|
|
INX do next number in series (0-255)
|
|
BNE fetch didn't roll over, so fetch more
|
|
RTS done
|
|
|
|
crclo ds 256 space for low byte of crc table
|
|
crchi ds 256 space for high bytes of crc table
|
|
|
|
|
|
*-------------------------------
|
|
* do a crc on 1 byte/fast
|
|
* on initial entry, CRC should be initialized to 0000
|
|
* on entry, A = byte to be included in CRC
|
|
* on exit, CRC = new CRC
|
|
*-------------------------------
|
|
|
|
doByte
|
|
EOR crc+1 add byte into crc hi byte
|
|
TAX to make offset into tables
|
|
|
|
LDA crc get previous lo byte back
|
|
EOR crchi,x add it to the proper table entry
|
|
STA crc+1 save it
|
|
|
|
LDA crclo,x get new lo byte
|
|
STA crc save it back
|
|
|
|
RTS all done
|
|
|
|
crc dw 0000 cumulative crc for all data
|
|
|
|
The following CRC check is written in APW assembler format for an Apple IIgs
|
|
with 16-bit memory and registers on entry.
|
|
|
|
crcByte start
|
|
|
|
crc equ $0
|
|
crca equ $2
|
|
crcx equ $4
|
|
crctemp equ $6
|
|
|
|
sta crca 4
|
|
stx crcx 4
|
|
|
|
eor crc+1 on entry, number to add to CRC 4
|
|
and #$00ff is in (A) 3
|
|
asl a 2
|
|
tax 2
|
|
lda crc16Table,x 5
|
|
and #$00ff 3
|
|
sta crcTemp 4
|
|
|
|
lda crc-1 4
|
|
eor crc16Table,x 5
|
|
and #$ff00 3
|
|
ora crcTemp 4
|
|
sta crc 4
|
|
|
|
lda crca 4
|
|
ldx crcx 4
|
|
rts cycles = 59
|
|
|
|
|
|
;
|
|
; CRC-16 Polynomial = $1021
|
|
;
|
|
crc16table anop
|
|
dc i'$0000, $1021, $2042, $3063, $4084, $50a5, $60c6, $70e7'
|
|
dc i'$8108, $9129, $a14a, $b16b, $c18c, $d1ad, $e1ce, $f1ef'
|
|
dc i'$1231, $0210, $3273, $2252, $52b5, $4294, $72f7, $62d6'
|
|
dc i'$9339, $8318, $b37b, $a35a, $d3bd, $c39c, $f3ff, $e3de'
|
|
dc i'$2462, $3443, $0420, $1401, $64e6, $74c7, $44a4, $5485'
|
|
dc i'$a56a, $b54b, $8528, $9509, $e5ee, $f5cf, $c5ac, $d58d'
|
|
dc i'$3653, $2672, $1611, $0630, $76d7, $66f6, $5695, $46b4'
|
|
dc i'$b75b, $a77a, $9719, $8738, $f7df, $e7fe, $d79d, $c7bc'
|
|
dc i'$48c4, $58e5, $6886, $78a7, $0840, $1861, $2802, $3823'
|
|
dc i'$c9cc, $d9ed, $e98e, $f9af, $8948, $9969, $a90a, $b92b'
|
|
dc i'$5af5, $4ad4, $7ab7, $6a96, $1a71, $0a50, $3a33, $2a12'
|
|
dc i'$dbfd, $cbdc, $fbbf, $eb9e, $9b79, $8b58, $bb3b, $ab1a'
|
|
dc i'$6ca6, $7c87, $4ce4, $5cc5, $2c22, $3c03, $0c60, $1c41'
|
|
dc i'$edae, $fd8f, $cdec, $ddcd, $ad2a, $bd0b, $8d68, $9d49'
|
|
dc i'$7e97, $6eb6, $5ed5, $4ef4, $3e13, $2e32, $1e51, $0e70'
|
|
dc i'$ff9f, $efbe, $dfdd, $cffc, $bf1b, $af3a, $9f59, $8f78'
|
|
dc i'$9188, $81a9, $b1ca, $a1eb, $d10c, $c12d, $f14e, $e16f'
|
|
dc i'$1080, $00a1, $30c2, $20e3, $5004, $4025, $7046, $6067'
|
|
dc i'$83b9, $9398, $a3fb, $b3da, $c33d, $d31c, $e37f, $f35e'
|
|
dc i'$02b1, $1290, $22f3, $32d2, $4235, $5214, $6277, $7256'
|
|
dc i'$b5ea, $a5cb, $95a8, $8589, $f56e, $e54f, $d52c, $c50d'
|
|
dc i'$34e2, $24c3, $14a0, $0481, $7466, $6447, $5424, $4405'
|
|
dc i'$a7db, $b7fa, $8799, $97b8, $e75f, $f77e, $c71d, $d73c'
|
|
dc i'$26d3, $36f2, $0691, $16b0, $6657, $7676, $4615, $5634'
|
|
dc i'$d94c, $c96d, $f90e, $e92f, $99c8, $89e9, $b98a, $a9ab'
|
|
dc i'$5844, $4865, $7806, $6827, $18c0, $08e1, $3882, $28a3'
|
|
dc i'$cb7d, $db5c, $eb3f, $fb1e, $8bf9, $9bd8, $abbb, $bb9a'
|
|
dc i'$4a75, $5a54, $6a37, $7a16, $0af1, $1ad0, $2ab3, $3a92'
|
|
dc i'$fd2e, $ed0f, $dd6c, $cd4d, $bdaa, $ad8b, $9de8, $8dc9'
|
|
dc i'$7c26, $6c07, $5c64, $4c45, $3ca2, $2c83, $1ce0, $0cc1'
|
|
dc i'$ef1f, $ff3e, $cf5d, $df7c, $af9b, $bfba, $8fd9, $9ff8'
|
|
dc i'$6e17, $7e36, $4e55, $5e74, $2e93, $3eb2, $0ed1, $1ef0'
|
|
end
|
|
|
|
Further Reference
|
|
_____________________________________________________________________________
|
|
o ProDOS 8 Technical Reference Manual
|
|
o GS/OS Reference
|
|
o Apple IIgs Toolbox Reference Manual
|
|
o Apple II File Type Note, File Type $E0, Auxiliary Type $8000
|
|
o Apple II Miscellaneous Technical Note #14, Guidelines for
|
|
Telecommunication Programs
|
|
o "A Technique for High-Performance Data Compression," T. Welch,
|
|
IEEE Computer, Vol. 17, No.6, June 1984, pp. 8-19.
|
|
</pre><hr>
|
|
|
|
<address>This document is Copyright by Apple Computer, Inc.</address>
|
|
|
|
<!--msnavigation--></td></tr><!--msnavigation--></table><!--msnavigation--></td></tr><!--msnavigation--></table><!--msnavigation--></td></tr><!--msnavigation--></table><!--msnavigation--></td></tr><!--msnavigation--></table></body>
|
|
|
|
</html>
|