diff --git a/LICENSE b/LICENSE.txt
similarity index 99%
rename from LICENSE
rename to LICENSE.txt
index 8f71f43..57bc88a 100644
--- a/LICENSE
+++ b/LICENSE.txt
@@ -178,7 +178,7 @@
    APPENDIX: How to apply the Apache License to your work.
 
       To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "{}"
+      boilerplate notice, with the fields enclosed by brackets "[]"
       replaced with your own identifying information. (Don't include
       the brackets!)  The text should be enclosed in the appropriate
       comment syntax for the file format. We also recommend that a
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright {yyyy} {name of copyright owner}
+   Copyright [yyyy] [name of copyright owner]
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
diff --git a/README.md b/README.md
index 9ff23a3..68a63c1 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,268 @@
-# fhpack
-Testing...
+fhpack - compression for Apple II hi-res images
+===============================================
+
+fhpack is a compression tool with a singular purpose: to compress
+Apple II hi-res graphics images.
+
+
+## Origins ##
+
+I've had an idea for a project involving hi-res graphics compression
+for several years, but didn't do much about it.  After learning
+about [LZ4](http://lz4.org/), and seeing uncompressors written
+for the [6502](http://pferrie.host22.com/misc/appleii.htm) and
+[65816](http://www.brutaldeluxe.fr/products/crossdevtools/lz4/index.html),
+I decided to see if I could apply LZ4 to hi-res images.
+
+A few hi-res compressors were written back in The Day, usually employing
+run-length encoding, which is easy to write and fast to encode and decode.
+In the spirit of LZ4, I decided to put together an asymmetric codec,
+meaning compression is very very slow, but uncompression is very very fast.
+
+The result is a modified form of LZ4 that consistently beats LZ4-HC,
+and generally comes close to (and occasionally beats) ShrinkIt's LZW/II.
+The decoder is tiny and extremely fast, especially on the 65816 where
+the bulk data copy instructions can be used.
+
+
+## About the fhpack Tool ##
+
+The compressor has two modes, similar to LZ4's "fast" and "high".  The
+fast mode uses greedy parsing and is not particularly fast, while the
+high-compression mode uses optimal parsing and takes 12 times as long.
+Both employ simple brute-force algorithms, which we can get away with
+because we're only compressing 8KB of data.  The high-compression mode
+does about 4% better on average -- not huge, but not negligible.
+
+Other compression programs, such as gzip, produce significantly smaller
+output, but uncompression is much slower and requires more memory.
+
+The comments in [fhpack.cpp](fhpack.cpp) describe the data format.
+
+There is no implementation of the compression side for the 6502.
+An implementation that uses greedy parsing is feasible, as the bulk of the
+time is spent comparing 8-bit strings that are less than 256 bytes long,
+and the 6502 series is pretty good at that.  The optimal parser could
+theoretically be done on a machine with 128KB of RAM, but would take a
+very long time to run.
+
+
+#### Screen Holes ####
+
+The hi-res screen has a curious interleaved structure that leaves "holes"
+in memory -- parts of the frame buffer that don't affect what appears
+on screen.  The screen layout is divided into 128-byte sections, with
+120 bytes of visible data followed by an eight byte "hole".  The holes
+tend to be filled with zeroes, though sometimes they may contain
+garbage or program state.
+
+fhpack can do one of three things with the screen holes:
+
+ 1. Preserve them.  This mode is enabled with the "-h" flag.  If you
+    want the uncompressed data to exactly match the original, you
+    must specify this flag.
+ 2. Fill them with zeroes.
+ 3. Fill them with a pattern that matches the data immediately before
+    or after the hole.
+
+In some cases #2 provides the best results, in others #3 wins.
+The difference is usual minimal, with outliers in the 70-90 byte range.
+On modern hardware fhpack runs very quickly, so when not in hole-preserve
+mode the tool compresses everything twice, and keeps whichever approach
+yielded the smallest output.
+
+
+## Apple II Code and Demos ##
+
+The 6502/65816 versions of the uncompressor (source and binaries), as
+well as two slideshow applications written in Applesoft and a number
+of sample files, are provided on the attached disk images.
+
+There are six disk images.  The first three hold the slide show demo:
+
+ * LZ4FHDemo.do (/LZ4FH, 140KB) - Source and object code for the
+   uncompression routines, plus a few test images and the Applesoft
+   "SLIDESHOW" program.
+ * UncompressedSlides.do (/SLIDESHOW, 140KB) - A set of 16 uncompressed
+   hi-res images.
+ * CompressedSlides.do (/SLIDESHOW, 140KB) - A set of 42 compressed
+   hi-res images.
+
+To view the demo, put the LZ4FHDemo image in slot 6 drive 1, and one
+of the slide disks in slot 6 drive 2.  Boot the disk and "-SLIDESHOW".
+Just hit return at the prompt to accept the default prefix.
+
+The slideshow program will scan the specified directory and identify files
+that appear to be compressed or uncompressed hi-res images.  It will
+then start a slide show, moving through them as quickly as possible.
+By swapping the compressed and uncompressed disks and restarting the
+program, you can compare the performance with and without compression.
+(For a 5.25" disk, it's generally faster to load a compressed image and
+uncompress it than it is to load an uncompressed image.)
+
+
+There is a second demo, called "HYPERSLIDE", which shows off the raw
+performance by eliminating the disk accesses.  A set of 15 images is loaded
+into memory -- overwriting BASIC.System -- and presented as a slide show
+as quickly as possible.  The demo and selected images are on this disk:
+
+ * HyperSlide.po (/HYPERSLIDE, 140KB)
+
+To run the demo, put the disk image in slot 6 drive 1, boot the disk,
+and "-HYPERSLIDE".  If you are running on a IIgs, you may want to try it
+with the 65816 uncompressor, which is much faster than the 6502 version.
+If you want to compute frame timings, you can set an iteration count,
+and the slide show will beep at the start and end.
+
+A larger set of images is available on a pair of 800KB disks.  One disk
+has the compressed form, the other the uncompressed form:
+
+ * UncompressedImages.po (/IMAGES, 800KB)
+ * CompressedImages.po (/IMAGES.LZ4H, 800KB)
+
+It's worth noting that the images on CompressedSlides.do take up about
+135KB of disk space, but are about 104KB combined.  The rest of the space
+is used up by filesystem overhead.  Storing them in a ShrinkIt archive
+would be more efficient, but would also make them far more difficult
+to unpack.
+
+
+#### Decoder Performance ####
+
+Running under AppleWin with "authentic" disk access speed enabled,
+a slide show of uncompressed images runs at about 1.7 seconds per
+image (about 0.6fps).  With compressed images the time varies, because
+the size of the compressed image affects the amount of disk activity,
+but it averages about 1.4 seconds per image (about 0.7 fps).
+
+Removing disk activity from the equation, HyperSlide improves that to
+about 3.7 fps, with very little variation between files.  The decode
+time is dominated by byte copies, and we're always copying 8KB, so the
+consistency is expected.
+
+HyperSlide still has some overhead from Applesoft BASIC.  The "blitz
+test", included on the LZ4FH demo disk, generates machine language
+calls that uncompress the same image 100x, eliminating all overhead
+(and simulating what HyperSlide could do if it weren't written in
+BASIC).  The speed improves to 5.6 fps.
+
+The most significant boost in speed comes from using the 65816 data
+move instructions.  With a 65816 implementation, still running at 1MHz,
+HyperSlide hits 6 fps, and BLITZTEST tops 12 fps.
+
+
+#### Code Notes ####
+
+The uncompressor takes as arguments the addresses of the compressed data
+and the buffer to uncompress to.  These are poked into memory locations
+$02FC and $02FE.  In the current implementation, the output buffer must
+be $2000 or $4000 (the two hi-res pages).
+
+Packed images use the FOT ($08) file type, with an auxtype of $8066
+(0x66 is ASCII 'f').
+
+
+## Experimental Results ##
+
+I grabbed a set of about 70 images, most from games, a few from early
+"contributed program" disks.  The latter include what look like digitized
+scans that don't compress especially well.
+
+All images were compressed with LZ4 r131 in high-compression mode (`lz4
+-9`), NuLib2 with LZW/II, and LZ4FH (`fhpack -9`).  fhpack output has a
+one-byte magic number, while LZ4-HC has 15 bytes of headers and footers,
+so for a fair "raw data" comparison the numbers should be adjusted
+appropriately.
+
+Most source images are 8192 bytes long, some are a few bytes shorter.
+
+Image File              | LZ4-HC | fhpack | LZW/II |
+----------------------- | -----: | -----: | -----: |
+contrib/BABY.JANE       |  3664  |  3617  |  2851  |
+contrib/CHARACTERS      |  1874  |  1836  |  1614  |
+contrib/CHURCHILL       |  4759  |  4723  |  3749  |
+contrib/DIP.CHIPS       |  3840  |  3785  |  3048  |
+contrib/DOLLAR          |  3838  |  3790  |  3483  |
+contrib/DOUBLE.BESSEL   |  3066  |  3010  |  2566  |
+contrib/GIRLS.BEST.FRND |  5967  |  5933  |  4659  |
+contrib/HOPALONG        |  3394  |  3331  |  2713  |
+contrib/JOE.SENT.ME     |  7569  |  7546  |  7702  |
+contrib/LADY.BE.GOOD    |  4119  |  4060  |  3292  |
+contrib/MACROMETER      |  4405  |  4354  |  3613  |
+contrib/MUSIC           |  1077  |  1030  |   929  |
+contrib/RANDOM.LADY     |  5754  |  5720  |  5426  |
+contrib/ROCKY.RACCOON   |  5864  |  5754  |  5598  |
+contrib/SHAKESPEARE     |  4933  |  4884  |  4286  |
+contrib/SPIRALLELLO     |  5722  |  5704  |  5345  |
+contrib/SQUEEZE         |  3209  |  3163  |  2533  |
+contrib/TEQUILA         |  5881  |  5828  |  5484  |
+contrib/TEX             |  4463  |  4413  |  3677  |
+contrib/TIME.MACHINE    |  3628  |  3578  |  3023  |
+contrib/UNCLE.SAM       |  3057  |  3012  |  2784  |
+contrib/WORLD.MAP       |  2792  |  2723  |  2429  |
+games/ABM.TITLE         |  1387  |  1353  |  1581  |
+games/ARCHON.TITLE      |  5607  |  5589  |  4991  |
+games/AZTEC.TITLE       |  6378  |  6362  |  6414  |
+games/BAM.TITLE         |  5380  |  5377  |  4902  |
+games/BANDITS.TITLE     |  1442  |  1388  |  1376  |
+games/BARDS.TALE.1      |  3853  |  3815  |  3523  |
+games/BILESTOAD         |  2140  |  1559  |  2552  |
+games/BORG.TITLE        |  2327  |  2559  |  2009  |
+games/CAPT.GOODNIGHT    |  2839  |  2820  |  2726  |
+games/CHOPLIFTER        |  1088  |  1007  |  1070  |
+games/CRISIS.MT.GAME    |  4144  |  4085  |  4037  |
+games/CRISIS.MT.TITLE   |  2290  |  2246  |  2384  |
+games/DAVID.MIDNIGHT    |  3362  |  3322  |  3225  |
+games/DEFENDER          |   728  |   696  |   678  |
+games/EAMON.TITLE       |  3744  |  3665  |  3252  |
+games/GALACTIC.EMPIRE   |  2161  |  2087  |  1956  |
+games/GERMANY.1985      |  2566  |  2460  |  2390  |
+games/HARD.HAT.MAC      |  1524  |  1457  |  1678  |
+games/KADASH.DEMO       |  1796  |  1736  |  2120  |
+games/KADASH.TITLE      |  5317  |  5294  |  5393  |
+games/KARATE.TITLE      |  4040  |  3845  |  3952  |
+games/KARATEKA.FORT     |  4050  |  3955  |  3452  |
+games/KARATEKA.GAME     |   948  |   904  |  1108  |
+games/LODE.RUNNER       |  1133  |  1102  |  1428  |
+games/MARIO.BROS        |  1472  |  1406  |  1372  |
+games/MAZE.CRAZE        |  2703  |  2659  |  2485  |
+games/MICROWAVE.TITLE   |  2812  |  2737  |  2434  |
+games/NIGHT.FLIGHT      |  1109  |  1024  |  1183  |
+games/ODYSSEY.TITLE     |  3994  |  3953  |  3752  |
+games/OUTWORLD          |  2222  |  2157  |  2296  |
+games/PCS               |  1897  |  1837  |  1861  |
+games/PCS.TITLE         |  1881  |  1841  |  1882  |
+games/QUESTRON.DEMO     |  2569  |  2518  |  2253  |
+games/QUESTRON.TITLE    |  1536  |  1499  |  1837  |
+games/RASTER.BLASTER    |  2687  |  2636  |  2553  |
+games/RESCUE.RAIDERS    |  5377  |  4961  |  4883  |
+games/ROADWAR2K.TITLE   |  2063  |  1983  |  2068  |
+games/SPARE.CHANGE      |  2058  |  2009  |  2268  |
+games/STAR.MAZE         |  1253  |  1208  |  1600  |
+games/STARSHIP.CMDR     |  1453  |  1427  |  1845  |
+games/STELLAR.7         |  1629  |  1412  |  1845  |
+games/SUNDOG.TITLE      |  3250  |  3188  |  3270  |
+games/SWASHBUCK.GAME    |  4690  |  4608  |  4286  |
+games/SWASHBUCK.TITLE   |  5077  |  5035  |  5085  |
+games/TRANQUILITY       |  1409  |  1363  |  1273  |
+games/ULT2.LORD.BRIT    |  1529  |  1514  |  1592  |
+games/ULTIMA2.TITLE     |  2220  |  2176  |  2201  |
+games/WASTELAND.TITLE   |  3540  |  3078  |  3510  |
+games/WAYOUT            |  1691  |  1669  |  1864  |
+games/WOLFEN.TITLE      |  2638  |  2588  |  2610  |
+games/ZAXXON            |  1884  |  1862  |  1769  |
+misc/CRAPS.TABLE        |  2286  |  2266  |  2548  |
+misc/GHOSTBUST.LOGO     |  1829  |  1724  |  1594  |
+misc/LINE.CHART         |  1753  |  1655  |  1578  |
+misc/MICKEY             |  3369  |  3316  |  2945  |
+misc/WHO.LOGO           |  1138  |  1084  |  1218  |
+test/allgreen           |    63  |   137  |   215  |
+test/allzero            |    62  |   136  |    38  |
+test/nomatch            |  8211  |  7928  |  7414  |
+TOTAL                   | 248473 | 242771 | 232201 |
+                        |  37.4% |  36.5% |  34.9% |
+
+Note: test/nomatch is not compressible by LZ4 encoding.  fhpack was able
+to compress it because it zeroed out the "screen holes".  When processed
+in hole-preservation mode, test/nomatch expands to 8292 bytes.
+
diff --git a/fhpack.cpp b/fhpack.cpp
new file mode 100644
index 0000000..41cdb30
--- /dev/null
+++ b/fhpack.cpp
@@ -0,0 +1,1028 @@
+/*
+ * fhpack, an Apple II hi-res picture compressor.
+ * By Andy McFadden
+ * Version 1.0, August 2015
+ *
+ * Copyright 2015 by faddenSoft.  All Rights Reserved.
+ * See the LICENSE.txt file for distribution terms (Apache 2.0).
+ *
+ * Under Linux, you can build it with just:
+ *   g++ -O2 fhpack.cpp -o fhpack
+ */
+// TODO: prompt before overwriting output file (add "-f" to force)
+
+/*
+Format summary:
+
+LZ4FH (FH is "fadden's hi-res") is similar to LZ4 (http://lz4.org) in
+that the output is byte-oriented and has two kinds of chunks: "string of
+literals" and "match".  The format has been modified to make it easier
+(and faster) to decode on a 6502.
+
+As with LZ4, the goal is to get reasonable compression ratios with an
+extremely fast decoder.  On a CPU with 8-bit index registers, there is
+a distinct advantage to keeping copy lengths under 256 bytes.  Since the
+goal is to compress hi-res graphics, runs of identical bytes tend to be
+fairly short anyway -- the interleaved nature means that solid blocks of
+color aren't necessarily contiguous in memory -- so the ability to encode
+runs of arbitrary length adds baggage with little benefit.
+
+Files should use file type $08 (FOT) with auxtype $8066 (vendor-specific,
+0x66 is 'f').
+
+The format is very similar to LZ4, with a few key differences.  It
+retains the idea of encoding the lengths of the next literal string
+and next match in a single byte (4 bits each), so it is most efficient
+when matches and literals alternate.
+
+ file:
+  1 byte : 0x66 - format magic number for version 1
+    Not strictly necessary, but gives a hint if the images end up on
+    a DOS 3.3 disk where there's no dedicated file type.
+  [ ...one or more chunks follow... ]
+
+ chunk:
+  1 byte : length of literal string (hi 4 bits) and match (lo 4 bits)
+    A literal-string len of zero indicates no literals (match follows
+    match).  A literal-string len of 15 indicates that the match is
+    at least 15, and the next byte must be added to it.  The match
+    len is stored as (length - 4), allowing us to represent a match
+    of length 4 to 18 with 4 bits.  A match len of 15 indicates that an
+    additional byte is needed.
+  1 byte : (optional) continuation of literal len, 0 - 240
+    Add 15 to get a match length of 15 - 255.
+  N bytes: 0 - 255 literal values
+
+  1 byte : (optional) continuation of match len, 0 - 236 -or- 253/254
+    Add 15 to get 15-251.  Factoring in the minimum match length of 4
+    yields 19 - 255.  A value of 253 indicates no match (literals
+    follow literals).  This is generally very rare, and is actually
+    impossible if we overwrite the screen holes as that will guarantee
+    a match every 120 literals.  A value of 254 indicates end-of-data.
+  2 bytes: (if match) offset to match
+    The offset is from the start of the output buffer, *not* back
+    from the current position.  That way, if we're writing the output
+    to $2000, instead of doing a 16-bit subtraction we can just
+    ORA #$20 into the high byte.
+
+We could save a byte by limiting the match distance to 8 bits (and probably
+making it relative to the current position), but the interleaved layout of
+the hi-res screen tends to spread things apart.  It won't really improve
+our speed, which is what we're mostly concerned with.
+
+The use of an explicit end indicator means we don't have to constantly
+check to see if we've consumed enough input or produced enough output.
+Unlike LZ4, we need to support adjacent runs of literals, so we already
+need a special-case check on the match length.  It also means we can
+choose to trim the file to $1ff8, losing the final "hole", or retain the
+original file length.
+
+Note that, in LZ4, the match offset comes before any optional match
+length extensions, while in LZ4FH it comes after.  This allows the match
+offset to be omitted when there's no match.  (This was not useful in LZ4
+because literals-follow-literals doesn't occur.)
+
+Expansion of uncompressible data is possible, but minimal.  The worst
+case is a file with no matches.  We add three bytes of overhead for
+every 255 literals (4/4 byte, 1 for literal len extension, 1 for match
+len extension that holds the "no match" symbol).  Globally we add +1 for
+the magic number.  The "end-of-data" symbol replaces the "no match"
+symbol, so overall it's int(ceil(8192/255)) * 33 + 1 = 100 bytes.
+
+*/
+/*
+Implementation notes:
+
+The compression code uses an exhaustive brute-force search for matches.
+The "greedy" approach is very slow, the "optimal" approach is extremely
+slow.  It executes quickly on a modern machine, but would take a long
+time to run on an Apple II.  On the bright side, with "greedy" parsing
+it uses very little memory, and an optimized 6502/65816 implementation
+might run in a reasonable amount of time.
+
+Unrelated to the compression is the handling of the "screen holes".
+Of the hi-res screens 8192 bytes, 512 are invisible.  We can teach the
+compression code to skip over them, but that will require additional
+code and will interrupt our literal/match strings every 120 bytes, so
+it's better to alter the contents of the holes so that they blend into
+the surrounding data and handle them as a match string.
+
+Sometimes filling holes with a nearby pattern is not a win.  This is
+particularly noticeable for the old digitized images in the "contrib"
+folder, which have widely varying pixel values near the edges.  It turns
+out we do slightly better by zeroing the holes out, which allows them
+to match previous holes.  Also, sometimes there are patterns in the
+file that happen to match eight zeroes followed by a splash of color.
+
+Generally speaking the difference in output size is a few dozen bytes,
+though in rare cases it can noticeably improve (-200) or cost (+50).
+We resolve this conundrum by compressing the file twice and using whichever
+works best.
+
+*/
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+
+enum ProgramMode {
+    MODE_UNKNOWN, MODE_COMPRESS, MODE_UNCOMPRESS, MODE_TEST
+};
+
+#define MAX_SIZE            8192
+#define MIN_SIZE            (MAX_SIZE - 8)  // without final screen hole
+#define MAX_EXPANSION       100             // ((MAX_SIZE/255)+1) * 3 + 1
+
+#define MIN_MATCH_LEN       4
+#define MAX_MATCH_LEN       255
+#define MAX_LITERAL_LEN     255
+#define INITIAL_LEN         15
+
+#define EMPTY_MATCH_TOKEN   253
+#define EOD_MATCH_TOKEN     254
+
+#define LZ4FH_MAGIC         0x66
+
+//#define DEBUG_MSGS
+#ifdef DEBUG_MSGS
+# define DBUG(x) printf x
+#else
+# define DBUG(x)
+#endif
+
+/*
+ * Print usage info.
+ */
+static void usage(const char* argv0)
+{
+    fprintf(stderr, "fhpack v1.0 by Andy McFadden -*- ");
+    fprintf(stderr, "Copyright 2015 by faddenSoft\n");
+    fprintf(stderr,
+        "Source code available from https://github.com/fadden/fhpack\n\n");
+    fprintf(stderr, "Usage:\n");
+    fprintf(stderr, "  fhpack {-c|-d} [-h] [-1|-9] infile outfile\n\n");
+    fprintf(stderr, "  fhpack {-t} [-h] [-1|-9] infile1 [infile2...] \n\n");
+    fprintf(stderr, "Use -c to compress, -d to decompress, -t to test\n");
+    fprintf(stderr, " -h: don't fill or remove hi-res screen holes\n");
+    fprintf(stderr, " -9: high compression (default)\n");
+    fprintf(stderr, " -1: fast compression\n");
+    fprintf(stderr, "\n");
+    fprintf(stderr, "Example: fhpack -c foo.pic foo.lz4fh\n");
+}
+
+
+/*
+ * Zero out the "screen holes".
+ */
+void zeroHoles(uint8_t* inBuf)
+{
+    uint8_t* inPtr = inBuf + 120;
+
+    while (inPtr < inBuf + MAX_SIZE) {
+        memset(inPtr, 0, 8);
+        inPtr += 128;
+    }
+}
+
+/*
+ * Fill in the "screen holes" in the image.  The hi-res page has
+ * three 40-byte chunks of visible data, followed by 8 bytes of unseen
+ * data (padding it to 128).
+ *
+ * Instead of simply zeroing them out, we want to examine the data that
+ * comes before and after, copying whichever seems best into the hole.
+ * If there's a repeating color pattern (2a 55 2a 55), the hole just
+ * becomes part of the string, and will be handled as part of a long match.
+ *
+ * We can match the bytes that appear before or after the hole.
+ * Ideally we'd use whichever yields the longest run.
+ *
+ * "inBuf" holds MAX_SIZE bytes.
+ */
+void fillHoles(uint8_t* inBuf)
+{
+    uint8_t* inPtr = inBuf + 120;
+    while (inPtr < inBuf + MAX_SIZE) {
+        // check to see if the bytes that follow are a better match
+        // ("greedy" parsing can be suboptimal)
+        uint8_t* checkp = inPtr + 8;
+        bool useAfter = false;
+        if (checkp < inBuf + MAX_SIZE) {
+            if (checkp[0] == checkp[2] && checkp[1] == checkp[3]) {
+                DBUG(("  bytes-after looks good at +0x%04lx\n",
+                        checkp - inBuf));
+                useAfter = true;
+            } else {
+                DBUG(("  bytes-before used at +0x%04lx\n", checkp - inBuf));
+            }
+        } else {
+            DBUG(("  bytes-before used at end +0x%04lx\n", checkp - inBuf));
+        }
+
+        // Do an 8-byte overlapping copy.  We can overlap by 2 bytes
+        // or 4 bytes depending on whether we want a 16-bit or 32-bit
+        // repeating pattern.
+        if (useAfter) {
+            for (int i = 7; i >= 0; i--) {
+                inPtr[i] = inPtr[i + 2];
+            }
+        } else {
+            for (int i = 0; i < 8; i++) {
+                inPtr[i] = inPtr[i - 2];
+            }
+        }
+
+        inPtr += 128;
+    }
+}
+
+/*
+ * Computes the number of characters that match.  Stops when it finds
+ * a mismatching byte, or "count" is reached.
+ */
+size_t getMatchLen(const uint8_t* str1, const uint8_t* str2, size_t count)
+{
+    size_t matchLen = 0;
+    while (count-- && *str1++ == *str2++) {
+        matchLen++;
+    }
+    return matchLen;
+}
+
+/*
+ * Finds a match for the string at "matchPtr", in the buffer pointed
+ * to by "inBuf" with length "inLen".  "matchPtr" must be inside "inBuf".
+ *
+ * We explicitly allow data to copy over itself, so a run of 200 0x00
+ * bytes could be represented by a literal 0x00 followed immediately
+ * by a match of length 199.  We do need to ensure that the initial
+ * literal(s) go out first, though, so we use "maxStartOffset" to
+ * restrict where matches may be found.
+ *
+ * Returns the length of the longest match found, with the match
+ * offset in "*pMatchOffset".
+ */
+size_t findLongestMatch(const uint8_t* matchPtr, const uint8_t* inBuf,
+    size_t inLen, size_t* pMatchOffset)
+{
+    size_t maxStartOffset = matchPtr - inBuf;
+    size_t longest = 0;
+    size_t longestOffset = 0;
+    DBUG(("  findLongestMatch: maxSt=%zd\n", maxStartOffset));
+
+    // Brute-force scan through the buffer.  Start from the beginning,
+    // and continue up to the point we've generated until now.  (We
+    // can't search the *entire* buffer right away because the decoder
+    // can only copy matches from previously-decoded data.)
+    for (size_t ii = 0; ii < maxStartOffset; ii++) {
+        // Limit the length of the match by the length of the buffer.
+        // We don't want the match code to go wandering off the end.
+        // The match source is always earlier than matchPtr, so we
+        // want to cap the length based on the distance from matchPtr
+        // to the end of the buffer.
+        size_t maxMatchLen = inLen - (matchPtr - inBuf);
+        if (maxMatchLen > MAX_MATCH_LEN) {
+            maxMatchLen = MAX_MATCH_LEN;
+        }
+        if (maxMatchLen < MIN_MATCH_LEN) {
+            // too close to end of buffer, no point continuing
+            break;
+        }
+
+        //DBUG(("  maxMatchLen is %zd\n", maxMatchLen));
+
+        size_t matchLen = getMatchLen(matchPtr, inBuf + ii, maxMatchLen);
+        if (matchLen > longest) {
+            longest = matchLen;
+            longestOffset = ii;
+        }
+        if (matchLen == maxMatchLen) {
+            // Not going to find a longer one -- any future matches
+            // will be the same length or shorter.
+            break;
+        }
+    }
+
+
+    *pMatchOffset = longestOffset;
+    return longest;
+}
+
+/*
+ * Compress a buffer, from "inBuf" to "outBuf".
+ *
+ * The input buffer holds between MIN_SIZE and MAX_SIZE bytes (inclusive),
+ * depending on the length of the source material and whether or not
+ * we're attempting to preserve the screen holes.
+ *
+ * Returns the amount of data in "outBuf" on success, or 0 on failure.
+ */
+size_t compressBufferOptimally(uint8_t* outBuf, const uint8_t* inBuf,
+    size_t inLen)
+{
+    // Optimal parsing for data compression is a lot like computing the
+    // shortest distance between two points in a directed graph.  For
+    // each location, there are two possible "paths": a literal at this
+    // point, which advances us one byte forward, or a match at this
+    // point, which takes us several bytes forward.
+    //
+    // We walk through the file backward.  At each position, we compute
+    // whether or not a match exists, and then determine the length from
+    // the current position to the end depending on whether we handle
+    // the value as a literal or the start of a match.  When we reach the
+    // start of the file, we generate output by walking forward, selecting
+    // the path based on whether a literal or match results in the best
+    // outcome.
+    struct OptNode {
+        size_t totalCost;           // running total "best" length
+        size_t matchLength;         // zero if no match or literal is best
+        size_t matchOffset;
+
+        size_t literalLength;       // running total of literal run length
+    };
+    OptNode* optList = (OptNode*) calloc(1, (inLen+1) * sizeof(OptNode));
+
+    //
+    // Pass 1: determine optimal path
+    //
+
+    for (unsigned int i = inLen - 1; i < inLen; i--) {
+        size_t costForMatch, costForLiteral;
+
+        // First consider the "match" path.  It doesn't matter what
+        // follows the match, as that has no local effect on the output
+        // length.
+        size_t matchOffset;
+        size_t longestMatch = findLongestMatch(inBuf + i, inBuf, inLen,
+                &matchOffset);
+        if (longestMatch < MIN_MATCH_LEN) {
+            // no match to consider; leave optList[] values at zero
+            costForMatch = MAX_SIZE * 2;   // arbitrary large value
+        } else {
+            // 4-14 bytes, fits in mixed-len byte
+            optList[i].matchLength = longestMatch;
+            optList[i].matchOffset = matchOffset;
+
+            // total is previous total + 3 for match
+            costForMatch = optList[i + longestMatch].totalCost + 3;
+            if (longestMatch >= INITIAL_LEN) {
+                costForMatch++;
+            }
+        }
+
+        // Now consider the "literal" path.  If the next node is a
+        // literal, we add on to the existing run.  If it's a match,
+        // we're a length-1 literal.
+        if (i == inLen - 1) {
+            // special-case start (essentially a 1-byte file)
+            optList[i].literalLength = 1;
+            optList[i].totalCost = 2;
+            costForLiteral = 2;        // mixed-len byte + literal
+        } else {
+            if (optList[i+1].matchLength != 0) {
+                // next is match
+                optList[i].literalLength = 1;
+                costForLiteral = 1;    // literal; mixed-len byte in match
+            } else if (optList[i+1].literalLength == MAX_LITERAL_LEN) {
+                // next is max-length literal, start a new one
+                optList[i].literalLength = 1;
+                costForLiteral = 3;    // mixed-len byte + literal + nomatch
+            } else {
+                // next is sub-max-length literal, join it
+                size_t newLiteralLen = optList[i+1].literalLength + 1;
+                optList[i].literalLength = newLiteralLen;
+                costForLiteral = 1;
+
+                if (newLiteralLen == INITIAL_LEN) {
+                    // just hit 15, now need the extension byte
+                    costForLiteral++;
+                }
+            }
+            costForLiteral += optList[i + 1].totalCost;
+        }
+
+        if (costForLiteral > costForMatch) {
+            // use the match
+            assert(longestMatch != 0);
+            optList[i].totalCost = costForMatch;
+            DBUG(("0x%04x use-mat [l=%zd m=%zd] (len=%zd off=0x%04zx) --> 0x%04zx\n",
+                    i, costForLiteral, costForMatch, longestMatch,
+                    matchOffset, optList[i].totalCost));
+        } else {
+            // use the literal -- zero the matchLength as a flag
+            optList[i].matchLength = 0;
+            optList[i].totalCost = costForLiteral;
+            DBUG(("0x%04x use-lit [l=%zd m=%zd] (len=%zd) --> 0x%04zx\n",
+                    i, costForLiteral, costForMatch, optList[i].literalLength,
+                    optList[i].totalCost));
+        }
+    }
+
+    // add one for the magic number; does not include end-of-data marker
+    // (which will be +1 if the last thing is a literal, +2 if a match)
+    size_t predictedLength = optList[0].totalCost + 1;
+    DBUG(("predicted length is %zd\n", predictedLength));
+
+
+    //
+    // Pass 2: generate output from optimal path
+    //
+
+    //const uint8_t* inPtr = inBuf;
+    uint8_t* outPtr = outBuf;
+
+    *outPtr++ = LZ4FH_MAGIC;
+
+    const uint8_t* literalSrcPtr = NULL;
+    size_t numLiterals = 0;
+
+    for (unsigned int i = 0; i < inLen; ) {
+        if (optList[i].matchLength == 0) {
+            // no match at this point, select literals
+            if (numLiterals != 0) {
+                // Previous entry was literals.  Because we parsed it
+                // backwards, we can end up with 32 literals followed
+                // by 255 literals, rather than the other way around.
+                DBUG(("  output literal-literal (%zd)\n", numLiterals));
+                if (numLiterals <= INITIAL_LEN) {
+                    *outPtr++ = (numLiterals << 4) | 0x0f;
+                } else {
+                    *outPtr++ = 0xff;
+                    *outPtr++ = numLiterals - INITIAL_LEN;
+                }
+                memcpy(outPtr, literalSrcPtr, numLiterals);
+                outPtr += numLiterals;
+                *outPtr++ = EMPTY_MATCH_TOKEN;
+            }
+            numLiterals = optList[i].literalLength;
+            literalSrcPtr = inBuf + i;
+
+            // advance to next node
+            i += numLiterals;
+        } else {
+            // found a match, output previous literals first
+            size_t longestMatch = optList[i].matchLength;
+            size_t matchOffset = optList[i].matchOffset;
+            size_t adjustedMatch = longestMatch - MIN_MATCH_LEN;
+
+            // Start by emitting the 4/4 length byte.
+            uint8_t mixedLengths;
+            if (adjustedMatch <= INITIAL_LEN) {
+                mixedLengths = adjustedMatch;
+            } else {
+                mixedLengths = INITIAL_LEN;
+            }
+            if (numLiterals <= INITIAL_LEN) {
+                mixedLengths |= numLiterals << 4;
+            } else {
+                mixedLengths |= INITIAL_LEN << 4;
+            }
+            DBUG(("  match len=%zd off=0x%04zx lits=%zd mix=0x%02x\n",
+                longestMatch, matchOffset, numLiterals,
+                mixedLengths));
+            *outPtr++ = mixedLengths;
+
+            // Output the literals, starting with the extended length.
+            if (numLiterals >= INITIAL_LEN) {
+                *outPtr++ = numLiterals - INITIAL_LEN;
+            }
+            memcpy(outPtr, literalSrcPtr, numLiterals);
+            outPtr += numLiterals;
+            numLiterals = 0;
+            literalSrcPtr = NULL;       // debug/sanity check
+
+            // Now output the match, starting with the extended length.
+            if (adjustedMatch >= INITIAL_LEN) {
+                *outPtr++ = adjustedMatch - INITIAL_LEN;
+            }
+            *outPtr++ = matchOffset & 0xff;
+            *outPtr++ = (matchOffset >> 8) & 0xff;
+
+            i += longestMatch;
+        }
+    }
+
+    // housekeeping check -- factor in end-of-data circumstances
+    predictedLength++;
+    if (numLiterals == 0) {
+        predictedLength++;
+    }
+
+    // Dump any remaining literals, with the end-of-data indicator
+    // in the match len.
+    if (numLiterals <= INITIAL_LEN) {
+        *outPtr++ = (numLiterals << 4) | 0x0f;
+    } else {
+        *outPtr++ = 0xff;
+        *outPtr++ = numLiterals - INITIAL_LEN;
+    }
+    memcpy(outPtr, literalSrcPtr, numLiterals);
+    outPtr += numLiterals;
+
+    *outPtr++ = EOD_MATCH_TOKEN;
+
+    DBUG(("Predicted length %zd, actual %ld\n",
+        predictedLength, outPtr - outBuf));
+
+    free(optList);
+    return outPtr - outBuf;
+}
+
+/*
+ * Compress a buffer, from "inBuf" to "outBuf".
+ *
+ * The input buffer holds between MIN_SIZE and MAX_SIZE bytes (inclusive),
+ * depending on the length of the source material and whether or not
+ * we're attempting to preserve the screen holes.
+ *
+ * Returns the amount of data in "outBuf" on success, or 0 on failure.
+ */
+size_t compressBufferGreedily(uint8_t* outBuf, const uint8_t* inBuf,
+    size_t inLen)
+{
+    const uint8_t* inPtr = inBuf;
+    uint8_t* outPtr = outBuf;
+
+    const uint8_t* literalSrcPtr = NULL;
+    size_t numLiterals = 0;
+
+    *outPtr++ = LZ4FH_MAGIC;
+
+    // Basic strategy: walk forward, searching for a match.  When we
+    // find one, output the literals then the match.
+    //
+    // If the literal would cause us to exceed the maximum literal
+    // length, output the previous literals with a "no match" indicator.
+    while (inPtr < inBuf + inLen) {
+        DBUG(("Loop: off 0x%08lx\n", inPtr - inBuf));
+
+        // sanity-check on MAX_EXPANSION value
+        assert(outPtr - outBuf < MAX_SIZE + MAX_EXPANSION);
+
+        size_t matchOffset;
+        size_t longestMatch = findLongestMatch(inPtr, inBuf, inLen,
+                &matchOffset);
+        if (longestMatch < MIN_MATCH_LEN) {
+            // No good match found here, emit as literal.
+            if (numLiterals == MAX_LITERAL_LEN) {
+                // We've maxed out the literal string length.  Emit
+                // the previously literals with an empty match indicator.
+                DBUG(("  max literals reached"));
+                *outPtr++ = 0xff;       // literal-len=15, match-len=15
+                *outPtr++ = MAX_LITERAL_LEN - INITIAL_LEN;  // 240
+                memcpy(outPtr, literalSrcPtr, numLiterals);
+                outPtr += numLiterals;
+
+                // Emit empty match indicator.
+                *outPtr++ = EMPTY_MATCH_TOKEN;
+
+                // Reset literal len, continue.
+                numLiterals = 0;
+            }
+            if (numLiterals == 0) {
+                // Start of run of literals.  Save pointer to data.
+                literalSrcPtr = inPtr;
+            }
+            numLiterals++;
+            inPtr++;
+        } else {
+            // Good match found.
+            size_t adjustedMatch = longestMatch - MIN_MATCH_LEN;
+
+            // Start by emitting the 4/4 length byte.
+            uint8_t mixedLengths;
+            if (adjustedMatch <= INITIAL_LEN) {
+                mixedLengths = adjustedMatch;
+            } else {
+                mixedLengths = INITIAL_LEN;
+            }
+            if (numLiterals <= INITIAL_LEN) {
+                mixedLengths |= numLiterals << 4;
+            } else {
+                mixedLengths |= INITIAL_LEN << 4;
+            }
+            DBUG(("  match len=%zd off=0x%04zx lits=%zd mix=0x%02x\n",
+                longestMatch, matchOffset, numLiterals,
+                mixedLengths));
+            *outPtr++ = mixedLengths;
+
+            // Output the literals, starting with the extended length.
+            if (numLiterals >= INITIAL_LEN) {
+                *outPtr++ = numLiterals - INITIAL_LEN;
+            }
+            memcpy(outPtr, literalSrcPtr, numLiterals);
+            outPtr += numLiterals;
+            numLiterals = 0;
+            literalSrcPtr = NULL;       // debug/sanity check
+
+            // Now output the match, starting with the extended length.
+            if (adjustedMatch >= INITIAL_LEN) {
+                *outPtr++ = adjustedMatch - INITIAL_LEN;
+            }
+            *outPtr++ = matchOffset & 0xff;
+            *outPtr++ = (matchOffset >> 8) & 0xff;
+            inPtr += longestMatch;
+        }
+    }
+
+    // Dump any remaining literals, with the end-of-data indicator
+    // in the match len.
+    if (numLiterals <= INITIAL_LEN) {
+        *outPtr++ = (numLiterals << 4) | 0x0f;
+    } else {
+        *outPtr++ = 0xff;
+        *outPtr++ = numLiterals - INITIAL_LEN;
+    }
+    memcpy(outPtr, literalSrcPtr, numLiterals);
+    outPtr += numLiterals;
+
+    *outPtr++ = EOD_MATCH_TOKEN;
+
+    return outPtr - outBuf;
+}
+
+/*
+ * Uncompress from "inBuf" to "outBuf".
+ *
+ * Given valid data, "inLen" is not necessary.  It can be used as an
+ * error check.
+ *
+ * Returns the uncompressed length on success, 0 on failure.
+ */
+size_t uncompressBuffer(uint8_t* outBuf, const uint8_t* inBuf, size_t inLen)
+{
+    uint8_t* outPtr = outBuf;
+    const uint8_t* inPtr = inBuf;
+
+    if (*inPtr++ != LZ4FH_MAGIC) {
+        fprintf(stderr, "Missing LZ4FH magic\n");
+        return 0;
+    }
+
+    while (true) {
+        uint8_t mixedLen = *inPtr++;
+
+        int literalLen = mixedLen >> 4;
+        if (literalLen != 0) {
+            if (literalLen == INITIAL_LEN) {
+                literalLen += *inPtr++;
+            }
+            DBUG(("Literals: %d\n", literalLen));
+            if ((outPtr - outBuf) + literalLen > (long) MAX_SIZE ||
+                    (inPtr - inBuf) + literalLen > (long) inLen) {
+                fprintf(stderr, "Buffer overrun\n");
+                return 0;
+            }
+            memcpy(outPtr, inPtr, literalLen);
+            outPtr += literalLen;
+            inPtr += literalLen;
+        } else {
+            DBUG(("Literals: none\n"));
+        }
+
+        int matchLen = mixedLen & 0x0f;
+        if (matchLen == INITIAL_LEN) {
+            uint8_t addon = *inPtr++;
+            if (addon == EMPTY_MATCH_TOKEN) {
+                DBUG(("Match: none\n"));
+                matchLen = - MIN_MATCH_LEN;
+            } else if (addon == EOD_MATCH_TOKEN) {
+                DBUG(("Hit end-of-data at 0x%04lx\n", outPtr - outBuf));
+                break;      // out of while
+            } else {
+                matchLen += addon;
+            }
+        }
+
+        matchLen += MIN_MATCH_LEN;
+        if (matchLen != 0) {
+            int matchOffset = *inPtr++;
+            matchOffset |= (*inPtr++) << 8;
+            DBUG(("Match: %d at %d\n", matchLen, matchOffset));
+            // Can't use memcpy() here, because we need to guarantee
+            // that the match is overlapping.
+            uint8_t* srcPtr = outBuf + matchOffset;
+            if ((outPtr - outBuf) + matchLen > MAX_SIZE ||
+                    (srcPtr - outBuf) + matchLen > MAX_SIZE) {
+                fprintf(stderr, "Buffer overrun\n");
+                return 0;
+            }
+            while (matchLen-- != 0) {
+                *outPtr++ = *srcPtr++;
+            }
+        }
+    }
+
+    if (inPtr - inBuf != (long) inLen) {
+        fprintf(stderr, "Warning: uncompress used only %ld of %zd bytes\n",
+                inPtr - inBuf, inLen);
+    }
+
+    return outPtr - outBuf;
+}
+
+/*
+ * Compress a file, from "inFileName" to "outFileName".
+ *
+ * Returns 0 on success.
+ */
+int compressFile(const char* outFileName, const char* inFileName,
+    bool doPreserveHoles, bool useGreedyParsing)
+{
+    int result = -1;
+    uint8_t inBuf1[MAX_SIZE];
+    uint8_t inBuf2[MAX_SIZE];
+    uint8_t verifyBuf[MAX_SIZE];
+    uint8_t outBuf1[MAX_SIZE + MAX_EXPANSION];
+    uint8_t outBuf2[MAX_SIZE + MAX_EXPANSION];
+    uint8_t* outBuf = NULL;
+    uint8_t* inBuf = NULL;
+    size_t outSize, sourceLen, uncompressedLen;
+    FILE* outfp = NULL;
+    FILE* infp;
+
+    infp = fopen(inFileName, "rb");
+    if (infp == NULL) {
+        perror("Unable to open input file");
+        return -1;
+    }
+
+    if (outFileName != NULL) {
+        outfp = fopen(outFileName, "wb");
+        if (outfp == NULL) {
+            perror("Unable to open output file");
+            fclose(infp);
+            return -1;
+        }
+    }
+
+    fseek(infp, 0, SEEK_END);
+    long fileLen = ftell(infp);
+    rewind(infp);
+    if (fileLen < MIN_SIZE || fileLen > MAX_SIZE) {
+        fprintf(stderr, "ERROR: input file is %ld bytes, must be %d - %d\n",
+            fileLen, MIN_SIZE, MAX_SIZE);
+        goto bail;
+    }
+
+    // Read data into buffer.
+    if (fread(inBuf1, 1, fileLen, infp) != (size_t) fileLen) {
+        perror("Failed while reading data");
+        goto bail;
+    }
+
+    if (doPreserveHoles) {
+        // Don't modify the input.
+        sourceLen = fileLen;        // retain original file length
+        if (useGreedyParsing) {
+            outSize = compressBufferGreedily(outBuf1, inBuf1, sourceLen);
+        } else {
+            outSize = compressBufferOptimally(outBuf1, inBuf1, sourceLen);
+        }
+        inBuf = inBuf1;
+        outBuf = outBuf1;
+    } else {
+        sourceLen = MIN_SIZE;       // always drop the last 8 bytes
+        memcpy(inBuf2, inBuf1, sourceLen);
+
+        // try it twice, with zero-filled holes and content-filled holes
+
+        size_t outSize1;
+        zeroHoles(inBuf1);
+        if (useGreedyParsing) {
+            outSize1 = compressBufferGreedily(outBuf1, inBuf1, sourceLen);
+        } else {
+            outSize1 = compressBufferOptimally(outBuf1, inBuf1, sourceLen);
+        }
+
+        size_t outSize2;
+        fillHoles(inBuf2);
+        if (useGreedyParsing) {
+            outSize2 = compressBufferGreedily(outBuf2, inBuf2, sourceLen);
+        } else {
+            outSize2 = compressBufferOptimally(outBuf2, inBuf2, sourceLen);
+        }
+
+        if (false) {     // save hole-punched output for examination
+            FILE* foo = fopen("HOLES", "wb");
+            fwrite(inBuf2, 1, MIN_SIZE, foo);
+            fclose(foo);
+        }
+
+        if (outSize1 <= outSize2) {
+            printf("  using zeroed-out holes (%zd vs. %zd)\n",
+                outSize1, outSize2);
+            outSize = outSize1;
+            inBuf = inBuf1;
+            outBuf = outBuf1;
+        } else {
+            printf("  using filled-in holes (%zd vs. %zd)\n",
+                outSize2, outSize1);
+            outSize = outSize2;
+            inBuf = inBuf2;
+            outBuf = outBuf2;
+        }
+    }
+
+    if (outSize == 0) {
+        fprintf(stderr, "Compression failed\n");
+        goto bail;
+    }
+    DBUG(("*** outSize is %zd\n", outSize));
+
+    // uncompress the data we just compressed
+    memset(verifyBuf, 0xcc, sizeof(verifyBuf));
+    uncompressedLen = uncompressBuffer(verifyBuf, outBuf, outSize);
+    if (uncompressedLen != sourceLen) {
+        fprintf(stderr, "ERROR: verify expanded %zd of expected %zd bytes\n",
+            uncompressedLen, sourceLen);
+        goto bail;
+    }
+
+    // byte-for-byte comparison
+    for (size_t ii = 0; ii < sourceLen; ii++) {
+        if (inBuf[ii] != verifyBuf[ii]) {
+            fprintf(stderr,
+                "ERROR: expansion mismatch (byte %zd, 0x%02x 0x%02x)\n",
+                ii, inBuf[ii], verifyBuf[ii]);
+            goto bail;
+        }
+    }
+    DBUG(("Verification succeeded\n"));
+
+    if (outfp != NULL) {
+        /* write the data */
+        if (fwrite(outBuf, 1, outSize, outfp) != outSize) {
+            perror("Failed while writing data");
+            goto bail;
+        }
+    } else {
+        // must be in test mode
+        printf("  success -- compressed len is %zd\n", outSize);
+    }
+
+    result = 0;
+
+bail:
+    fclose(infp);
+    if (outfp != NULL) {
+        fclose(outfp);
+    }
+    if (result != 0 && outFileName != NULL) {
+        unlink(outFileName);
+    }
+    return result;
+}
+
+/*
+ * Uncompress data from one file to another.
+ *
+ * Returns 0 on success.
+ */
+int uncompressFile(const char* outFileName, const char* inFileName)
+{
+    int result = -1;
+    uint8_t inBuf[MAX_SIZE + MAX_EXPANSION];
+    uint8_t outBuf[MAX_SIZE];
+    size_t outSize;
+    FILE* outfp = NULL;
+    FILE* infp;
+
+    infp = fopen(inFileName, "rb");
+    if (infp == NULL) {
+        perror("Unable to open input file");
+        return -1;
+    }
+
+    outfp = fopen(outFileName, "wb");
+    if (outfp == NULL) {
+        perror("Unable to open output file");
+        fclose(infp);
+        return -1;
+    }
+
+    fseek(infp, 0, SEEK_END);
+    long fileLen = ftell(infp);
+    rewind(infp);
+    if (fileLen < 10 || fileLen > MAX_SIZE + MAX_EXPANSION) {
+        // 10 just ensures we have enough for magic number, chunk, eod
+        fprintf(stderr, "ERROR: input file is %ld bytes, must be < %d\n",
+            fileLen, MAX_SIZE + MAX_EXPANSION);
+        goto bail;
+    }
+
+    // Read data into buffer.
+    if (fread(inBuf, 1, fileLen, infp) != (size_t) fileLen) {
+        perror("Failed while reading data");
+        goto bail;
+    }
+
+    outSize = uncompressBuffer(outBuf, inBuf, fileLen);
+    if (outSize == 0) {
+        goto bail;
+    }
+    DBUG(("*** outSize is %zd\n", outSize));
+
+    /* write the data */
+    if (fwrite(outBuf, 1, outSize, outfp) != outSize) {
+        perror("Failed while writing data");
+        goto bail;
+    }
+
+    result = 0;
+
+bail:
+    fclose(infp);
+    fclose(outfp);
+    if (result != 0) {
+        unlink(outFileName);
+    }
+    return result;
+}
+
+/*
+ * Process args.
+ */
+int main(int argc, char* argv[])
+{
+    ProgramMode mode = MODE_UNKNOWN;
+    bool doPreserveHoles = false;
+    bool useGreedyParsing = false;
+    bool wantUsage = false;
+    int opt;
+
+    while ((opt = getopt(argc, argv, "19cdth")) != -1) {
+        switch (opt) {
+        case '1':
+            useGreedyParsing = true;
+            break;
+        case '9':
+            useGreedyParsing = false;
+            break;
+        case 'c':
+            if (mode == MODE_UNKNOWN) {
+                mode = MODE_COMPRESS;
+            } else {
+                wantUsage = true;
+            }
+            break;
+        case 'd':
+            if (mode == MODE_UNKNOWN) {
+                mode = MODE_UNCOMPRESS;
+            } else {
+                wantUsage = true;
+            }
+            break;
+        case 't':
+            if (mode == MODE_UNKNOWN) {
+                mode = MODE_TEST;
+            } else {
+                wantUsage = true;
+            }
+            break;
+        case 'h':
+            doPreserveHoles = true;
+            break;
+        default:
+            usage(argv[0]);
+            return 2;
+        }
+    }
+
+    if (argc - optind < 1 ||
+        (mode != MODE_TEST && argc - optind != 2))
+    {
+        wantUsage = true;
+    }
+
+    if (mode == MODE_UNKNOWN || wantUsage) {
+        usage(argv[0]);
+        return 2;
+    }
+
+    const char* inFileName = argv[optind];
+    const char* outFileName = argv[optind+1];
+
+    int result = 0;
+    if (mode == MODE_COMPRESS) {
+        printf("Compressing %s -> %s\n", inFileName, outFileName);
+        result = compressFile(outFileName, inFileName, doPreserveHoles,
+                useGreedyParsing);
+    } else if (mode == MODE_UNCOMPRESS) {
+        printf("Expanding %s -> %s\n", inFileName, outFileName);
+        result = uncompressFile(outFileName, inFileName);
+    } else {
+        while (optind < argc) {
+            printf("Testing %s\n", argv[optind]);
+            result |= compressFile(NULL, argv[optind], doPreserveHoles,
+                    useGreedyParsing);
+            optind++;
+        }
+    }
+
+    return (result != 0);
+}
+
diff --git a/fhpack_disks.zip b/fhpack_disks.zip
new file mode 100644
index 0000000..8dc322f
Binary files /dev/null and b/fhpack_disks.zip differ
diff --git a/make-test-pic.cpp b/make-test-pic.cpp
new file mode 100644
index 0000000..eb23b75
--- /dev/null
+++ b/make-test-pic.cpp
@@ -0,0 +1,114 @@
+/*
+ * Generate some 8K images for fhpack testing.
+ * By Andy McFadden
+ * Version 1.0, August 2015
+ *
+ * Copyright 2015 by faddenSoft.  All Rights Reserved.
+ * See the LICENSE.txt file for distribution terms (Apache 2.0).
+ */
+#include <stdlib.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <stdio.h>
+
+const char* TEST_ALL_ZERO = "allzero#060000";
+const char* TEST_ALL_GREEN = "allgreen#060000";
+const char* TEST_NO_MATCH = "nomatch#060000";
+
+int main()
+{
+    FILE* fp;
+
+    if (access(TEST_ALL_ZERO, F_OK) == 0) {
+        printf("NOT overwriting %s\n", TEST_ALL_ZERO);
+    } else {
+        fp = fopen(TEST_ALL_ZERO, "w");
+        for (int i = 0; i < 8192; i++) {
+            putc('\0', fp);
+        }
+        fclose(fp);
+    }
+
+    if (access(TEST_ALL_GREEN, F_OK) == 0) {
+        printf("NOT overwriting %s\n", TEST_ALL_GREEN);
+    } else {
+        fp = fopen(TEST_ALL_GREEN, "w");
+        for (int i = 0; i < 4096; i++) {
+            putc(0x2a, fp);
+            putc(0x55, fp);
+        }
+        fclose(fp);
+    }
+
+    if (access(TEST_NO_MATCH, F_OK) == 0) {
+        printf("NOT overwriting %s\n", TEST_NO_MATCH);
+    } else {
+        fp = fopen(TEST_NO_MATCH, "w");
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic, fp);
+            putc(ic+1, fp);
+            putc(ic+2, fp);
+            putc(ic+3, fp);
+        }
+        // 1008
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic, fp);
+            putc(ic+2, fp);
+            putc(ic+1, fp);
+            putc(ic+3, fp);
+        }
+        // 2016
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic, fp);
+            putc(ic+1, fp);
+            putc(ic+3, fp);
+            putc(ic+2, fp);
+        }
+        // 3024
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic, fp);
+            putc(ic+3, fp);
+            putc(ic+2, fp);
+            putc(ic+1, fp);
+        }
+        // 4032
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic, fp);
+            putc(ic+3, fp);
+            putc(ic+1, fp);
+            putc(ic+2, fp);
+        }
+        // 5040
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic+1, fp);
+            putc(ic, fp);
+            putc(ic+2, fp);
+            putc(ic+3, fp);
+        }
+        // 6048
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic+1, fp);
+            putc(ic+2, fp);
+            putc(ic, fp);
+            putc(ic+3, fp);
+        }
+        // 7056
+        for (int ic = 0; ic < 252; ic++) {
+            putc(ic+1, fp);
+            putc(ic+2, fp);
+            putc(ic+3, fp);
+            putc(ic, fp);
+        }
+        // 8064
+        for (int ic = 0; ic < 32; ic++) {
+            putc(ic+2, fp);
+            putc(ic+1, fp);
+            putc(ic+3, fp);
+            putc(ic, fp);
+        }
+        fclose(fp);
+    }
+
+    return 0;
+}
+