Merge pull request #12 from KrisKennaway/fix-encoder

Modernize and fix some image quality bugs
2024-06-18 03:29:31 +00:00 · 2023-01-28 21:42:58 +00:00 · 2023-01-28 21:42:58 +00:00 · ee66fa0bcf
commit ee66fa0bcf
parent 4f6d7a796a a925a897a7
12 changed files with 264 additions and 170 deletions
--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
-# ]\[-Vision v0.2
+# ]\[-Vision v0.3

 Streaming video and audio for the Apple II.

@ -6,9 +6,9 @@ Streaming video and audio for the Apple II.
 Apple II hardware.

 Requires:
- 64K 6502 Apple II machine (only tested on //gs so far, but should work on older systems)
+- 64K 6502 Apple II machine (tested on //gs and //e but should also work on ]\[/]\[+)
 - [Uthernet II](http://a2retrosystems.com/products.htm) ethernet card
-  - AFAIK no emulators support this hardware so you'll need to run it on a real machine to see it in action
+  - AppleWin ([Windows](https://github.com/AppleWin/AppleWin) and [Linux](https://github.com/audetto/AppleWin)) and [Ample](https://github.com/ksherlock/ample) (Mac) emulate the Uthernet II.  ]\[-Vision has been confirmed to work with Ample.

 Dedicated to the memory of [Bob Bishop](https://www.kansasfest.org/2014/11/remembering-bob-bishop/), early pioneer of Apple II
 [video](https://www.youtube.com/watch?v=RiWE-aO-cyU) and [audio](http://www.faddensoftware.com/appletalker.png).
@ -17,6 +17,8 @@ Dedicated to the memory of [Bob Bishop](https://www.kansasfest.org/2014/11/remem

 Sample videos (recording of playback on Apple //gs with RGB monitor, or HDMI via VidHD)

+TODO: These are from older versions, for which quality was not as good.
+
 Double Hi-Res:
 - [Try getting this song out of your head](https://youtu.be/S7aNcyojoZI)
 - [Babylon 5 title credits](https://youtu.be/PadKk8n1xY8)
@ -28,8 +30,6 @@ Older Hi-Res videos:
 - [Paranoimia ft Max Headroom](https://youtu.be/wfdbEyP6v4o)
 - [How many of us still feel about our Apple II's](https://youtu.be/-e5LRcnQF-A)

-(These are from older versions, for which quality was not as good)
-
 There may be more on this [YouTube playlist](https://www.youtube.com/playlist?list=PLoAt3SC_duBiIjqK8FBoDG_31nUPB8KBM)

 ## Details
@ -40,7 +40,7 @@ This ends up streaming data at about 100KB/sec of which 56KB/sec are updates to

 The video frames are actually encoded at the original frame rate (or optionally by skipping frames), prioritizing differences in the screen content, so the effective frame rate is higher than this if only a fraction of the screen is changing between frames (which is the typical case). 

-I'm using the excellent (though under-documented ;) [BMP2DHR](http://www.appleoldies.ca/bmp2dhr/) to encode the input video stream into a sequence of memory maps, then post-processing the frame deltas to prioritize the screen bytes to stream in order to approximate these deltas as closely as possible within the timing budget. 
+I'm using the excellent (though under-documented ;) [BMP2DHR](https://github.com/digarok/b2d) to encode the input video stream into a sequence of memory maps, then post-processing the frame deltas to prioritize the screen bytes to stream in order to approximate these deltas as closely as possible within the timing budget. 

 ### KansasFest 2019 presentation

@ -50,27 +50,35 @@ TODO: link video once it is available.

 ## Installation

-This currently requires python3.7 because some dependencies (e.g. weighted-levenshtein) don't compile with 3.9+, and 3.8
-has a [bug](https://bugs.python.org/issue44439) in object pickling.  
+This currently requires python3.8 because some dependencies (e.g. weighted-levenshtein) don't compile with 3.9+.

 ```
-python3.7 -m venv venv
+python3.8 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
 ```

-To generate the data files required by the transcoder:
+Before you can run the transcoder you need to generate the data files it requires:

 ```
 % python transcoder/make_data_tables.py
 ```

-This takes about 3 hours on my machine.
+This is a one-time setup.  It takes about 90 minutes on my machine.

-TODO: download instructions
+## Sample videos
+
+Some sample videos are available [here](https://www.dropbox.com/sh/kq2ej63smrzruwk/AADZSaqbNuTwAfnPWT6r9TJra?dl=0) for
+streaming (see `server/server.py`)

 ## Release Notes

+### v0.3 (17 Jan 2023)
+
+- Fixed an image quality bug in the transcoder
+- Documentation/quality of life improvements to installation process
+- Stop using LFS to store the generated data files in git, they're using up all my quota
+
 ### v0.2 (19 July 2019)

 #### Transcoder
--- a/requirements.txt
+++ b/requirements.txt
@ -1,12 +1,30 @@
+appdirs==1.4.4
 audioread==3.0.0
+certifi==2022.12.7
+cffi==1.15.1
+charset-normalizer==3.0.1
 colormath==3.0.0
+decorator==5.1.1
 etaprogress==1.1.1
+idna==3.4
+importlib-metadata==6.0.0
+joblib==1.2.0
 librosa==0.9.2
-networkx==2.6.3
-numpy==1.21.6
+llvmlite==0.39.1
+networkx==3.0
+numba==0.56.4
+numpy==1.22.4  # Until colormath supports 1.23+
+packaging==23.0
 Pillow==9.4.0
-scikit-learn==1.0.2
+pooch==1.6.0
+pycparser==2.21
+requests==2.28.2
+resampy==0.4.2
+scikit-learn==1.2.0
 scikit-video==1.1.11
-scipy==1.7.3
+scipy==1.10.0
 soundfile==0.11.0
+threadpoolctl==3.1.0
+urllib3==1.26.14
 weighted-levenshtein==0.2.1
+zipp==3.11.0
--- a/transcoder/audio.py
+++ b/transcoder/audio.py
@ -55,8 +55,8 @@ class Audio:
            'float32').reshape((f.channels, -1), order='F')

        a = librosa.core.to_mono(data)
-        a = librosa.resample(a, f.samplerate,
-                             self.sample_rate,
+        a = librosa.resample(a, orig_sr=f.samplerate,
+                             target_sr=self.sample_rate,
                             res_type='scipy', scale=True).flatten()

        return a
@ -64,8 +64,8 @@ class Audio:
    def _normalization(self, read_bytes=1024 * 1024 * 10):
        """Read first read_bytes of audio stream and compute normalization.

-        We compute the 2.5th and 97.5th percentiles i.e. only 2.5% of samples
-        will clip.
+        We normalize based on the 0.5th and 99.5th percentiles, i.e. only <1% of
+        samples will clip.

        :param read_bytes:
        :return:
@ -77,7 +77,7 @@ class Audio:
                if len(raw) > read_bytes:
                    break
        a = self._decode(f, raw)
-        norm = np.max(np.abs(np.percentile(a, [2.5, 97.5])))
+        norm = np.max(np.abs(np.percentile(a, [0.5, 99.5])))

        return 16384. / norm

--- a/transcoder/data/.gitattributes
+++ b/transcoder/data/.gitattributes
@ -1 +0,0 @@
-*.bz2 filter=lfs diff=lfs merge=lfs -text
--- a/transcoder/data/DHGR_palette_0_edit_distance.pickle.bz2
+++ b/transcoder/data/DHGR_palette_0_edit_distance.pickle.bz2
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b47eadfdf8c8e16c6539f9a16ed0b5a393b17e0cbd03831aacda7f659e9522d6
-size 120830327
--- a/transcoder/data/DHGR_palette_5_edit_distance.pickle.bz2
+++ b/transcoder/data/DHGR_palette_5_edit_distance.pickle.bz2
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8c245981f91ffa89b47abdd1c9d646c2e79499a0c82c38c91234be0a59e52f1f
-size 118832545
--- a/transcoder/data/HGR_palette_0_edit_distance.pickle.bz2
+++ b/transcoder/data/HGR_palette_0_edit_distance.pickle.bz2
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3fd52feb08eb6f99b267a1050c68905f25d0d106ad7c2c63473cc0a0f6aa1b25
-size 224334626
--- a/transcoder/data/HGR_palette_5_edit_distance.pickle.bz2
+++ b/transcoder/data/HGR_palette_5_edit_distance.pickle.bz2
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:dbf83e3d0b6c7867ccf7ae1d55a6ed4e906409b08043dec514e1104cec95f0fc
-size 220565577
--- a/transcoder/make_data_tables.py
+++ b/transcoder/make_data_tables.py
@ -1,6 +1,5 @@
-import bz2
 import functools
-import pickle
+import os
 import sys
 from typing import Iterable, Type

@ -17,7 +16,7 @@ import screen


 PIXEL_CHARS = "0123456789ABCDEF"
-
+DATA_DIR = "transcoder/data"

 def pixel_char(i: int) -> str:
    return PIXEL_CHARS[i]
@ -39,7 +38,7 @@ class EditDistanceParams:
    # Smallest substitution value is ~20 from palette.diff_matrices, i.e.
    # we always prefer to transpose 2 pixels rather than substituting colours.
    # TODO: is quality really better allowing transposes?
-    transpose_costs = np.ones((128, 128), dtype=np.float64) * 100000  # 10
+    transpose_costs = np.ones((128, 128), dtype=np.float64)

    # These will be filled in later
    substitute_costs = np.zeros((128, 128), dtype=np.float64)
@ -113,7 +112,7 @@ def compute_edit_distance(
        edp: EditDistanceParams,
        bitmap_cls: Type[screen.Bitmap],
        nominal_colours: Type[colours.NominalColours]
-):
+) -> np.ndarray:
    """Computes edit distance matrix between all pairs of pixel strings.

    Enumerates all possible values of the masked bit representation from
@ -131,44 +130,45 @@ def compute_edit_distance(

    bitrange = np.uint64(2 ** bits)

-    edit = []
-    for _ in range(len(bitmap_cls.BYTE_MASKS)):
-        edit.append(
-            np.zeros(shape=np.uint64(bitrange * bitrange), dtype=np.uint16))
+    edit = np.zeros(
+        shape=(len(bitmap_cls.BYTE_MASKS), np.uint64(bitrange * bitrange)),
+        dtype=np.uint16)

-    # Matrix is symmetrical with zero diagonal so only need to compute upper
-    # triangle
-    bar = ProgressBar((bitrange * (bitrange - 1)) / 2, max_width=80)
+    bar = ProgressBar(
+        bitrange * (bitrange - 1) / 2 * len(bitmap_cls.PHASES), max_width=80)

    num_dots = bitmap_cls.MASKED_DOTS

    cnt = 0
    for i in range(np.uint64(bitrange)):
-        for j in range(i):
-            cnt += 1
+        pair_base = np.uint64(i) << bits
+        for o, ph in enumerate(bitmap_cls.PHASES):
+            # Compute this in the outer loop since it's invariant under j
+            first_dots = bitmap_cls.to_dots(i, byte_offset=o)
+            first_pixels = pixel_string(
+                colours.dots_to_nominal_colour_pixel_values(
+                    num_dots, first_dots, nominal_colours,
+                    init_phase=ph)
+            )

-            if cnt % 10000 == 0:
-                bar.numerator = cnt
-                print(bar, end='\r')
-                sys.stdout.flush()
+            # Matrix is symmetrical with zero diagonal so only need to compute
+            # upper triangle
+            for j in range(i):
+                cnt += 1
+                if cnt % 100000 == 0:
+                    bar.numerator = cnt
+                    print(bar, end='\r')
+                    sys.stdout.flush()

-            pair = (np.uint64(i) << bits) + np.uint64(j)
+                pair = pair_base + np.uint64(j)

-            for o, ph in enumerate(bitmap_cls.PHASES):
-                first_dots = bitmap_cls.to_dots(i, byte_offset=o)
                second_dots = bitmap_cls.to_dots(j, byte_offset=o)
-
-                first_pixels = pixel_string(
-                    colours.dots_to_nominal_colour_pixel_values(
-                        num_dots, first_dots, nominal_colours,
-                        init_phase=ph)
-                )
                second_pixels = pixel_string(
                    colours.dots_to_nominal_colour_pixel_values(
                        num_dots, second_dots, nominal_colours,
                        init_phase=ph)
                )
-                edit[o][pair] = edit_distance(
+                edit[o, pair] = edit_distance(
                    edp, first_pixels, second_pixels, error=False)

    return edit
@ -183,13 +183,17 @@ def make_edit_distance(
    """Write file containing (D)HGR edit distance matrix for a palette."""

    dist = compute_edit_distance(edp, bitmap_cls, nominal_colours)
-    data = "transcoder/data/%s_palette_%d_edit_distance.pickle.bz2" % (
-        bitmap_cls.NAME, pal.ID.value)
-    with bz2.open(data, "wb", compresslevel=9) as out:
-        pickle.dump(dist, out, protocol=pickle.HIGHEST_PROTOCOL)
+    data = "%s/%s_palette_%d_edit_distance.npz" % (
+        DATA_DIR, bitmap_cls.NAME, pal.ID.value)
+    np.savez_compressed(data, edit_distance=dist)


 def main():
+    try:
+        os.mkdir(DATA_DIR, mode=0o755)
+    except FileExistsError:
+        pass
+
    for p in palette.PALETTES.values():
        print("Processing palette %s" % p)
        edp = compute_substitute_costs(p)
--- a/transcoder/movie.py
+++ b/transcoder/movie.py
@ -6,6 +6,7 @@ import audio
 import frame_grabber
 import machine
 import opcodes
+import screen
 import video
 from palette import Palette
 from video_mode import VideoMode
@ -58,34 +59,54 @@ class Movie:
        :return:
        """
        video_frames = self.frame_grabber.frames()
-        main_seq = None
-        aux_seq = None
+        op_seq = None

        yield opcodes.Header(mode=self.video_mode)

+        last_memory_bank = self.aux_memory_bank
        for au in self.audio.audio_stream():
            self.ticks += 1
-            if self.video.tick(self.ticks):
+            new_video_frame = self.video.tick(self.ticks)
+            if new_video_frame:
                try:
                    main, aux = next(video_frames)
                except StopIteration:
                    break

-                if ((self.video.frame_number - 1) % self.every_n_video_frames
-                        == 0):
-                    print("Starting frame %d" % self.video.frame_number)
-                    main_seq = self.video.encode_frame(main, is_aux=False)
+                should_encode_frame = (
+                        (self.video.frame_number - 1) %
+                        self.every_n_video_frames == 0
+                )
+                if should_encode_frame:
+                    if self.video_mode == VideoMode.DHGR:
+                        target_pixelmap = screen.DHGRBitmap(
+                            main_memory=main,
+                            aux_memory=aux,
+                            palette=self.palette
+                        )
+                    else:
+                        target_pixelmap = screen.HGRBitmap(
+                            main_memory=main,
+                            palette=self.palette
+                        )

-                    if aux:
-                        aux_seq = self.video.encode_frame(aux, is_aux=True)
+                    print("Starting frame %d" % self.video.frame_number)
+                    op_seq = self.video.encode_frame(
+                        target_pixelmap, is_aux=self.aux_memory_bank)
+                    self.video.out_of_work = {True: False, False: False}
+
+            if self.aux_memory_bank != last_memory_bank:
+                # We've flipped memory banks, start new opcode sequence
+                last_memory_bank = self.aux_memory_bank
+                op_seq = self.video.encode_frame(
+                    target_pixelmap, is_aux=self.aux_memory_bank)

            # au has range -15 .. 16 (step=1)
            # Tick cycles are units of 2
            tick = au * 2  # -30 .. 32 (step=2)
            tick += 34  # 4 .. 66 (step=2)

-            (page, content, offsets) = next(
-                aux_seq if self.aux_memory_bank else main_seq)
+            (page, content, offsets) = next(op_seq)

            yield opcodes.TICK_OPCODES[(tick, page)](content, offsets)

--- a/transcoder/screen.py
+++ b/transcoder/screen.py
@ -268,6 +268,11 @@ class Bitmap:
            byte_offset, self.packed[page, packed_offset], value)
        self._fix_scalar_neighbours(page, packed_offset, byte_offset)

+        if is_aux:
+            self.aux_memory.write(page, offset, value)
+        else:
+            self.main_memory.write(page, offset, value)
+
    def _fix_scalar_neighbours(
            self,
            page: int,
@ -337,15 +342,13 @@ class Bitmap:

    @classmethod
    @functools.lru_cache(None)
-    def edit_distances(cls, palette_id: pal.Palette) -> List[np.ndarray]:
+    def edit_distances(cls, palette_id: pal.Palette) -> np.ndarray:
        """Load edit distance matrices for masked, shifted byte values."""

-        data = "transcoder/data/%s_palette_%d_edit_distance.pickle.bz2" % (
-            cls.NAME,
-            palette_id.value
+        data = "transcoder/data/%s_palette_%d_edit_distance.npz" % (
+            cls.NAME, palette_id.value
        )
-        with bz2.open(data, "rb") as ed:
-            dist = pickle.load(ed)  # type: List[np.ndarray]
+        dist = np.load(data)['edit_distance']

        # dist is an upper-triangular matrix of edit_distance(a, b)
        # encoded as dist[(a << N) + b] = edit_distance(a, b)
@ -358,8 +361,8 @@ class Bitmap:
                (identity & np.uint64(2 ** cls.MASKED_BITS - 1)) <<
                cls.MASKED_BITS)

-        for i in range(len(dist)):
-            dist[i][transpose] += dist[i][identity]
+        for i in range(dist.shape[0]):
+            dist[i, transpose] += dist[i, identity]

        return dist

@ -445,6 +448,51 @@ class Bitmap:

        return diff

+    # TODO: combine with _diff_weights
+    # TODO: unit test
+    def _diff_weights_page(
+            self,
+            source_packed: np.ndarray,
+            target_packed: np.ndarray,
+            is_aux: bool,
+            content: np.uint8 = None
+    ) -> np.ndarray:
+        """Computes edit distance matrix from source_packed to self.packed
+
+        If content is set, the distance will be computed as if this value
+        was stored into each offset position of source_packed, i.e. to
+        allow evaluating which offsets (if any) should be chosen for storing
+        this content byte.
+        """
+
+        diff = np.ndarray((256,), dtype=np.int32)
+
+        offsets = self._byte_offsets(is_aux)
+
+        dists = []
+        for o in offsets:
+            if content is not None:
+                compare_packed = self.masked_update(o, source_packed, content)
+                self._fix_array_neighbours(compare_packed, o)
+            else:
+                compare_packed = source_packed
+
+            # Pixels influenced by byte offset o
+            source_pixels = self.mask_and_shift_data(compare_packed, o)
+            target_pixels = self.mask_and_shift_data(target_packed, o)
+
+            # Concatenate N-bit source and target into 2N-bit values
+            pair = (source_pixels << self.MASKED_BITS) + target_pixels
+            dist = self.edit_distances(self.palette)[o][pair].reshape(
+                pair.shape)
+            dists.append(dist)
+
+        # Interleave even/odd columns
+        diff[0::2] = dists[0]
+        diff[1::2] = dists[1]
+
+        return diff
+
    def _check_consistency(self):
        """Sanity check that headers and footers are consistent."""

@ -474,8 +522,9 @@ class Bitmap:
            assert ok

    # TODO: unit tests
-    def compute_delta(
+    def compute_delta_page(
            self,
+            page: int,
            content: int,
            diff_weights: np.ndarray,
            is_aux: bool
@ -490,10 +539,12 @@ class Bitmap:
        """
        # TODO: use error edit distance?

-        new_diff = self._diff_weights(self.packed, is_aux, content)
+        packed_page = self.packed[page, :].reshape(1, -1)

-        # TODO: try different weightings
-        return (new_diff * 5) - diff_weights
+        new_diff = self._diff_weights_page(
+            packed_page, packed_page, is_aux, content)
+
+        return new_diff - diff_weights


 class HGRBitmap(Bitmap):
@ -688,6 +739,7 @@ class HGRBitmap(Bitmap):
        return double

    @classmethod
+    @functools.lru_cache(None)
    def to_dots(cls, masked_val: int, byte_offset: int) -> int:
        """Convert masked representation to bit sequence of display dots.

--- a/transcoder/video.py
+++ b/transcoder/video.py
@ -27,7 +27,7 @@ class Video:
    ):
        self.mode = mode  # type: VideoMode
        self.frame_grabber = frame_grabber  # type: FrameGrabber
-        self.ticks_per_second = ticks_per_second  # type: float
+        self.ticks_per_second = float(ticks_per_second)  # type: float
        self.ticks_per_frame = (
                self.ticks_per_second / frame_grabber.input_frame_rate
        )  # type: float
@ -57,6 +57,10 @@ class Video:
        if self.mode == mode.DHGR:
            self.aux_update_priority = np.zeros((32, 256), dtype=np.int32)

+        # Indicates whether we have run out of work for the main/aux banks.
+        # Key is True for aux bank and False for main bank
+        self.out_of_work = {True: False, False: False}
+
    def tick(self, ticks: int) -> bool:
        """Keep track of when it is time for a new image frame."""

@ -67,7 +71,7 @@ class Video:

    def encode_frame(
            self,
-            target: screen.MemoryMap,
+            target: screen.Bitmap,
            is_aux: bool,
    ) -> Iterator[opcodes.Opcode]:
        """Converge towards target frame in priority order of edit distance."""
@ -91,30 +95,16 @@ class Video:
    def _index_changes(
            self,
            source: screen.MemoryMap,
-            target: screen.MemoryMap,
+            target_pixelmap: screen.Bitmap,
            update_priority: np.array,
-            is_aux: True
+            is_aux: bool
    ) -> Iterator[Tuple[int, int, List[int]]]:
        """Transform encoded screen to sequence of change tuples."""

-        if self.mode == VideoMode.DHGR:
-            if is_aux:
-                target_pixelmap = screen.DHGRBitmap(
-                    main_memory=self.memory_map,
-                    aux_memory=target,
-                    palette=self.palette
-                )
-            else:
-                target_pixelmap = screen.DHGRBitmap(
-                    main_memory=target,
-                    aux_memory=self.aux_memory_map,
-                    palette=self.palette
-                )
+        if self.mode == VideoMode.DHGR and is_aux:
+            target = target_pixelmap.aux_memory
        else:
-            target_pixelmap = screen.HGRBitmap(
-                main_memory=target,
-                palette=self.palette
-            )
+            target = target_pixelmap.main_memory

        diff_weights = target_pixelmap.diff_weights(self.pixelmap, is_aux)
        # Don't bother storing into screen holes
@ -124,11 +114,10 @@ class Video:
        # with new frame
        update_priority[diff_weights == 0] = 0
        update_priority += diff_weights
+        assert np.all(update_priority >= 0)

        priorities = self._heapify_priorities(update_priority)

-        content_deltas = {}
-
        while priorities:
            pri, _, page, offset = heapq.heappop(priorities)

@ -152,23 +141,14 @@ class Video:
            diff_weights[page, offset] = 0

            # Update memory maps
-            source.page_offset[page, offset] = content
            self.pixelmap.apply(page, offset, is_aux, content)

-            # Make sure we don't emit this offset as a side-effect of some
-            # other offset later.
-            for cd in content_deltas.values():
-                cd[page, offset] = 0
-                # TODO: what if we add another content_deltas entry later?
-                #  We might clobber it again
-
            # Need to find 3 more offsets to fill this opcode
            for err, o in self._compute_error(
                    page,
                    content,
                    target_pixelmap,
                    diff_weights,
-                    content_deltas,
                    is_aux
            ):
                assert o != offset
@ -180,13 +160,6 @@ class Video:
                    # Someone already resolved this diff.
                    continue

-                # Make sure we don't end up considering this (page, offset)
-                # again until the next image frame.  Even if a better match
-                # comes along, it's probably better to fix up some other byte.
-                # TODO: or should we recompute it with new error?
-                for cd in content_deltas.values():
-                    cd[page, o] = 0
-
                byte_offset = target_pixelmap.byte_offset(o, is_aux)
                old_packed = target_pixelmap.packed[page, o // 2]

@ -196,13 +169,11 @@ class Video:
                # Update priority for the offset we're emitting
                update_priority[page, o] = p

-                source.page_offset[page, o] = content
                self.pixelmap.apply(page, o, is_aux, content)
-
                if p:
                    # This content byte introduced an error, so put back on the
                    # heap in case we can get back to fixing it exactly
-                    # during this frame.  Otherwise we'll get to it later.
+                    # during this frame.  Otherwise, we'll get to it later.
                    heapq.heappush(
                        priorities, (-p, random.getrandbits(8), page, o))

@ -213,37 +184,71 @@ class Video:
            # Pad to 4 if we didn't find enough
            for _ in range(len(offsets), 4):
                offsets.append(offsets[0])
-            yield (page + 32, content, offsets)
+            yield page + 32, content, offsets

-        # # TODO: there is still a bug causing residual diffs when we have
-        # # apparently run out of work to do
-        if not np.array_equal(source.page_offset, target.page_offset):
-            diffs = np.nonzero(source.page_offset != target.page_offset)
-            for i in range(len(diffs[0])):
-                diff_p = diffs[0][i]
-                diff_o = diffs[1][i]
+        self.out_of_work[is_aux] = True

-                # For HGR, 0x00 or 0x7f may be visually equivalent to the same
-                # bytes with high bit set (depending on neighbours), so skip
-                # them
-                if (source.page_offset[diff_p, diff_o] & 0x7f) == 0 and \
-                        (target.page_offset[diff_p, diff_o] & 0x7f) == 0:
-                    continue
-
-                if (source.page_offset[diff_p, diff_o] & 0x7f) == 0x7f and \
-                        (target.page_offset[diff_p, diff_o] & 0x7f) == 0x7f:
-                    continue
-
-                print("Diff at (%d, %d): %d != %d" % (
-                    diff_p, diff_o, source.page_offset[diff_p, diff_o],
-                    target.page_offset[diff_p, diff_o]
-                ))
-                # assert False
+        # These debugging assertions validate that when we are out of work,
+        # our source and target representations should be identical.
+        #
+        # They only work correctly for palettes that do not have identical
+        # colours (e.g. IIGS but not NTSC which has two identical greys).
+        #
+        # The problem is that if we have substituted one grey for the other
+        # there may be no diff if they are part of an extended run of greys.
+        #
+        # The only difference is at the end of the run where these produce
+        # different artifact colours, but this may only be visible in the
+        # other bank.
+        #
+        # It may take several iterations of main/aux before we will notice and
+        # correct all of these differences.  That means we don't have a
+        # deterministic point in time when we can assert that all diffs should
+        # have been resolved.
+        # TODO: add flag to enable debug assertions
+        # if not np.array_equal(source.page_offset, target.page_offset):
+        #     diffs = np.nonzero(source.page_offset != target.page_offset)
+        #     for i in range(len(diffs[0])):
+        #         diff_p = diffs[0][i]
+        #         diff_o = diffs[1][i]
+        #
+        #         # For HGR, 0x00 or 0x7f may be visually equivalent to the same
+        #         # bytes with high bit set (depending on neighbours), so skip
+        #         # them
+        #         if (source.page_offset[diff_p, diff_o] & 0x7f) == 0 and \
+        #                 (target.page_offset[diff_p, diff_o] & 0x7f) == 0:
+        #             continue
+        #
+        #         if (source.page_offset[diff_p, diff_o] & 0x7f) == 0x7f and \
+        #                 (target.page_offset[diff_p, diff_o] & 0x7f) == 0x7f:
+        #             continue
+        #
+        #         print("Diff at (%d, %d): %d != %d" % (
+        #             diff_p, diff_o, source.page_offset[diff_p, diff_o],
+        #             target.page_offset[diff_p, diff_o]
+        #         ))
+        #         assert False
+        #
+        # # If we've finished both main and aux pages, there should be no residual
+        # # diffs in packed representation
+        # all_done = self.out_of_work[True] and self.out_of_work[False]
+        # if all_done and not np.array_equal(self.pixelmap.packed,
+        #                                    target_pixelmap.packed):
+        #     diffs = np.nonzero(
+        #         self.pixelmap.packed != target_pixelmap.packed)
+        #     print("is_aux: %s" % is_aux)
+        #     for i in range(len(diffs[0])):
+        #         diff_p = diffs[0][i]
+        #         diff_o = diffs[1][i]
+        #         print("(%d, %d): got %d want %d" % (
+        #             diff_p, diff_o, self.pixelmap.packed[diff_p, diff_o],
+        #             target_pixelmap.packed[diff_p, diff_o]))
+        #     assert False

        # If we run out of things to do, pad forever
        content = target.page_offset[0, 0]
        while True:
-            yield (32, content, [0, 0, 0, 0])
+            yield 32, content, [0, 0, 0, 0]

    @staticmethod
    def _heapify_priorities(update_priority: np.array) -> List:
@ -254,7 +259,9 @@ class Video:
        pages, offsets = update_priority.nonzero()
        priorities = [tuple(data) for data in np.stack((
            -update_priority[pages, offsets],
-            # Don't use deterministic order for page, offset
+            # Don't use deterministic order for page, offset.  Otherwise,
+            # we get the "venetian blind" effect when filling large blocks of
+            # colour.
            np.random.randint(0, 2 ** 8, size=pages.shape[0]),
            pages,
            offsets)
@ -265,24 +272,21 @@ class Video:

    _OFFSETS = np.arange(256)

-    def _compute_error(self, page, content, target_pixelmap, diff_weights,
-                       content_deltas, is_aux):
+    def _compute_error(
+            self, page, content, target_pixelmap, diff_weights, is_aux):
        """Build priority queue of other offsets at which to store content.

        Ordered by offsets which are closest to the target content value.
        """
-        # TODO: move this up into parent
-        delta_screen = content_deltas.get(content)
-        if delta_screen is None:
-            delta_screen = target_pixelmap.compute_delta(
-                content, diff_weights, is_aux)
-            content_deltas[content] = delta_screen
-
-        delta_page = delta_screen[page]
+        delta_page = target_pixelmap.compute_delta_page(
+            page, content, diff_weights[page, :], is_aux)
        cond = delta_page < 0
        candidate_offsets = self._OFFSETS[cond]
        priorities = delta_page[cond]

+        # Don't use deterministic order for page, offset.  Otherwise,
+        # we get the "venetian blind" effect when filling large blocks of
+        # colour.
        deltas = [
            (priorities[i], random.getrandbits(8), candidate_offsets[i])
            for i in range(len(candidate_offsets))
@ -290,8 +294,8 @@ class Video:
        heapq.heapify(deltas)

        while deltas:
-            pri, _, o = heapq.heappop(deltas)
+            pri, _, offset = heapq.heappop(deltas)
            assert pri < 0
-            assert o <= 255
+            assert 0 <= offset <= 255

-            yield -pri, o
+            yield -pri, offset