Mark nogil functions as noexcept to avoid warning from newer cython (#15 )

Optimize palette initialization and NTSC image conversion (#13 )
About 2x faster end-to-end with default dhr conversion options
2023-10-29 14:43:55 +00:00 · 2023-02-26 00:00:39 +00:00 · 2023-02-25 21:21:43 +00:00 · 2023-02-03 00:42:11 +00:00 · 2023-02-03 00:40:32 +00:00 · 2023-01-31 21:32:02 +00:00
187 changed files with 2021 additions and 472 deletions
--- a/README.md
+++ b/README.md
@ -1,15 +1,25 @@
-# ][-pix
+# ][-pix 2.2

-][-pix is an image conversion utility targeting Apple II graphics modes, currently Double Hi-Res.
+][-pix is an image conversion utility targeting Apple II graphics modes, currently Hi-Res (all models), Double Hi-Res
+(enhanced //e, //c, //gs) and Super Hi-Res (//gs).

 ## Installation

 Requires:
 * python 3.x
-* [numpy](http://numpy.org/)
-* [cython](https://cython.org/)
-* [Pillow](https://python-pillow.org/)
 * [colour-science](https://www.colour-science.org/)
+* [cython](https://cython.org/)
+* [numpy](http://numpy.org/)
+* [Pillow](https://python-pillow.org/)
+* [pygame](https://www.pygame.org/)
+* [scikit-learn](https://scikit-learn.org/)
+
+These dependencies can be installed using the following command:
+
+```buildoutcfg
+# Install python dependencies
+pip install -r requirements.txt
+```

 To build ][-pix, run the following commands:

@ -17,33 +27,114 @@ To build ][-pix, run the following commands:
 # Compile cython code
 python setup.py build_ext --inplace

-# Precompute RGB/CAM16-UCS colour conversion matrix, used as part of image optimization
+# Precompute colour conversion matrices, used as part of image optimization
 python precompute_conversion.py
 ```

-## Usage
+# Usage

-Then, to convert an image, the simplest usage is:
+To convert an image, the basic command is:
+
+```bash
+python convert.py <mode> [<flags>] <input> <output>
+```
+where
+* `mode` is one of the following:
+    * `hgr` for Hi-Res Colour (560x192 but only half of the horizontal pixels may be independently controlled)
+    * `dhr` for Double Hi-Res Colour (560x192)
+    * `dhr_mono` for Double Hi-Res Mono (560x192)
+    * `shr` for Super Hi-Res (320x200)
+* `input` is the source image file to convert (e.g. `my-image.jpg`)
+* `output` is the output filename to produce (e.g. `my-image.dhr`)
+
+The following flags are supported in all modes:
+
+* `--show-input` Whether to show the input image before conversion. (default: False)
+* `--show-output` Whether to show the output image after conversion. (default: True)
+* `--save-preview` Whether to save a .PNG rendering of the output image (default: True)
+* `--verbose` Show progress during conversion (default: False)
+* `--gamma-correct` Gamma-correct image by this value (default: 2.4)
+
+For other available options, use `python convert.py <mode> --help`
+
+See below for mode-specific instructions.
+
+## Hi-Res
+
+To convert an image to Hi-Res the simplest usage is:

 ```buildoutcfg
-python convert.py --palette ntsc <input> <output.dhr>
+python convert.py hgr <input> <output.hgr>
+```
+
+`<output.hgr>` contains the hires image data in a form suitable for transfer to an Apple II disk image.
+
+TODO: document flags
+
+TODO: add more details about HGR - resolution and colour model.
+
+## Double Hi-Res
+
+To convert an image to Double Hi-Res (560x192, 16 colours but [it's complicated](docs/dhr.md)), the simplest usage is:
+
+```buildoutcfg
+python convert.py dhr --palette ntsc <input> <output.dhr>
 ```

 `<output.dhr>` contains the double-hires image data in a form suitable for transfer to an Apple II disk image.  The 16k output consists of 8k AUX data first, 8K MAIN data second (this matches the output format of other DHGR image converters).  i.e. if loaded at 0x2000, the contents of 0x2000..0x3fff should be moved to 0x4000..0x5fff in AUX memory, and the image can be viewed on DHGR page 2.

 By default, a preview image will be shown after conversion, and saved as `<output>-preview.png`

-For other available options, use `python convert.py --help`
-
 TODO: document flags

-## Examples
+For more details about Double Hi-Res graphics and the conversion process, see [here](docs/dhr.md).

-See [here](examples/gallery.md) for more sample image conversions.
+## Super Hi-Res
+
+To convert an image to Super Hi-Res (320x200, up to 256 colours), the simplest usage is:
+
+```buildoutcfg
+python convert.py shr <input> <output.shr>
+```
+
+i.e. no additional options are required.  In addition to the common flags described above, these additional flags are
+supported for `shr` conversions:
+* `--save-intermediate` Whether to save each intermediate iteration, or just the final image (default: False)
+* `--fixed-colours` How many colours to fix as identical across all 16 SHR palettes. (default: 0)
+* `--show-final-score` Whether to output the final image quality score (default: False)
+
+TODO: link to KansasFest 2022 talk slides/video for more details
+
+# Examples
+
+## Hi-Res
+
+This image was generated using
+
+```buildoutcfg
+python convert.py hgr examples/hgr/mandarin-duck.jpg examples/hgr/mandarin-duck.bin
+```
+The image on the right is a screenshot taken from OpenEmulator.
+
+| ![Mandarin duck](examples/hgr/mandarin-duck.jpg) | ![Mandarin duck](examples/hgr/mandarin-duck-openemulator.png) |
+|--------------------------------------------------|---------------------------------------------------------------|
+
+(Source: [Adrian Pingstone](https://commons.wikimedia.org/wiki/File:Mandarin.duck.arp.jpg), public domain, via Wikimedia Commons)
+
+| ![Portrait](examples/hgr/portrait.jpg) | ![Portrait](examples/hgr/portrait-openemulator.png) |
+|---|---|
+
+(Source: [Devanath](https://www.pikist.com/free-photo-srmda/fr), public domain)
+
+TODO: add more hi-res images
+
+## Double Hi-Res
+
+See [here](examples/dhr/gallery.md) for more sample Double Hi-Res image conversions.

 ### Original

-![Two colourful parrots sitting on a branch](examples/parrots-original.png)
+![Two colourful parrots sitting on a branch](examples/dhr/parrots-original.png)

 (Source: [Shreygadgil](https://commons.wikimedia.org/wiki/File:Vibrant_Wings.jpg), [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0), via Wikimedia Commons)

@ -52,18 +143,18 @@ See [here](examples/gallery.md) for more sample image conversions.
 This image was generated using

 ```buildoutcfg
-python convert.py --lookahead 8 --palette openemulator examples/parrots-original.png examples/parrots-iipix-openemulator.dhr
+python convert.py dhr --lookahead 8 --palette openemulator examples/dhr/parrots-original.png examples/dhr/parrots-iipix-openemulator.dhr
 ```

 The resulting ][-pix preview PNG image is shown here.

-![Two colourful parrots sitting on a branch](examples/parrots-iipix-openemulator-preview.png)
+![Two colourful parrots sitting on a branch](examples/dhr/parrots-iipix-openemulator-preview.png)

 ### OpenEmulator screenshot

 This is a screenshot taken from OpenEmulator when viewing the Double Hi-res image.

-![Two colourful parrots sitting on a branch](examples/parrots-iipix-openemulator-openemulator.png)
+![Two colourful parrots sitting on a branch](examples/dhr/parrots-iipix-openemulator-openemulator.png)

 Some difference in colour tone is visible due to blending of colours across pixels (e.g. brown blending into grey, in the background).  This is due to the fact that OpenEmulator simulates the reduced chroma bandwidth of the NTSC signal.

@ -71,287 +162,51 @@ Some difference in colour tone is visible due to blending of colours across pixe

 This is an OpenEmulator screenshot of the same image converted with `--palette=ntsc` instead of `--palette=openemulator`.  Colour match to the original is substantially improved, and more colour detail is visible, e.g. in the shading of the background.

-![Two colourful parrots sitting on a branch](examples/parrots-iipix-ntsc-openemulator.png)
+![Two colourful parrots sitting on a branch](examples/dhr/parrots-iipix-ntsc-openemulator.png)

+## Super Hi-Res

-## Some background on Apple II Double Hi-Res graphics
-
-Like other (pre-//gs) Apple II graphics modes, Double Hi-Res relies on [NTSC Artifact Colour](https://en.wikipedia.org/wiki/Composite_artifact_colors), which means that the colour of a pixel is entirely determined by its horizontal position on the screen, and the on/off status of preceding horizontal pixels.
-
-In Double Hi-Res mode, the 560 horizontal pixels per line are individually addressable.  This is an improvement over the (single) Hi-Res mode, which also has 560 horizontal pixels, but which can only be addressed in groups of two (with an option to shift blocks of 7 pixels each by one dot).  See _Assembly Lines: The Complete Book_ (Wagner) for a detailed introduction to this, or _Understanding the Apple IIe_ (Sather) for a deep technical discussion.
-
-Double Hi-Res is usually characterized as being capable of producing 16 display colours, but with heavy restrictions on how these colours can be arranged horizontally.
-
-### Naive model: 140x192x16
-
-One simple model for Double Hi-Res graphics is to only treat the display in groups of 4 horizontal pixels, which gives an effective resolution of 140x192 in 16 colours (=2^4).  These 140 pixel colours can be chosen independently, which makes this model easy to think about and to work with (e.g. when creating images by hand).  However the resulting images will exhibit (sometimes severe) colour interference/fringing effects when two colours are next to one another, because the underlying hardware does not actually work this way.  See below for an example image conversion, showing the unwanted colour fringing that results. 
-
-### Simplest realistic model: 560 pixels, 4-pixel colour
-
-A more complete model for thinking about DHGR comes from looking at how the NTSC signal produces colour on the display.
-The [NTSC chrominance subcarrier](https://en.wikipedia.org/wiki/Chrominance_subcarrier) completes one complete phase cycle in the time taken to draw 4 horizontal dots.  The colours produced are due to the interactions of the pixel luminosity (on/off) relative to this NTSC chroma phase.
-
-What this means is that the colour of each of the 560 horizontal pixels is determined by the current pixel value (on/off), the current X-coordinate modulo 4 (X coordinate relative to NTSC phase), as well as the on-off status of the pixels to the left of it.
-
-The simplest approximation is to only look at the current pixel value and the 3 pixels to the left, i.e. to consider a sliding window of 4 horizontal pixels moving across the screen from left to right.  Within this window, we have one pixel for each of the 4 values of NTSC phase (x % 4, ranging from 0 .. 3).  The on-off values for these 4 values of NTSC phase determine the colour of the pixel.  See [here](https://docs.google.com/presentation/d/1_eqBknG-4-llQw3oAOmPO3FlawUeWCeRPYpr_mh2iRU/edit) for more details.
-
-This model allows us to understand and predict the interference behaviour when two "140px" colours are next to each other, and to go beyond this "140px" model to take more advantage of the true 560px horizontal resolution.
-
-If we imagine drawing pixels from left to right across the screen, at each pixel we only have *two* accessible choices of colour: those resulting from turning the current pixel on, or off.  Which two particular colours are produced are determined by the pixels already drawn to the left (the immediate 3 neighbours, in our model).  One of these possibilities will always be the same as the pixel colour to the left (the on/off pixel choice corresponding to the value that just "fell off the left side" of the sliding window), and the other choice is some other colour from our palette of 16.
-
-This can be summarized in a chart, showing the possible colour transitions depending on the colour of the pixel to the immediate left, and the value of x%4.
-
-![Double hi-res colour transitions](docs/Double_Hi-Res_colour_transitions.png)
-
-So, if we want to transition from one colour to a particular new colour, it may take up to 4 horizontal pixels before we are able to achieve it (e.g. transitioning all the way from black (0000) to white (1111)).  In the meantime we have to transition through up to 2 other colours.  Depending on the details of the image we are aiming for, this may either produce unwanted visual noise, or can actually be beneficial (e.g. if the colour we want is available immediately at the next pixel)
-
-These constraints are difficult to work with when constructing DHGR graphics "by hand", but we can account for them programmatically in our image conversion to take full advantage of the "true" 560px resolution while accounting for colour interference effects.
-
-#### Limitations of this colour model
-
-In practise the above description of the Apple II colour model is still only an approximation.  On real hardware, the video signal is a continuous analogue signal, and colour is continuously modulated rather than producing discretely-coloured pixels with fixed colour values.
-
-More importantly, in an NTSC video signal the colour (chroma) signal has a lower bandwidth than the luma (brightness) signal ([Chroma sub-sampling](https://en.wikipedia.org/wiki/Chroma_subsampling)), which means that colours will tend to bleed across more than 4 pixels.  However our simple "4-pixel chroma bleed" model already produces good results, and exactly matches the implementation behaviour of some emulators, e.g. Virtual II.
-
-### NTSC emulation and 8-pixel colour
-
-By simulating the NTSC (Y'UV) signal directly we are able to recover the Apple II colour output from "first principles".  Here are the 16 "basic" DHGR colours, obtained using saturation/hue parameters tuned to match OpenEmulator's NTSC implementation, and allowing chroma to bleed across 4 pixels.
-
-![NTSC colours with 4 pixel chroma bleed](docs/ntsc-colours-chroma-bleed-4.png)
-
-However in real NTSC, chroma bleeds over more than 4 pixels, which means that we actually have more than 2^4 colours available to work with.
-
-This means that **when viewed on a composite colour display, Double Hi-Res graphics is not just a 16-colour graphics mode!**
-
-If we allow the NTSC chroma signal to bleed over 8 pixels instead of 4, then the resulting colour is determined by sequences of 8 pixels instead of 4 pixels, i.e. there are 2^8 = 256 possibilities.  In practise many of these result in the same output colour, and (with this approximation) there are only 85 unique colours available.  However this is still a marked improvement on the 16 "basic" DHGR colours:
-
-![NTSC colours with 8 pixel chroma bleed](docs/ntsc-colours-chroma-bleed-8.png)
-
-The "extra" DHGR colours are only available on real hardware, or an emulator that implements NTSC chroma sub-sampling (such as OpenEmulator).   But the result is that on such targets a much larger range of colours is available for use in image conversion.  However the restriction still exists that any given pixel only has a choice of 2 colours available (as determined by the on/off state of pixels to the left).
-
-In practise this gives much better image quality, especially when shading areas of similar colour.  The Apple II is still unable to directly modulate the luma (brightness) NTSC signal component, so areas of low or high brightness still tend to be heavily dithered.  This is because there are more bit sequences that have the number of '1' bits close to the average than there are at the extremes, so there are correspondingly few available colours that are very bright or very dark.
-
-These 85 unique double hi-res colours produced by the ][-pix NTSC emulation are not the definitive story - though they're closer to it than the usual story that double hi-res is a 16-colour graphics mode.  The implementation used by ][-pix is the simplest one: the Y'UV signal is averaged with a sliding window of 4 pixels for the Y' (luma) component and 8 pixels for the UV (chroma) component.
-
-The choice of 8 pixels is not strictly correct - e.g. the chroma bandwidth (~0.6MHz) is much less than half of luma bandwidth (~2Mhz) so the signal bleeds over more than twice as many pixels; but also decays in a more complex way than the simple step function sliding window chosen here.  In practise using 8 pixels is a good compromise between ease of implementation, runtime performance and fidelity.
-
-By contrast, OpenEmulator uses a more complex (and realistic) band-pass filtering to produce its colour output, which presumably allows even more possible colours (physical hardware will also produce its own unique results, depending on the hardware implementation of the signal decoding, and other physical characteristics).  I expect that most of these will be small variations on the above though; and in practise the ][-pix NTSC implementation already produces a close colour match for the OpenEmulator behaviour.
-
-#### Examples of NTSC images
-
-(Source: [Reinhold Möller](https://commons.wikimedia.org/wiki/File:Nymphaea_caerulea-20091014-RM-115245.jpg), [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0), via Wikimedia Commons)
-
-![Nymphaea](examples/nymphaea-original.png)
-
-OpenEmulator screenshot of image produced with `--palette=openemulator --lookahead=8`.  The distorted background colour compared to the original is particularly noticeable.
-
-![Nymphaea](examples/nymphaea-iipix-openemulator-openemulator.png)
-
-OpenEmulator screenshot of image produced with `--palette=ntsc --lookahead=8`.  Not only is the background colour a much better match, the image shading and detail is markedly improved.
-
-![Nymphaea](examples/nymphaea-iipix-ntsc-openemulator.png)
-
-Rendering the same .dhr image with 4-pixel colour shows the reason for the difference.  For example the background shading is due to pixel sequences that appear (with this simpler and less hardware-accurate rendering scheme) as sequences of grey and dark green, with a lot of blue and red sprinkled in.  In NTSC these pixel sequences combine to produce various shades of green.
-
-![Nymphaea](examples/nymphaea-iipix-ntsc-preview-openemulator.png)
-
-# Dithering and Double Hi-Res
-
-[Dithering](https://en.wikipedia.org/wiki/Dither) an image to produce an approximation with fewer image colours is a well-known technique.  The basic idea is to pick a "best colour match" for a pixel from our limited palette, then to compute the difference between the true and selected colour values and diffuse this error to nearby pixels (using some pattern).
-
-In the particular case of DHGR this algorithm runs into difficulties, because each pixel only has two possible colour choices (from a total of 16+).  If we only consider the two possibilities for the immediate next pixel then neither may be a particularly good match.  However it may be more beneficial to make a suboptimal choice now (deliberately introduce more error), if it allows us access to a better colour for a subsequent pixel.  "Classical" dithering algorithms do not account for these palette constraints, and produce suboptimal image quality for DHGR conversions. 
-
-We can deal with this by looking ahead N pixels (8 by default) for each image position (x,y), and computing the effect of choosing all 2^N combinations of these N-pixel states on the dithered source image.
-
-Specifically, for a fixed choice of one of these N pixel sequences, we tentatively perform the error diffusion as normal on a copy of the image, and compute the total mean squared distance from the (fixed) N-pixel sequence to the error-diffused source image.  To compute the perceptual difference between colours we convert to the perceptually uniform [CAM16-UCS](https://en.wikipedia.org/wiki/Color_appearance_model#CAM16) colour space in which perceptual distance is Euclidean. 
-
-Finally, we pick the N-pixel sequence with the lowest total error, and select the first pixel of this N-pixel sequence for position (x,y).  We then perform error diffusion as usual for this single pixel, and proceed to x+1.
-
-This allows us to "look beyond" local minima to find cases where it is better to make a suboptimal choice now to allow better overall image quality in subsequent pixels.  Since we will sometimes find that our choice of 2 next-pixel colours actually includes (or comes close to) the "ideal" choice, this means we can take maximal advantage of the 560-pixel horizontal resolution.
-
-## Gamma correction
-
-Most digital images are encoded using the [sRGB colour space](https://en.wikipedia.org/wiki/SRGB), which means that the stored RGB values do not map linearly onto the rendered colour intensities.  In order to work with linearized RGB values the source image needs to be gamma corrected.  Otherwise, the process of dithering an un-gamma-corrected image tends to result in an output that does not match the brightness of the input.  In particular shadows and highlights tend to get blown out/over-exposed.
-
-## Dither pattern
-
-The process of (error-diffusion) dithering involves distributing the "quantization error" (mismatch between the colour of the source image and chosen output pixels) across neighbouring pixels, according to some pattern.  [Floyd-Steinberg](https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering) and [Jarvis-Judice-Ninke](https://en.wikipedia.org/wiki/Error_diffusion#minimized_average_error) ("Jarvis") are two common patterns, though there are many others, which have slightly different characteristics.
-
-Since it uses a small dither pattern, Floyd-Steinberg dithering retains more of the image detail than larger kernels.  On the other hand, it sometimes produces image artifacts that are highly structured (e.g. runs of a single colour, checkerboard patterns).  This seems to be especially common with 4-pixel colours.
-
-In part this may be because these "classical" dither patterns only propagate errors to a small number of neighbouring pixels, e.g. 1 pixels in the forward direction for Floyd-Steinberg, and 2 pixels for Jarvis.  However for double hi-res colours we know that it might take up to 4 pixels before a given colour can be selected for output (e.g. to alternate between black and white, or any other pairs that are 4 steps away on the transition chart above).
-
-In other words, given the results of error diffusion from our current pixel, there is one colour from our palette of 16 that is the best one to match this - but it might be only possible to render this particular colour up to 4 pixels further on.  If we only diffuse the errors by 1 or 2 pixels each time, it will tend to have diffused away by the time we reach that position, and the opportunity will be lost.  Combined with the small overall set of available colours this can result in image artifacts.
-
-Modifying the Jarvis dither pattern to extend 4 pixels in the forward direction seems to give much better results for such images (e.g. when dithering large blocks of colour), although at the cost of reduced detail.  This is presumably because we allow each quantization error to diffuse to each of the 4 subsequent pixels that might be best-placed to act on it.
-
-The bottom line is that choice of `--dither` argument is a tradeoff between image detail and handling of colour.  If the default `--dither=floyd` algorithm does not give pleasing results, try other patterns such as `--dither=jarvis-mod`.
-
-Further experimentation with other dithering patterns (and similar modifications to the above) may also produce interesting results.
-
-## Palettes
-
-Since the Apple II graphics (prior to //gs) are not based on RGB colour, we have to choose an (approximate) RGB colour palette when dithering an RGB image.  There is no "true" choice for this palette, since it depends heavily on how the image is viewed:
-
-1.  Different emulators have made (often quite different) choices for the RGB colour palettes used to emulate Apple II graphics on a RGB display.  This means that an image that looks good on one emulator may not look good on another (or on real hardware).
-    - For example, Virtual II (and the Apple //gs) uses two different RGB shades of grey for the two DHGR grey colours, whereas they are rendered identically in NTSC.  That means that images not targeted for the Virtual II palette will look quite different when viewed there (and vice versa).
-
-2.  Secondly, the actual display colours rendered by an Apple II are not fixed, but bleed into each other due to the behaviour of the (analogue) NTSC video signal.  i.e. the entire notion of a "16-colour RGB palette" is a flawed one.  Furthermore, the NTSC colours depend on the particular monitor/TV and its tuning (brightness/contrast/hue settings etc).  "Never Twice the Same Colour" indeed.   The "4-pixel colour" model described above where we can assign 2 from 16 fixed colours to each of 560 discrete pixels is only an approximation (though a useful one in practise).
-
-Some emulators emulate the NTSC video signal more faithfully (e.g. OpenEmulator), in which case they do not have a true "RGB palette".  The best we can do here is measure the colours that are produced by large blocks of colour, i.e. where there is no colour blending.  Others use some discrete approximation (e.g. Virtual II seems to exactly match the colour model described above), so a fixed palette can be reconstructed.
-
-To compute the emulator palettes used by ][-pix I measured the sRGB colour values produced by a full-screen Apple II colour image (using the colour picker tool of Mac OS X), using default emulator settings.  I have not yet attempted to measure/estimate palettes of other emulators, or "real hardware"
-
-Existing conversion tools (see below) tend to support a variety of RGB palette values sourced from various places (older tools, emulators, theoretical estimations etc).  In practise, these only matter in a few ways:
-
-1.  If you are trying to target colour balance as accurately as possible for a particular viewing target (e.g. emulator), i.e. so that the rendered colour output looks as close as possible to the source image.
-
-2.  If you are targeting an emulator that has a "non-standard" colour model, e.g. Virtual II with its two distinct shades of grey.
-
-3.  Otherwise, choices of palette effectively amount to changing the colour balance of the source image.  Some of these might produce better image quality for a particular image  (e.g. if the source image contains large colour blocks that are difficult to approximate with a particular target palette), at the cost of changing the colour balance.  i.e. it might look good on its own but not match the source image.  You could also achieve similar results by tweaking the colour balance of the source image in an editor, e.g GIMP or Photoshop.
-
-## Precomputing distance matrix
-
-The mapping from RGB colour space to CAM16-UCS is quite complex, so to avoid this runtime cost we precompute a matrix from all 256^3 integer RGB values to corresponding CAM16-UCS values. This 192MB matrix is generated by the `precompute_conversion.py` utility, and is loaded at runtime for efficient access.
-
-# Comparison to other DHGR image converters
-
-## bmp2dhr
-
-*  [bmp2dhr](http://www.appleoldies.ca/bmp2dhr/) (see [here](https://github.com/digarok/b2d) for a maintained code fork) supports additional graphics modes not yet supported by ][-pix, namely (double) lo-res, and hi-res.  Support for the lores modes would be easy to add to ][-pix, although hi-res requires more work to accommodate the colour model.  A similar lookahead strategy will likely work well though.
-
-*  supports additional image dither modes
-
-*  only supports BMP source images in a particular format.
-
-*  DHGR conversions are treated as simple 140x192x16 colour images without colour constraints, and ignores the colour fringing behaviour described above.  The generated .bmp preview images also do not show fringing, but it is present when viewing the image on an Apple II or emulator that accounts for it.  i.e. the preview images are sometimes not very representative of the actual results.  See below for an example.
-
-*  Apart from ignoring DHGR colour interactions, the 140px converted images are also lower than ideal resolution since they do not make use of the ability to address all 560px independently.
-
-*  The perceptual colour distance metric used to match the best colour to an input pixel is a custom metric based on a weighted sum of Euclidean sRGB distance and Rec.601 luma value.  It's not explained why this particular metric was chosen, and in practise it seems to often give much lower quality results than modern perceptually uniform colour spaces like CIE2000 or CAM16-UCS (though these are much slower to compute - which is why we precompute the conversion matrix ahead of time)
-
-* It does not perform RGB colour space conversions before dithering, i.e. if the input image is in sRGB colour space (as most digital images will be) then the dithering is also performed in sRGB.  Since sRGB is not a linear colour space, the effect of dithering is to distribute errors non-linearly, which distorts the brightness of the resulting image.
-
-## a2bestpix 
-
-*  Like ][-pix, [a2bestpix](http://lukazi.blogspot.com/2017/03/double-high-resolution-graphics-dhgr.html) only supports DHGR conversion.  Overall quality is usually fairly good, although colours and brightness are slightly distorted (for reasons described below), and the generated preview images do not quite give a faithful representation of the native image rendering.
-
-*  Like ][-pix, and unlike bmp2dhr, a2bestpix does apply a model of the DHGR colour interactions, albeit an ad-hoc one based on rules and tables of 4-pixel "colour blocks" reconstructed from (AppleWin) emulator behaviour.  This does allow it to make use of (closer to) full 560px resolution, although it still treats the screen as a sequence of 140 4-pixel colour blocks (with some constraints on the allowed arrangement of these blocks).
-
-*  supports additional (custom) dither modes (partly out of necessity due to the custom "colour block" model)
-
-*  Supports a variety of perceptual colour distance metrics including CIE2000 and the one bmp2dhr uses.  In practise I'm not sure the others are useful since CIE2000 is the more recent refinement of much research on this topic, and is the most accurate of them.
-
-* like bmp2dhr, only supports BMP source images in a particular format.
-
-*  Does not apply gamma correction before dithering (though sRGB conversion is done when computing CIE2000 distance), so errors are diffused non-linearly.  The resulting images don't match the brightness of the original, e.g. shadows/highlights tend to be over-exposed.
-
-*  image conversion performs an optimization over groups of multiple pixels (via choice of "colour blocks").  From what I can tell this minimizes the total colour distance from a fixed list of colour blocks to a group of 4 target pixels, similar to --lookahead=4 for ][-pix (though I'm not sure it's evaluating all 2^4 pixel combinations).  But since the image is (AFAICT) treated as a sequence of (non-overlapping) 4-pixel blocks this does not result in optimizing each output pixel independently.
-
-*  The list of "colour blocks" seem to contain colour sequences that cannot actually be rendered on the Apple II.  For example compare the spacing of yellow and orange pixels on the parrot between the preview image (LHS) and openemulator (RHS): 
-
-![Detail of a2bestpix preview image](docs/a2bestbix-preview-crop.png)
-![Detail of openemulator render](docs/a2bestpix-openemulator-crop.png)
-
-*  See below for another example where the output has major image discrepancies with the original - perhaps also due to bugs/omissions in the table of colour blocks.
-
-*  This means that (like bmp2dhr) the generated "preview" image may not closely match the native image, and the dithering algorithm is also optimizing over a slightly incorrect set of colour sequences, which presumably impacts image quality.  Possibly these are transcription errors, or artifacts of the particular emulator (AppleWin) from which they were reconstructed.
-
-## Image comparisons
-
-These three images were converted using the same target (openemulator) palette, using ][-pix, bmp2dhr and a2bestpix (since this is supported by all three), and are shown as screenshots from openemulator.
-
-### Original
-![original source image](examples/paperclips-original.png)
-
-(Source: [Purple Sherbet Photography from Worldwide!](https://commons.wikimedia.org/wiki/File:Colourful_assortment_of_paper_clips_(10421946796).jpg), [CC BY 2.0](https://creativecommons.org/licenses/by/2.0), via Wikimedia Commons)
-
-The following images were all generated with a palette approximating OpenEmulator's colours (`--palette=openemulator` for ][-pix)
-
-### ][-pix 4-pixel colour
-
-Preview image and OpenEmulator screenshot
-
-![ii-pix preview](examples/paperclips-iipix-openemulator-preview.png)
-![ii-pix screenshot](examples/paperclips-iipix-openemulator-openemulator.png)
-
-### ][-pix NTSC 8-pixel colour (Preview image)
-
-Preview image and OpenEmulator screenshot
-
-![ii-pix preview](examples/paperclips-iipix-ntsc-preview.png)
-![ii-pix screenshot](examples/paperclips-iipix-ntsc-openemulator.png)
-
-### bmp2dhr (OpenEmulator screenshot)
-![bmp2dhr screenshot](examples/paperclips-bmp2dhr-openemulator.png)
-
-Comparing bmp2dhr under openemulator is the scenario most favourable to it, since the 140px resolution and non-treatment of fringing is masked by the chroma blending.  Colours are similar to ][-pix, but the 140px dithering and lack of gamma correction results in less detail, e.g. in hilights/shadows.
-
-### a2bestpix (OpenEmulator screenshot)
-
-![a2bestpix screenshot](examples/paperclips-a2bestpix-openemulator.png)
-
-This a2bestpix image is actually atypical in quality, and shows some severe colour errors relating to the pixels that should be close to the orange/brown colours.  These may be due to errors/omissions in the set of "colour blocks".  The effects of not gamma-correcting the source image can also be seen.
-
-## NTSC artifacts
-
-The difference in treatment of NTSC artifacts is much more visible when using an emulator that doesn't perform chroma subsampling, e.g. Virtual II.  i.e. it displays the full 560-pixel colour image without blending.
+See [here](examples/shr/gallery.md) for more sample Super Hi-Res image conversions.

 ### Original

-![original source image](examples/groundhog-original.png)
+![European rabbit kitten](examples/shr/rabbit-kitten-original.jpg)

-(Source: [Cephas](https://commons.wikimedia.org/wiki/File:Marmota_monax_UL_04.jpg), [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0), via Wikimedia Commons)

-The following images were generated with a palette matching the one used by Virtual II  (`--palette=virtualii` for ][-pix)
+ (Source: [Alexis LOURS](https://commons.wikimedia.org/wiki/File:European_rabbit_(Oryctolagus_cuniculus)_kitten.jpg), Licensed under [Creative Commons Attribution 2.0 Generic](https://creativecommons.org/licenses/by/2.0/deed.en), via Wikimedia Commons)

-### ][-pix
+### ][-pix preview image

-![original source image](examples/groundhog-original.png)
-![ii-pix preview](examples/groundhog-iipix-virtualii-preview.png)
+This image was generated using

-### bmp2dhr
+```buildoutcfg
+python convert.py shr examples/shr/rabbit-kitten-original.png examples/shr/rabbit-kitten-original.shr
+```

-![original source image](examples/groundhog-original.png)
-![ii-pix screenshot](examples/groundhog-bmp2dhr-virtualii.png)
-
-The image is heavily impacted by colour fringing, which bmp2dhr does not account for at all.  The difference in brightness of the groundhog's flank is also because bmp2dhr does not gamma-correct the image, so shadows/highlights tend to get blown out.
-
-### bmp2dhr (OpenEmulator)
-
-![original source image](examples/groundhog-original.png)
-![ii-pix screenshot](examples/groundhog-bmp2dhr-openemulator.png)
-
-This bmp2dhr image was generated using a palette approximating OpenEmulator's colours (`--palette=openemulator` for ][-pix), i.e. not the same image files as above.
-On OpenEmulator, which simulates NTSC chroma sub-sampling, the fringing is not pronounced but changes the colour balance of the image, e.g. creates a greenish tinge.
-
-### ][-pix, 4-pixel colour (OpenEmulator)
-
-![original source image](examples/groundhog-original.png)
-![ii-pix screenshot](examples/groundhog-iipix-openemulator-openemulator.png)
-
-Colour balance here is also slightly distorted due to not fully accounting for chroma blending.
-
-### ][-pix, NTSC 8-pixel colour (OpenEmulator)
-
-![original source image](examples/groundhog-original.png)
-![ii-pix screenshot](examples/groundhog-iipix-ntsc-openemulator.png)
-
-Detail and colour balance is much improved.
+![European rabbit kitten](examples/shr/rabbit-kitten-iipix.png)

 # Future work

-* Supporting lo-res and double lo-res graphics modes would be straightforward.
+* Supporting lo-res and double lo-res graphics modes, and super hi-res 3200 modes would be straightforward.

-* Hi-res will require more care, since the 560 pixel display is not individually dot addressible.  In particular the behaviour of the "palette bit" (which shifts a group of 7 dots to the right by 1) is another optimization constraint.  In practise a similar lookahead algorithm should work well though.
+* Super hi-res 640 mode would also likely require some investigation, since it is a more highly constrained optimization problem than 320 mode.

 * I would like to be able to find an ordered dithering algorithm that works well for Apple II graphics.  Ordered dithering specifically avoids diffusing errors arbitrarily across the image, which produces visual noise (and unnecessary deltas) when combined with animation.  For example such a thing may work well with my [II-Vision](https://github.com/KrisKennaway/ii-vision) video streamer.  However the properties of NTSC artifact colour seem to be in conflict with these requirements, i.e. pixel changes *always* propagate colour to some extent.

 # Version history

-## v1.0 (2021-03-15)
+## v2.2 (2023-02-03)

-Initial release
+* Added support for HGR colour conversions
+
+## v2.1 (2023-01-21)
+
+* Added support for DHGR mono conversions
+* Fixed compatibility with python 3.10
+
+## v2.0 (2022-07-16)
+
+* Added support for Super Hi-Res 320x200 image conversions

 ## v1.1 (2021-11-05)

@ -362,4 +217,8 @@ Initial release
 * Switch default to --dither=floyd, which seems to produce the best results with --palette=ntsc
 * Various internal code simplifications and cleanups

-![me](examples/kris-iipix-openemulator.png)
+## v1.0 (2021-03-15)
+
+Initial release
+
+![me](examples/dhr/kris-iipix-openemulator.png)
--- a/common.pxd
+++ b/common.pxd
@ -0,0 +1,10 @@
+cdef float clip(float a, float min_value, float max_value) nogil
+
+# This is used to avoid passing around float[::1] memoryviews in the critical path.  These seem to
+# require reference counting which has a large performance overhead.
+cdef packed struct float3:
+    float[3] data
+
+cdef float3 convert_rgb_to_cam16ucs(float[:, ::1] rgb_to_cam16ucs, float r, float g, float b) nogil
+
+cdef float colour_distance_squared(float[3] colour1, float[3] colour2) nogil
--- a/common.pyx
+++ b/common.pyx
@ -0,0 +1,31 @@
+# cython: infer_types=True
+# cython: profile=False
+# cython: boundscheck=False
+# cython: wraparound=False
+
+
+cdef inline float clip(float a, float min_value, float max_value) noexcept nogil:
+    """Clip a value between min_value and max_value inclusive."""
+    return min(max(a, min_value), max_value)
+
+
+cdef inline float3 convert_rgb_to_cam16ucs(float[:, ::1] rgb_to_cam16ucs, float r, float g, float b) noexcept nogil:
+    """Converts floating point (r,g,b) valueto 3-tuple in CAM16UCS colour space, via 24-bit RGB lookup matrix."""
+
+    cdef unsigned int rgb_24bit = (<unsigned int>(r*255) << 16) + (<unsigned int>(g*255) << 8) + <unsigned int>(b*255)
+    cdef float3 res
+    cdef int i
+    for i in range(3):
+        res.data[i] = rgb_to_cam16ucs[rgb_24bit][i]
+    return res
+
+
+cdef inline float colour_distance_squared(float[3] colour1, float[3] colour2) noexcept nogil:
+    """Computes Euclidean squared distance between two floating-point colour 3-tuples."""
+
+    return (
+        (colour1[0] - colour2[0]) * (colour1[0] - colour2[0]) +
+        (colour1[1] - colour2[1]) * (colour1[1] - colour2[1]) +
+        (colour1[2] - colour2[2]) * (colour1[2] - colour2[2])
+    )
+
--- a/convert.py
+++ b/convert.py
@ -1,14 +1,11 @@
 """Image converter to Apple II Double Hi-Res format."""

 import argparse
-import os.path
-import time
-
-import colour
-from PIL import Image
 import numpy as np

-import dither as dither_pyx
+import convert_hgr as convert_hgr_py
+import convert_dhr as convert_dhr_py
+import convert_shr as convert_shr_py
 import dither_pattern
 import image as image_py
 import palette as palette_py
@ -16,30 +13,43 @@ import screen as screen_py


 # TODO:
-# - support LR/DLR
-# - support HGR
+#  - support additional graphics modes (easiest --> hardest):
+#    - LR/DLR
+#    - SHR 3200
+#    - SHR 640
+#    - HGR


-def main():
-    parser = argparse.ArgumentParser()
+def add_common_args(parser):
    parser.add_argument("input", type=str, help="Input image file to process.")
    parser.add_argument("output", type=str, help="Output file for converted "
                                                 "Apple II image.")
-    parser.add_argument(
-        "--lookahead", type=int, default=8,
-        help=("How many pixels to look ahead to compensate for NTSC colour "
-              "artifacts (default: 8)"))
-    parser.add_argument(
-        '--dither', type=str, choices=list(dither_pattern.PATTERNS.keys()),
-        default=dither_pattern.DEFAULT_PATTERN,
-        help="Error distribution pattern to apply when dithering (default: "
-             + dither_pattern.DEFAULT_PATTERN + ")")
    parser.add_argument(
        '--show-input', action=argparse.BooleanOptionalAction, default=False,
        help="Whether to show the input image before conversion.")
    parser.add_argument(
        '--show-output', action=argparse.BooleanOptionalAction, default=True,
        help="Whether to show the output image after conversion.")
+    parser.add_argument(
+        '--save-preview', action=argparse.BooleanOptionalAction, default=True,
+        help='Whether to save a .PNG rendering of the output image (default: '
+             'True)'
+    )
+    parser.add_argument(
+        '--verbose', action=argparse.BooleanOptionalAction,
+        default=False, help="Show progress during conversion")
+    parser.add_argument(
+        '--gamma-correct', type=float, default=2.4,
+        help='Gamma-correct image by this value (default: 2.4)'
+    )
+
+
+def add_dhr_hgr_args(parser):
+    parser.add_argument(
+        '--dither', type=str, choices=list(dither_pattern.PATTERNS.keys()),
+        default=dither_pattern.DEFAULT_PATTERN,
+        help="Error distribution pattern to apply when dithering (default: "
+             + dither_pattern.DEFAULT_PATTERN + ")")
    parser.add_argument(
        '--palette', type=str, choices=list(set(palette_py.PALETTES.keys())),
        default=palette_py.DEFAULT_PALETTE,
@ -52,60 +62,110 @@ def main():
        '--show-palette', type=str, choices=list(palette_py.PALETTES.keys()),
        help="RGB colour palette to use when --show_output (default: "
             "value of --palette)")
-    parser.add_argument(
-        '--verbose', action=argparse.BooleanOptionalAction,
-        default=False, help="Show progress during conversion")
-    parser.add_argument(
-        '--gamma_correct', type=float, default=2.4,
-        help='Gamma-correct image by this value (default: 2.4)'
+
+
+def validate_lookahead(arg: int) -> int:
+    try:
+        int_arg = int(arg)
+    except Exception:
+        raise argparse.ArgumentTypeError("--lookahead must be an integer")
+    if int_arg < 1:
+        raise argparse.ArgumentTypeError("--lookahead must be at least 1")
+    return int_arg
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    subparsers = parser.add_subparsers(required=True)
+
+    # Hi-res
+    hgr_parser = subparsers.add_parser("hgr")
+    add_common_args(hgr_parser)
+    add_dhr_hgr_args(hgr_parser)
+    hgr_parser.add_argument(
+        '--error_fraction', type=float, default = 0.7,
+        help="Fraction of quantization error to distribute to neighbouring "
+             "pixels according to dither pattern"
    )
+    hgr_parser.set_defaults(func=convert_hgr)
+
+    # Double Hi-res
+    dhr_parser = subparsers.add_parser("dhr")
+    add_common_args(dhr_parser)
+    add_dhr_hgr_args(dhr_parser)
+    dhr_parser.add_argument(
+        "--lookahead", type=validate_lookahead, default=8,
+        help=("How many pixels to look ahead to compensate for NTSC colour "
+              "artifacts (default: 8)"))
+    dhr_parser.set_defaults(func=convert_dhr)
+
+    # Double Hi-Res mono
+    dhr_mono_parser = subparsers.add_parser("dhr_mono")
+    add_common_args(dhr_mono_parser)
+    dhr_mono_parser.set_defaults(func=convert_dhr_mono)
+
+    # Super Hi-Res 320x200
+    shr_parser = subparsers.add_parser("shr")
+    add_common_args(shr_parser)
+    shr_parser.add_argument(
+        '--fixed-colours', type=int, default=0,
+        help='How many colours to fix as identical across all 16 SHR palettes '
+             '(default: 0)'
+    )
+    shr_parser.add_argument(
+        '--show-final-score', action=argparse.BooleanOptionalAction,
+        default=False, help='Whether to output the final image quality score '
+                            '(default: False)'
+    )
+    shr_parser.add_argument(
+        '--save-intermediate', action=argparse.BooleanOptionalAction,
+        default=False, help='Whether to save each intermediate iteration, '
+                            'or just the final image (default: False)'
+    )
+    shr_parser.set_defaults(func=convert_shr)
    args = parser.parse_args()
-    if args.lookahead < 1:
-        parser.error('--lookahead must be at least 1')
+    args.func(args)

-    palette = palette_py.PALETTES[args.palette]()
-    screen = screen_py.DHGRScreen(palette)
-
-    # Conversion matrix from RGB to CAM16UCS colour values.  Indexed by
-    # 24-bit RGB value
-    rgb_to_cam16 = np.load("data/rgb_to_cam16ucs.npy")

+def prepare_image(image_filename: str, show_input: bool, screen,
+                  gamma_correct: float) -> np.ndarray:
    # Open and resize source image
-    image = image_py.open(args.input)
-    if args.show_input:
+    image = image_py.open(image_filename)
+    if show_input:
        image_py.resize(image, screen.X_RES, screen.Y_RES * 2,
                        srgb_output=True).show()
-    rgb = np.array(
-        image_py.resize(image, screen.X_RES, screen.Y_RES,
-                        gamma=args.gamma_correct)).astype(np.float32) / 255
+    return image_py.resize(image, screen.X_RES, screen.Y_RES,
+                           gamma=gamma_correct)

-    dither = dither_pattern.PATTERNS[args.dither]()
-    bitmap = dither_pyx.dither_image(
-        screen, rgb, dither, args.lookahead, args.verbose, rgb_to_cam16)

-    # Show output image by rendering in target palette
-    output_palette_name = args.show_palette or args.palette
-    output_palette = palette_py.PALETTES[output_palette_name]()
-    output_screen = screen_py.DHGRScreen(output_palette)
-    if output_palette_name == "ntsc":
-        output_srgb = output_screen.bitmap_to_image_ntsc(bitmap)
-    else:
-        output_srgb = image_py.linear_to_srgb(
-            output_screen.bitmap_to_image_rgb(bitmap)).astype(np.uint8)
-    out_image = image_py.resize(
-        Image.fromarray(output_srgb), screen.X_RES, screen.Y_RES * 2,
-        srgb_output=True)
+def convert_hgr(args):
+    palette = palette_py.PALETTES[args.palette]()
+    screen = screen_py.HGRNTSCScreen(palette)
+    image = prepare_image(args.input, args.show_input, screen,
+                          args.gamma_correct)
+    convert_hgr_py.convert(screen, image, args)

-    if args.show_output:
-        out_image.show()

-    # Save Double hi-res image
-    outfile = os.path.join(os.path.splitext(args.output)[0] + "-preview.png")
-    out_image.save(outfile, "PNG")
-    screen.pack(bitmap)
-    with open(args.output, "wb") as f:
-        f.write(bytes(screen.aux))
-        f.write(bytes(screen.main))
+def convert_dhr(args):
+    palette = palette_py.PALETTES[args.palette]()
+    screen = screen_py.DHGRNTSCScreen(palette)
+    image = prepare_image(args.input, args.show_input, screen,
+                          args.gamma_correct)
+    convert_dhr_py.convert(screen, image, args)
+
+
+def convert_dhr_mono(args):
+    screen = screen_py.DHGRScreen()
+    image = prepare_image(args.input, args.show_input, screen,
+                          args.gamma_correct)
+    convert_dhr_py.convert_mono(screen, image, args)
+
+
+def convert_shr(args):
+    screen = screen_py.SHR320Screen()
+    image = prepare_image(args.input, args.show_input, screen,
+                          args.gamma_correct)
+    convert_shr_py.convert(screen, image, args)


 if __name__ == "__main__":
--- a/convert_dhr.py
+++ b/convert_dhr.py
@ -0,0 +1,70 @@
+import os.path
+
+from PIL import Image
+import numpy as np
+
+import dither_dhr as dither_dhr_pyx
+import dither_pattern
+import palette as palette_py
+import screen as screen_py
+import image as image_py
+
+
+def _output(out_image: Image, args):
+    if args.show_output:
+        out_image.show()
+
+    if args.save_preview:
+        # Save Double hi-res image
+        outfile = os.path.join(
+            os.path.splitext(args.output)[0] + "-preview.png")
+        out_image.save(outfile, "PNG")
+
+
+def _write(screen: screen_py.DHGRScreen, bitmap: np.ndarray, args):
+    screen.pack(bitmap)
+    with open(args.output, "wb") as f:
+        f.write(bytes(screen.aux))
+        f.write(bytes(screen.main))
+
+
+# TODO: unify with convert_hgr.convert()
+def convert(screen: screen_py.DHGRNTSCScreen, image: Image, args):
+    rgb = np.array(image).astype(np.float32) / 255
+
+    # Conversion matrix from RGB to CAM16UCS colour values.  Indexed by
+    # 24-bit RGB value
+    base_dir = os.path.dirname(__file__)
+    rgb24_to_cam16ucs = np.load(
+        os.path.join(base_dir, "data/rgb24_to_cam16ucs.npy"))
+
+    dither = dither_pattern.PATTERNS[args.dither]()
+    bitmap, _ = dither_dhr_pyx.dither_image(
+        screen, rgb, dither, args.lookahead, args.verbose, rgb24_to_cam16ucs)
+
+    # Show output image by rendering in target palette
+    output_palette_name = args.show_palette or args.palette
+    output_palette = palette_py.PALETTES[output_palette_name]()
+    output_screen = screen_py.DHGRNTSCScreen(output_palette)
+    if output_palette_name == "ntsc":
+        output_srgb = output_screen.bitmap_to_image_ntsc(bitmap)
+    else:
+        output_srgb = image_py.linear_to_srgb(
+            output_screen.bitmap_to_image_rgb(bitmap)).astype(np.uint8)
+    out_image = image_py.resize(
+        Image.fromarray(output_srgb), screen.X_RES, screen.Y_RES * 2,
+        srgb_output=True)
+
+    _output(out_image, args)
+    _write(screen, bitmap, args)
+
+
+def convert_mono(screen: screen_py.DHGRScreen, image: Image, args):
+    image = image.convert("1")
+
+    out_image = Image.fromarray((np.array(image) * 255).astype(np.uint8))
+    out_image = image_py.resize(
+        out_image, screen.X_RES, screen.Y_RES * 2, srgb_output=True)
+
+    _output(out_image, args)
+    _write(screen, np.array(image).astype(bool), args)
--- a/convert_hgr.py
+++ b/convert_hgr.py
@ -0,0 +1,59 @@
+import os.path
+
+from PIL import Image
+import numpy as np
+
+import dither_dhr as dither_dhr_pyx
+import dither_pattern
+import palette as palette_py
+import screen as screen_py
+import image as image_py
+
+
+def _output(out_image: Image, args):
+    if args.show_output:
+        out_image.show()
+
+    if args.save_preview:
+        # Save Hi-res image
+        outfile = os.path.join(
+            os.path.splitext(args.output)[0] + "-preview.png")
+        out_image.save(outfile, "PNG")
+
+
+def _write(screen: screen_py.HGRNTSCScreen, linear_bytemap: np.ndarray, args):
+    screen.pack_bytes(linear_bytemap)
+    with open(args.output, "wb") as f:
+        f.write(bytes(screen.main))
+
+
+# TODO: unify with convert_dhr.convert()
+def convert(screen: screen_py.HGRNTSCScreen, image: Image, args):
+    rgb = np.array(image).astype(np.float32) / 255
+
+    # Conversion matrix from RGB to CAM16UCS colour values.  Indexed by
+    # 24-bit RGB value
+    base_dir = os.path.dirname(__file__)
+    rgb24_to_cam16ucs = np.load(
+        os.path.join(base_dir, "data/rgb24_to_cam16ucs.npy"))
+
+    dither = dither_pattern.PATTERNS[args.dither](
+        error_fraction = args.error_fraction)
+    bitmap, linear_bytemap = dither_dhr_pyx.dither_image(
+        screen, rgb, dither, 8, args.verbose, rgb24_to_cam16ucs)
+
+    # Show output image by rendering in target palette
+    output_palette_name = args.show_palette or args.palette
+    output_palette = palette_py.PALETTES[output_palette_name]()
+    output_screen = screen_py.HGRNTSCScreen(output_palette)
+    if output_palette_name == "ntsc":
+        output_srgb = output_screen.bitmap_to_image_ntsc(bitmap)
+    else:
+        output_srgb = image_py.linear_to_srgb(
+            output_screen.bitmap_to_image_rgb(bitmap)).astype(np.uint8)
+    out_image = image_py.resize(
+        Image.fromarray(output_srgb), screen.X_RES, screen.Y_RES * 2,
+        srgb_output=True)
+
+    _output(out_image, args)
+    _write(screen, linear_bytemap, args)
--- a/convert_shr.py
+++ b/convert_shr.py
@ -0,0 +1,459 @@
+from collections import defaultdict
+import os.path
+import random
+from typing import Tuple
+
+from PIL import Image
+import colour
+import numpy as np
+from sklearn import cluster
+
+from os import environ
+
+environ['PYGAME_HIDE_SUPPORT_PROMPT'] = '1'
+import pygame
+
+import dither_shr as dither_shr_pyx
+import image as image_py
+
+
+class ClusterPalette:
+    def __init__(
+            self, image: np.ndarray, rgb12_iigs_to_cam16ucs, rgb24_to_cam16ucs,
+            fixed_colours=0):
+
+        # Conversion matrix from 12-bit //gs RGB colour space to CAM16UCS
+        # colour space
+        self._rgb12_iigs_to_cam16ucs = rgb12_iigs_to_cam16ucs
+
+        # Conversion matrix from 24-bit linear RGB colour space to CAM16UCS
+        # colour space
+        self._rgb24_to_cam16ucs = rgb24_to_cam16ucs
+
+        # Preprocessed source image in 24-bit linear RGB colour space.  We
+        # first dither the source image using the full 12-bit //gs RGB colour
+        # palette, ignoring SHR palette limitations (i.e. 4096 independent
+        # colours for each pixel).  This gives much better results for e.g.
+        # solid blocks of colour, which would be dithered inconsistently if
+        # targeting the source image directly.
+        self._image_rgb = self._perfect_dither(image)
+
+        # Preprocessed source image in CAM16UCS colour space
+        self._colours_cam = self._image_colours_cam(self._image_rgb)
+
+        # We fit a 16-colour palette against the entire image which is used
+        # as starting values for fitting the reserved colours in the 16 SHR
+        # palettes.
+        self._global_palette = np.empty((16, 3), dtype=np.uint8)
+
+        # How many image colours to fix identically across all 16 SHR
+        # palettes.  These are taken to be the most prevalent colours from
+        # _global_palette.
+        self._fixed_colours = fixed_colours
+
+        # 16 SHR palettes each of 16 colours, in CAM16UCS colour space
+        self._palettes_cam = np.empty((16, 16, 3), dtype=np.float32)
+
+        # 16 SHR palettes each of 16 colours, in //gs 4-bit RGB colour space
+        self._palettes_rgb = np.empty((16, 16, 3), dtype=np.uint8)
+
+        # defaultdict(list) mapping palette index to the lines that use this
+        # palette
+        self._palette_lines = self._init_palette_lines()
+
+    @staticmethod
+    def _image_colours_cam(image: Image):
+        colours_rgb = np.asarray(image)  # .reshape((-1, 3))
+        with colour.utilities.suppress_warnings(colour_usage_warnings=True):
+            colours_cam = colour.convert(colours_rgb, "RGB",
+                                         "CAM16UCS").astype(np.float32)
+        return colours_cam
+
+    def _init_palette_lines(self, init_random=False):
+        palette_lines = defaultdict(list)
+
+        if init_random:
+            lines = list(range(200))
+            random.shuffle(lines)
+            idx = 0
+            while lines:
+                palette_lines[idx].append(lines.pop())
+                idx += 1
+        else:
+            palette_splits = self._equal_palette_splits()
+            for i, lh in enumerate(palette_splits):
+                l, h = lh
+                palette_lines[i].extend(list(range(l, h)))
+        return palette_lines
+
+    @staticmethod
+    def _equal_palette_splits(palette_height=35):
+        # The 16 palettes are striped across consecutive (overlapping) line
+        # ranges.  Since nearby lines tend to have similar colours, this has
+        # the effect of smoothing out the colour transitions across palettes.
+
+        # If we want to overlap 16 palettes in 200 lines, where each palette
+        # has height H and overlaps the previous one by L lines, then the
+        # boundaries are at lines:
+        #   (0, H), (H-L, 2H-L), (2H-2L, 3H-2L), ..., (15H-15L, 16H - 15L)
+        # i.e. 16H - 15L = 200, so for a given palette height H we need to
+        # overlap by:
+        #   L = (16H - 200)/15
+
+        palette_overlap = (16 * palette_height - 200) / 15
+
+        palette_ranges = []
+        for palette_idx in range(16):
+            palette_lower = palette_idx * (palette_height - palette_overlap)
+            palette_upper = palette_lower + palette_height
+            palette_ranges.append((int(np.round(palette_lower)),
+                                   int(np.round(palette_upper))))
+        return palette_ranges
+
+    def _perfect_dither(self, source_image: np.ndarray):
+        """Dither a "perfect" image using the full 12-bit //gs RGB colour
+        palette, ignoring restrictions."""
+
+        # Suppress divide by zero warning,
+        # https://github.com/colour-science/colour/issues/900
+        with colour.utilities.suppress_warnings(python_warnings=True):
+            full_palette_linear_rgb = colour.convert(
+                self._rgb12_iigs_to_cam16ucs, "CAM16UCS", "RGB").astype(
+                np.float32)
+
+        total_image_error, image_rgb = dither_shr_pyx.dither_shr_perfect(
+            source_image, self._rgb12_iigs_to_cam16ucs, full_palette_linear_rgb,
+            self._rgb24_to_cam16ucs)
+        # print("Perfect image error:", total_image_error)
+        return image_rgb
+
+    def _dither_image(self, palettes_cam):
+        # Suppress divide by zero warning,
+        # https://github.com/colour-science/colour/issues/900
+        with colour.utilities.suppress_warnings(python_warnings=True):
+            palettes_linear_rgb = colour.convert(
+                palettes_cam, "CAM16UCS", "RGB").astype(np.float32)
+
+        output_4bit, line_to_palette, total_image_error, palette_line_errors = \
+            dither_shr_pyx.dither_shr(
+                self._image_rgb, palettes_cam, palettes_linear_rgb,
+                self._rgb24_to_cam16ucs)
+
+        # Update map of palettes to image lines for which the palette was the
+        # best match
+        palette_lines = defaultdict(list)
+        for line, palette in enumerate(line_to_palette):
+            palette_lines[palette].append(line)
+        self._palette_lines = palette_lines
+
+        self._palette_line_errors = palette_line_errors
+
+        return (output_4bit, line_to_palette, palettes_linear_rgb,
+                total_image_error)
+
+    def iterate(self, max_inner_iterations: int,
+                max_outer_iterations: int):
+        total_image_error = 1e9
+
+        outer_iterations_since_improvement = 0
+        while outer_iterations_since_improvement < max_outer_iterations:
+            inner_iterations_since_improvement = 0
+            self._palette_lines = self._init_palette_lines()
+
+            while inner_iterations_since_improvement < max_inner_iterations:
+                # print("Iterations %d" % inner_iterations_since_improvement)
+                new_palettes_cam, new_palettes_rgb12_iigs = (
+                    self._fit_shr_palettes())
+
+                # Recompute image with proposed palettes and check whether it
+                # has lower total image error than our previous best.
+                (output_4bit, line_to_palette, palettes_linear_rgb,
+                 new_total_image_error) = self._dither_image(new_palettes_cam)
+
+                self._reassign_unused_palettes(
+                    line_to_palette, new_palettes_rgb12_iigs)
+
+                if new_total_image_error >= total_image_error:
+                    inner_iterations_since_improvement += 1
+                    continue
+
+                # We found a globally better set of palettes, so restart the
+                # clocks
+                inner_iterations_since_improvement = 0
+                outer_iterations_since_improvement = -1
+                total_image_error = new_total_image_error
+
+                self._palettes_cam = new_palettes_cam
+                self._palettes_rgb = new_palettes_rgb12_iigs
+
+                yield (new_total_image_error, output_4bit, line_to_palette,
+                       new_palettes_rgb12_iigs, palettes_linear_rgb)
+            outer_iterations_since_improvement += 1
+
+    def _fit_shr_palettes(self) -> Tuple[np.ndarray, np.ndarray]:
+        """Attempt to find new palettes that locally improve image quality.
+
+        Re-fit a set of 16 palettes from (overlapping) line ranges of the
+        source image, using k-means clustering in CAM16-UCS colour space.
+
+        We maintain the total image error for the pixels on which the 16
+        palettes are clustered.  A new palette that increases this local
+        image error is rejected.
+
+        New palettes that reduce local error cannot be applied immediately
+        though, because they may cause an increase in *global* image error
+        when dithering.  i.e. they would reduce the overall image quality.
+
+        The current (locally) best palettes are returned and can be applied
+        using accept_palettes()
+
+        XXX update
+        """
+        new_palettes_cam = np.empty_like(self._palettes_cam)
+        new_palettes_rgb12_iigs = np.empty_like(self._palettes_rgb)
+
+        # Compute a new 16-colour global palette for the entire image,
+        # used as the starting center positions for k-means clustering of the
+        # individual palettes
+        self._fit_global_palette()
+
+        for palette_idx in range(16):
+            palette_pixels = (
+                self._colours_cam[self._palette_lines[
+                                      palette_idx], :, :].reshape(-1, 3))
+
+            # Fix reserved colours from the global palette.
+            initial_centroids = np.copy(self._global_palette)
+            pixels_rgb_iigs = dither_shr_pyx.convert_cam16ucs_to_rgb12_iigs(
+                palette_pixels)
+            seen_colours = set()
+            for i in range(self._fixed_colours):
+                seen_colours.add(tuple(initial_centroids[i, :]))
+
+            # Pick unique random colours from the sample points for the
+            # remaining initial centroids.
+            for i in range(self._fixed_colours, 16):
+                choice = np.random.randint(0, pixels_rgb_iigs.shape[0])
+                new_colour = pixels_rgb_iigs[choice, :]
+                if tuple(new_colour) in seen_colours:
+                    continue
+                seen_colours.add(tuple(new_colour))
+                initial_centroids[i, :] = new_colour
+
+            # If there are any single colours in our source //gs RGB pixels that
+            # represent more than fixed_colour_fraction_threshold of the total,
+            # then fix these colours for the palette instead of clustering
+            # them.  This reduces artifacting on blocks of colour.
+            fixed_colour_fraction_threshold = 0.1
+            most_frequent_colours = sorted(list(zip(
+                *np.unique(pixels_rgb_iigs, return_counts=True, axis=0))),
+                key=lambda kv: kv[1], reverse=True)
+            fixed_colours = self._fixed_colours
+            for palette_colour, freq in most_frequent_colours:
+                if (freq < (palette_pixels.shape[0] *
+                            fixed_colour_fraction_threshold)) or (
+                        fixed_colours == 16):
+                    break
+                if tuple(palette_colour) not in seen_colours:
+                    seen_colours.add(tuple(palette_colour))
+                    initial_centroids[fixed_colours, :] = palette_colour
+                    fixed_colours += 1
+
+            palette_rgb12_iigs = dither_shr_pyx.k_means_with_fixed_centroids(
+                n_clusters=16, n_fixed=fixed_colours,
+                samples=palette_pixels,
+                initial_centroids=initial_centroids,
+                max_iterations=1000,
+                rgb12_iigs_to_cam16ucs=self._rgb12_iigs_to_cam16ucs)
+            # If the k-means clustering returned fewer than 16 unique colours,
+            # fill out the remainder with the most common pixels colours that
+            # have not yet been used.
+            #
+            # TODO: this seems like an opportunity to do something better -
+            #   e.g. forcibly split clusters and iterate the clustering
+            palette_rgb12_iigs = self._fill_short_palette(
+                palette_rgb12_iigs, most_frequent_colours)
+
+            for i in range(16):
+                new_palettes_cam[palette_idx, i, :] = (
+                    np.array(dither_shr_pyx.convert_rgb12_iigs_to_cam(
+                        self._rgb12_iigs_to_cam16ucs, palette_rgb12_iigs[
+                            i]), dtype=np.float32))
+
+            new_palettes_rgb12_iigs[palette_idx, :, :] = palette_rgb12_iigs
+
+        self._palettes_accepted = False
+        return new_palettes_cam, new_palettes_rgb12_iigs
+
+    def _fit_global_palette(self):
+        """Compute a 16-colour palette for the entire image to use as
+        starting point for the sub-palettes.  This should help when the image
+        has large blocks of colour since the sub-palettes will tend to pick the
+        same colours."""
+
+        clusters = cluster.MiniBatchKMeans(n_clusters=16, max_iter=10000)
+        clusters.fit_predict(self._colours_cam.reshape(-1, 3))
+
+        # Dict of {palette idx : frequency count}
+        palette_freq = {idx: 0 for idx in range(16)}
+        for idx, freq in zip(*np.unique(clusters.labels_, return_counts=True)):
+            palette_freq[idx] = freq
+
+        frequency_order = [
+            k for k, v in sorted(
+                list(palette_freq.items()), key=lambda kv: kv[1], reverse=True)]
+
+        self._global_palette = (
+            dither_shr_pyx.convert_cam16ucs_to_rgb12_iigs(
+                clusters.cluster_centers_[frequency_order].astype(
+                    np.float32)))
+
+    @staticmethod
+    def _fill_short_palette(palette_iigs_rgb, most_frequent_colours):
+        """Fill out the palette to 16 unique entries."""
+
+        # We want to maintain order of insertion so that we respect the
+        # ordering of fixed colours in the palette.  Python doesn't have an
+        # orderedset but dicts preserve insertion order.
+        palette_set = {}
+        for palette_entry in palette_iigs_rgb:
+            palette_set[tuple(palette_entry)] = True
+        if len(palette_set) == 16:
+            return palette_iigs_rgb
+
+        # Add most frequent image colours that are not yet in the palette
+        for palette_colour, freq in most_frequent_colours:
+            if tuple(palette_colour) in palette_set:
+                continue
+            palette_set[tuple(palette_colour)] = True
+            if len(palette_set) == 16:
+                break
+
+        # We couldn't find any more unique colours, fill out with random ones.
+        while len(palette_set) < 16:
+            palette_set[
+                tuple(np.random.randint(0, 16, size=3, dtype=np.uint8))] = True
+
+        return np.array(tuple(palette_set.keys()), dtype=np.uint8)
+
+    def _reassign_unused_palettes(self, line_to_palette, palettes_iigs_rgb):
+        palettes_used = [False] * 16
+        for palette in line_to_palette:
+            palettes_used[palette] = True
+        best_palette_lines = [v for k, v in sorted(list(zip(
+            self._palette_line_errors, range(200))))]
+
+        all_palettes = set()
+        for palette_idx, palette_iigs_rgb in enumerate(palettes_iigs_rgb):
+            palette_set = set()
+            for palette_entry in palette_iigs_rgb:
+                palette_set.add(tuple(palette_entry))
+            palette_set = frozenset(palette_set)
+            if palette_set in all_palettes:
+                print("Duplicate palette", palette_idx, palette_set)
+                palettes_used[palette_idx] = False
+
+        for palette_idx, palette_used in enumerate(palettes_used):
+            if palette_used:
+                continue
+
+            # TODO: also remove from old entry
+            worst_line = best_palette_lines.pop()
+            self._palette_lines[palette_idx] = [worst_line]
+
+
+def convert(screen, image: Image, args):
+    rgb = np.array(image).astype(np.float32) / 255
+
+    # Conversion matrix from RGB to CAM16UCS colour values.  Indexed by
+    # 24-bit RGB value
+    base_dir = os.path.dirname(__file__)
+    rgb24_to_cam16ucs = np.load(
+        os.path.join(base_dir, "data/rgb24_to_cam16ucs.npy"))
+    rgb12_iigs_to_cam16ucs = np.load(
+        os.path.join(base_dir, "data/rgb12_iigs_to_cam16ucs.npy"))
+
+    # TODO: flags
+    inner_iterations = 10
+    outer_iterations = 20
+
+    if args.show_output:
+        pygame.init()
+        canvas = pygame.display.set_mode((640, 400))
+        canvas.fill((0, 0, 0))
+        pygame.display.set_caption("][-Pix image preview")
+        pygame.event.pump()  # Update caption
+        pygame.display.flip()
+
+    total_image_error = None
+    cluster_palette = ClusterPalette(
+        rgb, fixed_colours=args.fixed_colours,
+        rgb12_iigs_to_cam16ucs=rgb12_iigs_to_cam16ucs,
+        rgb24_to_cam16ucs=rgb24_to_cam16ucs)
+
+    output_base, output_ext = os.path.splitext(args.output)
+
+    seq = 0
+    for (
+            new_total_image_error, output_4bit, line_to_palette,
+            palettes_rgb12_iigs,
+            palettes_linear_rgb
+    ) in cluster_palette.iterate(inner_iterations, outer_iterations):
+
+        if args.verbose and total_image_error is not None:
+            print("Improved quality +%f%% (%f)" % (
+                (1 - new_total_image_error / total_image_error) * 100,
+                new_total_image_error))
+        total_image_error = new_total_image_error
+        for i in range(16):
+            screen.set_palette(i, palettes_rgb12_iigs[i, :, :])
+
+        # Recompute current screen RGB image
+        screen.set_pixels(output_4bit)
+        output_rgb = np.empty((200, 320, 3), dtype=np.uint8)
+        for i in range(200):
+            screen.line_palette[i] = line_to_palette[i]
+            output_rgb[i, :, :] = (
+                    palettes_linear_rgb[line_to_palette[i]][
+                        output_4bit[i, :]] * 255
+            ).astype(np.uint8)
+
+        output_srgb = (image_py.linear_to_srgb(output_rgb)).astype(np.uint8)
+        out_image = image_py.resize(
+            Image.fromarray(output_srgb), screen.X_RES * 2, screen.Y_RES * 2,
+            srgb_output=True)
+
+        if args.show_output:
+            surface = pygame.surfarray.make_surface(
+                np.asarray(out_image).transpose((1, 0, 2)))  # flip y/x axes
+            canvas.blit(surface, (0, 0))
+            pygame.display.set_caption("][-Pix image preview [Iteration %d]"
+                                       % seq)
+            pygame.event.pump()  # Update caption
+            pygame.display.flip()
+
+        unique_colours = np.unique(
+            palettes_rgb12_iigs.reshape(-1, 3), axis=0).shape[0]
+        if args.verbose:
+            print("%d unique colours" % unique_colours)
+
+        if args.save_preview:
+            # Save super hi-res image
+            if args.save_intermediate:
+                outfile = "%s-%d-preview.png" % (output_base, seq)
+            else:
+                outfile = "%s-preview.png" % output_base
+            out_image.save(outfile, "PNG")
+        screen.pack()
+
+        if args.save_intermediate:
+            outfile = "%s-%d%s" % (output_base, seq, output_ext)
+        else:
+            outfile = "%s%s" % (output_base, output_ext)
+        with open(outfile, "wb") as f:
+            f.write(bytes(screen.memory))
+
+        seq += 1
+
+    if args.show_final_score:
+        print("FINAL_SCORE:", total_image_error)
--- a/dither_dhr.pyx
+++ b/dither_dhr.pyx
@ -1,10 +1,16 @@
 # cython: infer_types=True
 # cython: profile=False
+# cython: boundscheck=False
+# cython: wraparound=False

 cimport cython
 import numpy as np
 from libc.stdlib cimport malloc, free

+cimport common
+
+import screen as screen_py
+

 # TODO: use a cdef class
 # C representation of dither_pattern.DitherPattern data, for efficient access.
@ -17,26 +23,22 @@ cdef struct Dither:
    int y_origin


-cdef float clip(float a, float min_value, float max_value) nogil:
-    return min(max(a, min_value), max_value)
-
-
 # Compute left-hand bounding box for dithering at horizontal position x.
-cdef int dither_bounds_xl(Dither *dither, int x) nogil:
+cdef inline int dither_bounds_xl(Dither *dither, int x) nogil:
    cdef int el = max(dither.x_origin - x, 0)
    cdef int xl = x - dither.x_origin + el
    return xl


 #Compute right-hand bounding box for dithering at horizontal position x.
-cdef int dither_bounds_xr(Dither *dither, int x_res, int x) nogil:
+cdef inline int dither_bounds_xr(Dither *dither, int x_res, int x) nogil:
    cdef int er = min(dither.x_shape, x_res - x)
    cdef int xr = x - dither.x_origin + er
    return xr


 # Compute upper bounding box for dithering at vertical position y.
-cdef int dither_bounds_yt(Dither *dither, int y) nogil:
+cdef inline int dither_bounds_yt(Dither *dither, int y) nogil:
    cdef int et = max(dither.y_origin - y, 0)
    cdef int yt = y - dither.y_origin + et

@ -44,7 +46,7 @@ cdef int dither_bounds_yt(Dither *dither, int y) nogil:


 # Compute lower bounding box for dithering at vertical position y.
-cdef int dither_bounds_yb(Dither *dither, int y_res, int y) nogil:
+cdef inline int dither_bounds_yb(Dither *dither, int y_res, int y) nogil:
    cdef int eb = min(dither.y_shape, y_res - y)
    cdef int yb = y - dither.y_origin + eb
    return yb
@ -75,6 +77,42 @@ cdef inline unsigned char shift_pixel_window(
    return ((last_pixels >> shift_right_by) | shifted_next_pixels) & window_mask


+# Given a byte to store on the hi-res screen, compute the sequence of 560-resolution pixels that will be displayed.
+# Hi-res graphics works like this:
+# - Each of the low 7 bits in screen_byte results in enabling or disabling two sequential 560-resolution pixels.
+# - pixel screen order is from LSB to MSB
+# - if bit 8 (the "palette bit) is set then the 14-pixel sequence is shifted one position to the right, and the
+#   left-most pixel is filled in by duplicating the right-most pixel controlled by the previous screen byte (i.e. bit 7)
+# - this gives a 15 or 14 pixel sequence depending on whether or not the palette bit is set.
+cdef unsigned int compute_fat_pixels(unsigned int screen_byte, unsigned char last_pixels) nogil:
+    cdef int i, bit, fat_bit
+    cdef unsigned int result = 0
+
+    for i in range(7):
+        bit = (screen_byte >> i) & 0b1
+        fat_bit = bit << 1 | bit
+        result |= (fat_bit) << (2 * i)
+    if screen_byte & 0x80:
+        # Palette bit shifts to the right
+        result <<= 1
+        result |= (last_pixels >> 7)
+
+    return result
+
+
+# Context parametrizes the differences between DHGR and HGR image optimization
+cdef struct Context:
+    # How many bit positions to lookahead when optimizing
+    unsigned char bit_lookahead
+    # How many screen pixels produced by bit_lookahead.  This is 1:1 for DHGR but for HGR 8 bits in memory produce
+    # 14 or 15 screen pixels (see compute_fat_pixels above)
+    unsigned char pixel_lookahead
+    # HGR has a NTSC phase shift relative to DHGR which rotates the effective mappings from screen pixels to colours
+    unsigned char phase_shift
+    # Non-zero for HGR optimization
+    unsigned char is_hgr
+
+
 # Look ahead a number of pixels and compute choice for next pixel with lowest total squared error after dithering.
 #
 # Args:
@ -90,25 +128,26 @@ cdef inline unsigned char shift_pixel_window(
 #
 # Returns: index from 0 .. 2**lookahead into options_nbit representing best available choice for position (x,y)
 #
-@cython.boundscheck(False)
-@cython.wraparound(False)
-cdef int dither_lookahead(Dither* dither, float[:, :, ::1] palette_cam16, float[:, :, ::1] palette_rgb,
-        float[:, :, ::1] image_rgb, int x, int y, int lookahead, unsigned char last_pixels,
-        int x_res, float[:,::1] rgb_to_cam16ucs, unsigned char palette_depth) nogil:
-    cdef int candidate_pixels, i, j
+@cython.cdivision(True)
+cdef int dither_lookahead(Dither* dither, unsigned char palette_depth, float[:, :, ::1] palette_cam16,
+        float[:, :, ::1] palette_rgb, float[:, :, ::1] image_rgb, int x, int y, unsigned char last_pixels,
+        int x_res, float[:,::1] rgb_to_cam16ucs, Context context) nogil:
+    cdef int candidate, next_pixels, i, j
    cdef float[3] quant_error
    cdef int best
    cdef float best_error = 2**31-1
    cdef float total_error
-    cdef unsigned char next_pixels
+    cdef unsigned char current_pixels
    cdef int phase
-    cdef float[::1] lah_cam16ucs
+    cdef common.float3 lah_cam16ucs
+    cdef float[3] cam

    # Don't bother dithering past the lookahead horizon or edge of screen.
-    cdef int xxr = min(x + lookahead, x_res)
+    cdef int xxr = min(x + context.pixel_lookahead, x_res)

    cdef int lah_shape1 = xxr - x
    cdef int lah_shape2 = 3
+    # TODO: try again with memoryview - does it actually have overhead here?
    cdef float *lah_image_rgb = <float *> malloc(lah_shape1 * lah_shape2 * sizeof(float))

    # For each 2**lookahead possibilities for the on/off state of the next lookahead pixels, apply error diffusion
@ -116,34 +155,45 @@ cdef int dither_lookahead(Dither* dither, float[:, :, ::1] palette_cam16, float[
    # given pixel (dependent on the state already chosen for pixels to the left), we need to look beyond local minima.
    # i.e. it might be better to make a sub-optimal choice for this pixel if it allows access to much better pixel
    # colours at later positions.
-    for candidate_pixels in range(1 << lookahead):
+    for candidate in range(1 << context.bit_lookahead):
        # Working copy of input pixels
        for i in range(xxr - x):
            for j in range(3):
                lah_image_rgb[i * lah_shape2 + j] = image_rgb[y, x+i, j]

        total_error = 0
+
+        if context.is_hgr:
+            # A HGR screen byte controls 14 or 15 screen pixels
+            next_pixels = compute_fat_pixels(candidate, last_pixels)
+        else:
+            # DHGR pixels are 1:1 with memory bits
+            next_pixels = candidate
+
        # Apply dithering to lookahead horizon or edge of screen
        for i in range(xxr - x):
            xl = dither_bounds_xl(dither, i)
            xr = dither_bounds_xr(dither, xxr - x, i)
-            phase = (x + i) % 4
+            phase = (x + i + context.phase_shift) % 4

-            next_pixels = shift_pixel_window(
-                    last_pixels, next_pixels=candidate_pixels, shift_right_by=i+1, window_width=palette_depth)
+            current_pixels = shift_pixel_window(
+                    last_pixels, next_pixels=next_pixels, shift_right_by=i+1, window_width=palette_depth)

            # We don't update the input at position x (since we've already chosen fixed outputs), but we do propagate
            # quantization errors to positions >x  so we can compensate for how good/bad these choices were.  i.e. the
-            # next_pixels choices are fixed, but we can still distribute quantization error from having made these
+            # current_pixels choices are fixed, but we can still distribute quantization error from having made these
            # choices, in order to compute the total error.
            for j in range(3):
-                quant_error[j] = lah_image_rgb[i * lah_shape2 + j] - palette_rgb[next_pixels, phase, j]
+                quant_error[j] = lah_image_rgb[i * lah_shape2 + j] - palette_rgb[current_pixels, phase, j]
            apply_one_line(dither, xl, xr, i, lah_image_rgb, lah_shape2, quant_error)

-            lah_cam16ucs = convert_rgb_to_cam16ucs(
+            # Accumulate error distance from pixel colour to target colour in CAM16UCS colour space
+            lah_cam16ucs = common.convert_rgb_to_cam16ucs(
                rgb_to_cam16ucs, lah_image_rgb[i*lah_shape2], lah_image_rgb[i*lah_shape2+1],
                lah_image_rgb[i*lah_shape2+2])
-            total_error += colour_distance_squared(lah_cam16ucs, palette_cam16[next_pixels, phase])
+            for j in range(3):
+                cam[j] = palette_cam16[current_pixels, phase, j]
+            total_error += common.colour_distance_squared(lah_cam16ucs.data, cam)

            if total_error >= best_error:
                # No need to continue
@ -151,25 +201,12 @@ cdef int dither_lookahead(Dither* dither, float[:, :, ::1] palette_cam16, float[

        if total_error < best_error:
            best_error = total_error
-            best = candidate_pixels
+            best = candidate

    free(lah_image_rgb)
    return best


-@cython.boundscheck(False)
-@cython.wraparound(False)
-cdef inline float[::1] convert_rgb_to_cam16ucs(float[:, ::1] rgb_to_cam16ucs, float r, float g, float b) nogil:
-    cdef int rgb_24bit = (<int>(r*255) << 16) + (<int>(g*255) << 8) + <int>(b*255)
-    return rgb_to_cam16ucs[rgb_24bit]
-
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-cdef inline float colour_distance_squared(float[::1] colour1, float[::1] colour2) nogil:
-    return (colour1[0] - colour2[0]) ** 2 + (colour1[1] - colour2[1]) ** 2 + (colour1[2] - colour2[2]) ** 2
-
-
 # Perform error diffusion to a single image row.
 #
 # Args:
@ -181,8 +218,8 @@ cdef inline float colour_distance_squared(float[::1] colour1, float[::1] colour2
 #     image_shape1: horizontal dimension of image
 #     quant_error: RGB quantization error to be diffused
 #
-cdef void apply_one_line(Dither* dither, int xl, int xr, int x, float[] image, int image_shape1,
-        float[] quant_error) nogil:
+cdef inline void apply_one_line(Dither* dither, int xl, int xr, int x, float[] image, int image_shape1,
+        float[] quant_error) noexcept nogil:

    cdef int i, j
    cdef float error_fraction
@ -190,7 +227,7 @@ cdef void apply_one_line(Dither* dither, int xl, int xr, int x, float[] image, i
    for i in range(xl, xr):
        error_fraction = dither.pattern[i - x + dither.x_origin]
        for j in range(3):
-            image[i * image_shape1 + j] = clip(image[i * image_shape1 + j] + error_fraction * quant_error[j], 0, 1)
+            image[i * image_shape1 + j] = common.clip(image[i * image_shape1 + j] + error_fraction * quant_error[j], 0, 1)


 # Perform error diffusion across multiple image rows.
@ -204,9 +241,7 @@ cdef void apply_one_line(Dither* dither, int xl, int xr, int x, float[] image, i
 #     image: RGB pixel data, to be mutated
 #     quant_error: RGB quantization error to be diffused
 #
-@cython.boundscheck(False)
-@cython.wraparound(False)
-cdef void apply(Dither* dither, int x_res, int y_res, int x, int y, float[:,:,::1] image, float[] quant_error) nogil:
+cdef void apply(Dither* dither, int x_res, int y_res, int x, int y, float[:,:,::1] image, float[] quant_error) noexcept nogil:

    cdef int i, j, k

@ -220,11 +255,9 @@ cdef void apply(Dither* dither, int x_res, int y_res, int x, int y, float[:,:,::
        for j in range(xl, xr):
            error_fraction = dither.pattern[(i - y) * dither.x_shape + j - x + dither.x_origin]
            for k in range(3):
-                image[i,j,k] = clip(image[i,j,k] + error_fraction * quant_error[k], 0, 1)
+                image[i,j,k] = common.clip(image[i,j,k] + error_fraction * quant_error[k], 0, 1)


-@cython.boundscheck(False)
-@cython.wraparound(False)
 cdef image_nbit_to_bitmap(
    (unsigned char)[:, ::1] image_nbit, unsigned int x_res, unsigned int y_res, unsigned char palette_depth):
    cdef unsigned int x, y
@ -247,16 +280,14 @@ cdef image_nbit_to_bitmap(
 #
 # Returns: tuple of n-bit output image array and RGB output image array
 #
-@cython.boundscheck(False)
-@cython.wraparound(False)
+@cython.cdivision(True)
 def dither_image(
-        screen, float[:, :, ::1] image_rgb, dither, int lookahead, unsigned char verbose, float[:,::1] rgb_to_cam16ucs):
+        screen, float[:, :, ::1] image_rgb, dither, int lookahead, unsigned char verbose, float[:, ::1] rgb_to_cam16ucs):
    cdef int y, x
    cdef unsigned char i, j, pixels_nbit, phase
-    # cdef float[3] input_pixel_rgb
    cdef float[3] quant_error
    cdef unsigned char output_pixel_nbit
-    cdef unsigned char best_next_pixels
+    cdef unsigned int next_pixels
    cdef float[3] output_pixel_rgb

    # Hoist some python attribute accesses into C variables for efficient access during the main loop
@ -294,22 +325,52 @@ def dither_image(
    # dot positions are used to determine the colour of a given pixel.
    cdef (unsigned char)[:, ::1] image_nbit = np.empty((image_rgb.shape[0], image_rgb.shape[1]), dtype=np.uint8)

+    cdef Context context
+    if screen.MODE == screen_py.Mode.HI_RES:
+        context.is_hgr = 1
+        context.bit_lookahead = 8
+        context.pixel_lookahead = 15
+        # HGR and DHGR have a timing phase shift which rotates the effective mappings from screen dots to colours
+        context.phase_shift = 3
+    else:
+        context.is_hgr = 0
+        context.bit_lookahead = lookahead
+        context.pixel_lookahead = lookahead
+        context.phase_shift = 0
+
+    cdef (unsigned char)[:, ::1] linear_bytemap = np.zeros((192, 40), dtype=np.uint8)
+
+    # After performing lookahead, move ahead this many pixels at once.
+    cdef int apply_batch_size
+    if context.is_hgr:
+        # For HGR we have to apply an entire screen byte at a time, which controls 14 or 15 pixels (see
+        # compute_fat_pixels above).  This is because the high bit shifts this entire group of 14 pixels at once,
+        # so we have to make a single decision about whether or not to enable it.
+        apply_batch_size = 14
+    else:
+        # For DHGR we can choose each pixel state independently, so we get better results if we apply one pixel at
+        # a time.
+        apply_batch_size = 1
+
    for y in range(yres):
        if verbose:
            print("%d/%d" % (y, yres))
        output_pixel_nbit = 0
        for x in range(xres):
-            # Compute all possible 2**N choices of n-bit pixel colours for positions x .. x + lookahead
-            # lookahead_palette_choices_nbit = lookahead_options(lookahead, output_pixel_nbit)
-            # Apply error diffusion for each of these 2**N choices, and compute which produces the closest match
-            # to the source image over the succeeding N pixels
-            best_next_pixels = dither_lookahead(
-                    &cdither, palette_cam16, palette_rgb, image_rgb, x, y, lookahead, output_pixel_nbit, xres,
-                    rgb_to_cam16ucs, palette_depth)
+            if x % apply_batch_size == 0:
+                # Compute all possible 2**N choices of n-bit pixel colours for positions x .. x + lookahead
+                # Apply error diffusion for each of these 2**N choices, and compute which produces the closest match
+                # to the source image over the succeeding N pixels
+                next_pixels = dither_lookahead(
+                        &cdither, palette_depth, palette_cam16, palette_rgb, image_rgb, x, y, output_pixel_nbit, xres,
+                        rgb_to_cam16ucs, context)
+                if context.is_hgr:
+                    linear_bytemap[y, x // 14] = next_pixels
+                    next_pixels = compute_fat_pixels(next_pixels, output_pixel_nbit)
+
            # Apply best choice for next 1 pixel
            output_pixel_nbit = shift_pixel_window(
-                    output_pixel_nbit, best_next_pixels, shift_right_by=1, window_width=palette_depth)
-
+                    output_pixel_nbit, next_pixels, shift_right_by=x % apply_batch_size + 1, window_width=palette_depth)
            # Apply error diffusion from chosen output pixel value
            for i in range(3):
                output_pixel_rgb[i] = palette_rgb[output_pixel_nbit, x % 4, i]
@ -322,4 +383,4 @@ def dither_image(
                image_rgb[y, x, i] = output_pixel_rgb[i]

    free(cdither.pattern)
-    return image_nbit_to_bitmap(image_nbit, xres, yres, palette_depth)
+    return image_nbit_to_bitmap(image_nbit, xres, yres, palette_depth), linear_bytemap
--- a/dither_pattern.py
+++ b/dither_pattern.py
@ -7,11 +7,14 @@ class DitherPattern:
    PATTERN = None
    ORIGIN = None

+    def __init__(self, error_fraction=1.0):
+        self.PATTERN *= error_fraction
+

 class NoDither(DitherPattern):
    """No dithering."""
    PATTERN = np.array(((0, 0), (0, 0)),
-                       dtype=np.float32).reshape(2, 2) / np.float(16)
+                       dtype=np.float32).reshape(2, 2) / np.float32(16)
    ORIGIN = (0, 1)


@ -20,7 +23,7 @@ class FloydSteinbergDither(DitherPattern):
    # 0 * 7
    # 3 5 1
    PATTERN = np.array(((0, 0, 7), (3, 5, 1)),
-                       dtype=np.float32).reshape(2, 3) / np.float(16)
+                       dtype=np.float32).reshape(2, 3) / np.float32(16)
    ORIGIN = (0, 1)


@ -31,7 +34,7 @@ class FloydSteinbergDither2(DitherPattern):
    PATTERN = np.array(
        ((0, 0, 0, 0, 0, 7),
         (3, 5, 1, 0, 0, 0)),
-        dtype=np.float32).reshape(2, 6) / np.float(16)
+        dtype=np.float32).reshape(2, 6) / np.float32(16)
    ORIGIN = (0, 2)


@ -84,7 +87,7 @@ PATTERNS = {
    'buckels': BuckelsDither,
    'jarvis': JarvisDither,
    'jarvis-mod': JarvisModifiedDither,
-    'none': NoDither
+    'none': NoDither,
 }

 DEFAULT_PATTERN = 'floyd'
--- a/dither_shr.pyx
+++ b/dither_shr.pyx
@ -0,0 +1,431 @@
+# cython: infer_types=True
+# cython: profile=False
+# cython: boundscheck=False
+# cython: wraparound=False
+
+cimport cython
+import colour
+import numpy as np
+
+cimport common
+
+
+def dither_shr_perfect(
+        float[:, :, ::1] input_rgb, float[:, ::1] full_palette_cam, float[:, ::1] full_palette_rgb,
+        float[:,::1] rgb_to_cam16ucs):
+    cdef int y, x, idx, best_colour_idx, i, j
+    cdef double best_distance, distance, total_image_error
+    cdef float[::1] best_colour_rgb
+    cdef float quant_error
+    cdef float[:, ::1] palette_rgb, palette_cam
+
+    cdef float[:, :, ::1] working_image = np.copy(input_rgb)
+    cdef float[:, ::1] line_cam = np.zeros((320, 3), dtype=np.float32)
+
+    cdef int palette_size = full_palette_rgb.shape[0]
+
+    cdef float decay = 0.5
+    cdef int floyd_steinberg = 1
+
+    cdef common.float3 cam, pixel_cam
+
+    total_image_error = 0.0
+    for y in range(200):
+        for x in range(320):
+            cam = common.convert_rgb_to_cam16ucs(
+                rgb_to_cam16ucs, working_image[y,x,0], working_image[y,x,1], working_image[y,x,2])
+            for j in range(3):
+                line_cam[x, j] = cam.data[j]
+
+        for x in range(320):
+            pixel_cam = common.convert_rgb_to_cam16ucs(
+                rgb_to_cam16ucs, working_image[y, x, 0], working_image[y, x, 1], working_image[y, x, 2])
+
+            best_distance = 1e9
+            best_colour_idx = -1
+            for idx in range(palette_size):
+                for j in range(3):
+                    cam.data[j] = full_palette_cam[idx,j]
+                distance = common.colour_distance_squared(pixel_cam.data, cam.data)
+                if distance < best_distance:
+                    best_distance = distance
+                    best_colour_idx = idx
+            best_colour_rgb = full_palette_rgb[best_colour_idx, :]
+            total_image_error += best_distance
+
+            for i in range(3):
+                quant_error = working_image[y, x, i] - best_colour_rgb[i]
+
+                working_image[y, x, i] = best_colour_rgb[i]
+                if floyd_steinberg:
+                    # Floyd-Steinberg dither
+                    # 0 * 7
+                    # 3 5 1
+                    if x < 319:
+                        working_image[y, x + 1, i] = common.clip(
+                            working_image[y, x + 1, i] + quant_error * (7 / 16), 0, 1)
+                    if y < 199:
+                        if x > 0:
+                            working_image[y + 1, x - 1, i] = common.clip(
+                                working_image[y + 1, x - 1, i] + decay * quant_error * (3 / 16), 0, 1)
+                        working_image[y + 1, x, i] = common.clip(
+                            working_image[y + 1, x, i] + decay * quant_error * (5 / 16), 0, 1)
+                        if x < 319:
+                            working_image[y + 1, x + 1, i] = common.clip(
+                                working_image[y + 1, x + 1, i] + decay * quant_error * (1 / 16), 0, 1)
+                else:
+                    # Jarvis
+                    # 0 0 X 7 5
+                    # 3 5 7 5 3
+                    # 1 3 5 3 1
+                    if x < 319:
+                        working_image[y, x + 1, i] = common.clip(
+                            working_image[y, x + 1, i] + quant_error * (7 / 48), 0, 1)
+                    if x < 318:
+                        working_image[y, x + 2, i] = common.clip(
+                            working_image[y, x + 2, i] + quant_error * (5 / 48), 0, 1)
+                    if y < 199:
+                        if x > 1:
+                            working_image[y + 1, x - 2, i] = common.clip(
+                                working_image[y + 1, x - 2, i] + decay * quant_error * (3 / 48), 0,
+                                1)
+                        if x > 0:
+                            working_image[y + 1, x - 1, i] = common.clip(
+                                working_image[y + 1, x - 1, i] + decay * quant_error * (5 / 48), 0,
+                                1)
+                        working_image[y + 1, x, i] = common.clip(
+                            working_image[y + 1, x, i] + decay * quant_error * (7 / 48), 0, 1)
+                        if x < 319:
+                            working_image[y + 1, x + 1, i] = common.clip(
+                                working_image[y + 1, x + 1, i] + decay * quant_error * (5 / 48),
+                                0, 1)
+                        if x < 318:
+                            working_image[y + 1, x + 2, i] = common.clip(
+                                working_image[y + 1, x + 2, i] + decay * quant_error * (3 / 48),
+                                0, 1)
+                    if y < 198:
+                        if x > 1:
+                            working_image[y + 2, x - 2, i] = common.clip(
+                                working_image[y + 2, x - 2, i] + decay * decay * quant_error * (1 / 48), 0,
+                                1)
+                        if x > 0:
+                            working_image[y + 2, x - 1, i] = common.clip(
+                                working_image[y + 2, x - 1, i] + decay * decay * quant_error * (3 / 48), 0,
+                                1)
+                        working_image[y + 2, x, i] = common.clip(
+                            working_image[y + 2, x, i] + decay * decay * quant_error * (5 / 48), 0, 1)
+                        if x < 319:
+                            working_image[y + 2, x + 1, i] = common.clip(
+                                working_image[y + 2, x + 1, i] + decay * decay * quant_error * (3 / 48),
+                                0, 1)
+                        if x < 318:
+                            working_image[y + 2, x + 2, i] = common.clip(
+                                working_image[y + 2, x + 2, i] + decay * decay * quant_error * (1 / 48),
+                                0, 1)
+
+    return total_image_error, working_image
+
+
+def dither_shr(
+        float[:, :, ::1] input_rgb, float[:, :, ::1] palettes_cam, float[:, :, ::1] palettes_rgb,
+        float[:,::1] rgb_to_cam16ucs):
+    cdef int y, x, idx, best_colour_idx, best_palette, i, j
+    cdef double best_distance, distance, total_image_error
+    cdef float[::1] best_colour_rgb
+    cdef float quant_error
+    cdef float[:, ::1] palette_rgb, palette_cam
+
+    cdef (unsigned char)[:, ::1] output_4bit = np.zeros((200, 320), dtype=np.uint8)
+    cdef float[:, :, ::1] working_image = np.copy(input_rgb)
+    cdef float[:, ::1] line_cam = np.zeros((320, 3), dtype=np.float32)
+
+    cdef int[::1] line_to_palette = np.zeros(200, dtype=np.int32)
+    cdef double[::1] palette_line_errors = np.zeros(200, dtype=np.float64)
+    cdef PaletteSelection palette_line
+
+    cdef float decay = 0.5
+    cdef int floyd_steinberg = 1
+
+    cdef common.float3 pixel_cam, cam
+
+    best_palette = -1
+    total_image_error = 0.0
+    for y in range(200):
+        for x in range(320):
+            pixel_cam = common.convert_rgb_to_cam16ucs(
+                rgb_to_cam16ucs, working_image[y,x,0], working_image[y,x,1], working_image[y,x,2])
+            for j in range(3):
+                line_cam[x, j] = pixel_cam.data[j]
+
+        palette_line = best_palette_for_line(line_cam, palettes_cam, best_palette)
+        best_palette = palette_line.palette_idx
+        palette_line_errors[y] = palette_line.total_error
+
+        palette_rgb = palettes_rgb[best_palette, :, :]
+        palette_cam = palettes_cam[best_palette, :, :]
+        line_to_palette[y] = best_palette
+
+        for x in range(320):
+            pixel_cam = common.convert_rgb_to_cam16ucs(
+                rgb_to_cam16ucs, working_image[y, x, 0], working_image[y, x, 1], working_image[y, x, 2])
+
+            best_distance = 1e9
+            best_colour_idx = -1
+            for idx in range(16):
+                for j in range(3):
+                    cam.data[j] = palette_cam[idx, j]
+                distance = common.colour_distance_squared(pixel_cam.data, cam.data)
+                if distance < best_distance:
+                    best_distance = distance
+                    best_colour_idx = idx
+            best_colour_rgb = palette_rgb[best_colour_idx]
+            output_4bit[y, x] = best_colour_idx
+            total_image_error += best_distance
+
+            for i in range(3):
+                quant_error = working_image[y, x, i] - best_colour_rgb[i]
+
+                working_image[y, x, i] = best_colour_rgb[i]
+                if floyd_steinberg:
+                    # Floyd-Steinberg dither
+                    # 0 * 7
+                    # 3 5 1
+                    if x < 319:
+                        working_image[y, x + 1, i] = common.clip(
+                            working_image[y, x + 1, i] + quant_error * (7 / 16), 0, 1)
+                    if y < 199:
+                        if x > 0:
+                            working_image[y + 1, x - 1, i] = common.clip(
+                                working_image[y + 1, x - 1, i] + decay * quant_error * (3 / 16), 0, 1)
+                        working_image[y + 1, x, i] = common.clip(
+                            working_image[y + 1, x, i] + decay * quant_error * (5 / 16), 0, 1)
+                        if x < 319:
+                            working_image[y + 1, x + 1, i] = common.clip(
+                                working_image[y + 1, x + 1, i] + decay * quant_error * (1 / 16), 0, 1)
+                else:
+                    # Jarvis
+                    # 0 0 X 7 5
+                    # 3 5 7 5 3
+                    # 1 3 5 3 1
+                    if x < 319:
+                        working_image[y, x + 1, i] = common.clip(
+                            working_image[y, x + 1, i] + quant_error * (7 / 48), 0, 1)
+                    if x < 318:
+                        working_image[y, x + 2, i] = common.clip(
+                            working_image[y, x + 2, i] + quant_error * (5 / 48), 0, 1)
+                    if y < 199:
+                        if x > 1:
+                            working_image[y + 1, x - 2, i] = common.clip(
+                                working_image[y + 1, x - 2, i] + decay * quant_error * (3 / 48), 0,
+                                1)
+                        if x > 0:
+                            working_image[y + 1, x - 1, i] = common.clip(
+                                working_image[y + 1, x - 1, i] + decay * quant_error * (5 / 48), 0,
+                                1)
+                        working_image[y + 1, x, i] = common.clip(
+                            working_image[y + 1, x, i] + decay * quant_error * (7 / 48), 0, 1)
+                        if x < 319:
+                            working_image[y + 1, x + 1, i] = common.clip(
+                                working_image[y + 1, x + 1, i] + decay * quant_error * (5 / 48),
+                                0, 1)
+                        if x < 318:
+                            working_image[y + 1, x + 2, i] = common.clip(
+                                working_image[y + 1, x + 2, i] + decay * quant_error * (3 / 48),
+                                0, 1)
+                    if y < 198:
+                        if x > 1:
+                            working_image[y + 2, x - 2, i] = common.clip(
+                                working_image[y + 2, x - 2, i] + decay * decay * quant_error * (1 / 48), 0,
+                                1)
+                        if x > 0:
+                            working_image[y + 2, x - 1, i] = common.clip(
+                                working_image[y + 2, x - 1, i] + decay * decay * quant_error * (3 / 48), 0,
+                                1)
+                        working_image[y + 2, x, i] = common.clip(
+                            working_image[y + 2, x, i] + decay * decay * quant_error * (5 / 48), 0, 1)
+                        if x < 319:
+                            working_image[y + 2, x + 1, i] = common.clip(
+                                working_image[y + 2, x + 1, i] + decay * decay * quant_error * (3 / 48),
+                                0, 1)
+                        if x < 318:
+                            working_image[y + 2, x + 2, i] = common.clip(
+                                working_image[y + 2, x + 2, i] + decay * decay * quant_error * (1 / 48),
+                                0, 1)
+
+    return (
+        np.array(output_4bit, dtype=np.uint8), line_to_palette, total_image_error,
+        np.array(palette_line_errors, dtype=np.float64)
+    )
+
+
+cdef struct PaletteSelection:
+    int palette_idx
+    double total_error
+
+
+cdef PaletteSelection best_palette_for_line(
+    float [:, ::1] line_cam, float[:, :, ::1] palettes_cam, int last_palette_idx) nogil:
+    cdef int palette_idx, best_palette_idx, palette_entry_idx, pixel_idx
+    cdef double best_total_dist, total_dist, best_pixel_dist, pixel_dist
+    cdef float[:, ::1] palette_cam
+    cdef common.float3 pixel_cam, cam
+    cdef int j
+
+    best_total_dist = 1e9
+    best_palette_idx = -1
+    cdef int line_size = line_cam.shape[0]
+    for palette_idx in range(16):
+        palette_cam = palettes_cam[palette_idx, :, :]
+        total_dist = 0
+        for pixel_idx in range(line_size):
+            for j in range(3):
+                pixel_cam.data[j] = line_cam[pixel_idx, j]
+            best_pixel_dist = 1e9
+            for palette_entry_idx in range(16):
+                for j in range(3):
+                    cam.data[j] = palette_cam[palette_entry_idx, j]
+                pixel_dist = common.colour_distance_squared(pixel_cam.data, cam.data)
+                if pixel_dist < best_pixel_dist:
+                    best_pixel_dist = pixel_dist
+            total_dist += best_pixel_dist
+        if total_dist < best_total_dist:
+            best_total_dist = total_dist
+            best_palette_idx = palette_idx
+
+    cdef PaletteSelection res
+    res.palette_idx = best_palette_idx
+    res.total_error = best_total_dist
+    return res
+
+
+cdef common.float3 _convert_rgb12_iigs_to_cam(float [:, ::1] rgb12_iigs_to_cam16ucs, (unsigned char)[::1] point_rgb12) nogil:
+    cdef int rgb12 = (point_rgb12[0] << 8) | (point_rgb12[1] << 4) | point_rgb12[2]
+    cdef int i
+    cdef common.float3 res
+    for i in range(3):
+        res.data[i] = rgb12_iigs_to_cam16ucs[rgb12, i]
+    return res
+
+
+# Wrapper around _convert_rgb12_iigs_to_cam to allow calling from python while retaining fast path for cython calls.
+def convert_rgb12_iigs_to_cam(float [:, ::1] rgb12_iigs_to_cam16ucs, (unsigned char)[::1] point_rgb12) -> float[::1]:
+    cdef common.float3 cam = _convert_rgb12_iigs_to_cam(rgb12_iigs_to_cam16ucs, point_rgb12)
+    cdef int i
+    cdef float[::1] res = np.empty((3), dtype=np.float32)
+    for i in range(3):
+        res[i] = cam.data[i]
+    return res
+
+
+
+@cython.cdivision(True)
+cdef float[:, ::1] linear_to_srgb_array(float[:, ::1] a, float gamma=2.4):
+    cdef int i, j
+    cdef float[:, ::1] res = np.empty_like(a, dtype=np.float32)
+    for i in range(res.shape[0]):
+        for j in range(3):
+            if a[i, j] <= 0.0031308:
+                res[i, j] = a[i, j] * 12.92
+            else:
+                res[i, j] = 1.055 * a[i, j] ** (1.0 / gamma) - 0.055
+    return res
+
+
+# TODO: optimize
+cdef (unsigned char)[:, ::1] _convert_cam16ucs_to_rgb12_iigs(float[:, ::1] point_cam):
+    cdef float[:, ::1] rgb
+    cdef (float)[:, ::1] rgb12_iigs
+
+    # Convert CAM16UCS input to RGB.  Even though this dynamically constructs a path on the graph of colour conversions
+    # every time, in practise this seems to have a negligible overhead compared to the actual conversion functions.
+    with colour.utilities.suppress_warnings(python_warnings=True):
+        rgb = colour.convert(point_cam, "CAM16UCS", "RGB").astype(np.float32)
+
+    # TODO: precompute this conversion matrix since it's static.  This accounts for about 10% of the CPU time here.
+    rgb12_iigs = np.ascontiguousarray(
+        np.clip(
+            # Convert to Rec.601 R'G'B'
+            colour.YCbCr_to_RGB(
+                # Gamma correct and convert Rec.709 R'G'B' to YCbCr
+                colour.RGB_to_YCbCr(
+                    linear_to_srgb_array(rgb), K=colour.WEIGHTS_YCBCR['ITU-R BT.709']),
+                K=colour.WEIGHTS_YCBCR['ITU-R BT.601']), 0, 1)
+        ).astype(np.float32) * 15
+    return np.round(rgb12_iigs).astype(np.uint8)
+
+# Wrapper around _convert_cam16ucs_to_rgb12_iigs to allow calling from python while retaining fast path for cython
+# calls.
+def convert_cam16ucs_to_rgb12_iigs(float[:, ::1] point_cam):
+    return _convert_cam16ucs_to_rgb12_iigs(point_cam)
+
+
+@cython.cdivision(True)
+def k_means_with_fixed_centroids(
+    int n_clusters, int n_fixed, float[:, ::1] samples, (unsigned char)[:, ::1] initial_centroids, int max_iterations,
+    float [:, ::1] rgb12_iigs_to_cam16ucs):
+
+    cdef double error, best_error, total_error, last_total_error
+    cdef int centroid_idx, closest_centroid_idx, i, point_idx
+
+    cdef (unsigned char)[:, ::1] centroids_rgb12 = np.copy(initial_centroids)
+    cdef (unsigned char)[:, ::1] new_centroids_rgb12
+
+    cdef common.float3 point_cam
+    cdef float[:, ::1] new_centroids_cam = np.empty((n_clusters - n_fixed, 3), dtype=np.float32)
+    cdef float[:, ::1] centroid_cam_sample_positions_total
+    cdef int[::1] centroid_sample_counts
+
+    last_total_error = 1e9
+    for iteration in range(max_iterations):
+        total_error = 0.0
+        centroid_cam_sample_positions_total = np.zeros((16, 3), dtype=np.float32)
+        centroid_sample_counts = np.zeros(16, dtype=np.int32)
+
+        # For each sample, associate it to the closest centroid.  We want to compute the mean of all associated samples
+        # but we do this by accumulating the (coordinate vector) total and number of associated samples.
+        #
+        # Centroid positions are tracked in 4-bit //gs RGB colour space with distances measured in CAM16UCS colour
+        # space.
+        for point_idx in range(samples.shape[0]):
+            for j in range(3):
+                point_cam.data[j] = samples[point_idx, j]
+            best_error = 1e9
+            closest_centroid_idx = 0
+            for centroid_idx in range(n_clusters):
+                error = common.colour_distance_squared(
+                    _convert_rgb12_iigs_to_cam(rgb12_iigs_to_cam16ucs, centroids_rgb12[centroid_idx, :]).data,
+                    point_cam.data)
+                if error < best_error:
+                    best_error = error
+                    closest_centroid_idx = centroid_idx
+            for i in range(3):
+                centroid_cam_sample_positions_total[closest_centroid_idx, i] += point_cam.data[i]
+            centroid_sample_counts[closest_centroid_idx] += 1
+            total_error += best_error
+
+        # Since the allowed centroid positions are discrete (and not uniformly spaced in CAM16UCS colour space), we
+        # can't rely on measuring total centroid movement as a termination condition.  e.g. sometimes the nearest
+        # available point to an intended next centroid position will increase the total distance, or centroids may
+        # oscillate between two neighbouring positions.  Instead, we terminate when the total error stops decreasing.
+        if total_error >= last_total_error:
+            break
+        last_total_error = total_error
+
+        # Compute new centroid positions in CAM16UCS colour space
+        for centroid_idx in range(n_fixed, n_clusters):
+            if centroid_sample_counts[centroid_idx]:
+                for i in range(3):
+                    new_centroids_cam[centroid_idx - n_fixed, i] = (
+                        centroid_cam_sample_positions_total[centroid_idx, i] / centroid_sample_counts[centroid_idx])
+
+        # Convert all new centroids back to //gb RGB colour space (done as a single matrix since
+        # _convert_cam16ucs_to_rgb12_iigs has nontrivial overhead)
+        new_centroids_rgb12 = _convert_cam16ucs_to_rgb12_iigs(new_centroids_cam)
+
+        # Update positions for non-fixed centroids
+        for centroid_idx in range(n_clusters - n_fixed):
+            for i in range(3):
+                if centroids_rgb12[centroid_idx + n_fixed, i] != new_centroids_rgb12[centroid_idx, i]:
+                    centroids_rgb12[centroid_idx + n_fixed, i] = new_centroids_rgb12[centroid_idx, i]
+
+    return centroids_rgb12
--- a/docs/dhr.md
+++ b/docs/dhr.md
@ -0,0 +1,266 @@
+# Double Hi-Res image conversion
+
+## Some background on Apple II Double Hi-Res graphics
+
+Like other (pre-//gs) Apple II graphics modes, Double Hi-Res relies on [NTSC Artifact Colour](https://en.wikipedia.org/wiki/Composite_artifact_colors), which means that the colour of a pixel is entirely determined by its horizontal position on the screen, and the on/off status of preceding horizontal pixels.
+
+In Double Hi-Res mode, the 560 horizontal pixels per line are individually addressable.  This is an improvement over the (single) Hi-Res mode, which also has 560 horizontal pixels, but which can only be addressed in groups of two (with an option to shift blocks of 7 pixels each by one dot).  See _Assembly Lines: The Complete Book_ (Wagner) for a detailed introduction to this, or _Understanding the Apple IIe_ (Sather) for a deep technical discussion.
+
+Double Hi-Res is usually characterized as being capable of producing 16 display colours, but with heavy restrictions on how these colours can be arranged horizontally.
+
+### Naive model: 140x192x16
+
+One simple model for Double Hi-Res graphics is to only treat the display in groups of 4 horizontal pixels, which gives an effective resolution of 140x192 in 16 colours (=2^4).  These 140 pixel colours can be chosen independently, which makes this model easy to think about and to work with (e.g. when creating images by hand).  However the resulting images will exhibit (sometimes severe) colour interference/fringing effects when two colours are next to one another, because the underlying hardware does not actually work this way.  See below for an example image conversion, showing the unwanted colour fringing that results. 
+
+### Simplest realistic model: 560 pixels, 4-pixel colour
+
+A more complete model for thinking about DHGR comes from looking at how the NTSC signal produces colour on the display.
+The [NTSC chrominance subcarrier](https://en.wikipedia.org/wiki/Chrominance_subcarrier) completes one complete phase cycle in the time taken to draw 4 horizontal dots.  The colours produced are due to the interactions of the pixel luminosity (on/off) relative to this NTSC chroma phase.
+
+What this means is that the colour of each of the 560 horizontal pixels is determined by the current pixel value (on/off), the current X-coordinate modulo 4 (X coordinate relative to NTSC phase), as well as the on-off status of the pixels to the left of it.
+
+The simplest approximation is to only look at the current pixel value and the 3 pixels to the left, i.e. to consider a sliding window of 4 horizontal pixels moving across the screen from left to right.  Within this window, we have one pixel for each of the 4 values of NTSC phase (x % 4, ranging from 0 .. 3).  The on-off values for these 4 values of NTSC phase determine the colour of the pixel.  See [here](https://docs.google.com/presentation/d/1_eqBknG-4-llQw3oAOmPO3FlawUeWCeRPYpr_mh2iRU/edit) for more details.
+
+This model allows us to understand and predict the interference behaviour when two "140px" colours are next to each other, and to go beyond this "140px" model to take more advantage of the true 560px horizontal resolution.
+
+If we imagine drawing pixels from left to right across the screen, at each pixel we only have *two* accessible choices of colour: those resulting from turning the current pixel on, or off.  Which two particular colours are produced are determined by the pixels already drawn to the left (the immediate 3 neighbours, in our model).  One of these possibilities will always be the same as the pixel colour to the left (the on/off pixel choice corresponding to the value that just "fell off the left side" of the sliding window), and the other choice is some other colour from our palette of 16.
+
+This can be summarized in a chart, showing the possible colour transitions depending on the colour of the pixel to the immediate left, and the value of x%4.
+
+![Double hi-res colour transitions](Double_Hi-Res_colour_transitions.png)
+
+So, if we want to transition from one colour to a particular new colour, it may take up to 4 horizontal pixels before we are able to achieve it (e.g. transitioning all the way from black (0000) to white (1111)).  In the meantime we have to transition through up to 2 other colours.  Depending on the details of the image we are aiming for, this may either produce unwanted visual noise, or can actually be beneficial (e.g. if the colour we want is available immediately at the next pixel)
+
+These constraints are difficult to work with when constructing DHGR graphics "by hand", but we can account for them programmatically in our image conversion to take full advantage of the "true" 560px resolution while accounting for colour interference effects.
+
+#### Limitations of this colour model
+
+In practise the above description of the Apple II colour model is still only an approximation.  On real hardware, the video signal is a continuous analogue signal, and colour is continuously modulated rather than producing discretely-coloured pixels with fixed colour values.
+
+More importantly, in an NTSC video signal the colour (chroma) signal has a lower bandwidth than the luma (brightness) signal ([Chroma sub-sampling](https://en.wikipedia.org/wiki/Chroma_subsampling)), which means that colours will tend to bleed across more than 4 pixels.  However our simple "4-pixel chroma bleed" model already produces good results, and exactly matches the implementation behaviour of some emulators, e.g. Virtual II.
+
+### NTSC emulation and 8-pixel colour
+
+By simulating the NTSC (Y'UV) signal directly we are able to recover the Apple II colour output from "first principles".  Here are the 16 "basic" DHGR colours, obtained using saturation/hue parameters tuned to match OpenEmulator's NTSC implementation, and allowing chroma to bleed across 4 pixels.
+
+![NTSC colours with 4 pixel chroma bleed](ntsc-colours-chroma-bleed-4.png)
+
+However in real NTSC, chroma bleeds over more than 4 pixels, which means that we actually have more than 2^4 colours available to work with.
+
+This means that **when viewed on a composite colour display, Double Hi-Res graphics is not just a 16-colour graphics mode!**
+
+If we allow the NTSC chroma signal to bleed over 8 pixels instead of 4, then the resulting colour is determined by sequences of 8 pixels instead of 4 pixels, i.e. there are 2^8 = 256 possibilities.  In practise many of these result in the same output colour, and (with this approximation) there are only 85 unique colours available.  However this is still a marked improvement on the 16 "basic" DHGR colours:
+
+![NTSC colours with 8 pixel chroma bleed](ntsc-colours-chroma-bleed-8.png)
+
+The "extra" DHGR colours are only available on real hardware, or an emulator that implements NTSC chroma sub-sampling (such as OpenEmulator).   But the result is that on such targets a much larger range of colours is available for use in image conversion.  However the restriction still exists that any given pixel only has a choice of 2 colours available (as determined by the on/off state of pixels to the left).
+
+In practise this gives much better image quality, especially when shading areas of similar colour.  The Apple II is still unable to directly modulate the luma (brightness) NTSC signal component, so areas of low or high brightness still tend to be heavily dithered.  This is because there are more bit sequences that have the number of '1' bits close to the average than there are at the extremes, so there are correspondingly few available colours that are very bright or very dark.
+
+These 85 unique double hi-res colours produced by the ][-pix NTSC emulation are not the definitive story - though they're closer to it than the usual story that double hi-res is a 16-colour graphics mode.  The implementation used by ][-pix is the simplest one: the Y'UV signal is averaged with a sliding window of 4 pixels for the Y' (luma) component and 8 pixels for the UV (chroma) component.
+
+The choice of 8 pixels is not strictly correct - e.g. the chroma bandwidth (~0.6MHz) is much less than half of luma bandwidth (~2Mhz) so the signal bleeds over more than twice as many pixels; but also decays in a more complex way than the simple step function sliding window chosen here.  In practise using 8 pixels is a good compromise between ease of implementation, runtime performance and fidelity.
+
+By contrast, OpenEmulator uses a more complex (and realistic) band-pass filtering to produce its colour output, which presumably allows even more possible colours (physical hardware will also produce its own unique results, depending on the hardware implementation of the signal decoding, and other physical characteristics).  I expect that most of these will be small variations on the above though; and in practise the ][-pix NTSC implementation already produces a close colour match for the OpenEmulator behaviour.
+
+#### Examples of NTSC images
+
+(Source: [Reinhold Möller](https://commons.wikimedia.org/wiki/File:Nymphaea_caerulea-20091014-RM-115245.jpg), [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0), via Wikimedia Commons)
+
+![Nymphaea](../examples/dhr/nymphaea-original.png)
+
+OpenEmulator screenshot of image produced with `--palette=openemulator --lookahead=8`.  The distorted background colour compared to the original is particularly noticeable.
+
+![Nymphaea](../examples/dhr/nymphaea-iipix-openemulator-openemulator.png)
+
+OpenEmulator screenshot of image produced with `--palette=ntsc --lookahead=8`.  Not only is the background colour a much better match, the image shading and detail is markedly improved.
+
+![Nymphaea](../examples/dhr/nymphaea-iipix-ntsc-openemulator.png)
+
+Rendering the same .dhr image with 4-pixel colour shows the reason for the difference.  For example the background shading is due to pixel sequences that appear (with this simpler and less hardware-accurate rendering scheme) as sequences of grey and dark green, with a lot of blue and red sprinkled in.  In NTSC these pixel sequences combine to produce various shades of green.
+
+![Nymphaea](../examples/dhr/nymphaea-iipix-ntsc-preview-openemulator.png)
+
+# Dithering and Double Hi-Res
+
+[Dithering](https://en.wikipedia.org/wiki/Dither) an image to produce an approximation with fewer image colours is a well-known technique.  The basic idea is to pick a "best colour match" for a pixel from our limited palette, then to compute the difference between the true and selected colour values and diffuse this error to nearby pixels (using some pattern).
+
+In the particular case of DHGR this algorithm runs into difficulties, because each pixel only has two possible colour choices (from a total of 16+).  If we only consider the two possibilities for the immediate next pixel then neither may be a particularly good match.  However it may be more beneficial to make a suboptimal choice now (deliberately introduce more error), if it allows us access to a better colour for a subsequent pixel.  "Classical" dithering algorithms do not account for these palette constraints, and produce suboptimal image quality for DHGR conversions. 
+
+We can deal with this by looking ahead N pixels (8 by default) for each image position (x,y), and computing the effect of choosing all 2^N combinations of these N-pixel states on the dithered source image.
+
+Specifically, for a fixed choice of one of these N pixel sequences, we tentatively perform the error diffusion as normal on a copy of the image, and compute the total mean squared distance from the (fixed) N-pixel sequence to the error-diffused source image.  To compute the perceptual difference between colours we convert to the perceptually uniform [CAM16-UCS](https://en.wikipedia.org/wiki/Color_appearance_model#CAM16) colour space in which perceptual distance is Euclidean. 
+
+Finally, we pick the N-pixel sequence with the lowest total error, and select the first pixel of this N-pixel sequence for position (x,y).  We then perform error diffusion as usual for this single pixel, and proceed to x+1.
+
+This allows us to "look beyond" local minima to find cases where it is better to make a suboptimal choice now to allow better overall image quality in subsequent pixels.  Since we will sometimes find that our choice of 2 next-pixel colours actually includes (or comes close to) the "ideal" choice, this means we can take maximal advantage of the 560-pixel horizontal resolution.
+
+## Gamma correction
+
+Most digital images are encoded using the [sRGB colour space](https://en.wikipedia.org/wiki/SRGB), which means that the stored RGB values do not map linearly onto the rendered colour intensities.  In order to work with linearized RGB values the source image needs to be gamma corrected.  Otherwise, the process of dithering an un-gamma-corrected image tends to result in an output that does not match the brightness of the input.  In particular shadows and highlights tend to get blown out/over-exposed.
+
+## Dither pattern
+
+The process of (error-diffusion) dithering involves distributing the "quantization error" (mismatch between the colour of the source image and chosen output pixels) across neighbouring pixels, according to some pattern.  [Floyd-Steinberg](https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering) and [Jarvis-Judice-Ninke](https://en.wikipedia.org/wiki/Error_diffusion#minimized_average_error) ("Jarvis") are two common patterns, though there are many others, which have slightly different characteristics.
+
+Since it uses a small dither pattern, Floyd-Steinberg dithering retains more of the image detail than larger kernels.  On the other hand, it sometimes produces image artifacts that are highly structured (e.g. runs of a single colour, checkerboard patterns).  This seems to be especially common with 4-pixel colours.
+
+In part this may be because these "classical" dither patterns only propagate errors to a small number of neighbouring pixels, e.g. 1 pixels in the forward direction for Floyd-Steinberg, and 2 pixels for Jarvis.  However for double hi-res colours we know that it might take up to 4 pixels before a given colour can be selected for output (e.g. to alternate between black and white, or any other pairs that are 4 steps away on the transition chart above).
+
+In other words, given the results of error diffusion from our current pixel, there is one colour from our palette of 16 that is the best one to match this - but it might be only possible to render this particular colour up to 4 pixels further on.  If we only diffuse the errors by 1 or 2 pixels each time, it will tend to have diffused away by the time we reach that position, and the opportunity will be lost.  Combined with the small overall set of available colours this can result in image artifacts.
+
+Modifying the Jarvis dither pattern to extend 4 pixels in the forward direction seems to give much better results for such images (e.g. when dithering large blocks of colour), although at the cost of reduced detail.  This is presumably because we allow each quantization error to diffuse to each of the 4 subsequent pixels that might be best-placed to act on it.
+
+The bottom line is that choice of `--dither` argument is a tradeoff between image detail and handling of colour.  If the default `--dither=floyd` algorithm does not give pleasing results, try other patterns such as `--dither=jarvis-mod`.
+
+Further experimentation with other dithering patterns (and similar modifications to the above) may also produce interesting results.
+
+## Palettes
+
+Since the Apple II graphics (prior to //gs) are not based on RGB colour, we have to choose an (approximate) RGB colour palette when dithering an RGB image.  There is no "true" choice for this palette, since it depends heavily on how the image is viewed:
+
+1.  Different emulators have made (often quite different) choices for the RGB colour palettes used to emulate Apple II graphics on a RGB display.  This means that an image that looks good on one emulator may not look good on another (or on real hardware).
+    - For example, Virtual II (and the Apple //gs) uses two different RGB shades of grey for the two DHGR grey colours, whereas they are rendered identically in NTSC.  That means that images not targeted for the Virtual II palette will look quite different when viewed there (and vice versa).
+
+2.  Secondly, the actual display colours rendered by an Apple II are not fixed, but bleed into each other due to the behaviour of the (analogue) NTSC video signal.  i.e. the entire notion of a "16-colour RGB palette" is a flawed one.  Furthermore, the NTSC colours depend on the particular monitor/TV and its tuning (brightness/contrast/hue settings etc).  "Never Twice the Same Colour" indeed.   The "4-pixel colour" model described above where we can assign 2 from 16 fixed colours to each of 560 discrete pixels is only an approximation (though a useful one in practise).
+
+Some emulators emulate the NTSC video signal more faithfully (e.g. OpenEmulator), in which case they do not have a true "RGB palette".  The best we can do here is measure the colours that are produced by large blocks of colour, i.e. where there is no colour blending.  Others use some discrete approximation (e.g. Virtual II seems to exactly match the colour model described above), so a fixed palette can be reconstructed.
+
+To compute the emulator palettes used by ][-pix I measured the sRGB colour values produced by a full-screen Apple II colour image (using the colour picker tool of Mac OS X), using default emulator settings.  I have not yet attempted to measure/estimate palettes of other emulators, or "real hardware"
+
+Existing conversion tools (see below) tend to support a variety of RGB palette values sourced from various places (older tools, emulators, theoretical estimations etc).  In practise, these only matter in a few ways:
+
+1.  If you are trying to target colour balance as accurately as possible for a particular viewing target (e.g. emulator), i.e. so that the rendered colour output looks as close as possible to the source image.
+
+2.  If you are targeting an emulator that has a "non-standard" colour model, e.g. Virtual II with its two distinct shades of grey.
+
+3.  Otherwise, choices of palette effectively amount to changing the colour balance of the source image.  Some of these might produce better image quality for a particular image  (e.g. if the source image contains large colour blocks that are difficult to approximate with a particular target palette), at the cost of changing the colour balance.  i.e. it might look good on its own but not match the source image.  You could also achieve similar results by tweaking the colour balance of the source image in an editor, e.g GIMP or Photoshop.
+
+## Precomputing distance matrix
+
+The mapping from RGB colour space to CAM16-UCS is quite complex, so to avoid this runtime cost we precompute a matrix from all 256^3 integer RGB values to corresponding CAM16-UCS values. This 192MB matrix is generated by the `precompute_conversion.py` utility, and is loaded at runtime for efficient access.
+
+# Comparison to other DHGR image converters
+
+## bmp2dhr
+
+*  [bmp2dhr](http://www.appleoldies.ca/bmp2dhr/) (see [here](https://github.com/digarok/b2d) for a maintained code fork) supports additional graphics modes not yet supported by ][-pix, namely (double) lo-res, and hi-res.  Support for the lores modes would be easy to add to ][-pix, although hi-res requires more work to accommodate the colour model.  A similar lookahead strategy will likely work well though.
+
+*  supports additional image dither modes
+
+*  only supports BMP source images in a particular format.
+
+*  DHGR conversions are treated as simple 140x192x16 colour images without colour constraints, and ignores the colour fringing behaviour described above.  The generated .bmp preview images also do not show fringing, but it is present when viewing the image on an Apple II or emulator that accounts for it.  i.e. the preview images are sometimes not very representative of the actual results.  See below for an example.
+
+*  Apart from ignoring DHGR colour interactions, the 140px converted images are also lower than ideal resolution since they do not make use of the ability to address all 560px independently.
+
+*  The perceptual colour distance metric used to match the best colour to an input pixel is a custom metric based on a weighted sum of Euclidean sRGB distance and Rec.601 luma value.  It's not explained why this particular metric was chosen, and in practise it seems to often give much lower quality results than modern perceptually uniform colour spaces like CIE2000 or CAM16-UCS (though these are much slower to compute - which is why we precompute the conversion matrix ahead of time)
+
+* It does not perform RGB colour space conversions before dithering, i.e. if the input image is in sRGB colour space (as most digital images will be) then the dithering is also performed in sRGB.  Since sRGB is not a linear colour space, the effect of dithering is to distribute errors non-linearly, which distorts the brightness of the resulting image.
+
+## a2bestpix 
+
+*  Like ][-pix, [a2bestpix](http://lukazi.blogspot.com/2017/03/double-high-resolution-graphics-dhgr.html) only supports DHGR conversion.  Overall quality is usually fairly good, although colours and brightness are slightly distorted (for reasons described below), and the generated preview images do not quite give a faithful representation of the native image rendering.
+
+*  Like ][-pix, and unlike bmp2dhr, a2bestpix does apply a model of the DHGR colour interactions, albeit an ad-hoc one based on rules and tables of 4-pixel "colour blocks" reconstructed from (AppleWin) emulator behaviour.  This does allow it to make use of (closer to) full 560px resolution, although it still treats the screen as a sequence of 140 4-pixel colour blocks (with some constraints on the allowed arrangement of these blocks).
+
+*  supports additional (custom) dither modes (partly out of necessity due to the custom "colour block" model)
+
+*  Supports a variety of perceptual colour distance metrics including CIE2000 and the one bmp2dhr uses.  In practise I'm not sure the others are useful since CIE2000 is the more recent refinement of much research on this topic, and is the most accurate of them.
+
+* like bmp2dhr, only supports BMP source images in a particular format.
+
+*  Does not apply gamma correction before dithering (though sRGB conversion is done when computing CIE2000 distance), so errors are diffused non-linearly.  The resulting images don't match the brightness of the original, e.g. shadows/highlights tend to be over-exposed.
+
+*  image conversion performs an optimization over groups of multiple pixels (via choice of "colour blocks").  From what I can tell this minimizes the total colour distance from a fixed list of colour blocks to a group of 4 target pixels, similar to --lookahead=4 for ][-pix (though I'm not sure it's evaluating all 2^4 pixel combinations).  But since the image is (AFAICT) treated as a sequence of (non-overlapping) 4-pixel blocks this does not result in optimizing each output pixel independently.
+
+*  The list of "colour blocks" seem to contain colour sequences that cannot actually be rendered on the Apple II.  For example compare the spacing of yellow and orange pixels on the parrot between the preview image (LHS) and openemulator (RHS): 
+
+![Detail of a2bestpix preview image](a2bestbix-preview-crop.png)
+![Detail of openemulator render](a2bestpix-openemulator-crop.png)
+
+*  See below for another example where the output has major image discrepancies with the original - perhaps also due to bugs/omissions in the table of colour blocks.
+
+*  This means that (like bmp2dhr) the generated "preview" image may not closely match the native image, and the dithering algorithm is also optimizing over a slightly incorrect set of colour sequences, which presumably impacts image quality.  Possibly these are transcription errors, or artifacts of the particular emulator (AppleWin) from which they were reconstructed.
+
+## Image comparisons
+
+These three images were converted using the same target (openemulator) palette, using ][-pix, bmp2dhr and a2bestpix (since this is supported by all three), and are shown as screenshots from openemulator.
+
+### Original
+![original source image](../examples/dhr/paperclips-original.png)
+
+(Source: [Purple Sherbet Photography from Worldwide!](https://commons.wikimedia.org/wiki/File:Colourful_assortment_of_paper_clips_(10421946796).jpg), [CC BY 2.0](https://creativecommons.org/licenses/by/2.0), via Wikimedia Commons)
+
+The following images were all generated with a palette approximating OpenEmulator's colours (`--palette=openemulator` for ][-pix)
+
+### ][-pix 4-pixel colour
+
+Preview image and OpenEmulator screenshot
+
+![ii-pix preview](../examples/dhr/paperclips-iipix-openemulator-preview.png)
+![ii-pix screenshot](../examples/dhr/paperclips-iipix-openemulator-openemulator.png)
+
+### ][-pix NTSC 8-pixel colour (Preview image)
+
+Preview image and OpenEmulator screenshot
+
+![ii-pix preview](../examples/dhr/paperclips-iipix-ntsc-preview.png)
+![ii-pix screenshot](../examples/dhr/paperclips-iipix-ntsc-openemulator.png)
+
+### bmp2dhr (OpenEmulator screenshot)
+![bmp2dhr screenshot](../examples/dhr/paperclips-bmp2dhr-openemulator.png)
+
+Comparing bmp2dhr under openemulator is the scenario most favourable to it, since the 140px resolution and non-treatment of fringing is masked by the chroma blending.  Colours are similar to ][-pix, but the 140px dithering and lack of gamma correction results in less detail, e.g. in hilights/shadows.
+
+### a2bestpix (OpenEmulator screenshot)
+
+![a2bestpix screenshot](../examples/dhr/paperclips-a2bestpix-openemulator.png)
+
+This a2bestpix image is actually atypical in quality, and shows some severe colour errors relating to the pixels that should be close to the orange/brown colours.  These may be due to errors/omissions in the set of "colour blocks".  The effects of not gamma-correcting the source image can also be seen.
+
+## NTSC artifacts
+
+The difference in treatment of NTSC artifacts is much more visible when using an emulator that doesn't perform chroma subsampling, e.g. Virtual II.  i.e. it displays the full 560-pixel colour image without blending.
+
+### Original
+
+![original source image](../examples/dhr/groundhog-original.png)
+
+(Source: [Cephas](https://commons.wikimedia.org/wiki/File:Marmota_monax_UL_04.jpg), [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0), via Wikimedia Commons)
+
+The following images were generated with a palette matching the one used by Virtual II  (`--palette=virtualii` for ][-pix)
+
+### ][-pix
+
+![original source image](../examples/dhr/groundhog-original.png)
+![ii-pix preview](../examples/dhr/groundhog-iipix-virtualii-preview.png)
+
+### bmp2dhr
+
+![original source image](../examples/dhr/groundhog-original.png)
+![ii-pix screenshot](../examples/dhr/groundhog-bmp2dhr-virtualii.png)
+
+The image is heavily impacted by colour fringing, which bmp2dhr does not account for at all.  The difference in brightness of the groundhog's flank is also because bmp2dhr does not gamma-correct the image, so shadows/highlights tend to get blown out.
+
+### bmp2dhr (OpenEmulator)
+
+![original source image](../examples/dhr/groundhog-original.png)
+![ii-pix screenshot](../examples/dhr/groundhog-bmp2dhr-openemulator.png)
+
+This bmp2dhr image was generated using a palette approximating OpenEmulator's colours (`--palette=openemulator` for ][-pix), i.e. not the same image files as above.
+On OpenEmulator, which simulates NTSC chroma sub-sampling, the fringing is not pronounced but changes the colour balance of the image, e.g. creates a greenish tinge.
+
+### ][-pix, 4-pixel colour (OpenEmulator)
+
+![original source image](../examples/dhr/groundhog-original.png)
+![ii-pix screenshot](../examples/dhr/groundhog-iipix-openemulator-openemulator.png)
+
+Colour balance here is also slightly distorted due to not fully accounting for chroma blending.
+
+### ][-pix, NTSC 8-pixel colour (OpenEmulator)
+
+![original source image](../examples/dhr/groundhog-original.png)
+![ii-pix screenshot](../examples/dhr/groundhog-iipix-ntsc-openemulator.png)
+
+Detail and colour balance is much improved.
--- a/examples/dhr/autumn-iipix-ntsc-openemulator.png
+++ b/examples/dhr/autumn-iipix-ntsc-openemulator.png
--- a/examples/dhr/autumn-iipix-ntsc-preview.png
+++ b/examples/dhr/autumn-iipix-ntsc-preview.png
--- a/examples/dhr/autumn-iipix-ntsc.dhr
+++ b/examples/dhr/autumn-iipix-ntsc.dhr
--- a/examples/dhr/autumn-iipix-openemulator-openemulator.png
+++ b/examples/dhr/autumn-iipix-openemulator-openemulator.png
--- a/examples/dhr/autumn-iipix-openemulator-preview.png
+++ b/examples/dhr/autumn-iipix-openemulator-preview.png
--- a/examples/dhr/autumn-iipix-openemulator.dhr
+++ b/examples/dhr/autumn-iipix-openemulator.dhr
--- a/examples/dhr/autumn-iipix-virtualii-preview.png
+++ b/examples/dhr/autumn-iipix-virtualii-preview.png
--- a/examples/dhr/autumn-iipix-virtualii.dhr
+++ b/examples/dhr/autumn-iipix-virtualii.dhr
--- a/examples/dhr/autumn-original.png
+++ b/examples/dhr/autumn-original.png
--- a/examples/dhr/blue-frog-iipix-ntsc-openemulator.png
+++ b/examples/dhr/blue-frog-iipix-ntsc-openemulator.png
--- a/examples/dhr/blue-frog-iipix-ntsc-preview.png
+++ b/examples/dhr/blue-frog-iipix-ntsc-preview.png
--- a/examples/dhr/blue-frog-iipix-ntsc.dhr
+++ b/examples/dhr/blue-frog-iipix-ntsc.dhr
--- a/examples/dhr/blue-frog-iipix-openemulator-openemulator.png
+++ b/examples/dhr/blue-frog-iipix-openemulator-openemulator.png
--- a/examples/dhr/blue-frog-iipix-openemulator-preview.png
+++ b/examples/dhr/blue-frog-iipix-openemulator-preview.png
--- a/examples/dhr/blue-frog-iipix-openemulator.dhr
+++ b/examples/dhr/blue-frog-iipix-openemulator.dhr
--- a/examples/dhr/blue-frog-iipix-virtualii-preview.png
+++ b/examples/dhr/blue-frog-iipix-virtualii-preview.png
--- a/examples/dhr/blue-frog-iipix-virtualii.dhr
+++ b/examples/dhr/blue-frog-iipix-virtualii.dhr
--- a/examples/dhr/blue-frog-original.png
+++ b/examples/dhr/blue-frog-original.png
--- a/examples/dhr/examples.po
+++ b/examples/dhr/examples.po
--- a/examples/dhr/fish-iipix-ntsc-openemulator.png
+++ b/examples/dhr/fish-iipix-ntsc-openemulator.png
--- a/examples/dhr/fish-iipix-ntsc-preview.png
+++ b/examples/dhr/fish-iipix-ntsc-preview.png
--- a/examples/dhr/fish-iipix-ntsc.dhr
+++ b/examples/dhr/fish-iipix-ntsc.dhr
--- a/examples/dhr/fish-iipix-openemulator-openemulator.png
+++ b/examples/dhr/fish-iipix-openemulator-openemulator.png
--- a/examples/dhr/fish-iipix-openemulator-preview.png
+++ b/examples/dhr/fish-iipix-openemulator-preview.png
--- a/examples/dhr/fish-iipix-openemulator.dhr
+++ b/examples/dhr/fish-iipix-openemulator.dhr
--- a/examples/dhr/fish-iipix-virtualii-preview.png
+++ b/examples/dhr/fish-iipix-virtualii-preview.png
--- a/examples/dhr/fish-iipix-virtualii.dhr
+++ b/examples/dhr/fish-iipix-virtualii.dhr
--- a/examples/dhr/fish-original.png
+++ b/examples/dhr/fish-original.png
--- a/examples/dhr/gallery.md
+++ b/examples/dhr/gallery.md
@ -1,6 +1,6 @@
-# Gallery of images
+# Gallery of Double Hi-Res images

-Here are some more images converted with ][-pix.
+Here are some more Double Hi-Res images converted with ][-pix.

 * (top-left) original image
 * (top-right) image converted for display in Virtual II emulator (converted with `--palette virtualii --lookahead 8 --dither jarvis-mod`)
--- a/examples/dhr/groundhog-bmp2dhr-openemulator.png
+++ b/examples/dhr/groundhog-bmp2dhr-openemulator.png
--- a/examples/dhr/groundhog-bmp2dhr-virtualii.png
+++ b/examples/dhr/groundhog-bmp2dhr-virtualii.png
--- a/examples/dhr/groundhog-iipix-ntsc-openemulator.png
+++ b/examples/dhr/groundhog-iipix-ntsc-openemulator.png
--- a/examples/dhr/groundhog-iipix-ntsc-preview.png
+++ b/examples/dhr/groundhog-iipix-ntsc-preview.png
--- a/examples/dhr/groundhog-iipix-ntsc.dhr
+++ b/examples/dhr/groundhog-iipix-ntsc.dhr
--- a/examples/dhr/groundhog-iipix-openemulator-openemulator.png
+++ b/examples/dhr/groundhog-iipix-openemulator-openemulator.png
--- a/examples/dhr/groundhog-iipix-openemulator-preview.png
+++ b/examples/dhr/groundhog-iipix-openemulator-preview.png
--- a/examples/dhr/groundhog-iipix-openemulator.dhr
+++ b/examples/dhr/groundhog-iipix-openemulator.dhr
--- a/examples/dhr/groundhog-iipix-virtualii-preview.png
+++ b/examples/dhr/groundhog-iipix-virtualii-preview.png
--- a/examples/dhr/groundhog-iipix-virtualii.dhr
+++ b/examples/dhr/groundhog-iipix-virtualii.dhr
--- a/examples/dhr/groundhog-iipix-virtualii.png
+++ b/examples/dhr/groundhog-iipix-virtualii.png
--- a/examples/dhr/groundhog-original.png
+++ b/examples/dhr/groundhog-original.png
--- a/examples/dhr/horse-iipix-ntsc-openemulator.png
+++ b/examples/dhr/horse-iipix-ntsc-openemulator.png
--- a/examples/dhr/horse-iipix-ntsc-preview.png
+++ b/examples/dhr/horse-iipix-ntsc-preview.png
--- a/examples/dhr/horse-iipix-ntsc.dhr
+++ b/examples/dhr/horse-iipix-ntsc.dhr
--- a/examples/dhr/horse-iipix-openemulator-openemulator.png
+++ b/examples/dhr/horse-iipix-openemulator-openemulator.png
--- a/examples/dhr/horse-iipix-openemulator-preview.png
+++ b/examples/dhr/horse-iipix-openemulator-preview.png
--- a/examples/dhr/horse-iipix-openemulator.dhr
+++ b/examples/dhr/horse-iipix-openemulator.dhr
--- a/examples/dhr/horse-iipix-virtualii-preview.png
+++ b/examples/dhr/horse-iipix-virtualii-preview.png
--- a/examples/dhr/horse-iipix-virtualii.dhr
+++ b/examples/dhr/horse-iipix-virtualii.dhr
--- a/examples/dhr/horse-original.png
+++ b/examples/dhr/horse-original.png
--- a/examples/dhr/kris-iipix-openemulator.png
+++ b/examples/dhr/kris-iipix-openemulator.png
--- a/examples/dhr/kris-preview.png
+++ b/examples/dhr/kris-preview.png
--- a/examples/dhr/kris.dhr
+++ b/examples/dhr/kris.dhr
--- a/examples/dhr/macaque-iipix-ntsc-openemulator.png
+++ b/examples/dhr/macaque-iipix-ntsc-openemulator.png
--- a/examples/dhr/macaque-iipix-ntsc-preview.png
+++ b/examples/dhr/macaque-iipix-ntsc-preview.png
--- a/examples/dhr/macaque-iipix-ntsc.dhr
+++ b/examples/dhr/macaque-iipix-ntsc.dhr
--- a/examples/dhr/macaque-iipix-openemulator-openemulator.png
+++ b/examples/dhr/macaque-iipix-openemulator-openemulator.png
--- a/examples/dhr/macaque-iipix-openemulator-preview.png
+++ b/examples/dhr/macaque-iipix-openemulator-preview.png
--- a/examples/dhr/macaque-iipix-openemulator.dhr
+++ b/examples/dhr/macaque-iipix-openemulator.dhr
--- a/examples/dhr/macaque-iipix-virtualii-preview.png
+++ b/examples/dhr/macaque-iipix-virtualii-preview.png
--- a/examples/dhr/macaque-iipix-virtualii.dhr
+++ b/examples/dhr/macaque-iipix-virtualii.dhr
--- a/examples/dhr/macaque-original.png
+++ b/examples/dhr/macaque-original.png
--- a/examples/dhr/mockingbird-iipix-ntsc-openemulator.png
+++ b/examples/dhr/mockingbird-iipix-ntsc-openemulator.png
--- a/examples/dhr/mockingbird-iipix-ntsc-preview.png
+++ b/examples/dhr/mockingbird-iipix-ntsc-preview.png
--- a/examples/dhr/mockingbird-iipix-ntsc.dhr
+++ b/examples/dhr/mockingbird-iipix-ntsc.dhr
--- a/examples/dhr/mockingbird-iipix-openemulator-openemulator.png
+++ b/examples/dhr/mockingbird-iipix-openemulator-openemulator.png
--- a/examples/dhr/mockingbird-iipix-openemulator-preview.png
+++ b/examples/dhr/mockingbird-iipix-openemulator-preview.png
--- a/examples/dhr/mockingbird-iipix-openemulator.dhr
+++ b/examples/dhr/mockingbird-iipix-openemulator.dhr
--- a/examples/dhr/mockingbird-iipix-virtualii-preview.png
+++ b/examples/dhr/mockingbird-iipix-virtualii-preview.png
--- a/examples/dhr/mockingbird-iipix-virtualii.dhr
+++ b/examples/dhr/mockingbird-iipix-virtualii.dhr
--- a/examples/dhr/mockingbird-original.png
+++ b/examples/dhr/mockingbird-original.png
--- a/examples/dhr/nymphaea-iipix-ntsc-openemulator.png
+++ b/examples/dhr/nymphaea-iipix-ntsc-openemulator.png
--- a/examples/dhr/nymphaea-iipix-ntsc-preview-openemulator.png
+++ b/examples/dhr/nymphaea-iipix-ntsc-preview-openemulator.png
--- a/examples/dhr/nymphaea-iipix-ntsc-preview.png
+++ b/examples/dhr/nymphaea-iipix-ntsc-preview.png
--- a/examples/dhr/nymphaea-iipix-ntsc.dhr
+++ b/examples/dhr/nymphaea-iipix-ntsc.dhr
--- a/examples/dhr/nymphaea-iipix-openemulator-openemulator.png
+++ b/examples/dhr/nymphaea-iipix-openemulator-openemulator.png
--- a/examples/dhr/nymphaea-iipix-openemulator-preview.png
+++ b/examples/dhr/nymphaea-iipix-openemulator-preview.png
--- a/examples/dhr/nymphaea-iipix-openemulator.dhr
+++ b/examples/dhr/nymphaea-iipix-openemulator.dhr
--- a/examples/dhr/nymphaea-iipix-virtualii-preview.png
+++ b/examples/dhr/nymphaea-iipix-virtualii-preview.png
--- a/examples/dhr/nymphaea-iipix-virtualii.dhr
+++ b/examples/dhr/nymphaea-iipix-virtualii.dhr
--- a/examples/dhr/nymphaea-original.png
+++ b/examples/dhr/nymphaea-original.png
--- a/examples/dhr/paperclips-a2bestpix-openemulator.png
+++ b/examples/dhr/paperclips-a2bestpix-openemulator.png
--- a/examples/dhr/paperclips-bmp2dhr-openemulator.png
+++ b/examples/dhr/paperclips-bmp2dhr-openemulator.png
--- a/examples/dhr/paperclips-iipix-ntsc-openemulator.png
+++ b/examples/dhr/paperclips-iipix-ntsc-openemulator.png
--- a/examples/dhr/paperclips-iipix-ntsc-preview.png
+++ b/examples/dhr/paperclips-iipix-ntsc-preview.png
--- a/examples/dhr/paperclips-iipix-ntsc.dhr
+++ b/examples/dhr/paperclips-iipix-ntsc.dhr
--- a/examples/dhr/paperclips-iipix-openemulator-openemulator.png
+++ b/examples/dhr/paperclips-iipix-openemulator-openemulator.png
--- a/examples/dhr/paperclips-iipix-openemulator-preview.png
+++ b/examples/dhr/paperclips-iipix-openemulator-preview.png
--- a/examples/dhr/paperclips-iipix-openemulator.dhr
+++ b/examples/dhr/paperclips-iipix-openemulator.dhr
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
KrisKennaway	f55e878b8e	Mark nogil functions as noexcept to avoid warning from newer cython (#15 )	2023-10-29 14:43:55 +00:00
KrisKennaway	69a96b4719	Optimize palette initialization and NTSC image conversion (#13 ) About 2x faster end-to-end with default dhr conversion options	2023-02-26 00:00:39 +00:00
KrisKennaway	f8fbd768a5	Optimize dither_dhr.dither_image performance by about 2x (#12 ) - avoid passing around a float[::1] memoryview across function barriers, this seems to require reference counting which has a large overhead - inline some functions - C division - float instead of double	2023-02-25 21:21:43 +00:00
kris	7a4e27e0da	Bump version in header	2023-02-03 00:42:11 +00:00
KrisKennaway	3aa29f2d2c	Add support for hi-res conversions (#11 ) Hi-Res is essentially a more constrained version of Double Hi-Res, in which only about half of the 560 horizontal screen pixels can be independently addressed. In particular an 8 bit byte in screen memory controls 14 or 15 screen pixels. Bits 0-7 are doubled, and bit 8 shifts these 14 dots to the right if enabled. In this case bit 7 of the previous byte is repeated a third time. This means that we have to optimize all 8 bits at once and move forward in increments of 14 screen pixels. There's also a timing difference that results in a phase shift of the NTSC colour signal, which means the mappings from dot patterns to effective colours are rotated. Error diffusion seems to give best results if we only distribute about 2/3 of the quantization error according to the dither pattern.	2023-02-03 00:40:32 +00:00
KrisKennaway	6573bad509	Merge pull request #10 from KrisKennaway/fix-lookahead Fix --lookahead parsing	2023-01-31 21:32:02 +00:00
kris	0560409717	Fix --lookahead parsing	2023-01-31 21:28:40 +00:00
KrisKennaway	055851aa9c	Merge pull request #9 from KrisKennaway/create-data-dir Create data directory before writing to it	2023-01-31 21:15:03 +00:00
kris	2ab582790c	Create data directory before writing to it	2023-01-31 21:12:42 +00:00
KrisKennaway	05a3624866	Merge pull request #5 from KrisKennaway/mono Add support for DHGR mono conversions and fix compatibility with python 3.10	2023-01-21 17:32:14 +00:00
kris	39e8eac8ed	Add support for DHGR mono conversions	2023-01-21 17:30:27 +00:00
kris	629104b933	Fixes for python 3.10 and/or latest dependency versions	2023-01-21 17:29:06 +00:00
kris	7b8b6bc12b	Tweak	2022-07-19 22:41:38 +01:00
kris	7ce574dc06	Add SHR details	2022-07-19 22:39:56 +01:00
kris	2a1874face	Tweak wording	2022-07-19 22:39:11 +01:00
kris	c61bd258fd	Add sample SHR conversions	2022-07-19 22:38:59 +01:00
kris	b11b322c39	Move DHR examples to subdir in preparation for adding SHR examples	2022-07-18 23:11:23 +01:00
KrisKennaway	bce2153d97	Merge pull request #4 from KrisKennaway/shr Add support for Super Hi-Res conversions	2022-07-18 23:06:18 +01:00
kris	1468e06d2f	Tweak	2022-07-18 23:02:49 +01:00
kris	1486f8a394	Add TODO	2022-07-18 22:31:22 +01:00
kris	12d6805617	Update docs for 2.0 and split out the technical details of dhr into its own file	2022-07-18 22:30:56 +01:00
kris	e156dd3b48	Add a requirements.txt to simplify installation	2022-07-18 22:11:32 +01:00
kris	3196369b7d	Tidy a bit and add a --save-intermediate flag	2022-07-18 10:00:19 +01:00
kris	1ffb2c9110	Tidy	2022-07-18 09:59:01 +01:00
kris	8fd0ec5dc6	Set window title and clean up a bit	2022-07-16 22:13:26 +01:00
kris	e71352490d	Add comments	2022-07-16 22:00:42 +01:00
kris	99aa394196	Tweak comment	2022-07-16 22:00:14 +01:00
kris	cfef9fa3c9	Add arg type	2022-07-16 21:57:45 +01:00
kris	ccbb6980d9	Load data files relative to script path	2021-11-27 10:43:41 +00:00
kris	a2b67ba882	Require a subcommand	2021-11-26 13:36:29 +00:00
kris	4d5dea2c41	Restore dhr conversion support	2021-11-26 13:15:57 +00:00
kris	0a964b377a	Move SHR conversion out into convert_shr in preparation for re-enabling dhr support	2021-11-26 12:35:45 +00:00
kris	ae89682dab	Split out common utility functions into a shared module	2021-11-26 12:26:46 +00:00
kris	0dc2c0a7a0	Disable bounds checking and wraparound by default	2021-11-26 12:12:55 +00:00
kris	4221c00701	Split dither into dither_dhr and dither_shr	2021-11-26 12:08:48 +00:00
kris	1075ff0136	Tidy a bit and remove support for tunable parameters that are no longer needed	2021-11-26 10:36:39 +00:00
kris	cf63a35797	Cython tweaks to remove some unnecessary C code	2021-11-26 09:54:42 +00:00
kris	25e6ed7b88	Preserve palette order when deduplicating entries Also make sure we're not mutating _global_palettes, though this should currently be harmless.	2021-11-25 21:57:27 +00:00
kris	61b4cbb184	Tweak k-means convergence criterion to return once the total centroid position error stops decreasing.	2021-11-25 21:33:12 +00:00
kris	fc35387360	- Fill any palettes that have fewer than 16 unique entries after clustering, using the most frequent pixel colours that are not yet in the palette - Reassign any palettes that are duplicated after clustering	2021-11-25 13:14:22 +00:00
kris	ad50ed103d	Improvements to image quality: - Preprocess the source image by dithering with the full 12-bit //gs colour palette, ignoring SHR palette restrictions (i.e. each pixel chosen independently from 4096 colours) - Using this as the ground truth allows much better handling of e.g. solid colours, which were being dithered inconsistently with the previous approach - Also when fitting an SHR palette, fix any colours that comprise more than 10% of source pixels. This also encourages more uniformity in regions of solid colour.	2021-11-25 11:46:42 +00:00
kris	870c008827	Parametrize quantization error decay and minimum value. The latter helps with images where there are large solid colour fields that sometimes cause uneven dithering because of colours that cannot be matched with the //gs palette, but it's not a viable solution in general since it reduces overall quality (sometimes substantially, e.g. in case of vertical colour gradients)	2021-11-25 09:09:40 +00:00
kris	8b5c3dc6c1	Fix bool flags	2021-11-24 16:03:55 +00:00
kris	9a77af37aa	Add a --show-final-score to output the final image quality score. This is useful when used as part of an image repository build pipeline, to avoid replacing existing images if the new score is higher. Hide intermediate output behind --verbose	2021-11-24 15:49:56 +00:00
kris	0036ee9522	Add default values to help	2021-11-24 15:44:37 +00:00
kris	8d3ab4f50e	Add the ability to disable saving preview images. Also rename --gamma_correct to --gamma-correct for consistency	2021-11-24 15:41:32 +00:00
kris	8175dcb052	Add --fixed-colours to control how many colours will be kept identical across all 16 SHR palettes.	2021-11-24 15:27:34 +00:00
kris	5fefd0b0bb	Don't initialize pygame if --no-show-output	2021-11-24 15:24:58 +00:00
kris	e77e7abd43	Rename	2021-11-24 15:24:45 +00:00
kris	d645cc5964	Tidy	2021-11-24 15:21:50 +00:00
kris	c36de2b76b	When initializing centroids for fitting the SHR palettes, only use the reserved colours from the global palette, and pick unique random points from the samples for the rest. This encourages a larger range of colours in the resulting images and may improve quality. Iterate a max number of times without improvement in the outer loop as well. Save intermediate preview outputs.	2021-11-24 14:57:24 +00:00
kris	3b8767782b	Each run seems to converge fairly quickly but there is a lot of variation across runs. Run in a loop and keep the running best.	2021-11-24 11:47:39 +00:00
kris	de8a303de2	Initial attempt at fitting palettes to arbitrary lines instead of line ranges. Works OK but isn't converging as well as I hoped.	2021-11-24 10:41:25 +00:00
kris	50c71d3a35	Whitespace	2021-11-24 09:19:35 +00:00
kris	04fd4f7427	Move reassigning palettes back to after fitting, otherwise it does the wrong thing the first time. Fix an off by one when splitting palette ranges	2021-11-24 09:18:59 +00:00
kris	62f23ff910	Don't mutate initial_centroids	2021-11-24 09:10:03 +00:00
kris	7179d009e1	Refactor Reassign palettes before computing new ones instead of after	2021-11-23 15:09:12 +00:00
kris	e488955c23	Reorder	2021-11-23 14:58:46 +00:00
kris	0b985a66b9	Reorder and tidy	2021-11-23 14:58:09 +00:00
kris	c78f731cd7	Refactor	2021-11-23 14:55:45 +00:00
kris	0323b80e68	Refactor	2021-11-23 14:51:04 +00:00
kris	6988b19b43	Tidy	2021-11-23 14:00:57 +00:00
kris	1ce5c25764	Fix a bug where _fit_global_palette would crash if there were fewer than 16 global colours computed.	2021-11-23 13:59:48 +00:00
kris	6e52680cf1	Dynamically tune the line ranges used to fit the 16 SHR palettes: - start with an equal split - with each iteration, pick a palette and adjust its line ranges by a small random amount - if the proposed palette is accepted, continue to apply the same delta - if not, revert the adjustment and pick a different one In addition, often there will be palettes that are entirely unused by the image. For such palettes: - find the palette with the largest line range. If > 20, then subdivide this range and assign half each to both palettes - if not, then pick a random line range for the unused palette This helps to refine and explore more of the parameter space.	2021-11-23 13:01:50 +00:00
kris	189b4655ad	Since fixing the bug in the previous commit there is no longer a need to limit to neighbouring palettes (which was unaware of the dynamic line splits anyway)	2021-11-23 12:49:37 +00:00
kris	be55fb859d	- Fix a serious bug in best_palette_for_line which was not actually computing the palette with lowest per-row error, rather the lowest per-pixel error! - Tidy a bit	2021-11-23 12:46:36 +00:00
kris	b78c42e287	Fix rounding	2021-11-18 22:35:15 +00:00
kris	b1d3488182	Actually use equal-sized palette splits. With the previous version the first and last were smaller.	2021-11-18 22:27:19 +00:00
kris	9e46ca48a0	Refactor to extract palette splits in preparation for tuning them dynamically	2021-11-18 22:08:09 +00:00
kris	cfc150ed13	Remove some dead code	2021-11-18 22:03:18 +00:00
kris	c608f6b961	Optimize calling _convert_cam16ucs_to_rgb12_iigs since it has significant overhead	2021-11-18 21:50:39 +00:00
kris	3159a09c27	Uncomment	2021-11-18 20:33:21 +00:00
kris	7609297f0d	Optimize a bit	2021-11-18 17:34:27 +00:00
kris	d7969f50ba	Remove cython checks and obsolete TODO	2021-11-18 17:24:12 +00:00
kris	e53c085a91	Remove debugging prints	2021-11-17 22:55:47 +00:00
kris	ed2082344a	Working version! Quantize the k-means centroids in 12-bit //gs RGB space but continue to use CAM16-UCS for distances and updating centroid positions, before mapping back to the nearest legal 12-bit RGB position. Needs some more work to deal with the fact that now that there are discrete distances (but no fixed minimum) between allowed centroid positions, the previous notion of convergence doesn't apply. Actually the centroids can oscillate between positions. There is room for optimization but this is already reasonably performant, and the image quality is much higher \o/	2021-11-17 22:49:06 +00:00
kris	0009ce8913	- allow reserving a number of colours which are to be shared across all palettes. This will be useful for Total Replay which does an animation effect when displaying the image (first set palettes, then transition in pixels) - this requires us to go back to computing k-means ourself instead of using sklearn, since it can't keep some centroids fixed - try to be more careful about //gs RGB values, which are in the Rec.601 colour space. This isn't quite right yet - the issue seems to be that since we dither in linear RGB space but quantize in the nonlinear space, small differences may lead to a +/- 1 in the 4-bit //gs RGB value, which is quite noticeable. Instead we need to be clustering and/or dithering with awareness of the quantized palette space.	2021-11-17 17:09:42 +00:00
kris	f2f07ddc04	Refactor and add comments	2021-11-16 23:45:11 +00:00
kris	bb70eea7b0	Cleanup	2021-11-16 21:07:13 +00:00
kris	613a36909c	Suppress pygame message at startup Keep iterating until N iterations without quality improvement	2021-11-16 17:23:31 +00:00
kris	5111696d5c	Compute number of unique colours. This does not seem to strongly depend on the width of the palette sampling. Note the potential issue that since we are clustering in CAM space but then quantizing a (much coarser) 4-bit RGB value we could end up picking multiple centroids that will be represented by the same RGB value. This doesn't seem to be a major issue though (e.g. 3-4 lost colours per typical image)	2021-11-16 16:57:44 +00:00
kris	91e4fd7cba	Add comment	2021-11-16 15:50:19 +00:00
kris	83b047b73f	Whoops, fix a major bug with the iterated image fitting: we don't want to mutate our source image! Fix another bug introduced in the previous commit: convert from linear rgb before quantizing //gs RGB palette since //gs RGB values are in Rec.601 colour space. Switch to double for colour_squared_distance and related variables, not sure if it matters though. When iterating palette clustering, reject the new palettes if they would increase the total image error. This prevents accepting changes that are local improvements to one palette but which would introduce more net errors elsewhere when this palette is reused. This now seems to give monotonic improvements in image quality so no need to write out intermediate images any more.	2021-11-16 15:44:04 +00:00
kris	8694ab364e	Perform conversions in linear RGB space	2021-11-16 12:38:53 +00:00
kris	7ad560247b	Clean up	2021-11-16 12:24:43 +00:00
kris	10c829906b	Checkpoint - Repeatedly refit palettes since k-means is only a local optimization. This can produce incremental improvements in image quality but may also overfit, especially on complex images. - use pygame to render incremental images - Fix off-by-one in palette striping - When fitting palettes, first cluster a 16-colour palette for the entire image and use this to initialize the centroids for individual palettes. This improves quality when fitting images with large blocks of colour, since they will otherwise be fit separately and may have slight differences. With a global initializer these will tend to be the same. This also improves performance.	2021-11-16 11:21:53 +00:00
kris	b363d60754	Checkpoint - switch to pyclustering for kmedians - allow choosing the same palette as previous line, with a multiplicative penalty to distance in case it's much better - iterate kmedians multiple times and choose the best, since it's only a local optimum	2021-11-15 09:19:44 +00:00
kris	643e50349e	Optimize more	2021-11-13 17:29:13 +00:00
kris	0596aefe0b	Use pyclustering for kmedians instead of hand-rolled Optimize cython code	2021-11-13 17:18:34 +00:00
kris	52af982159	k-means should be using median with L1 norm, otherwise it may not converge Also optimize a tiny bit	2021-11-13 16:10:33 +00:00
kris	5cab854269	Fit palettes from overlapping line ranges, and map line to palette when dithering with two limitations: - cannot choose the same palette as the previous line (this avoids banding) - must be within +/- 1 of the "base" palette for the line number This gives pretty good results!	2021-11-11 16:10:03 +00:00
kris	ee2229d0ea	* Modify Floyd-Steinberg dithering to diffuse less error in the y direction. Otherwise, errors can accumulate in an RGB channel if there are no palette colours with an extremal value, and then when we introduce a new palette the error all suddenly discharges in a spurious horizontal line. This now gives quite good results! * Switch to using L1-norm for k-means, per suggestion of Lucas Scharenbroich: "A k-medians effectively uses an L1 distance metric instead of L2 for k-means. Using a squared distance metric causes the fit to "fall off" too quickly and allows too many of the k centroids to cluster around areas of high density, which results in many similar colors being selected. A linear cost function forces the centroids to spread out since the error influence has a broader range."	2021-11-11 11:10:22 +00:00
kris	8c34d87216	WIP - interleave 3 successive palettes for each contiguous row range. Avoids the banding but not clear if it's overall better Also implement my own k-means clustering which is able to keep some centroids fixed, e.g. to be able to retain some fixed palette entries while swapping out others. I was hoping this would improve colour blending across neighbouring palettes but it's also not clear if it does.	2021-11-10 18:30:39 +00:00
kris	322123522c	Assign scan lines randomly to palettes and cluster independently. This doesn't give good results either, since neighbouring lines end up getting similar but not identical colours, which still results in horizontal striping.	2021-11-10 00:34:17 +00:00
kris	fb52815412	Experiment with striping 16 palettes contiguously across line ranges. As expected it has clear banding. A better approach (though still not optimal) might be to assign lines to palettes randomly.	2021-11-09 22:42:27 +00:00
kris	80885aabf9	Working SHR version. Still just uses a single palette	2021-11-09 22:26:34 +00:00
kris	21058084e2	Tidy	2021-11-09 16:14:37 +00:00
kris	01b19a4a06	Use 4-bit RGB values instead of 8-bit	2021-11-09 15:35:44 +00:00
kris	a92c9cd7b5	Work in CAM16-UCS colour space and cythonize	2021-11-09 15:13:07 +00:00
kris	173c283369	First implementation of using k-means clustering in RGB space to dither a 320x200 SHR image.	2021-11-09 11:23:25 +00:00