Add a new DHGRBitmap class that efficiently represents the
DHGR interleaving of the (aux, main) MemoryMap as a sequence of
28-bit integers.
This allows for easily extracting the 8-bit and 12-bit subsequences
representing the DHGR pixels that are influenced when storing a byte
at offsets 0..3 within the interleaved (aux, main, aux, main)
sequence.
Since we have precomputed all of the pairwise differences between
these 8- and 12-bit values, this allows us to efficiently compute the
edit distances between pairs of screen bytes (and/or arrays)
(3-pixel) sequences that may be modified when storing bytes to the
DHGR display.
This relies on producing an efficient linear representation of the
DHGR framebuffer in terms of a packed 28-bit representation of (Aux,
Main, Aux, Main) screen bytes.
DHGR playback modes. Actually the only difference is whether to
initialize (D)HGR display since everything else is steered by the video
stream. 6 of these bytes are currently unused, but it is convenient
to pad to the same length as the TICK opcode so that it does not
complicate the ACK stream framing.
Move towards the convention of using upper-case labels for system use
(soft switches etc) and lower-case for program use.
- Use a safe ZP address instead of $00
- Use dec instead of hex for IP address bytes
- Remove some unused Uthernet/W5100 defines
Optimize the socket buffer management
- Since we're guaranteeing 2K frame padding, the low byte is always 0
- Remove some vestiges of the Uthernet TCP demo code - AFAICT there
isn't a need to compare high and low bytes of the S0RXRSR, this
was just being used as a (slightly risky) check that they were both
not equal (presumably to 0)
- (h/t Oliver Schmidt <ol.sc@web.de>) it turns out that the W5100
automatically wraps the address pointer at the end of the 8k RX/TX
buffer space, so since we're using 8k buffers we don't need any of the
pointer/mask arithmetic to make sure we don't stray outside this range
- Instead, we can just save the W5100 address pointer before we start
doing the stream buffer management and restore it when we're ready
to read from the stream again.
- Moreover, since we know the low-byte is 0 we don't even need to
save it.
This gives us enough free cycles to implement a keypress check. For
now any key will pause the video and any other key resume it.
We still have a whole 16 cycles left over while maintaining the 36/37
cycle tick cadence.
We've saved 73 cycles of "dead time" though, i.e. the
op_ack + CHECKRECV + op_nop "slow path" now takes 2*73 rather than 3*73
cycles. This should result in better audio quality.
- Every time we process an ACK opcode, toggle page 1/page 2 soft
switches to steer subsequent writes between MAIN and AUX memory
- while I'm here, squeeze out some unnecessary operations from the
buffer management
On the player side, this is implemented by maintaining two screen
memory maps, and alternating between opcode streams for each of them.
This is using entirely the wrong colour model for errors, but
surprisingly it already works pretty well in practise (and the frame
rate is acceptable on test videos)
DHGR/HGR could be made runtime selectable by adding a header byte that
determines whether to set the DHGR soft switches before initiating
the decode loop.
While I'm in here, fix op_terminate to clear keyboard strobe before
waiting.
- can't emit Terminate opcode in the middle of the bytestream
- pad the TCP stream to next 2k boundary when emitting terminate opcode,
since the player will block until receiving this much data.
Clean up naming in edit_distance
In video encoder, when we emit additional offsets as part of an opcode,
reinsert back into the priority heapq if the new edit distance is
nonzero, in case we get the chance to fix it up later in the frame.
Also make sure to zero out the diff_weights and content_deltas
so we don't consider the offset again as a side-effect of some other
opcode.
Instead of prioritizing side-effect offsets by their previous update
priority, prioritize by those with the lowest (error - edit) delta i.e.
not introducing too much error relative to their edit distance.
introduce an attempt at post-processing the colour artefacting that
results in coalescing adjacent '1' bits into white pixels. This is
an incomplete modeling even of this artefact, let alone the other
various fringing weirdness that happens for e.g. NTSC rendering (which
is not faithfully reproduced by Apple //GS RGB display, so hard for me
to test further)
- Commented-out experimentation with using hitherdither library to
use yliluoma dithering (see
https://bisqwit.iki.fi/story/howto/dither/jy/) instead of the
(error-diffusion) based BMP2DHR which introduces a lot of noise between
frames since it is easily perturbed.
Unfortunately apart from being extremely slow, it also doesn't give
good results, even for (simulated) DHGR palette. There's a lot of
banding and for HGR the available colours are just too far apart in
colour space.
This is even without (somehow) applying the HGR colour constraints.
- Also return the priority from _compute_error as preparation for
reinserting the offset back into the priority heap, in case we can do
a better job later. In order to do this properly we need to compute
both the error edit distance and the "true" edit distance and only
insert the priority of the latter.