- Use a safe ZP address instead of $00
- Use dec instead of hex for IP address bytes
- Remove some unused Uthernet/W5100 defines
Optimize the socket buffer management
- Since we're guaranteeing 2K frame padding, the low byte is always 0
- Remove some vestiges of the Uthernet TCP demo code - AFAICT there
isn't a need to compare high and low bytes of the S0RXRSR, this
was just being used as a (slightly risky) check that they were both
not equal (presumably to 0)
- (h/t Oliver Schmidt <ol.sc@web.de>) it turns out that the W5100
automatically wraps the address pointer at the end of the 8k RX/TX
buffer space, so since we're using 8k buffers we don't need any of the
pointer/mask arithmetic to make sure we don't stray outside this range
- Instead, we can just save the W5100 address pointer before we start
doing the stream buffer management and restore it when we're ready
to read from the stream again.
- Moreover, since we know the low-byte is 0 we don't even need to
save it.
This gives us enough free cycles to implement a keypress check. For
now any key will pause the video and any other key resume it.
We still have a whole 16 cycles left over while maintaining the 36/37
cycle tick cadence.
We've saved 73 cycles of "dead time" though, i.e. the
op_ack + CHECKRECV + op_nop "slow path" now takes 2*73 rather than 3*73
cycles. This should result in better audio quality.
- Every time we process an ACK opcode, toggle page 1/page 2 soft
switches to steer subsequent writes between MAIN and AUX memory
- while I'm here, squeeze out some unnecessary operations from the
buffer management
On the player side, this is implemented by maintaining two screen
memory maps, and alternating between opcode streams for each of them.
This is using entirely the wrong colour model for errors, but
surprisingly it already works pretty well in practise (and the frame
rate is acceptable on test videos)
DHGR/HGR could be made runtime selectable by adding a header byte that
determines whether to set the DHGR soft switches before initiating
the decode loop.
While I'm in here, fix op_terminate to clear keyboard strobe before
waiting.
- can't emit Terminate opcode in the middle of the bytestream
- pad the TCP stream to next 2k boundary when emitting terminate opcode,
since the player will block until receiving this much data.
Clean up naming in edit_distance
In video encoder, when we emit additional offsets as part of an opcode,
reinsert back into the priority heapq if the new edit distance is
nonzero, in case we get the chance to fix it up later in the frame.
Also make sure to zero out the diff_weights and content_deltas
so we don't consider the offset again as a side-effect of some other
opcode.
Instead of prioritizing side-effect offsets by their previous update
priority, prioritize by those with the lowest (error - edit) delta i.e.
not introducing too much error relative to their edit distance.
introduce an attempt at post-processing the colour artefacting that
results in coalescing adjacent '1' bits into white pixels. This is
an incomplete modeling even of this artefact, let alone the other
various fringing weirdness that happens for e.g. NTSC rendering (which
is not faithfully reproduced by Apple //GS RGB display, so hard for me
to test further)
- Commented-out experimentation with using hitherdither library to
use yliluoma dithering (see
https://bisqwit.iki.fi/story/howto/dither/jy/) instead of the
(error-diffusion) based BMP2DHR which introduces a lot of noise between
frames since it is easily perturbed.
Unfortunately apart from being extremely slow, it also doesn't give
good results, even for (simulated) DHGR palette. There's a lot of
banding and for HGR the available colours are just too far apart in
colour space.
This is even without (somehow) applying the HGR colour constraints.
- Also return the priority from _compute_error as preparation for
reinserting the offset back into the priority heap, in case we can do
a better job later. In order to do this properly we need to compute
both the error edit distance and the "true" edit distance and only
insert the priority of the latter.
- Change ACK code to perform two dummy stream reads rather than relying
on a preceding NOP to pad the TCP frame to 2K. This fixes the timing
issue that was causing most of the low-frequency ticks.
- Ticks still aren't perfectly aligned during the ACK slow path but
it's almost good enough, i.e. probably no need to actually bother
optimizing the slow path more.
of all possible weights. We encode the two 8-bit inputs into a single
16 bit value instead of dealing with an array of tuples.
Fix an important bug in _compute_delta: old and is_odd were transposed
so we weren't actually subtracting the old deltas! Surprisingly,
when I accidentally fixed this bug in the vectorized version, the video
encoding was much worse! This turned out to be because the edit
distance metric allowed reducing diffs by turning on pixels, which
meant it would tend to do this when "minimizing error" in a way that
was visually unappealing.
To remedy this, introduce a separate notion of substitution cost for
errors, and weight pixel colour changes more highly to discourage them
unless absolutely necessary. This gives very good quality results!
Also vectorize the selection of page offsets and priorities having
a negative error delta, instead of heapifying the entire page.
Also it turns out to be a bit faster to compute (and memoize) the delta
between a proposed content byte and the entire target screen at once,
since we'll end up recomputing the same content diffs multiple times.
(Committing messy version in case I want to revisit some of those
interim versions)