Commit Graph

73 Commits

Author SHA1 Message Date
kris
4c168721bb - Don't hardcode clock speed
- Commented-out experimentation with using hitherdither library to
use yliluoma dithering (see
https://bisqwit.iki.fi/story/howto/dither/jy/) instead of the
(error-diffusion) based BMP2DHR which introduces a lot of noise between
frames since it is easily perturbed.

Unfortunately apart from being extremely slow, it also doesn't give
good results, even for (simulated) DHGR palette.  There's a lot of
banding and for HGR the available colours are just too far apart in
colour space.

This is even without (somehow) applying the HGR colour constraints.

- Also return the priority from _compute_error as preparation for
reinserting the offset back into the priority heap, in case we can do
a better job later.  In order to do this properly we need to compute
both the error edit distance and the "true" edit distance and only
insert the priority of the latter.
2019-03-21 15:57:09 +00:00
kris
8f6aa019b6 Remove old player 2019-03-21 15:43:09 +00:00
kris
065d9dddf8 Move server file into subdir 2019-03-21 15:37:51 +00:00
kris
0732e2c067 Remove unneeded assert 2019-03-21 15:36:29 +00:00
kris
c4ed5f3d0a Remove files no longer in use. 2019-03-21 15:35:55 +00:00
kris
c8942ba138 Add cc65's LOADER.SYSTEM to the base disk image so we can use it to
bootstrap the player.
2019-03-21 15:28:41 +00:00
kris
341c645c12 Reorder a bit to put the configurable bits at the top. 2019-03-21 15:27:24 +00:00
kris
c53cf76df0 Switch to argparse and parametrize some flags 2019-03-21 15:26:00 +00:00
kris
0a9e83fa22 Listen on all interfaces, switch to port 1977 and use argparse instead
of sys.argv.
2019-03-21 15:25:06 +00:00
kris
92e1247c63 Add more comments and clean up timings for op_ack. Magically,
this all fits in exact multiples of 36/37 ticks!
2019-03-15 22:42:33 +00:00
kris
b3ba069b2d - Improve comments and clean up a bit
- un-orphan the code that sets up 5 connection retries

- Pad CHECKRECV loop to 36,37,36,37 tick cycles
2019-03-15 22:18:03 +00:00
kris
fffd05f4d1 Don't schedule a NOP for TCP frame padding; instead just have the ACK
include the 2 dummy reads.
2019-03-15 21:09:45 +00:00
kris
7343aa39ed Fix 2019-03-15 21:08:31 +00:00
kris
2092ef0926 - Add some comments
- Change ACK code to perform two dummy stream reads rather than relying
  on a preceding NOP to pad the TCP frame to 2K.  This fixes the timing
  issue that was causing most of the low-frequency ticks.

- Ticks still aren't perfectly aligned during the ACK slow path but
  it's almost good enough, i.e. probably no need to actually bother
  optimizing the slow path more.
2019-03-15 21:08:06 +00:00
kris
4cb45efb2c Remove old video-only opcode support. 2019-03-14 23:05:47 +00:00
kris
d90b865b16 Style cleanups 2019-03-14 23:05:15 +00:00
kris
01ffd034eb Move edit distance functions into separate module and clean up
partially.

Slight optimization to not heapppush() many times, instead build a
regular list and then heapify.
2019-03-14 22:32:52 +00:00
kris
e0ac37fe4a Style 2019-03-14 22:31:07 +00:00
kris
976e26f159 Vectorize the computation of diff weights, by precomputing a map
of all possible weights.  We encode the two 8-bit inputs into a single
16 bit value instead of dealing with an array of tuples.

Fix an important bug in _compute_delta: old and is_odd were transposed
so we weren't actually subtracting the old deltas!  Surprisingly,
when I accidentally fixed this bug in the vectorized version, the video
encoding was much worse!  This turned out to be because the edit
distance metric allowed reducing diffs by turning on pixels, which
meant it would tend to do this when "minimizing error" in a way that
was visually unappealing.

To remedy this, introduce a separate notion of substitution cost for
errors, and weight pixel colour changes more highly to discourage them
unless absolutely necessary.  This gives very good quality results!

Also vectorize the selection of page offsets and priorities having
a negative error delta, instead of heapifying the entire page.

Also it turns out to be a bit faster to compute (and memoize) the delta
between a proposed content byte and the entire target screen at once,
since we'll end up recomputing the same content diffs multiple times.

(Committing messy version in case I want to revisit some of those
interim versions)
2019-03-14 22:08:50 +00:00
kris
ede453d292 Remove vestige of standalone audio encoder. 2019-03-14 21:52:57 +00:00
kris
19ffa000f9 fix 2019-03-14 21:45:40 +00:00
kris
ea58c2f5b9 style 2019-03-14 21:45:28 +00:00
kris
cd17dce267 Normalize audio by tasting the first 10M of the audio stream and
computing the 2.5%ile and 97.5%ile values, i.e. so that <2.5% of
audio samples will clip.
2019-03-14 21:40:09 +00:00
kris
2d410a4b13 Take filename to serve from argv. 2019-03-14 21:38:12 +00:00
kris
9c68a6a369 Take input and output filenames from argv. 2019-03-14 21:37:43 +00:00
kris
718dc15cf2 Add some tests for edit_weight and byte_to_colour_string 2019-03-10 23:07:44 +00:00
kris
7db5c1c444 Read video frame rate and encode a new frame when the cycle count
has ticked past the appropriate time.

- optimize the frame encoding a bit
- use int64 consistently to avoid casting

Fix a bug - when retiring an offset, also update our memory map with the
new content, oops

If we run out of changes to index, keep emitting stores for content
at page=32,offset=0 forever

Switch to a weighted D-L implementation so we can weight e.g. different
substitutions differently (e.g. weighting diffs to/from black pixels
differently than color errors)
2019-03-10 22:42:31 +00:00
kris
aed439c0b3 Optimize some more to fit in memory! 2019-03-10 21:57:05 +00:00
kris
6b969476a0 Try to even out tick timings during ACK/slow path 2019-03-10 21:04:20 +00:00
kris
4598709a7d Add custom cc65 config that makes space for a LOWCODE segment from
0x800-0x2000

Place some of the tick opcodes there.  This gives enough room for all
but 2 of the op_tick_*_page_n opcodes!

It may be possible to fit the remaining ones into unused RAM in the
language card, but this will require some finesse to get the code in
there.  Or maybe I can optimize enough bytes...

0x300 is used by the loader.system, but there is also still 0x400..0x800
if I don't mind messing up the text page, and 0x200 if I can
get away with using the keyboard buffer.

Something is broken with RESET now though, maybe the reset vector is
pointing somewhere orphaned.
2019-03-09 22:35:56 +00:00
kris
4310034993 Unmodified version of Apple2BuildPipeline 2019-03-09 22:31:56 +00:00
kris
3fa9d510b5 Construct opcodes and classes for all of the TICK_x_PAGE_y
Use these in the audio encoder to generate random video stores to
validate.
2019-03-07 23:08:01 +00:00
kris
c00300147e Integrated audio + video player!
- Introduce a new Movie() class that multiplexes audio and video.

- Every N audio frames we grab a new video frame and begin pulling
  opcodes from the audio and video streams

- Grab frames from the input video using bmp2dhr if the .BIN file does
  not already exist.  Run bmp2dhr in a background thread to not block
  encoding

- move the output byte streaming from Video to Movie

- For now, manually clip updates to pages > 56 since the client doesn't
support them yet

The way we encode video is now:
- iterate in descending order over update_priority
- begin a new (page, content) opcode
- for all of the other offset bytes in that page, compute the error
  between the candidate content byte and the target content byte
- iterate over offsets in order of increasing error and decreasing
  update_priority to fill out the remaining opcode
2019-03-07 23:07:24 +00:00
kris
f133bb0008 Can fit in 7 more pages 2019-03-07 16:04:01 +00:00
kris
99c7f6db34 Construct opcodes and classes for all of the TICK_x_PAGE_y
Use these in the audio encoder to generate random video stores to
validate.
2019-03-07 15:56:04 +00:00
kris
318a64ad56 Move to 0x4000 for now
Generate opcodes for pages 32-49 which fit in memory without effort.
2019-03-07 15:53:35 +00:00
kris
5bd0352491 Parametrize opcodes as macros 2019-03-07 15:20:43 +00:00
kris
7832333b27 Build as SYSTEM file and load at 0x800. Need to work out how to
avoid loading over screen page.
2019-03-05 23:41:06 +00:00
kris
df25fce067 Fix up all but two of the off-by-one tick counts. This uses the
trick of temporarily violating the X=0 invariant (which is only
required in the tick_6 opcode tail path to steal an extra cycle)
to reorder a STA $2000,Y outside of the tick loop.

The cost of this is that we don't have enough pad cycles left to JMP
to the common opcode tail, but I think this still (barely) fits in
main RAM.
2019-03-05 23:20:17 +00:00
kris
8c23824aa6 Optimize more. Should now fit in main memory?! 2019-03-05 22:44:39 +00:00
kris
3cd44cd891 Optimize down to about 42k required for full set of page opcodes
- enough to fit in AUX RAM but still room to go, hopefully will be
  able to fit in MAIN?

Fix some of the off-by-one cycle counts introduced when switching from
STA tick (which is wrong since it accesses twice) to BIT tick.
Hopefully can fix others by reordering?
2019-03-05 22:22:35 +00:00
kris
12d48b664a Prototype audio-only player, which uses a different strategy for
scheduling audio + video.

Use a combined "fat" audio + video opcode that combines several
features:
- constant cycle count of 73 cycles/opcode (=14364 Hz)
- page and content are controlled per opcode
- each opcode does 4 offset stores (hence 57456 stores/sec)
- tick speaker twice per opcode, with varying duty cycles 4 .. 70 in
  units of 2 cycles
  - thus 32 opcodes, or 5-bit audio @ 14364 Hz

The price for this is that we need per-page variants of the opcodes,
and at 53 bytes/opcode they won't (quite) all fit even in AUX RAM.

The good news is that with some further work it should be possible
to reduce this footprint by having opcodes share implementation by
JMPing into a common tail sequence.

Also introduce some ticks in approximately correct places during the
ACK slow path, as a proof of concept that this does mitigate the
clicking.

This works and gives reasonable quality audio!
2019-03-05 21:05:41 +00:00
kris
340a3005d8 Support old-style opcodes that use relative branch addressing, and new
cycle-counted tick opcodes that use absolute addressing.

For now switch to the experimental audio-only player.
2019-03-05 20:51:05 +00:00
kris
2f12407d3c Extract audio channel from movie file and emit 5-bit audio opcodes
at 14KHz.
2019-03-05 20:47:34 +00:00
kris
6e2c83c1e5 Introduction more general notion of update priority used to increase
weight of diffs that persist across multiple frames.

For each frame, zero out update priority of bytes that no longer have
a pending diff, and add the edit distance of the remaining diffs.

Zero these out as opcodes are retired.

Replace hamming distance with Damerau-Levenshtein distance of the
encoded pixel colours in the byte, e.g. 0x2A --> GGG0 (taking into
account the half-pixel)

This has a couple of benefits over hamming distance of the bit patterns:
- transposed pixels are weighted less (edit distance 1, not 2+ for
  Hamming)
- coloured pixels are weighted equally as white pixels (not half as
  much)
- weighting changes in palette bit that flip multiple pixel colours

While I'm here, the RLE opcode should emit run_length - 1 so that we
can encode runs of 256 bytes.
2019-03-04 23:09:00 +00:00
kris
d3522c817f Randomize tie-breaker when pages etc have the same weight, so we
don't consistently prefer larger numbers.

There still seems to be a bug somewhere causing some screen regions to
be consistently not updated, but perhaps I'll find it when working on
the logic to penalize persistent diffs.
2019-03-03 23:25:10 +00:00
kris
a6f32886cd Refactor the various representations of screen memory (bitmap, (x,y)
bytemap, (page,offset) memory map)
- add a FlatMemoryMap that is a linear 8K array
- add converter methods and default constructors that allow converting
  between them
- use MemoryMap as the central representation used by the video encoder
2019-03-03 22:21:28 +00:00
kris
80402f25a5 - Allow HGR ROM entry point
- Don't trap unexpected entrypoint when crossing between regions via RTS
- Implement TICK handler
- Improve status printing in CPU loop
2019-02-27 22:46:53 +00:00
kris
90f696b8e4 Bare-bones py65-based simulator for Apple //e with Uthernet (i.e.
simulating the W5100).  This will hopefully be useful for
troubleshooting and testing player behaviour more precisely, e.g.

- trapping read/write access to unexpected memory areas
- asserting invariants on the processor state across loops
- measuring cycle timing
- tracing program execution

This already gets as far as negotiating the TCP connect.  The major
remaining piece seems to be the TCP buffer management on the W5100 side.
2019-02-27 22:26:35 +00:00
kris
2b3343f374 Encode audio file into cycle timings and emit tick opcodes. Amazingly,
even the naive opcode implementation works!

The main issue is that when we ACK, the speaker cone is allowed to tick
fully.  Maybe optimizing the ACK codepath to be fast enough will help
with this?
2019-02-27 14:49:21 +00:00