ii-vision

mirror of https://github.com/KrisKennaway/ii-vision.git synced 2024-12-21 05:30:20 +00:00

Author	SHA1	Message	Date
kris	fffd05f4d1	Don't schedule a NOP for TCP frame padding; instead just have the ACK include the 2 dummy reads.	2019-03-15 21:09:45 +00:00
kris	7343aa39ed	Fix	2019-03-15 21:08:31 +00:00
kris	2092ef0926	- Add some comments - Change ACK code to perform two dummy stream reads rather than relying on a preceding NOP to pad the TCP frame to 2K. This fixes the timing issue that was causing most of the low-frequency ticks. - Ticks still aren't perfectly aligned during the ACK slow path but it's almost good enough, i.e. probably no need to actually bother optimizing the slow path more.	2019-03-15 21:08:06 +00:00
kris	4cb45efb2c	Remove old video-only opcode support.	2019-03-14 23:05:47 +00:00
kris	d90b865b16	Style cleanups	2019-03-14 23:05:15 +00:00
kris	01ffd034eb	Move edit distance functions into separate module and clean up partially. Slight optimization to not heapppush() many times, instead build a regular list and then heapify.	2019-03-14 22:32:52 +00:00
kris	e0ac37fe4a	Style	2019-03-14 22:31:07 +00:00
kris	976e26f159	Vectorize the computation of diff weights, by precomputing a map of all possible weights. We encode the two 8-bit inputs into a single 16 bit value instead of dealing with an array of tuples. Fix an important bug in _compute_delta: old and is_odd were transposed so we weren't actually subtracting the old deltas! Surprisingly, when I accidentally fixed this bug in the vectorized version, the video encoding was much worse! This turned out to be because the edit distance metric allowed reducing diffs by turning on pixels, which meant it would tend to do this when "minimizing error" in a way that was visually unappealing. To remedy this, introduce a separate notion of substitution cost for errors, and weight pixel colour changes more highly to discourage them unless absolutely necessary. This gives very good quality results! Also vectorize the selection of page offsets and priorities having a negative error delta, instead of heapifying the entire page. Also it turns out to be a bit faster to compute (and memoize) the delta between a proposed content byte and the entire target screen at once, since we'll end up recomputing the same content diffs multiple times. (Committing messy version in case I want to revisit some of those interim versions)	2019-03-14 22:08:50 +00:00
kris	ede453d292	Remove vestige of standalone audio encoder.	2019-03-14 21:52:57 +00:00
kris	19ffa000f9	fix	2019-03-14 21:45:40 +00:00
kris	ea58c2f5b9	style	2019-03-14 21:45:28 +00:00
kris	cd17dce267	Normalize audio by tasting the first 10M of the audio stream and computing the 2.5%ile and 97.5%ile values, i.e. so that <2.5% of audio samples will clip.	2019-03-14 21:40:09 +00:00
kris	2d410a4b13	Take filename to serve from argv.	2019-03-14 21:38:12 +00:00
kris	9c68a6a369	Take input and output filenames from argv.	2019-03-14 21:37:43 +00:00
kris	718dc15cf2	Add some tests for edit_weight and byte_to_colour_string	2019-03-10 23:07:44 +00:00
kris	7db5c1c444	Read video frame rate and encode a new frame when the cycle count has ticked past the appropriate time. - optimize the frame encoding a bit - use int64 consistently to avoid casting Fix a bug - when retiring an offset, also update our memory map with the new content, oops If we run out of changes to index, keep emitting stores for content at page=32,offset=0 forever Switch to a weighted D-L implementation so we can weight e.g. different substitutions differently (e.g. weighting diffs to/from black pixels differently than color errors)	2019-03-10 22:42:31 +00:00
kris	aed439c0b3	Optimize some more to fit in memory!	2019-03-10 21:57:05 +00:00
kris	6b969476a0	Try to even out tick timings during ACK/slow path	2019-03-10 21:04:20 +00:00
kris	4598709a7d	Add custom cc65 config that makes space for a LOWCODE segment from 0x800-0x2000 Place some of the tick opcodes there. This gives enough room for all but 2 of the op_tick_*_page_n opcodes! It may be possible to fit the remaining ones into unused RAM in the language card, but this will require some finesse to get the code in there. Or maybe I can optimize enough bytes... 0x300 is used by the loader.system, but there is also still 0x400..0x800 if I don't mind messing up the text page, and 0x200 if I can get away with using the keyboard buffer. Something is broken with RESET now though, maybe the reset vector is pointing somewhere orphaned.	2019-03-09 22:35:56 +00:00
kris	4310034993	Unmodified version of Apple2BuildPipeline	2019-03-09 22:31:56 +00:00
kris	3fa9d510b5	Construct opcodes and classes for all of the TICK_x_PAGE_y Use these in the audio encoder to generate random video stores to validate.	2019-03-07 23:08:01 +00:00
kris	c00300147e	Integrated audio + video player! - Introduce a new Movie() class that multiplexes audio and video. - Every N audio frames we grab a new video frame and begin pulling opcodes from the audio and video streams - Grab frames from the input video using bmp2dhr if the .BIN file does not already exist. Run bmp2dhr in a background thread to not block encoding - move the output byte streaming from Video to Movie - For now, manually clip updates to pages > 56 since the client doesn't support them yet The way we encode video is now: - iterate in descending order over update_priority - begin a new (page, content) opcode - for all of the other offset bytes in that page, compute the error between the candidate content byte and the target content byte - iterate over offsets in order of increasing error and decreasing update_priority to fill out the remaining opcode	2019-03-07 23:07:24 +00:00
kris	f133bb0008	Can fit in 7 more pages	2019-03-07 16:04:01 +00:00
kris	99c7f6db34	Construct opcodes and classes for all of the TICK_x_PAGE_y Use these in the audio encoder to generate random video stores to validate.	2019-03-07 15:56:04 +00:00
kris	318a64ad56	Move to 0x4000 for now Generate opcodes for pages 32-49 which fit in memory without effort.	2019-03-07 15:53:35 +00:00
kris	5bd0352491	Parametrize opcodes as macros	2019-03-07 15:20:43 +00:00
kris	7832333b27	Build as SYSTEM file and load at 0x800. Need to work out how to avoid loading over screen page.	2019-03-05 23:41:06 +00:00
kris	df25fce067	Fix up all but two of the off-by-one tick counts. This uses the trick of temporarily violating the X=0 invariant (which is only required in the tick_6 opcode tail path to steal an extra cycle) to reorder a STA $2000,Y outside of the tick loop. The cost of this is that we don't have enough pad cycles left to JMP to the common opcode tail, but I think this still (barely) fits in main RAM.	2019-03-05 23:20:17 +00:00
kris	8c23824aa6	Optimize more. Should now fit in main memory?!	2019-03-05 22:44:39 +00:00
kris	3cd44cd891	Optimize down to about 42k required for full set of page opcodes - enough to fit in AUX RAM but still room to go, hopefully will be able to fit in MAIN? Fix some of the off-by-one cycle counts introduced when switching from STA tick (which is wrong since it accesses twice) to BIT tick. Hopefully can fix others by reordering?	2019-03-05 22:22:35 +00:00
kris	12d48b664a	Prototype audio-only player, which uses a different strategy for scheduling audio + video. Use a combined "fat" audio + video opcode that combines several features: - constant cycle count of 73 cycles/opcode (=14364 Hz) - page and content are controlled per opcode - each opcode does 4 offset stores (hence 57456 stores/sec) - tick speaker twice per opcode, with varying duty cycles 4 .. 70 in units of 2 cycles - thus 32 opcodes, or 5-bit audio @ 14364 Hz The price for this is that we need per-page variants of the opcodes, and at 53 bytes/opcode they won't (quite) all fit even in AUX RAM. The good news is that with some further work it should be possible to reduce this footprint by having opcodes share implementation by JMPing into a common tail sequence. Also introduce some ticks in approximately correct places during the ACK slow path, as a proof of concept that this does mitigate the clicking. This works and gives reasonable quality audio!	2019-03-05 21:05:41 +00:00
kris	340a3005d8	Support old-style opcodes that use relative branch addressing, and new cycle-counted tick opcodes that use absolute addressing. For now switch to the experimental audio-only player.	2019-03-05 20:51:05 +00:00
kris	2f12407d3c	Extract audio channel from movie file and emit 5-bit audio opcodes at 14KHz.	2019-03-05 20:47:34 +00:00
kris	6e2c83c1e5	Introduction more general notion of update priority used to increase weight of diffs that persist across multiple frames. For each frame, zero out update priority of bytes that no longer have a pending diff, and add the edit distance of the remaining diffs. Zero these out as opcodes are retired. Replace hamming distance with Damerau-Levenshtein distance of the encoded pixel colours in the byte, e.g. 0x2A --> GGG0 (taking into account the half-pixel) This has a couple of benefits over hamming distance of the bit patterns: - transposed pixels are weighted less (edit distance 1, not 2+ for Hamming) - coloured pixels are weighted equally as white pixels (not half as much) - weighting changes in palette bit that flip multiple pixel colours While I'm here, the RLE opcode should emit run_length - 1 so that we can encode runs of 256 bytes.	2019-03-04 23:09:00 +00:00
kris	d3522c817f	Randomize tie-breaker when pages etc have the same weight, so we don't consistently prefer larger numbers. There still seems to be a bug somewhere causing some screen regions to be consistently not updated, but perhaps I'll find it when working on the logic to penalize persistent diffs.	2019-03-03 23:25:10 +00:00
kris	a6f32886cd	Refactor the various representations of screen memory (bitmap, (x,y) bytemap, (page,offset) memory map) - add a FlatMemoryMap that is a linear 8K array - add converter methods and default constructors that allow converting between them - use MemoryMap as the central representation used by the video encoder	2019-03-03 22:21:28 +00:00
kris	80402f25a5	- Allow HGR ROM entry point - Don't trap unexpected entrypoint when crossing between regions via RTS - Implement TICK handler - Improve status printing in CPU loop	2019-02-27 22:46:53 +00:00
kris	90f696b8e4	Bare-bones py65-based simulator for Apple //e with Uthernet (i.e. simulating the W5100). This will hopefully be useful for troubleshooting and testing player behaviour more precisely, e.g. - trapping read/write access to unexpected memory areas - asserting invariants on the processor state across loops - measuring cycle timing - tracing program execution This already gets as far as negotiating the TCP connect. The major remaining piece seems to be the TCP buffer management on the W5100 side.	2019-02-27 22:26:35 +00:00
kris	2b3343f374	Encode audio file into cycle timings and emit tick opcodes. Amazingly, even the naive opcode implementation works! The main issue is that when we ACK, the speaker cone is allowed to tick fully. Maybe optimizing the ACK codepath to be fast enough will help with this?	2019-02-27 14:49:21 +00:00
kris	9d4edc6c4a	Compute median frame similarity. This turns out not to be a great metric though, because it doesn't penalize artifacts like colour fringing, or diffs that persist across many frames.	2019-02-27 14:10:39 +00:00
kris	4840efc41e	In HeuristicPageFirstScheduler, don't use a deterministic ordering of pages and content, since we may never get around to some of them across many frames. Instead weight by total xor weight for the page, (page, content) tuple and offset list Add some other scheduler variants - prefer content first, then page. This turns out to introduce a lot of colour fringing since we may not ever get back to fix up the hanging bit	2019-02-27 14:09:42 +00:00
kris	0ac905a7aa	Use decoder symbol table to populate start/end addresses for opcodes.	2019-02-27 12:10:56 +00:00
kris	c139e8bf1b	Write symbol table to .dbg file when assembling player Add explicit end_{opcode} labels to mark (1 byte past) end of opcode. Rename op_done to op_terminate to match opcode name in encoder. Extract symbol table in encoder and use this to populate the opcode start/end addresses.	2019-02-27 12:10:14 +00:00
kris	86066fec61	Makefile	2019-02-24 00:04:13 +00:00
kris	9da18f0ecc	Initial working version of video player. The basic strategy is that we remove as much conditional evaluation as possible from the inner decode loop. e.g. rather than doing opcode dispatch by some kind of table lookup (etc), this is precomputed on the server side. The next opcode in the stream is encoded as a branch offset to that opcode's first instruction, and we modify the BRA instruction in place to dispatch there. TCP buffer management is also offloaded to the server side; we rely on the server to explicitly schedule an ACK opcode every 2048 bytes to drop us into a slow path where we move the W5100 read pointer, send the TCP ACK, and block until the read socket has enough data to continue with. This outer loop is overly conservative (e.g. since we're performing exactly known read sizes we can omit a lot of duplicate bookkeeping), i.e. there is a lot of room for optimizing this. Experimental (i.e. not working yet) support for audio delay loop; we should be able to leverage the way we do offset-based dispatch to implement variable-delay loops with some level of cycle resolution.	2019-02-24 00:03:36 +00:00
kris	1b54c9c864	Video() is now aware of target frame rate, and will continue to emit opcodes until the cycle budget for the frame is exhausted. Output stream is also now aware of TCP framing, and schedules an ACK opcode every 2048 output bytes to instruct the client to perform TCP ACK and buffer management. Fixes several serious bugs in RLE encoding, including: - we were emitting the RLE opcode with the next content byte after the run completed! - we were looking at the wrong field for the start offset! - handle the case where the entire page is a single run - stop trying to allow accumulating error when RLE -- this does not respect the Apple II colour encoding, i.e. may introduce colour fringing. - also because of this we're unlikely to actually be able to find many runs because odd and even columns are encoded differently. In a followup we should start encoding odd and even columns separately Optimize after profiling -- encoder is now about 2x faster Add tests.	2019-02-23 23:52:25 +00:00
kris	cc6c92335d	Implement a much more efficient mechanism for mapping an array between (x, y) indexing and (page, offset) indexing. This uses numpy to construct a new array by indexing into the old one. In benchmarking this is something like 100x faster.	2019-02-23 23:44:29 +00:00
kris	4178c191db	Update cycle timing from working ethernet player. Add _START and _END addresses that are used by the byte stream to vector the program counter to the next opcode in the stream. Support equality testing of opcodes and add tests. Add an ACK opcode for instructing the client to ACK the TCP stream. Tick opcode now accepts a cycle argument, for experimenting with audio support.	2019-02-23 23:38:14 +00:00
kris	e0ab30d074	Fix deprecation warning on newer numpy Similarity metric should be a float	2019-02-23 23:33:18 +00:00
kris	e4174ed10b	Extract out input video decoding into separate module. Prototype a threaded version of the decoder but this doesn't seem to be necessary as it's not the bottleneck. Opcode stream is now aware of frame cycle budget and will keep emitting until budget runs out -- so no need for fullness estimate.	2019-02-23 23:32:07 +00:00

1 2 3

112 Commits