Prototype a threaded version of the decoder but this doesn't seem to be
necessary as it's not the bottleneck.
Opcode stream is now aware of frame cycle budget and will keep emitting
until budget runs out -- so no need for fullness estimate.
for runs of N >= 4.
Also fix a bug in the decoder that was apparently allowing opcodes to
fall through. Replace BVC with BRA (i.e. assume 65C02) until I can work
out what is going on
solver to minimize the cycle cost to visit all changes in our estimated
list.
This is fortunately a tractable (though slow) computation that does give
improvements on the previous heuristic at the level of ~6% better
throughput.
This opcode schedule prefers to group by page and vary over content, so
implement a fast heuristic that does that. This scheduler is within 2%
of the TSP solution.
bonus we now maintain much better tracking of our target frame rate.
Maintain a running estimate of the opcode scheduling overhead, i.e.
how many opcodes we end up scheduling for each content byte written.
Use this to select an estimated number of screen changes to fill the
cycle budget, ordered by hamming weight of the delta. Group these
by content byte and then page as before.
weight of the xor of old and new frames, and switch to setting the
new byte directly instead of xor'ing, to improve efficiency of decoder.
Instead of iterating in a fixed order by target byte then page, at
each step compute the next change to make that would maximize
cycles/pixel, including switching page and/or content byte.
This is unfortunately much slower to encode currently but can hopefully
be optimized sufficiently.
bytestream by prioritizing bytes to be XOR'ed that have the highest
hamming weight, i.e. will result in the largest number of pixel
transitions on the screen.
Not especially optimized yet (either runtime, or byte stream)