The hope was that this would reduce the amount of copying and bit
shifting required by the frontend to get the data on screen, but
it doesn't seem to offer much advantage, surprisingly. I'll leave
it in though. There are a few other minor tweaks included here to
try to improve the performance a bit
It turns out to not be too much of a performance issue to allocate
a new frame each time one is produced, so to reduce lock contention
I added a queue where frames are added to and taken from without
locking the frame for the whole update. I'm hoping this will give
more flexibility to frontend implementations, which can simply
skip or repeat frames if needed.
I wanted to make this a bit more modular, so it's easier in theory to
write external crates that can reuse bits, and selectively compile in
bits, such as adding new systems or new cpu implementations