Add explicit end_{opcode} labels to mark (1 byte past) end of opcode.
Rename op_done to op_terminate to match opcode name in encoder.
Extract symbol table in encoder and use this to populate the opcode
start/end addresses.
The basic strategy is that we remove as much conditional evaluation as
possible from the inner decode loop.
e.g. rather than doing opcode dispatch by some kind of table lookup
(etc), this is precomputed on the server side. The next opcode in the
stream is encoded as a branch offset to that opcode's first instruction,
and we modify the BRA instruction in place to dispatch there.
TCP buffer management is also offloaded to the server side; we rely on
the server to explicitly schedule an ACK opcode every 2048 bytes to
drop us into a slow path where we move the W5100 read pointer, send
the TCP ACK, and block until the read socket has enough data to
continue with.
This outer loop is overly conservative (e.g. since we're performing
exactly known read sizes we can omit a lot of duplicate bookkeeping),
i.e. there is a lot of room for optimizing this.
Experimental (i.e. not working yet) support for audio delay loop;
we should be able to leverage the way we do offset-based dispatch to
implement variable-delay loops with some level of cycle resolution.