This is a rewrite of LZSA1JMP.ASM to use a 256-element jumptable, which
allows the code to handle all of the hot paths (common cases) without
any branching. This not only reduces branches (which are very costly on
x86) to a bare minimum, but also grants us foreknowledge in a decode
path of what steps can be skipped.
The new code is 12.7% faster than the old code, and assembles to less
than 3K of object code and data.
This commit provides a time-effecient LZSA2 decompressor for
the 8088 (and higher) CPU. Decompression speed is roughly 50% faster
than ZX7 on the same hardware.
Because the 8086's BIU is more efficient and can read a word in
a single operation, we can skew LZSA1 decompression faster on 8086
by using a jump table to eliminate 2 comparions.