37 Commits

Author SHA1 Message Date
Emmanuel Marty
668204d953
Merge optimizations by Pavel Zagrebin
Manually merge PR #44
2020-04-04 13:29:25 +02:00
mobygamer
30192238ea Rewrite 8088 jumptable decompressor for maximum speed
This is a rewrite of LZSA1JMP.ASM to use a 256-element jumptable, which
allows the code to handle all of the hot paths (common cases) without
any branching.  This not only reduces branches (which are very costly on
x86) to a bare minimum, but also grants us foreknowledge in a decode
path of what steps can be skipped.

The new code is 12.7% faster than the old code, and assembles to less
than 3K of object code and data.
2019-10-26 23:34:24 -05:00
Emmanuel Marty
710d7e05d6
Fix comments 2019-07-14 10:13:16 +02:00
Emmanuel Marty
0b540431fc
Fix comments 2019-07-14 10:12:32 +02:00
Emmanuel Marty
981b1d5925
NASM versions of Jim Leonard's speed-optimized depackers 2019-07-14 10:11:16 +02:00
Emmanuel Marty
19e8bc0468
Merge pull request #17 from MobyGamer/decompressor/8086_speed
Time-efficient LZSA2 decompressor
2019-07-14 09:28:40 +02:00
mobygamer
c38b582e73 Time-efficient LZSA2 decompressor
This commit provides a time-effecient LZSA2 decompressor for
the 8088 (and higher) CPU.  Decompression speed is roughly 50% faster
than ZX7 on the same hardware.
2019-07-13 22:35:28 -05:00
Emmanuel Marty
6445d9ff6f
Merge pull request #16 from peterferrie/master - thanks!
smaller
2019-07-12 10:34:58 +02:00
Peter Ferrie
6797fe6268 smaller 2019-07-11 17:20:38 -07:00
Emmanuel Marty
7195104c73
Merge pull request #15 from MobyGamer/decompressor/8086_speed
Submit 8086-optimized decompressor using jump tables
2019-07-12 00:59:41 +02:00
mobygamer
f123a6d9df Submit 8086-optimized decompressor using jump tables
Because the 8086's BIU is more efficient and can read a word in
a single operation, we can skew LZSA1 decompression faster on 8086
by using a jump table to eliminate 2 comparions.
2019-07-11 17:22:29 -05:00
Emmanuel Marty
4d7d90b893
Merge pull request #14 from MobyGamer/decompressor/8086_speed
Additional minor speedups
2019-07-11 09:01:14 +02:00
mobygamer
638c33b432 Additional minor speedups 2019-07-11 00:58:46 -05:00
Peter Ferrie
68339ad961 smaller 2019-07-09 18:44:37 -07:00
Peter Ferrie
15c9059adf smaller 2019-07-09 18:30:29 -07:00
Peter Ferrie
e85cb8a5bb smaller 2019-07-09 16:49:01 -07:00
Emmanuel Marty
467cd1970f
Merge pull request #9 from MobyGamer/decompressor/8086_speed
Additional 1% speedup from deferring work
2019-07-09 20:42:17 +02:00
mobygamer
9a180a59f9 Additional 1% speedup from to introspec suggestions 2019-07-09 13:05:10 -05:00
Emmanuel Marty
b7b3ca1907
Merge pull request #8 from MobyGamer/decompressor/8086_speed - thank you!
8088-speed-optimized LZSA1 decompression
2019-07-08 09:17:51 +02:00
mobygamer
b0c9d61aab 8088-speed-optimized LZSA1 decompression
This commit contributes LZSA1 raw block format decompression code,
optimized for speed for the 8088 and higher processors.  With
appropriate compression options (-f1 -m5), decompression speed is
comparable to LZ4.
2019-07-07 20:25:50 -05:00
Peter Ferrie
111403c052 a little more 2019-07-07 12:05:02 -07:00
Emmanuel Marty
fe0dcf107f
Use db rather than .byte 2019-07-04 18:09:29 +02:00
Peter Ferrie
32b98f3085 a bit faster, a bit smaller 2019-07-03 19:38:27 -07:00
Emmanuel Marty
8d8f50a509
Don't use 80186 opcodes. Issue #3 2019-06-27 18:32:49 +02:00
Emmanuel Marty
79ed7bf91e
Further update LZSA2 format; avoid name conflicts 2019-06-08 13:35:03 +02:00
Emmanuel Marty
272f2e7a29
Update LZSA2 6502 and 8088 depackers 2019-06-07 23:22:34 +02:00
emmanuel-marty
d70830b525 Simplify nibble handling in LZSA2 8088 depacker 2019-05-11 11:42:00 +02:00
emmanuel-marty
8b7b4a2b4f Check in LZSA2 implementation (ratio competitive with ZX7, faster decompression) 2019-05-09 16:51:29 +02:00
emmanuel-marty
2b9780bd65 Finalize lzsa1 compressed format, speed up and simplify decompression 2019-04-24 09:47:40 +02:00
emmanuel-marty
02592cfe3b Fix typo in 8088 decompressor comments 2019-04-10 17:30:13 +02:00
emmanuel-marty
e24320b23b Save 1 byte in 8088 decompressor 2019-04-06 00:21:15 +02:00
emmanuel-marty
a785010448 Revert token to O|LLL|MMMM; revert to always shifting the match offset by 1; set raw block end marker as a large zero-size match 2019-04-05 23:16:05 +02:00
emmanuel-marty
1ef1ad8111 Reorganize token byte for faster decoding on 8-bit CPUs, without affecting the compression ratio 2019-04-05 11:58:44 +02:00
emmanuel-marty
c7692cf688 Store 16-bit lengths and match offsets directly, to simplify decompression on 8-bit CPUs without affecting the compression ratio 2019-04-05 10:42:06 +02:00
emmanuel-marty
0744ec99de Unpack raw blocks in 8088 decompressor 2019-04-03 13:05:32 +02:00
emmanuel-marty
06396f5ba6 Save 2 bytes in 8088 decompressor 2019-04-02 13:21:45 +02:00
marty-emmanuel
e216b0c544 Initial checkin 2019-04-01 18:04:56 +02:00