mirror of
https://github.com/emmanuel-marty/lzsa.git
synced 2024-11-15 20:09:39 +00:00
Clarify the encoding of matches, fix some broken formatting.
This commit is contained in:
parent
e216b0c544
commit
fde853e095
24
README.md
24
README.md
@ -66,12 +66,12 @@ Each frame contains a 3-byte length followed by block data that expands to up to
|
||||
|
||||
LZSA blocks are composed from consecutive commands. Each command follows this format:
|
||||
|
||||
* <token: O|LLL|MMMM>
|
||||
* <optional extra literal length>
|
||||
* <literal values>
|
||||
* <match offset low>
|
||||
* <optional match offset high>
|
||||
* <optional extra match length>
|
||||
* token: <O|LLL|MMMM>
|
||||
* optional extra literal length
|
||||
* literal values
|
||||
* match offset low
|
||||
* optional match offset high
|
||||
* optional extra encoded match length
|
||||
|
||||
**token**
|
||||
|
||||
@ -82,7 +82,7 @@ The token byte is broken down into three parts:
|
||||
|
||||
* O: set for a 2-byte match offset, clear for a 1-byte match offset
|
||||
* L: 3-bit literals length (0-6, or 7 if extended). If the number of literals for this command is 0 to 6, the length is encoded in the token and no extra bytes are required. Otherwise, a value of 7 is encoded and extra bytes follow as 'optional extra literal length'
|
||||
* M: 4-bit match length (0-14, or 15 if extended). Likewise, if the match length for this command is 0 to 14, it is directly encoded, otherwise 15 is stored and extra bytes follow as 'optional extra match length'.
|
||||
* M: 4-bit encoded match length (0-14, or 15 if extended). Likewise, if the encoded match length for this command is 0 to 14, it is directly stored, otherwise 15 is stored and extra bytes follow as 'optional extra encoded match length'. Except for the last command in a block, a command always contains a match, so the encoded match length is the actual match length offset by the minimum, which is 3 bytes. For instance, an actual match length of 10 bytes to be copied, is encoded as 7.
|
||||
|
||||
**optional extra literal length**
|
||||
|
||||
@ -96,6 +96,8 @@ If the literals length is 7 or more, the 'L' bits in the token form the value 7,
|
||||
|
||||
Literal bytes, whose number is specified by the literals length, follow here. There can be zero literals in a command.
|
||||
|
||||
Important note: the last command in a block ends here, as it always contains literals only.
|
||||
|
||||
**match offset low**
|
||||
|
||||
The low 8 bits of the match offset follows.
|
||||
@ -108,13 +110,13 @@ If the 'O' bit (bit 7) is set in the token, the high 8 bits of the match offset
|
||||
|
||||
Note that the match offset is *off by 1*: a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to tow bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
|
||||
|
||||
**optional extra match length**
|
||||
**optional extra encoded match length**
|
||||
|
||||
If the match length is 15 or more, the 'M' bits in the token form the value 15, and an extra byte follows here, with three possible types of value.
|
||||
If the encoded match length is 15 or more, the 'M' bits in the token form the value 15, and an extra byte follows here, with three possible types of value.
|
||||
|
||||
* 0-254: the value is added to the 15 stored in the token.
|
||||
* 254: a second byte follows. The final match length is 15 + 254 + the second byte.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final match length is 15 + 255 + that 16-bit value.
|
||||
* 254: a second byte follows. The final encoded match length is 15 + 254 + the second byte, which gives an actual match length of 3 + 15 + 254 + the second byte.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is 15 + 255 + that 16-bit value, which gives an actual match length of 3 + 15 + 255 + that 16-bit value.
|
||||
|
||||
# Footer format
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user