mirror of
https://github.com/emmanuel-marty/lzsa.git
synced 2025-01-26 17:32:00 +00:00
Update format description
This commit is contained in:
parent
c7692cf688
commit
33b62c004a
10
README.md
10
README.md
@ -99,7 +99,7 @@ If the literals length is 7 or more, the 'L' bits in the token form the value 7,
|
||||
|
||||
* 0-253: the value is added to the 7 stored in the token, to compose the final literals length. For instance a length of 206 will be stored as 7 in the token + a single byte with the value of 199, as 7 + 199 = 206.
|
||||
* 254: a second byte follows. The final literals value is 7 + 254 + the second byte. For instance, a literals length of 499 is encoded as 7 in the token, a byte with the value of 254, and a final byte with teh value of 238, as 7 + 254 + 238 = 499.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is 7 + 255 + that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 250 and 2, as 7 + 255 + 250 + (2 * 256) = 1024.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 0 and 4, as (4 * 256) = 1024.
|
||||
|
||||
**literal values**
|
||||
|
||||
@ -115,9 +115,11 @@ The low 8 bits of the match offset follows.
|
||||
|
||||
If the 'O' bit (bit 7) is set in the token, the high 8 bits of the match offset follow, otherwise they are understood to be all set to 0.
|
||||
|
||||
**important note regarding match offsets: off by 1**
|
||||
**important note regarding short match offsets: off by 1**
|
||||
|
||||
Note that the match offset is *off by 1*: a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
|
||||
Note that the match offset is *off by 1* when encoded as a single byte (the O bit in the token is set to 0): a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
|
||||
|
||||
When match offsets are encoded as two bytes (the O bit in the token is set to 1), they are stored directly: a value of 1 refers to the byte preceding the current output index (N-1), and so on.
|
||||
|
||||
**optional extra encoded match length**
|
||||
|
||||
@ -125,7 +127,7 @@ If the encoded match length is 15 or more, the 'M' bits in the token form the va
|
||||
|
||||
* 0-253: the value is added to the 15 stored in the token.
|
||||
* 254: a second byte follows. The final encoded match length is 15 + 254 + the second byte, which gives an actual match length of 3 + 15 + 254 + the second byte.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is 15 + 255 + that 16-bit value, which gives an actual match length of 3 + 15 + 255 + that 16-bit value.
|
||||
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is that 16-bit value.
|
||||
|
||||
# Footer format
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user