Update format description

This commit is contained in:
Emmanuel Marty 2019-04-05 10:46:24 +02:00 committed by GitHub
parent c7692cf688
commit 33b62c004a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -99,7 +99,7 @@ If the literals length is 7 or more, the 'L' bits in the token form the value 7,
* 0-253: the value is added to the 7 stored in the token, to compose the final literals length. For instance a length of 206 will be stored as 7 in the token + a single byte with the value of 199, as 7 + 199 = 206.
* 254: a second byte follows. The final literals value is 7 + 254 + the second byte. For instance, a literals length of 499 is encoded as 7 in the token, a byte with the value of 254, and a final byte with teh value of 238, as 7 + 254 + 238 = 499.
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is 7 + 255 + that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 250 and 2, as 7 + 255 + 250 + (2 * 256) = 1024.
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 0 and 4, as (4 * 256) = 1024.
**literal values**
@ -115,9 +115,11 @@ The low 8 bits of the match offset follows.
If the 'O' bit (bit 7) is set in the token, the high 8 bits of the match offset follow, otherwise they are understood to be all set to 0.
**important note regarding match offsets: off by 1**
**important note regarding short match offsets: off by 1**
Note that the match offset is *off by 1*: a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
Note that the match offset is *off by 1* when encoded as a single byte (the O bit in the token is set to 0): a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
When match offsets are encoded as two bytes (the O bit in the token is set to 1), they are stored directly: a value of 1 refers to the byte preceding the current output index (N-1), and so on.
**optional extra encoded match length**
@ -125,7 +127,7 @@ If the encoded match length is 15 or more, the 'M' bits in the token form the va
* 0-253: the value is added to the 15 stored in the token.
* 254: a second byte follows. The final encoded match length is 15 + 254 + the second byte, which gives an actual match length of 3 + 15 + 254 + the second byte.
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is 15 + 255 + that 16-bit value, which gives an actual match length of 3 + 15 + 255 + that 16-bit value.
* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is that 16-bit value.
# Footer format