Update format description

2025-04-18 22:38:22 +00:00 · 2019-04-05 10:46:24 +02:00 · 2019-04-05 10:46:24 +02:00 · 33b62c004a
commit 33b62c004a
parent c7692cf688
1 changed files with 6 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -99,7 +99,7 @@ If the literals length is 7 or more, the 'L' bits in the token form the value 7,

 * 0-253: the value is added to the 7 stored in the token, to compose the final literals length. For instance a length of 206 will be stored as 7 in the token + a single byte with the value of 199, as 7 + 199 = 206.
 * 254: a second byte follows. The final literals value is 7 + 254 + the second byte. For instance, a literals length of 499 is encoded as 7 in the token, a byte with the value of 254, and a final byte with teh value of 238, as 7 + 254 + 238 = 499.
-* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is 7 + 255 + that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 250 and 2, as 7 + 255 + 250 + (2 * 256) = 1024.
+* 255: a second and third byte follow, forming a little-endian 16-bit value. The final literals value is that 16-bit value. For instance, a literals length of 1024 is stored as 7 in the token, then byte values of 255, 0 and 4, as (4 * 256) = 1024.

 **literal values**

@ -115,9 +115,11 @@ The low 8 bits of the match offset follows.

 If the 'O' bit (bit 7) is set in the token, the high 8 bits of the match offset follow, otherwise they are understood to be all set to 0.

-**important note regarding match offsets: off by 1**
+**important note regarding short match offsets: off by 1**

-Note that the match offset is *off by 1*: a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
+Note that the match offset is *off by 1* when encoded as a single byte (the O bit in the token is set to 0): a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to two bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression.
+
+When match offsets are encoded as two bytes (the O bit in the token is set to 1), they are stored directly: a value of 1 refers to the byte preceding the current output index (N-1), and so on.

 **optional extra encoded match length**

@ -125,7 +127,7 @@ If the encoded match length is 15 or more, the 'M' bits in the token form the va

 * 0-253: the value is added to the 15 stored in the token.
 * 254: a second byte follows. The final encoded match length is 15 + 254 + the second byte, which gives an actual match length of 3 + 15 + 254 + the second byte.
-* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is 15 + 255 + that 16-bit value, which gives an actual match length of 3 + 15 + 255 + that 16-bit value.
+* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is that 16-bit value.

 # Footer format