diff --git a/README.md b/README.md index d6003e1..2a43be2 100755 --- a/README.md +++ b/README.md @@ -66,12 +66,12 @@ Each frame contains a 3-byte length followed by block data that expands to up to LZSA blocks are composed from consecutive commands. Each command follows this format: -* -* -* -* -* -* +* token: +* optional extra literal length +* literal values +* match offset low +* optional match offset high +* optional extra encoded match length **token** @@ -82,7 +82,7 @@ The token byte is broken down into three parts: * O: set for a 2-byte match offset, clear for a 1-byte match offset * L: 3-bit literals length (0-6, or 7 if extended). If the number of literals for this command is 0 to 6, the length is encoded in the token and no extra bytes are required. Otherwise, a value of 7 is encoded and extra bytes follow as 'optional extra literal length' -* M: 4-bit match length (0-14, or 15 if extended). Likewise, if the match length for this command is 0 to 14, it is directly encoded, otherwise 15 is stored and extra bytes follow as 'optional extra match length'. +* M: 4-bit encoded match length (0-14, or 15 if extended). Likewise, if the encoded match length for this command is 0 to 14, it is directly stored, otherwise 15 is stored and extra bytes follow as 'optional extra encoded match length'. Except for the last command in a block, a command always contains a match, so the encoded match length is the actual match length offset by the minimum, which is 3 bytes. For instance, an actual match length of 10 bytes to be copied, is encoded as 7. **optional extra literal length** @@ -96,6 +96,8 @@ If the literals length is 7 or more, the 'L' bits in the token form the value 7, Literal bytes, whose number is specified by the literals length, follow here. There can be zero literals in a command. +Important note: the last command in a block ends here, as it always contains literals only. + **match offset low** The low 8 bits of the match offset follows. @@ -108,13 +110,13 @@ If the 'O' bit (bit 7) is set in the token, the high 8 bits of the match offset Note that the match offset is *off by 1*: a value of 0 refers to the byte preceding the current output index (N-1). A value of 1 refers to tow bytes before the current output index (N-2) and so on. This is so that match offsets up to 256 can be encoded as a single byte, for extra compression. -**optional extra match length** +**optional extra encoded match length** -If the match length is 15 or more, the 'M' bits in the token form the value 15, and an extra byte follows here, with three possible types of value. +If the encoded match length is 15 or more, the 'M' bits in the token form the value 15, and an extra byte follows here, with three possible types of value. * 0-254: the value is added to the 15 stored in the token. -* 254: a second byte follows. The final match length is 15 + 254 + the second byte. -* 255: a second and third byte follow, forming a little-endian 16-bit value. The final match length is 15 + 255 + that 16-bit value. +* 254: a second byte follows. The final encoded match length is 15 + 254 + the second byte, which gives an actual match length of 3 + 15 + 254 + the second byte. +* 255: a second and third byte follow, forming a little-endian 16-bit value. The final encoded match length is 15 + 255 + that 16-bit value, which gives an actual match length of 3 + 15 + 255 + that 16-bit value. # Footer format