diff --git a/docs/BytecodeFormat.html b/docs/BytecodeFormat.html new file mode 100644 index 00000000000..3169f162eef --- /dev/null +++ b/docs/BytecodeFormat.html @@ -0,0 +1,248 @@ + + +
+Written by Reid Spencer +and Chris Lattner
++
This document is an (after the fact) specification of the LLVM bytecode +file format. It documents the binary encoding rules of the bytecode file format +so that equivalent systems can encode bytecode files correctly. The LLVM +bytecode representation is used to store the intermediate representation on +disk in compacted form. +
+This section describes the general concepts of the bytecode file format +without getting into bit and byte level specifics.
+LLVM bytecode files consist simply of a sequence of blocks of bytes. +Each block begins with an identification value that determines the type of +the next block. The possible types of blocks are described below in the section +Block Types. The block identifier is used because +it is possible for entire blocks to be omitted from the file if they are +empty. The block identifier helps the reader determine which kind of block is +next in the file.
++Except for the Header Block all blocks are variable +length. The consume just enough bytes to express their contents.
+Most blocks are constructed of lists of information. Lists can be constructed +of other lists, etc. This decomposition of information follows the containment +hierarchy of the LLVM Intermediate Representation. For example, a function is +composed of a list of basic blocks. Each basic block is composed of a set of +instructions. This list of list nesting and hierarchy is maintained in the +bytecode file.
+A list is encoded into the file simply by encoding the number of entries as +an integer followed by each of the entries. The reader knows when the list is +done because it will have filled the list with the required numbe of entries. +
+Fields are units of information that LLVM knows how to write atomically. +Most fields have a uniform length or some kind of length indication built into +their encoding. For example, a constant string (array of SByte or UByte) is +written simply as the length followed by the characters. Although this is +similar to a list, constant strings are treated atomically and are thus +fields.
+Fields use a condensed bit format specific to the type of information +they must contain. As few bits as possible are written for each field. The +sections that follow will provide the details on how these fields are +written and how the bits are to be interpreted.
+To support cross-platform differences, the bytecode file is aligned on +certain boundaries. This means that a small amount of padding (at most 3 bytes) +will be added to ensure that the next entry is aligned to a 32-bit boundary. +
+This section provides the detailed layout of the LLVM bytecode file format. + bit and byte level specifics.
+The descriptions of the bytecode format that follow describe the bit + fields in detail. These descriptions are provided in tabular form. Each table + has four columns that specify:
+The bytecode format encodes the intermediate representation into groups + of bytes known as blocks. The blocks are written sequentially to the file in + the following order:
+The Header Block occurs in every LLVM bytecode file and is always first. It +simply provides a few bytes of data to identify the file, its format, and the +bytecode version. This block is fixed length and always eight bytes, as follows: +
Byte(s) | +Bit(s) | +Align? | +Field Description | +
---|---|---|---|
00 | 00-07 | No | +Constant "l" | +
01 | 00-07 | No | +Constant "l" | +
02 | 00-07 | No | +Constant "v" | +
03 | 00-07 | No | +Constant "m" | +
04-07 | 00 | No | +Target is big endian | +
04-07 | 01 | No | +Target has long pointers | +
04-07 | 02 | No | +Target has no endianess | +
04-07 | 03 | No | +Target has no pointer size | +
04-07 | 04-31 | Yes | +The LLVM bytecode format version number | +