diff --git a/docs/BytecodeFormat.html b/docs/BytecodeFormat.html new file mode 100644 index 00000000000..3169f162eef --- /dev/null +++ b/docs/BytecodeFormat.html @@ -0,0 +1,248 @@ + + + + LLVM Bytecode File Format + + + + +
LLVM Bytecode File Format
+
    +
  1. Abstract
  2. +
  3. General Concepts
  4. +
      +
    1. Blocks
    2. +
    3. Lists
    4. +
    5. Fields
    6. +
    7. Alignment
    8. +
    +
  5. Detailed Layout +
      +
    1. Notation
    2. +
    3. Blocks Types
    4. +
    5. Header Block
    6. +
    7. Global Type Pool
    8. +
    9. Module Info Block
    10. +
    11. Global Constant Pool
    12. +
    13. Function Blocks
    14. +
    15. Module Symbol Table
    16. +
    +
  6. +
+
+

Written by Reid Spencer +and Chris Lattner

+

+
+ +
Abstract
+ +
+

This document is an (after the fact) specification of the LLVM bytecode +file format. It documents the binary encoding rules of the bytecode file format +so that equivalent systems can encode bytecode files correctly. The LLVM +bytecode representation is used to store the intermediate representation on +disk in compacted form. +

+
+ +
General Concepts
+ +
+

This section describes the general concepts of the bytecode file format +without getting into bit and byte level specifics.

+
+ +
Blocks
+
+

LLVM bytecode files consist simply of a sequence of blocks of bytes. +Each block begins with an identification value that determines the type of +the next block. The possible types of blocks are described below in the section +Block Types. The block identifier is used because +it is possible for entire blocks to be omitted from the file if they are +empty. The block identifier helps the reader determine which kind of block is +next in the file.

+

+Except for the Header Block all blocks are variable +length. The consume just enough bytes to express their contents.

+
+ +
Lists
+
+

Most blocks are constructed of lists of information. Lists can be constructed +of other lists, etc. This decomposition of information follows the containment +hierarchy of the LLVM Intermediate Representation. For example, a function is +composed of a list of basic blocks. Each basic block is composed of a set of +instructions. This list of list nesting and hierarchy is maintained in the +bytecode file.

+

A list is encoded into the file simply by encoding the number of entries as +an integer followed by each of the entries. The reader knows when the list is +done because it will have filled the list with the required numbe of entries. +

+
+ +
Fields
+
+

Fields are units of information that LLVM knows how to write atomically. +Most fields have a uniform length or some kind of length indication built into +their encoding. For example, a constant string (array of SByte or UByte) is +written simply as the length followed by the characters. Although this is +similar to a list, constant strings are treated atomically and are thus +fields.

+

Fields use a condensed bit format specific to the type of information +they must contain. As few bits as possible are written for each field. The +sections that follow will provide the details on how these fields are +written and how the bits are to be interpreted.

+
+ +
Alignment
+
+

To support cross-platform differences, the bytecode file is aligned on +certain boundaries. This means that a small amount of padding (at most 3 bytes) +will be added to ensure that the next entry is aligned to a 32-bit boundary. +

+
+ +
Detailed Layout
+ +
+

This section provides the detailed layout of the LLVM bytecode file format. + bit and byte level specifics.

+
+ +
Notation
+
+

The descriptions of the bytecode format that follow describe the bit + fields in detail. These descriptions are provided in tabular form. Each table + has four columns that specify:

+
    +
  1. Byte(s). The offset in bytes of the field from the start of + its container (block, list, other field).
  2. +
  3. Bit(s). The offset in bits of the field from the start of + the byte field. Bits are always little endian. That is, bit addresses with + smaller values have smaller address (i.e. 2^0 is at bit 0, 2^1 at 1, etc.) +
  4. +
  5. Align? Indicates if this field is aligned to 32 bits or not. + This indicates where the next field starts, always on a 32 bit + boundary.
  6. +
  7. Description. Descripts the contents of the field.
  8. +
+
+ +
Block Types
+
+

The bytecode format encodes the intermediate representation into groups + of bytes known as blocks. The blocks are written sequentially to the file in + the following order:

+
    +
  1. Header. This block contains the file signature + (magic number), machine description and file format version (not LLVM + version).
  2. +
  3. Global Type Pool. This block contains all the + global (module) level types.
  4. +
  5. Module Info. This block contains the types of the + global variables and functions in the module as well as the constant + initializers for the global variables
  6. +
  7. Constants. This block contains all the global + constants except function arguments, global values and constant strings.
  8. +
  9. Functions. One function block is written for + each function in the module.
  10. +
  11. Symbol Table. The module level symbol table that + provides names for the various other entries in the file is the final block + written.
  12. +
+
+ +
Header Block
+
+

The Header Block occurs in every LLVM bytecode file and is always first. It +simply provides a few bytes of data to identify the file, its format, and the +bytecode version. This block is fixed length and always eight bytes, as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Byte(s)Bit(s)Align?Field Description
0000-07NoConstant "l"
0100-07NoConstant "l"
0200-07NoConstant "v"
0300-07NoConstant "m"
04-0700NoTarget is big endian
04-0701NoTarget has long pointers
04-0702NoTarget has no endianess
04-0703NoTarget has no pointer size
04-0704-31YesThe LLVM bytecode format version number
+

+ +
Global Type Pool
+
+
+ +
Module Info
+
+
+ +
Constants
+
+
+ +
Functions
+
+
+ +
Module Symbol Table
+
+
+ + +
+
+ Valid CSS! + Valid HTML 4.01! + + Reid Spencer and + Chris Lattner
+ The LLVM Compiler Infrastructure
+ Last modified: $Date$ +
+ + + diff --git a/docs/index.html b/docs/index.html index 32664b27209..2429b86a30a 100644 --- a/docs/index.html +++ b/docs/index.html @@ -114,6 +114,7 @@ useful for those writing front-ends.
  • Command Line Library
  • Extending LLVM
  • Coding Standards
  • +
  • LLVM Bytecode File Format