From 51f31e07f636003388cdac5299ac82ce63c6be15 Mon Sep 17 00:00:00 2001 From: Reid Spencer Date: Mon, 5 Jul 2004 22:28:02 +0000 Subject: [PATCH] First draft completed. All sections written. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14633 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/BytecodeFormat.html | 428 ++++++++++++++++++++++++++++----------- 1 file changed, 312 insertions(+), 116 deletions(-) diff --git a/docs/BytecodeFormat.html b/docs/BytecodeFormat.html index d9657750cea..9b85a378fd0 100644 --- a/docs/BytecodeFormat.html +++ b/docs/BytecodeFormat.html @@ -26,15 +26,9 @@
  • Slots
  • -
  • General Layout +
  • General Structure
  • +
  • Block Definitions
      -
    1. Structure
    2. -
    -
  • -
  • Detailed Layout -
      -
    1. Notation
    2. -
    3. Blocks Types
    4. Signature Block
    5. Module Block
    6. Global Type Pool
    7. @@ -58,9 +52,6 @@

      Written by Reid Spencer

      -
      -

      Warning: This is a work in progress.

      -
      @@ -203,7 +194,7 @@ variable bit rate encoding as described above.

      ordering. That is bits 20 through 27 are in the byte with the lowest file offset (little endian). - uint_vbr + uint32_vbr A 32-bit unsigned integer that occupies from one to five bytes using variable bit rate encoding. @@ -222,7 +213,7 @@ variable bit rate encoding as described above.

      A single bit within some larger integer field. string - A uint_vbr indicating the type of the character string + A uint32_vbr indicating the type of the constant string which also includes its length, immediately followed by the characters of the string. There is no terminating null byte in the string. @@ -282,25 +273,17 @@ This is exactly what the compaction table does.

      - +
      -

      This section provides the general layout of the LLVM bytecode file format. - The detailed layout can be found in the next section. -

      -
      - - - -
      -

      The bytecode file format requires blocks to be in a certain order and -nested in a particular way so that an LLVM module can be constructed -efficiently from the contents of the file. This ordering defines a general -structure for bytecode files as shown below. The table below shows the order -in which all block types may appear. Please note that some of the blocks are -optional and some may be repeated. The structure is fairly loose because -optional blocks, if empty, are completely omitted from the file. -

      +

      This section provides the general structur of the LLVM bytecode file + format. The bytecode file format requires blocks to be in a certain order and + nested in a particular way so that an LLVM module can be constructed + efficiently from the contents of the file. This ordering defines a general + structure for bytecode files as shown below. The table below shows the order + in which all block types may appear. Please note that some of the blocks are + optional and some may be repeated. The structure is fairly loose because + optional blocks, if empty, are completely omitted from the file.

      @@ -309,48 +292,68 @@ optional blocks, if empty, are completely omitted from the file. + + + - + + - + + - + + - + + - + + - + + - + + - + + - + +
      IDRepeated? Level Block TypeDescription
      N/AFileNoNo0 SignatureThis contains the file signature (magic number) + that identifies the file as LLVM bytecode.
      0x01FileNoNo0 ModuleThis is the top level block in a bytecode file. It + contains all the other blocks.
      0x15ModuleNoNo1    - Global Type Pool   Global Type PoolThis block contains all the global (module) level + types.
      0x14ModuleNoNo1    - Module Globals Info   Module Globals InfoThis block contains the type, constness, and linkage + for each of the global variables in the module. It also contains the + type of the functions and the constant initializers.
      0x12ModuleYesNo1    - Module Constant Pool   Module Constant PoolThis block contains all the global constants + except function arguments, global values and constant strings.
      0x11ModuleYesYes1    - Function Definitions   Function DefinitionsOne function block is written for each function in + the module. The function block contains the instructions, compaction + table, type constant pool, and symbol table for the function.
      0x12FunctionYesNo2       - Function Constant Pool      Function Constant PoolAny constants (including types) used solely + within the function are emitted here in the function constant pool. +
      0x33FunctionYesNo2       - Compaction Table      Compaction TableThis table reduces bytecode size by providing a + funtion-local mapping of type and value slot numbers to their + global slot numbers
      0x32FunctionNoNo2       - Instruction List      Instruction ListThis block contains all the instructions of the + function. The basic blocks are inferred by terminating instructions. +
      0x13FunctionYesNo2       - Function Symbol Table      Function Symbol TableThis symbol table provides the names for the + function specific values used (basic block labels mostly).
      0x13ModuleYesNo1    - Module Symbol Table   Module Symbol TableThis symbol table provides the names for the various + entries in the file that are not function specific (global vars, and + functions mostly).

      Use the links in the table or see Block Types for @@ -358,59 +361,13 @@ details about the contents of each of the block types.

      - +
      -

      This section provides the detailed layout of the LLVM bytecode file format. -

      -
      - - -
      -

      The descriptions of the bytecode format that follow describe the order, type -and bit fields in detail. These descriptions are provided in tabular form. -Each table has four columns that specify:

      -
        -
      1. Byte(s): The offset in bytes of the field from the start of - its container (block, list, other field).
      2. -
      3. Bit(s): The offset in bits of the field from the start of - the byte field. Bits are always little endian. That is, bit addresses with - smaller values have smaller address (i.e. 20 is at bit 0, - 21 at 1, etc.) -
      4. -
      5. Align?: Indicates if this field is aligned to 32 bits or not. - This indicates where the next field starts, always on a 32 bit - boundary.
      6. -
      7. Type: The basic type of information contained in the field.
      8. -
      9. Description: Describes the contents of the field.
      10. -
      -
      - - -
      -

      The bytecode format encodes the intermediate representation into groups - of bytes known as blocks. The blocks are written sequentially to the file in - the following order:

      -
        -
      1. Signature: This contains the file signature - (magic number) that identifies the file as LLVM bytecode and the bytecode - version number.
      2. -
      3. Module Block: This is the top level block in a - bytecode file. It contains all the other blocks.
      4. -
      5. Global Type Pool: This block contains all the - global (module) level types.
      6. -
      7. Module Info: This block contains the types of the - global variables and functions in the module as well as the constant - initializers for the global variables
      8. -
      9. Constants: This block contains all the global - constants except function arguments, global values and constant strings.
      10. -
      11. Functions: One function block is written for - each function in the module.
      12. -
      13. Symbol Table: The module level symbol table that - provides names for the various other entries in the file is the final block - written.
      14. -
      +

      This section provides the detailed layout of the individual block types + in the LLVM bytecode file format.

      +
      @@ -866,9 +823,44 @@ Notes:
    - +

    To be determined.

    + + + + + + + + + + + + + + + + + + + + +
    TypeField Description
    uint32_vbrThe linkage type of the function: 0=External, 1=Weak, + 2=Appending, 3=Internal, 4=LinkOnce1
    constant poolThe constant pool block for this function. + 2 +
    compaction tableThe compaction table block for the function. + 2 +
    instruction listThe list of instructions in the function.
    symbol tableThe function's slot table containing only those + symbols pertinent to the function (mostly block labels). +
    + Notes:
      +
    1. Note that if the linkage type is "External" then none of the other + fields will be present as the function is defined elsewhere.
    2. +
    3. Note that only one of the constant pool or compaction table will be + written. Compaction tables are only written if they will actually save + bytecode space. If not, then a regular constant pool is written.
    4. +
    @@ -929,8 +921,168 @@ Notes:
    -

    To be determined.

    +

    The instructions in a function are written as a simple list. Basic blocks + are inferred by the terminating instruction types. The format of the block + is given in the following table.

    + + + + + + + + + + + + + + +
    TypeField Description
    unsignedInstruction list identifier (0x33).
    unsignedSize in bytes of the instruction list.
    instructionAn instruction.1
    + Notes: +
      +
    1. A repeated field with a variety of formats. See + Instructions for details.
    2. +
    + + + +
    +

    For brevity, instructions are written in one of four formats, depending on + the number of operands to the instruction. Each instruction begins with a + uint32_vbr that encodes the type of the instruction + as well as other things. The tables that follow describe the format of this + first word of each instruction.

    +

    Instruction Format 0

    +

    This format is used for a few instructions that can't easily be optimized + because they have large numbers of operands (e.g. PHI Node or getelementptr). + Each of the opcode, type, and operand fields is as successive fields.

    + + + + + + + + + + + + + + + + + +
    TypeField Description
    uint32_vbrSpecifies the opcode of the instruction. Note that for + compatibility with the other instruction formats, the opcode is shifted + left by 2 bits. Bits 0 and 1 must have value zero for this format.
    uint32_vbrProvides the slot number of the result type of the + instruction
    uint32_vbrThe number of operands that follow.
    uint32_vbrThe slot number of the value for the operand(s). + 1,2
    + Notes:
      +
    1. Repeatable field (limit given by previous field).
    2. +
    3. Note that if the instruction is a getelementptr and the type of the + operand is a sequential type (array or pointer) then the slot number is + shifted up two bits and the low order bits will encode the type of index + used, as follows: 0=uint, 1=int, 2=ulong, 3=long.
    4. +
    +

    Instruction Format 1

    +

    This format encodes the opcode, type and a single operand into a single + uint32_vbr as follows:

    + + + + + + + + + + + + + + + + + + + +
    BitsTypeField Description
    0-1constant "1"These two bits must be the value 1 which identifies + this as an instruction of format 1.
    2-7opcodeSpecifies the opcode of the instruction. Note that + the maximum opcode value si 63.
    8-19unsignedSpecifies the slot number of the type for this + instruction. Maximum slot number is 212-1=4095.
    20-31unsignedSpecifies the slot number of the value for the + first operand. Maximum slot number is 212-1=4095. Note + that the value 212-1 denotes zero operands.
    +

    Instruction Format 2

    +

    This format encodes the opcode, type and two operands into a single + uint32_vbr as follows:

    + + + + + + + + + + + + + + + + + + + + + + +
    BitsTypeField Description
    0-1constant "2"These two bits must be the value 2 which identifies + this as an instruction of format 2.
    2-7opcodeSpecifies the opcode of the instruction. Note that + the maximum opcode value si 63.
    8-15unsignedSpecifies the slot number of the type for this + instruction. Maximum slot number is 28-1=255.
    16-23unsignedSpecifies the slot number of the value for the + first operand. Maximum slot number is 28-1=255.
    24-31unsignedSpecifies the slot number of the value for the + second operand. Maximum slot number is 28-1=255.
    +

    Instruction Format 3

    +

    This format encodes the opcode, type and three operands into a single + uint32_vbr as follows:

    + + + + + + + + + + + + + + + + + + + + + + + + + +
    BitsTypeField Description
    0-1constant "3"These two bits must be the value 3 which identifies + this as an instruction of format 3.
    2-7opcodeSpecifies the opcode of the instruction. Note that + the maximum opcode value si 63.
    8-13unsignedSpecifies the slot number of the type for this + instruction. Maximum slot number is 26-1=63.
    14-19unsignedSpecifies the slot number of the value for the + first operand. Maximum slot number is 26-1=63.
    20-25unsignedSpecifies the slot number of the value for the + second operand. Maximum slot number is 26-1=63.
    26-31unsignedSpecifies the slot number of the value for the + third operand. Maximum slot number is 26-1=63.
    +
    +
    @@ -942,38 +1094,81 @@ number of the value and the name associated with that value are written. The format is given in the table below.

    - - - - + - + - + - - + + - - - - - - + +
    Byte(s)Bit(s)Align? Type Field Description
    00-03-Nounsignedunsigned Symbol Table Identifier (0x13)
    04-07-Nounsignedunsigned Size in bytes of the symbol table block.
    08-111-Nouint32_vbruint32_vbr Number of entries in type plane
    12-151-Nouint32_vbrType plane index for following entriessymtab_entryProvides the slot number of the type and its name. + 1
    16-191,2-Nouint32_vbrSlot number of a value.
    variable1,2-NostringName of the value in the symbol table.
    symtab_planeA type plane containing value slot number and name + for all values of the same type.1
    Notes:
      -
    1. Maximum length shown, may be smaller
    2. Repeated field.
    + + + +
    +

    A symbol table plane provides the symbol table entries for all values of + a common type. The encoding is given in the following table:

    + + + + + + + + + + + + + + +
    TypeField Description
    uint32_vbrNumber of entries in this plane.
    uint32_vbrSlot number of type for this plane.
    symtab_entryThe symbol table entries for this plane (repeated).
    +
    + + + +
    +

    A symbol table entry provides the assocation between a type or value's + slot number and the name given to that type or value. The format is given + in the following table:

    + + + + + + + + + + + + + + +
    TypeField Description
    uint32_vbrSlot number of the type or value being given a name. +
    uint32_vbrLength of the character array that follows.
    charThe characters of the name (repeated).
    +
    + @@ -984,6 +1179,7 @@ current version is as documented in the previous sections. Each section here describes the differences between that version and the one that follows.

    + @@ -1037,7 +1233,7 @@ describes the differences between that version and the one that follows. +Version 1.0 Differences From 1.1

    None. Version 1.0 and 1.1 bytecode formats are identical.