From 69b3e40196f837d5dbc484cd708b6e0489085787 Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Sun, 13 May 2007 01:39:44 +0000 Subject: [PATCH] add initial description of llvm top-level stuff. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37017 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/BitCodeFormat.html | 96 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 93 insertions(+), 3 deletions(-) diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html index 70f0ec27492..7194c7a6d34 100644 --- a/docs/BitCodeFormat.html +++ b/docs/BitCodeFormat.html @@ -22,7 +22,11 @@
  • Standard Blocks
  • -
  • LLVM IR Encoding
  • +
  • LLVM IR Encoding +
      +
    1. Basics
    2. +
    +
  • Written by Chris Lattner. @@ -114,7 +118,8 @@ is used by a reader to know what is contained in the file.

    A bitstream literally consists of a stream of bits. This stream is made up of a -number of primitive values that encode a stream of integer values. These +number of primitive values that encode a stream of unsigned integer values. +These integers are are encoded in two ways: either as Fixed Width Integers or as Variable Width Integers. @@ -505,7 +510,92 @@ abbreviation.

    -

    +

    LLVM IR is encoded into a bitstream by defining blocks and records. It uses +blocks for things like constant pools, functions, symbol tables, etc. It uses +records for things like instructions, global variable descriptors, type +descriptions, etc. This document does not describe the set of abbreviations +that the writer uses, as these are fully self-described in the file, and the +reader is not allowed to build in any knowledge of this.

    + +
    + + + + + + + +
    + +

    +The magic number for LLVM IR files is: +

    + +

    ['B'8, 'C'8, 0x04, 0xC4, +0xE4, 0xD4]

    + +

    When viewed as bytes, this is "BC 0xC0DE".

    + +
    + + + + +
    + +

    +Variable Width Integers are an efficient way to +encode arbitrary sized unsigned values, but is an extremely inefficient way to +encode signed values (as signed values are otherwise treated as maximally large +unsigned values).

    + +

    As such, signed vbr values of a specific width are emitted as follows:

    + +
      +
    • Positive values are emitted as vbrs of the specified width, but with their + value shifted left by one.
    • +
    • Negative values are emitted as vbrs of the specified width, but the negated + value is shifted left by one, and the low bit is set.
    • +
    + +

    With this encoding, small positive and small negative values can both be +emitted efficiently.

    + +
    + + + + + +
    + +

    +LLVM IR is defined with the following blocks: +

    + +
      +
    • 8 - MODULE_BLOCK - This is the top-level block that contains the + entire module, and describes a variety of per-module information.
    • +
    • 9 - PARAMATTR_BLOCK - This enumerates the parameter attributes.
    • +
    • 10 - TYPE_BLOCK - This describes all of the types in the module.
    • +
    • 11 - CONSTANTS_BLOCK - This describes constants for a module or + function.
    • +
    • 12 - FUNCTION_BLOCK - This describes a function body.
    • +
    • 13 - TYPE_SYMTAB_BLOCK - This describes the type symbol table.
    • +
    • 14 - VALUE_SYMTAB_BLOCK - This describes a value symbol table.
    • +
    + +
    + + + + +
    + +

    +