diff --git a/docs/LangRef.html b/docs/LangRef.html index cf514490a8a..86041a4d49f 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -28,6 +28,7 @@
  • High Level Structure
    1. Module Structure +
    2. Global Variables
    3. Function Structure
  • Instruction Reference @@ -65,9 +66,9 @@
  • 'malloc' Instruction
  • 'free' Instruction
  • 'alloca' Instruction -
  • 'getelementptr' Instruction
  • 'load' Instruction
  • 'store' Instruction +
  • 'getelementptr' Instruction
  • Other Operations
      @@ -76,25 +77,14 @@
    1. 'icall' Instruction
    2. 'phi' Instruction
    -
  • Builtin Functions - -
  • TODO List -
      -
    1. Exception Handling Instructions -
    2. Synchronization Instructions -
    -
  • Possible Extensions -
      -
    1. 'tailcall' Instruction -
    2. Global Variables -
    3. Explicit Parrellelism
  • Related Work -

    +

    +
    Abstract

      @@ -102,7 +92,7 @@
      This document describes the LLVM assembly language. LLVM is an SSA based representation that is a useful midlevel IR, providing type safety, low level - operations, flexibility, and the capability to represent 'all' high level + operations, flexibility, and the capability of representing 'all' high level languages cleanly.
      @@ -110,7 +100,8 @@ -
    + +
    Introduction
      @@ -150,16 +141,16 @@ syntactically okay, but not well formed:

      The LLVM api provides a verification pass (created by the createVerifierPass function) that may be used to verify that an LLVM module is well formed. This pass is automatically run by the parser after -parsing input assembly, and by the optimizer before it outputs bytecode. Often, +parsing input assembly, and by the optimizer before it outputs bytecode. The violations pointed out by the verifier pass indicate bugs in transformation -passes.

      - +passes or input to the parser.

      Describe the typesetting conventions here. -

    + +
    Identifiers
      @@ -167,7 +158,7 @@ Describe the typesetting conventions here. LLVM uses three different forms of identifiers, for different purposes:

        -
      1. Numeric constants are represented as you would expect: 12, -3 123.421, etc. +
      2. Numeric constants are represented as you would expect: 12, -3 123.421, etc. Floating point constants have an optional hexidecimal notation.
      3. Named values are represented as a string of characters with a '%' prefix. For example, %foo, %DivisionByZero, %a.really.long.identifier. The actual regular expression used is '%[a-zA-Z$._][a-zA-Z$._0-9]*'.
      4. Unnamed values are represented as an unsigned numeric value with a '%' prefix. For example, %12, %2, %44.

      @@ -191,19 +182,19 @@ by 8:

      The easy way:

      -  %result = mul int %X, 8
      +  %result = mul uint %X, 8
       
      After strength reduction:
      -  %result = shl int %X, ubyte 3
      +  %result = shl uint %X, ubyte 3
       
      And the hard way:
      -  add int %X, %X           ; yields {int}:%0
      -  add int %0, %0           ; yields {int}:%1
      -  %result = add int %1, %1
      +  add uint %X, %X           ; yields {int}:%0
      +  add uint %0, %0           ; yields {int}:%1
      +  %result = add uint %1, %1
       
      This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:

      @@ -220,28 +211,41 @@ demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic text.

      +The one unintuitive notation for constants is the optional hexidecimal form of +floating point constants. For example, the form 'double +0x432ff973cafa8000' is equivalent to (but harder to read than) 'double +4.5e+15' which is also supported by the parser. The only time hexadecimal +floating point constants are useful (and the only time that they are generated +by the disassembler) is when an FP constant has to be emitted that is not +representable as a decimal floating point number exactly. For example, NaN's, +infinities, and other special cases are represented in their IEEE hexadecimal +format so that assembly and disassembly do not cause any bits to change in the +constants.

      -

    Overview:
    -The structure type is used to represent a collection of data members together in memory. Although the runtime is allowed to lay out the data members any way that it would like, they are guaranteed to be "close" to each other.

    +The structure type is used to represent a collection of data members together in +memory. Although the runtime is allowed to lay out the data members any way +that it would like, they are guaranteed to be "close" to each other.

    -Structures are accessed using 'load and 'store' by getting a pointer to a field with the 'getelementptr' instruction.

    +Structures are accessed using 'load and 'store' by getting a pointer to a field with the 'getelementptr' instruction.

    Syntax:
    @@ -450,52 +458,130 @@ Packed types should be 'nonsaturated' because standard data types are not satura
     
     
     
    -
    + +
    Type System
      -The LLVM type system is critical to the overall usefulness of the language and -runtime. Being strongly typed enables a number of optimizations to be performed -on the IR directly, without having to do extra analyses on the side before the -transformation. A strong type system makes it easier to read the generated code -and enables novel analyses and transformations that are not feasible to perform -on normal three address code representations.

      +The LLVM type system is one of the most important features of the intermediate +representation. Being strongly typed enables a number of optimizations to be +performed on the IR directly, without having to do extra analyses on the side +before the transformation. A strong type system makes it easier to read the +generated code and enables novel analyses and transformations that are not +feasible to perform on normal three address code representations.

      -The assembly language form for the type system was heavily influenced by the -type problems in the C language1.

      +The written form for the type system was heavily influenced by the syntactic +problems with types in the C language1.

      -

    - +
       + +
       Primitive Types
      @@ -285,7 +289,7 @@ These different primitive types fall into a few useful classifications:

    unsignedubyte, ushort, uint, ulong
    integralubyte, sbyte, ushort, short, uint, int, ulong, long
    floating pointfloat, double
    first classbool, ubyte, sbyte, ushort, short,
    uint, int, ulong, long, float, double
    first classbool, ubyte, sbyte, ushort, short,
    uint, int, ulong, long, float, double, pointer

    @@ -318,7 +322,7 @@ underlying data type.

    [<# elements> x <elementtype>] -The number of elements is a constant integer value, elementtype may be any time +The number of elements is a constant integer value, elementtype may be any type with a size.

    Examples:
    @@ -386,9 +390,13 @@ LLVM.
    + +
    High Level Structure
      -
       + +
       Module Structure
       + +
       +Global Variables +
      + +Global variables define regions of memory allocated at compilation time instead +of runtime. Global variables, may optionally be initialized. A variable may be +defined as a global "constant", which indicates that the contents of the +variable will never be modified (opening options for optimization). Constants +must always have an initial value.

      + +As SSA values, global variables define pointer values that are in scope in +(i.e. they dominate) all basic blocks in the program. Global variables always +define a pointer to their "content" type because they describe a region of +memory, and all memory objects in LLVM are accessed through pointers.

      + + + + +

    +
       Function Structure
      +LLVM functions definitions are composed of a (possibly empty) argument list, an +opening curly brace, a list of basic blocks, and a closing curly brace. LLVM +function declarations are defined with the "declare" keyword, a +function name and a function signature.

      -talk about the optional constant pool

      -talk about how basic blocks delinate labels

      -talk about how basic blocks end with terminators

      +A function definition contains a list of basic blocks, forming the CFG for the +function. Each basic block may optionally start with a label (giving the basic +block a symbol table entry), contains a list of instructions, and ends with a terminator instruction (such as a branch or function +return).

      + +The first basic block in program is special in two ways: it is immediately +executed on entrance to the function, and it is not allowed to have predecessor +basic blocks (i.e. there can not be any branches to the entry block of a +function).

      -

    + +
    Instruction Reference
       + +
       Terminator Instructions
      - - -As was mentioned previously, every basic block -in a program ends with a "Terminator" instruction. All of these terminator -instructions yield a 'void' value: they produce control flow, not -values.

      +As mentioned previously, every basic block in a +program ends with a "Terminator" instruction, which indicates where control flow +should go now that this basic block has been completely executed. These +terminator instructions typically yield a 'void' value: they produce +control flow, not values (the one exception being the 'invoke' instruction).

      There are four different terminator instructions: the 'ret' instruction, the 'invoke' instruction.

      Overview:
      - The 'ret' instruction is used to return control flow (and optionally a - value) from a function, back to the caller.

      +The 'ret' instruction is used to return control flow (and a value) from +a function, back to the caller.

      There are two forms of the 'ret' instructruction: one that returns a value and then causes control flow, and one that just causes control flow to @@ -578,9 +664,9 @@ Test: %cond = seteq int %a, %b br bool %cond, label %IfEqual, label %IfUnequal IfEqual: - ret bool true + ret int 1 IfUnequal: - ret bool false + ret int 0 @@ -611,9 +697,9 @@ generally useful if the values to switch on are spread far appart, where index branching is useful if the values to switch on are generally dense.

      The two different forms of the 'switch' statement are simple hints to -the underlying virtual machine implementation. For example, a virtual machine -may choose to implement a small indirect branch table as a series of predicated -comparisons: if it is faster for the target architecture.

      +the underlying implementation. For example, the compiler may choose to +implement a small indirect branch table as a series of predicated comparisons: +if it is faster for the target architecture.

      Arguments:
      @@ -648,16 +734,7 @@ provided as part of the constant values type.

      ; Emulate an unconditional br instruction switch uint 0, label %dest, [ 0 x label] [ ] - ; Implement a jump table using the constant pool: - void "testmeth"(int %arg0) - %switchdests = [3 x label] [ label %onzero, label %onone, label %ontwo ] - begin - ... - switch uint %val, label %otherwise, [3 x label] %switchdests... - ... - end - - ; Implement the equivilent jump table directly: + ; Implement a jump table: switch uint %val, label %otherwise, [3 x label] [ label %onzero, label %onone, label %ontwo ] @@ -689,7 +766,7 @@ This instruction requires several arguments:

      1. 'ptr to function ty': shall be the signature of the pointer to -function value being invoked. In most cases, this is a direct method +function value being invoked. In most cases, this is a direct function invocation, but indirect invoke's are just as possible, branching off an arbitrary pointer to function value.

        @@ -707,7 +784,14 @@ a 'ret' instruction.

        Semantics:
        -This instruction is designed to operate as a standard 'call' instruction in most regards. The primary difference is that it assiciates a label with the function invocation that may be accessed via the runtime library provided by the execution environment. This instruction is used in languages with destructors to ensure that proper cleanup is performed in the case of either a longjmp or a thrown exception. Additionally, this is important for implementation of 'catch' clauses in high-level languages that support them.

        +This instruction is designed to operate as a standard 'call' instruction in most regards. The primary +difference is that it associates a label with the function invocation that may +be accessed via the runtime library provided by the execution environment. This +instruction is used in languages with destructors to ensure that proper cleanup +is performed in the case of either a longjmp or a thrown exception. +Additionally, this is important for implementation of 'catch' clauses +in high-level languages that support them.

        For a more comprehensive explanation of this instruction look in the llvm/docs/2001-05-18-ExceptionHandling.txt document.

        @@ -720,7 +804,8 @@ For a more comprehensive explanation of this instruction look in the llvm/docs/2 -

       + +
       Unary Operations
       + +
       Bitwise Binary Operations
      -Bitwise binary operators are used to do various forms of bit-twiddling in a program. They are generally very efficient instructions, and can commonly be strength reduced from other instructions. They require two operands, execute an operation on them, and produce a single value. The resulting value of the bitwise binary operators is always the same type as its first operand.

      +Bitwise binary operators are used to do various forms of bit-twiddling in a +program. They are generally very efficient instructions, and can commonly be +strength reduced from other instructions. They require two operands, execute an +operation on them, and produce a single value. The resulting value of the +bitwise binary operators is always the same type as its first operand.


    'and' Instruction

       + +
       Memory Access Operations
      @@ -1215,6 +1307,67 @@ address available, as well as spilled variables.

      + +


    'load' Instruction


    'store' Instruction


    'getelementptr' Instruction

      @@ -1253,84 +1406,9 @@ TODO. - -


    'load' Instruction


    'store' Instruction

       + +
       Other Operations
      @@ -1423,105 +1501,10 @@ Example: - -
       -Builtin Functions -
      - -Notice: Preliminary idea!

      - -Builtin functions are very similar to normal functions, except they are defined by the implementation. Invocations of these functions are very similar to function invocations, except that the syntax is a little less verbose.

      - -Builtin functions are useful to implement semi-high level ideas like a 'min' or 'max' operation that can have important properties when doing program analysis. For example: - -

        -
      • Some optimizations can make use of identities defined over the functions, - for example a parrallelizing compiler could make use of 'min' - identities to parrellelize a loop. -
      • Builtin functions would have polymorphic types, where normal function calls - may only have a single type. -
      • Builtin functions would be known to not have side effects, simplifying - analysis over straight function calls. -
      • The syntax of the builtin are cleaner than the syntax of the - 'call' instruction (very minor point). -
      - -Because these invocations are explicit in the representation, the runtime can choose to implement these builtin functions any way that they want, including: - -
        -
      • Inlining the code directly into the invocation -
      • Implementing the functions in some sort of Runtime class, convert invocation - to a standard function call. -
      • Implementing the functions in some sort of Runtime class, and perform - standard inlining optimizations on it. -
      - -Note that these builtins do not use quoted identifiers: the name of the builtin effectively becomes an identifier in the language.

      - -Example: -

      -  ; Example of a normal function call
      -  %maximum = call int %maximum(int %arg1, int %arg2)   ; yields {int}:%maximum
      -
      -  ; Examples of potential builtin functions
      -  %max = max(int %arg1, int %arg2)                     ; yields {int}:%max
      -  %min = min(int %arg1, int %arg2)                     ; yields {int}:%min
      -  %sin = sin(double %arg)                              ; yields {double}:%sin
      -  %cos = cos(double %arg)                              ; yields {double}:%cos
      -
      -  ; Show that builtin's are polymorphic, like instructions
      -  %max = max(float %arg1, float %arg2)                 ; yields {float}:%max
      -  %cos = cos(float %arg)                               ; yields {float}:%cos
      -
      - -The 'maximum' vs 'max' example illustrates the difference in calling semantics between a 'call' instruction and a builtin function invocation. Notice that the 'maximum' example assumes that the function is defined local to the caller.

      - - - -

    -TODO List -
      - - -This list of random topics includes things that will need to be addressed before the llvm may be used to implement a java like langauge. Right now, it is pretty much useless for any language, given to unavailable of structure types

      - - -


    Synchronization Instructions

      - -We will need some type of synchronization instructions to be able to implement stuff in Java well. The way I currently envision doing this is to introduce a 'lock' type, and then add two (builtin or instructions) operations to lock and unlock the lock.

      - - - -

    -Possible Extensions -


    'tailcall' Instruction


    Global Variables

      - -In order to represent programs written in languages like C, we need to be able to support variables at the module (global) scope. Perhaps they should be written outside of the module definition even. Maybe global functions should be handled like this as well.

      - - - -


    Explicit Parrellelism

      - -With the rise of massively parrellel architectures (like the IA64 architecture, multithreaded CPU cores, and SIMD data sets) it is becoming increasingly more important to extract all of the ILP from a code stream possible. It would be interesting to research encoding functions that can explicitly represent this. One straightforward way to do this would be to introduce a "stop" instruction that is equilivent to the IA64 stop bit.

      - - - - -

    + +
    Related Work
      @@ -1590,7 +1573,7 @@ more...
      Chris Lattner
      -Last modified: Sun Apr 14 01:12:55 CDT 2002 +Last modified: Fri May 3 14:39:52 CDT 2002