llvm Assembly Language Reference Manual

llvm Assembly Language Reference Manual

Abstract
Introduction
Identifiers
Type System
1. Primitive Types
  1. Type Classifications
2. Derived Types
High Level Structure
1. Module Structure
2. Method Structure
Instruction Reference
TODO List
1. Exception Handling Instructions
2. Synchronization Instructions
Possible Extensions
Related Work

This document describes the LLVM assembly language IR/VM. LLVM is an SSA based representation that attempts to be a useful midlevel IR by providing type safety, low level operations, flexibility, and the capability to represent 'all' high level languages cleanly.

Introduction

This dual nature leads to three different representations of LLVM (the human readable assembly representation, the compact bytecode representation, and the in memory, pointer based, representation). This document describes the human readable representation and notation.

The LLVM representation aims to be a light weight and low level while being expressive, type safe, and extensible at the same time. It aims to be a "universal IR" of sorts, by being at a low enough level that high level ideas may be cleanly mapped to it. By providing type safety, LLVM can be used as the target of optimizations: for example, through pointer analysis, it can be proven that a C automatic variable is never accessed outside of the current function... allowing it to be promoted to a simple SSA value instead of a memory location.

Well Formedness

It is important to note that this document describes 'well formed' llvm assembly language. There is a difference between what the parser accepts and what is considered 'well formed'. For example, the following instruction is syntactically okay, but not well formed:

  %x = add int 1, %x

phi

verify

Describe the typesetting conventions here.

Identifiers

Numeric constants are represented as you would expect: 12, -3 123.421, etc.
Named values are represented as a string of characters with a '%' prefix. For example, %foo, %DivisionByZero, %a.really.long.identifier. The actual regular expression used is '%[a-zA-Z$._][a-zA-Z$._0-9]*'.
Unnamed values are represented as an unsigned numeric value with a '%' prefix. For example, %12, %2, %44.

LLVM requires the values start with a '%' sign for two reasons: Compilers don't need to worry about name clashes with reserved words, and the set of reserved words may be expanded in the future without penalty. Additionally, unnamed identifiers allow a compiler to quickly come up with a temporary variable without having to avoid symbol table conflicts.

Reserved words in LLVM are very similar to reserved words in other languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', etc...), and others. These reserved words cannot conflict with variable names, because none of them may start with a '%' character.

Here is an example of LLVM code to multiply the integer variable '%X' by 8:

The easy way:

  %result = mul int %X, 8

  %result = shl int %X, ubyte 3

  add int %X, %X           ; yields {int}:%0
  add int %0, %0           ; yields {int}:%1
  %result = add int %1, %1

%X

Comments are delimited with a ';' and go until the end of line.
Unnamed temporaries are created when the result of a computation is not assigned to a named value.
Unnamed temporaries are numbered sequentially

...and it also show a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic text.

Type System

The assembly language form for the type system was heavily influenced by the type problems in the C language¹.

Primitive Types

void No value

ubyte Unsigned 8 bit value

ushort Unsigned 16 bit value

uint Unsigned 32 bit value

ulong Unsigned 64 bit value

float 32 bit floating point value

label Branch destination

bool True or False value

sbyte Signed 8 bit value

short Signed 16 bit value

int Signed 32 bit value

long Signed 64 bit value

double 64 bit floating point value

lock Recursive mutex value

Type Classifications

signed sbyte, short, int, long, float, double

unsigned ubyte, ushort, uint, ulong

integral ubyte, sbyte, ushort, short, uint, int, ulong, long

floating point float, double

first class bool, ubyte, sbyte, ushort, short, uint, int, ulong, long, float, double, lock

Derived Types

Array Type

Overview:

The array type is a very simple derived type. It arranges elements sequentially in memory. There are two different forms of the array type:

Fixed size array type:
The simplest form of the array type, has a size hard coded in as part of the type. Thus these are three distinct type qualifiers:
[40 x int ]: Array of 40 integer values.
[41 x int ]: Array of 41 integer values.
[40 x uint]: Array of 40 unsigned integer values.
Fixed sized arrays are very useful for compiler optimization passes and for representing analysis results. Additionally, multidimensional arrays must have fixed sizes for all dimensions except the outer-most dimension.
Dynamically sized array type:
The dynamically sized arrays are very similar to the fixed size arrays, except that the size of the array is calculated at runtime by the virtual machine. This is useful for representing generic methods that take any size array as an argument, or when representing Java style arrays.

Here are some examples of multidimensional arrays:

`[3 x [4 x int]]`	: 3x4 array integer values.
`[[10 x int]]`	: Nx10 array of integer values.
`[2 x [3 x [4 x uint]]]`	: 2x3x4 array of unsigned integer values.

Method Type

Overview:

Syntax:

  <returntype> (<parameter list>)

<parameter list>

Examples:

`int (int)`	: method taking an `int`, returning an `int`
`float (int, int ) `	: Pointer to a method that takes an `int` and a pointer to `int`, returning `float`.

Structure Type

Overview:

The structure type is used to represent a collection of data members together in memory. Although the runtime is allowed to lay out the data members any way that it would like, they are guaranteed to be "close" to each other.

Structures are accessed using 'load and 'store' by getting a pointer to a field with the 'getelementptr' instruction.

Syntax:

  { <type list> }

Examples:

`{ int, int, int }`	: a triple of three `int` values
`{ float, int (int ) }`	: A pair, where the first element is a `float` and the second element is a pointer to a method that takes an `int`, returning an `int`.

Pointer Type

Packed Type

Packed types should be 'nonsaturated' because standard data types are not saturated. Maybe have a saturated packed type?

High Level Structure

Module Structure

Method Structure

talk about how basic blocks delinate labels

talk about how basic blocks end with terminators

Instruction Reference

List all of the instructions, list valid types that they accept. Tell what they do and stuff also.

Terminator Instructions

As was mentioned

previously

void

There are three different terminator instructions: the 'ret' instruction, the 'br' instruction, and the 'switch' instruction.

'`ret`' Instruction

Syntax:

  ret <type> <value>       ; Return a value from a non-void method
  ret void                 ; Return from void method

Overview:

The 'ret' instruction is used to return control flow (and optionally a value) from a method, back to the caller.

There are two forms of the 'ret' instructruction: one that returns a value and then causes control flow, and one that just causes control flow to occur.

Arguments:

The 'ret' instruction may return any '

first class

well formed

ret

Semantics:

ret

Example:

  ret int 5                       ; Return an integer value of 5
  ret void                        ; Return from a void method

'`br`' Instruction

Syntax:

  br bool <cond>, label <iftrue>, label <iffalse>
  br label <dest>          ; Unconditional branch

Overview:

The 'br' instruction is used to cause control flow to transfer to a different basic block in the current method. There are two forms of this instruction, corresponding to a conditional branch and an unconditional branch. The 'br' instruction is a (useful) special case '

switch

Arguments:

br

bool

label

br

label

Semantics:

br

bool

true

iftrue

label

false

iffalse

label

Example:

Test:
  %cond = seteq int %a, %b
  br bool %cond, label %IfEqual, label %IfUnequal
IfEqual:
  ret bool true
IfUnequal:
  ret bool false

'`switch`' Instruction

Syntax:

  ; Definitions for lookup indirect branch
  %switchtype = type [<anysize> x { uint, label }]

  ; Lookup indirect branch
  switch uint <value>, label <defaultdest>, %switchtype <switchtable>

  ; Indexed indirect branch
  switch uint <idxvalue>, label <defaultdest>, [<anysize> x label] <desttable>

Overview:

The 'switch' instruction is used to transfer control flow to one of several different places. It is a simple generalization of the 'br' instruction, and supports a strict superset of its functionality.

The 'switch' statement supports two different styles of indirect branching: lookup branching and indexed branching. Lookup branching is generally useful if the values to switch on are spread far appart, where index branching is useful if the values to switch on are generally dense.

The two different forms of the 'switch' statement are simple hints to the underlying virtual machine implementation. For example, a virtual machine may choose to implement a small indirect branch table as a series of predicated comparisons: if it is faster for the target architecture.

Arguments:

The lookup form of the 'switch' instruction uses three parameters: a 'uint' comparison value 'value', a default 'label' destination, and a sized array of pairs of comparison value constants and 'label's. The sized array must be a constant value.

The indexed form of the 'switch' instruction uses three parameters: an 'uint' index value, a default 'label' and a sized array of 'label's. The 'dests' array must be a constant array.

Semantics:

The lookup style switch statement specifies a table of values and destinations. When the 'switch' instruction is executed, this table is searched for the given value. If the value is found, the corresponding destination is branched to.

The index branch form simply looks up a label element directly in a table and branches to it.

In either case, the compiler knows the static size of the array, because it is provided as part of the constant values type.

Example:

  ; Emulate a conditional br instruction
  %Val = cast bool %value to uint
  switch uint %Val, label %truedest, [1 x label] [label %falsedest ]

  ; Emulate an unconditional br instruction
  switch uint 0, label %dest, [ 0 x label] [ ]

  ; Implement a jump table using the constant pool:
  void "testmeth"(int %arg0)
    %switchdests = [3 x label] [ label %onzero, label %onone, label %ontwo ]
  {
  ...
    switch uint %val, label %otherwise, [3 x label] %switchdests...
  ...
  }

  ; Implement the equivilent jump table directly:
  switch uint %val, label %otherwise, [3 x label] [ label %onzero, 
                                                    label %onone, 
                                                    label %ontwo ]

'`call .. with`' Instruction

Syntax:

  <result> = call <method ty> %<method name>(<method args>) with label <break label>

Overview:

The 'call .. with' instruction is used to cause control flow to transfer to a specified method, with the possibility of control flow transfer to the 'break label' label, in addition to the possibility of fallthrough to the next basic block. The '

call

TODO: icall .. with needs to be defined as well for an indirect call.

Arguments:

'method ty': shall be the signature of the named method being invoked. This must be a method type.
'method name': method name to be invoked.
'method args': argument list whose types match the method signature argument types.
'break label': a label that specifies the break label associated with this call.

Semantics:

call

longjmp

catch

For a more comprehensive explanation of this instruction look in the llvm/docs/2001-05-18-ExceptionHandling.txt document.

Example:

  %retval = call int (int) %Test(int 15) with label %TestCleanup     ; {int}:retval set

Unary Operations

There is only one unary operators: the 'not' instruction.

'`not`' Instruction

Syntax:

  <result> = not <ty> <var>       ; yields {ty}:result

Overview:

The 'not' instruction returns the

logical

Arguments:

not

integral

Semantics:

not

logical

integral

Note that the 'not' instruction is is not defined over to 'bool' type. To invert a boolean value, the recommended method is to use:

  <result> = xor bool true, <var> ; yields {bool}:result

Example:

  %x = not int 1                  ; {int}:x is now equal to 0
  %x = not bool true              ; {bool}:x is now equal to false

Binary Operations

There are several different binary operators:

'`add`' Instruction

Syntax:

  <result> = add <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'add' instruction returns the sum of its two operands.

Arguments:

The two arguments to the 'add' instruction must be either

integral

floating point

Semantics:

Example:

  <result> = add int 4, %var          ; yields {int}:result = 4 + %var

'`sub`' Instruction

Syntax:

  <result> = sub <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'sub' instruction returns the difference of its two operands.

Note that the 'sub' instruction is the cannonical way the 'neg' instruction is represented as well.

Arguments:

The two arguments to the 'sub' instruction must be either

integral

floating point

Semantics:

Example:

  <result> = sub int 4, %var          ; yields {int}:result = 4 - %var
  <result> = sub int 0, %val          ; yields {int}:result = -%var

'`mul`' Instruction

Syntax:

  <result> = mul <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'mul' instruction returns the product of its two operands.

Arguments:

The two arguments to the 'mul' instruction must be either

integral

floating point

Semantics:

There is no signed vs unsigned multiplication. The appropriate action is taken based on the type of the operand.

Example:

  <result> = mul int 4, %var          ; yields {int}:result = 4 * %var

'`div`' Instruction

Syntax:

  <result> = div <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'div' instruction returns the quotient of its two operands.

Arguments:

The two arguments to the 'div' instruction must be either

integral

floating point

Semantics:

Example:

  <result> = div int 4, %var          ; yields {int}:result = 4 / %var

'`rem`' Instruction

Syntax:

  <result> = rem <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'rem' instruction returns the remainder from the division of its two operands.

Arguments:

The two arguments to the 'rem' instruction must be either

integral

floating point

Semantics:

...

Example:

  <result> = rem int 4, %var          ; yields {int}:result = 4 % %var

'`setcc`' Instructions

Syntax:

  <result> = seteq <ty> <var1>, <var2>   ; yields {bool}:result
  <result> = setne <ty> <var1>, <var2>   ; yields {bool}:result
  <result> = setlt <ty> <var1>, <var2>   ; yields {bool}:result
  <result> = setgt <ty> <var1>, <var2>   ; yields {bool}:result
  <result> = setle <ty> <var1>, <var2>   ; yields {bool}:result
  <result> = setge <ty> <var1>, <var2>   ; yields {bool}:result

Overview:

The 'setcc' family of instructions returns a boolean value based on a comparison of their two operands.

Arguments:

The two arguments to the 'setcc' instructions must be of

first class

derived

label

void

The 'setlt', 'setgt', 'setle', and 'setge' instructions do not operate on 'bool' typed arguments.

Semantics:

seteq

true

bool

setne

true

bool

setlt

true

bool

setgt

true

bool

setle

true

bool

setge

true

bool

Example:

  <result> = seteq int   4, 5        ; yields {bool}:result = false
  <result> = setne float 4, 5        ; yields {bool}:result = true
  <result> = setlt uint  4, 5        ; yields {bool}:result = true
  <result> = setgt sbyte 4, 5        ; yields {bool}:result = false
  <result> = setle sbyte 4, 5        ; yields {bool}:result = true
  <result> = setge sbyte 4, 5        ; yields {bool}:result = false

Bitwise Binary Operations

'`and`' Instruction

Syntax:

  <result> = and <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'and' instruction returns the bitwise logical and of its two operands.

Arguments:

The two arguments to the 'and' instruction must be either

integral

bool

Semantics:

Example:

  <result> = and int 4, %var         ; yields {int}:result = 4 & %var
  <result> = and int 15, 40          ; yields {int}:result = 8
  <result> = and int 4, 8            ; yields {int}:result = 0

'`or`' Instruction

Syntax:

  <result> = or <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'or' instruction returns the bitwise logical inclusive or of its two operands.

Arguments:

The two arguments to the 'or' instruction must be either

integral

bool

Semantics:

Example:

  <result> = or int 4, %var         ; yields {int}:result = 4 | %var
  <result> = or int 15, 40          ; yields {int}:result = 47
  <result> = or int 4, 8            ; yields {int}:result = 12

'`xor`' Instruction

Syntax:

  <result> = xor <ty> <var1>, <var2>   ; yields {ty}:result

Overview:

The 'xor' instruction returns the bitwise logical exclusive or of its two operands.

Arguments:

The two arguments to the 'xor' instruction must be either

integral

bool

Semantics:

Example:

  <result> = xor int 4, %var         ; yields {int}:result = 4 ^ %var
  <result> = xor int 15, 40          ; yields {int}:result = 39
  <result> = xor int 4, 8            ; yields {int}:result = 12

'`shl`' Instruction

Syntax:

  <result> = shl <ty> <var1>, ubyte <var2>   ; yields {ty}:result

Overview:

The 'shl' instruction returns the first operand shifted to the left a specified number of bits.

Arguments:

The first argument to the 'shl' instruction must be an

integral

ubyte

Semantics:

Example:

  <result> = shl int 4, ubyte %var   ; yields {int}:result = 4 << %var
  <result> = shl int 4, ubyte 2      ; yields {int}:result = 16
  <result> = shl int 1, ubyte 10     ; yields {int}:result = 1024

'`shr`' Instruction

Syntax:

  <result> = shr <ty> <var1>, ubyte <var2>   ; yields {ty}:result

Overview:

The 'shr' instruction returns the first operand shifted to the right a specified number of bits.

Arguments:

The first argument to the 'shr' instruction must be an

integral

ubyte

Semantics:

signed

Example:

  <result> = shr int 4, ubyte %var   ; yields {int}:result = 4 >> %var
  <result> = shr int 4, ubyte 1      ; yields {int}:result = 2
  <result> = shr int 4, ubyte 2      ; yields {int}:result = 1
  <result> = shr int 4, ubyte 3      ; yields {int}:result = 0

Memory Access Operations

'`malloc`' Instruction

Syntax:

  <result> = malloc  <type>                        ; yields { type  *}:result
  <result> = malloc [<type>], uint <NumElements>   ; yields {[type] *}:result

Overview:

The 'malloc' instruction allocates memory from the system heap and returns a pointer to it.

Arguments:

There are two forms of the 'malloc' instruction, one for allocating a variable of a fixed type, and one for allocating an array. The array form is used to allocate an array, where the upper bound is not known until run time. If the upper bound is known at compile time, it is recommended that the first form be used with a

sized array type

'type' may be any type except for a unsized array type.

Semantics:

Example:

  %array  = malloc [4 x ubyte ]                    ; yields {[%4 x ubyte]*}:array

  %size   = add uint 2, 2                          ; yields {uint}:size = uint 4
  %array1 = malloc [ubyte], uint 4                 ; yields {[ubyte]*}:array1
  %array2 = malloc [ubyte], uint %size             ; yields {[ubyte]*}:array2

'`free`' Instruction

Syntax:

  free <type> <value>                              ; yields {void}

Overview:

The 'free' instruction returns memory back to the unused memory heap, to be reallocated in the future.

Arguments:

'value' shall be a pointer value that points to a value that was allocated with the '

malloc

Semantics:

value

Example:

  %array  = malloc [4 x ubyte]                    ; yields {[4 x ubyte]*}:array
            free   [4 x ubyte]* %array

'`alloca`' Instruction

Syntax:

  <result> = alloca  <type>                       ; yields {type*}:result
  <result> = alloca [<type>], uint <NumElements>  ; yields {[type] *}:result

Overview:

The 'alloca' instruction allocates memory on the current stack frame of the procedure that is live as long as the method does not return.

Arguments:

There are two forms of the 'alloca' instruction, one for allocating a variable of a fixed type, and one for allocating an array. The array form is used to allocate an array, where the upper bound is not known until run time. If the upper bound is known at compile time, it is recommended that the first form be used with a

sized array type

'type' may be any type except for a unsized array type.

Note that a virtual machine may generate more efficient native code for a method if all of the fixed size 'alloca' instructions live in the first basic block of that method.

Semantics:

alloca

Example:

  %ptr = alloca int                              ; yields {int*}:ptr
  %ptr = alloca [int], uint 4                    ; yields {[int]*}:ptr

'`load`' Instruction

Syntax:

  <result> = load <ty>* <pointer>                 ; yields {ty}:result
  <result> = load <ty>* <arrayptr>{, uint <idx>}+    ; yields {ty}:result
  <result> = load <ty>* <structptr>{, ubyte <idx>}+     ; yields field type

Overview:

The 'load' instruction is used to read from memory.

Arguments:

There are three forms of the 'load' instruction: one for reading from a general pointer, one for reading from a pointer to an array, and one for reading from a pointer to a structure.

In the first form, '<ty>' must be a pointer to a simple type (a primitive type or another pointer).

In the second form, '<ty>' must be a pointer to an array, and a list of one or more indices is provided as indexes into the (possibly multidimensional) array. No bounds checking is performed on array reads.

In the third form, the pointer must point to a (possibly nested) structure. There shall be one ubyte argument for each level of dereferencing involved.

Semantics:

...

Examples:

  %ptr = alloca int                               ; yields {int*}:ptr
  store int 3, int* %ptr                          ; yields {void}
  %val = load int* %ptr                           ; yields {int}:val = int 3

  %array = malloc [4 x ubyte]                     ; yields {[4 x ubyte]*}:array
  store ubyte 124, [4 x ubyte]* %array, uint 4
  %val   = load [4 x ubyte]* %array, uint 4       ; yields {ubyte}:val = ubyte 124
  %val   = load {{int, float}}* %stptr, 0, 1      ; yields {float}:val

'`store`' Instruction

Syntax:

  store <ty> <value>, <ty>* <pointer>                   ; yields {void}
  store <ty> <value>, <ty>* <arrayptr>{, uint <idx>}+   ; yields {void}
  store <ty> <value>, <ty>* <structptr>{, ubyte <idx>}+ ; yields {void}e

Overview:

The 'store' instruction is used to write to memory.

Arguments:

There are three forms of the 'store' instruction: one for writing through a general pointer, one for writing through a pointer to a (possibly multidimensional) array, and one for writing to an element of a (potentially nested) structure.

The semantics of this instruction closely match that of the load instruction, except that memory is written to, not read from.

Semantics:

Example:

  %ptr = alloca int                               ; yields {int*}:ptr
  store int 3, int* %ptr                          ; yields {void}
  %val = load int* %ptr                           ; yields {int}:val = int 3

  %array = malloc [4 x ubyte]                     ; yields {[4 x ubyte]*}:array
  store ubyte 124, [4 x ubyte]* %array, uint 4
  %val   = load [4 x ubyte]* %array, uint 4       ; yields {ubyte}:val = ubyte 124
  %val   = load {{int, float}}* %stptr, 0, 1      ; yields {float}:val

'`getelementptr`' Instruction

Syntax:

  <result> = getelementptr <ty>* <arrayptr>{, uint <idx>}+    ; yields {ty*}:result
  <result> = getelementptr <ty>* <structptr>{, ubyte <idx>}+     ; yields field type*

Overview:

'getelementptr' performs all of the same work that a '

load' instruction does, except for the actual memory fetch. Instead, 'getelementpr' simply performs the addressing arithmetic to get to the element in question, and returns it. This is useful for indexing into a bimodal structure. Arguments: Semantics: Example: %aptr = getelementptr {int, [12 x ubyte]}* %sptr, 1 ; yields {[12 x ubyte]*}:aptr %ub = load [12x ubyte]* %aptr, 4 ;yields {ubyte}:ub

Other Operations

The instructions in this catagory are the "miscellaneous" functions, that defy better classification.

'cast .. to' Instruction TODO Talk about what is considered true or false for integrals. Syntax: Overview: Arguments: Semantics: Example: 'call' Instruction Syntax: Overview: Arguments: Semantics: Example: %retval = call int %test(int %argc) 'icall' Instruction Indirect calls are desperately needed to implement virtual function tables (C++, java) and function pointers (C, C++, ...). A new instruction icall or similar should be introduced to represent an indirect call. Example: %retval = icall int %funcptr(int %arg1) ; yields {int}:%retval 'phi' Instruction Syntax: Overview: Arguments: Semantics: Example: Builtin Functions Notice: Preliminary idea! Builtin functions are very similar to normal functions, except they are defined by the implementation. Invocations of these functions are very similar to method invocations, except that the syntax is a little less verbose. Builtin functions are useful to implement semi-high level ideas like a 'min' or 'max' operation that can have important properties when doing program analysis. For example: Some optimizations can make use of identities defined over the functions, for example a parrallelizing compiler could make use of 'min' identities to parrellelize a loop. Builtin functions would have polymorphic types, where normal method calls may only have a single type. Builtin functions would be known to not have side effects, simplifying analysis over straight method calls. The syntax of the builtin are cleaner than the syntax of the 'call' instruction (very minor point). Because these invocations are explicit in the representation, the runtime can choose to implement these builtin functions any way that they want, including: Inlining the code directly into the invocation Implementing the functions in some sort of Runtime class, convert invocation to a standard method call. Implementing the functions in some sort of Runtime class, and perform standard inlining optimizations on it. Note that these builtins do not use quoted identifiers: the name of the builtin effectively becomes an identifier in the language. Example: ; Example of a normal method call %maximum = call int %maximum(int %arg1, int %arg2) ; yields {int}:%maximum ; Examples of potential builtin functions %max = max(int %arg1, int %arg2) ; yields {int}:%max %min = min(int %arg1, int %arg2) ; yields {int}:%min %sin = sin(double %arg) ; yields {double}:%sin %cos = cos(double %arg) ; yields {double}:%cos ; Show that builtin's are polymorphic, like instructions %max = max(float %arg1, float %arg2) ; yields {float}:%max %cos = cos(float %arg) ; yields {float}:%cos The 'maximum' vs 'max' example illustrates the difference in calling semantics between a 'call' instruction and a builtin function invocation. Notice that the 'maximum' example assumes that the method is defined local to the caller. TODO List This list of random topics includes things that will need to be addressed before the llvm may be used to implement a java like langauge. Right now, it is pretty much useless for any language, given to unavailable of structure types Synchronization Instructions We will need some type of synchronization instructions to be able to implement stuff in Java well. The way I currently envision doing this is to introduce a 'lock' type, and then add two (builtin or instructions) operations to lock and unlock the lock. Possible Extensions These extensions are distinct from the TODO list, as they are mostly "interesting" ideas that could be implemented in the future by someone so motivated. They are not directly required to get Java like languages working. 'tailcall' Instruction This could be useful. Who knows. '.net' does it, but is the optimization really worth the extra hassle? Using strong typing would make this trivial to implement and a runtime could always callback to using downconverting this to a normal 'call' instruction. Global Variables In order to represent programs written in languages like C, we need to be able to support variables at the module (global) scope. Perhaps they should be written outside of the module definition even. Maybe global functions should be handled like this as well. Explicit Parrellelism With the rise of massively parrellel architectures (like the IA64 architecture, multithreaded CPU cores, and SIMD data sets) it is becoming increasingly more important to extract all of the ILP from a code stream possible. It would be interesting to research encoding methods that can explicitly represent this. One straightforward way to do this would be to introduce a "stop" instruction that is equilivent to the IA64 stop bit. Related Work Codesigned virtual machines. SafeTSA Description here Java Desciption here Microsoft .net Desciption here GNU RTL Intermediate Representation Desciption here IA64 Architecture & Instruction Set Desciption here MMIX Instruction Set Desciption here "Interview With Bjarne Stroustrup" This interview influenced the design and thought process behind LLVM in several ways, most notably the way that derived types are written in text format. See the question that starts with "you defined the C declarator syntax as an experiment that failed". Vectorized Architectures Intel MMX, MMX2, SSE, SSE2 Description here AMD 3Dnow!, 3Dnow! 2 Desciption here Sun VIS ISA Desciption here more... Chris Lattner Last modified: Sun Jul 8 19:25:56 CDT 2001

signed	`sbyte, short, int, long, float, double`
unsigned	`ubyte, ushort, uint, ulong`
integral	`ubyte, sbyte, ushort, short, uint, int, ulong, long`
floating point	`float, double`
first class	`bool, ubyte, sbyte, ushort, short, uint, int, ulong, long, float, double, lock`

Well Formedness

Type Classifications

Array Type

Overview:

Method Type

Overview:

Syntax:

Examples:

Structure Type

Overview:

Syntax:

Examples:

Pointer Type

Packed Type

'ret' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'br' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'switch' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'call .. with' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'not' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'add' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'sub' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'mul' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'div' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'rem' Instruction

Syntax:

Overview:

Arguments:

Semantics:

Example:

'setcc' Instructions

Syntax:

Overview:

Arguments:

Semantics:

Example:

'`ret`' Instruction

'`br`' Instruction

'`switch`' Instruction

'`call .. with`' Instruction

'`not`' Instruction

'`add`' Instruction

'`sub`' Instruction

'`mul`' Instruction

'`div`' Instruction

'`rem`' Instruction

'`setcc`' Instructions

'`and`' Instruction

'`or`' Instruction

'`xor`' Instruction

'`shl`' Instruction

'`shr`' Instruction

'`malloc`' Instruction

'`free`' Instruction

'`alloca`' Instruction

'`load`' Instruction

'`store`' Instruction

'`getelementptr`' Instruction

'`cast .. to`' Instruction