diff --git a/docs/LangRef.html b/docs/LangRef.html index 03235ecaa40..fc1f9615c2a 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,132 +1,136 @@ - -
LLVM Language Reference Manual | -
Written by Chris Lattner and Vikram Adve
- +
Written by Chris Lattner and Vikram Adve
+
-Abstract - |
- This document is a reference manual for the LLVM assembly language. LLVM is - an SSA based representation that provides type safety, low-level operations, - flexibility, and the capability of representing 'all' high-level languages - cleanly. It is the common code representation used throughout all phases of - the LLVM compilation strategy. -- +
This document is a reference manual for the LLVM assembly language. LLVM is +an SSA based representation that provides type safety, low-level operations, +flexibility, and the capability of representing 'all' high-level languages +cleanly. It is the common code representation used throughout all phases of the +LLVM compilation strategy.
+-Introduction - |
The LLVM code representation is designed to be used in three different forms: +as an in-memory compiler IR, as an on-disk bytecode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation. This allows LLVM to provide a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualize the transformations. The three different forms of LLVM are all equivalent. This document describes the human -readable representation and notation.
+readable representation and notation.
-The LLVM representation aims to be a light-weight and low-level while being +The LLVM representation aims to be a light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a "universal IR" of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are "universal IR's", @@ -134,103 +138,121 @@ allowing many source languages to be mapped to them). By providing type information, LLVM can be used as the target of optimizations: for example, through pointer analysis, it can be proven that a C automatic variable is never accessed outside of the current function... allowing it to be promoted to a -simple SSA value instead of a memory location.
+simple SSA value instead of a memory location.
+ ++
It is important to note that this document describes 'well formed' LLVM +assembly language. There is a difference between what the parser accepts and +what is considered 'well formed'. For example, the following instruction is +syntactically okay, but not well formed:
%x = add int 1, %x-...because the definition of %x does not dominate all of its uses. The -LLVM infrastructure provides a verification pass that may be used to verify that -an LLVM module is well formed. This pass is automatically run by the parser -after parsing input assembly, and by the optimizer before it outputs bytecode. -The violations pointed out by the verifier pass indicate bugs in transformation -passes or input to the parser.
+
...because the definition of %x does not dominate all of its uses. +The LLVM infrastructure provides a verification pass that may be used to verify +that an LLVM module is well formed. This pass is automatically run by the +parser after parsing input assembly, and by the optimizer before it outputs +bytecode. The violations pointed out by the verifier pass indicate bugs in +transformation passes or input to the parser.
+-Identifiers - |
+
LLVM uses three different forms of identifiers, for different purposes:
+
+
LLVM requires the values start with a '%' sign for two reasons: Compilers +don't need to worry about name clashes with reserved words, and the set of +reserved words may be expanded in the future without penalty. Additionally, +unnamed identifiers allow a compiler to quickly come up with a temporary +variable without having to avoid symbol table conflicts.
+ +Reserved words in LLVM are very similar to reserved words in other languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', etc...), and others. These reserved words cannot conflict with variable names, because none of them start with a '%' -character.
+character.
-Here is an example of LLVM code to multiply the integer variable '%X' -by 8:+
Here is an example of LLVM code to multiply the integer variable +'%X' by 8:
+ +The easy way:
-The easy way:%result = mul uint %X, 8-After strength reduction: +
After strength reduction:
+%result = shl uint %X, ubyte 3-And the hard way: +
And the hard way:
+add uint %X, %X ; yields {uint}:%0 add uint %0, %0 ; yields {uint}:%1 %result = add uint %1, %1-This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:
+
This last way of multiplying %X by 8 illustrates several important +lexical features of LLVM:
+
...and it also show a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic -text.
+text.
-The one non-intuitive notation for constants is the optional hexidecimal form of -floating point constants. For example, the form 'double +The one non-intuitive notation for constants is the optional hexidecimal form +of floating point constants. For example, the form 'double 0x432ff973cafa8000' is equivalent to (but harder to read than) 'double 4.5e+15' which is also supported by the parser. The only time hexadecimal floating point constants are useful (and the only time that they are generated @@ -238,41 +260,47 @@ by the disassembler) is when an FP constant has to be emitted that is not representable as a decimal floating point number exactly. For example, NaN's, infinities, and other special cases are represented in their IEEE hexadecimal format so that assembly and disassembly do not cause any bits to change in the -constants.
+constants.
+-Type System - |
+
The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of optimizations to +be performed on the IR directly, without having to do extra analyses on the side +before the transformation. A strong type system makes it easier to read the +generated code and enables novel analyses and transformations that are not +feasible to perform on normal three address code representations.
- +-Primitive Types - |
+