diff --git a/docs/tutorial/LangImpl7.html b/docs/tutorial/LangImpl7.html new file mode 100644 index 00000000000..f80c0673fdd --- /dev/null +++ b/docs/tutorial/LangImpl7.html @@ -0,0 +1,298 @@ + + + +
+Welcome to Part 7 of the "Implementing a language with +LLVM" tutorial. In parts 1 through 6, we've built a very respectable, +albeit simple, functional +programming language. In our journey, we learned some parsing techniques, +how to build and represent an AST, how to build LLVM IR, and how to optimize +the resultant code and JIT compile it.
+ +While Kaleidoscope is interesting as a functional language, this makes it +"too easy" to generate LLVM IR for it. In particular, a functional language +makes it very easy to build LLVM IR directly in SSA form. +Since LLVM requires that the input code be in SSA form, this is a very nice +property and it is often unclear to newcomers how to generate code for an +imperative language with mutable variables.
+ +The short (and happy) summary of this chapter is that there is no need for +your front-end to build SSA form: LLVM provides highly tuned and well tested +support for this, though the way it works is a bit unexpected for some.
+ ++To understand why mutable variables cause complexities in SSA construction, +consider this extremely simple C example: +
+ ++int G, H; +int test(_Bool Condition) { + int X; + if (Condition) + X = G; + else + X = H; + return X; +} ++
In this case, we have the variable "X", whose value depends on the path +executed in the program. Because there are two different possible values for X +before the return instruction, a PHI node is inserted to merge the two values. +The LLVM IR that we want for this example looks like this:
+ ++@G = weak global i32 0 ; type of @G is i32* +@H = weak global i32 0 ; type of @H is i32* + +define i32 @test(i1 %Condition) { +entry: + br i1 %Condition, label %cond_true, label %cond_false + +cond_true: + %X.0 = load i32* @G + br label %cond_next + +cond_false: + %X.1 = load i32* @H + br label %cond_next + +cond_next: + %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.2 +} ++
In this example, the loads from the G and H global variables are explicit in +the LLVM IR, and they live in the then/else branches of the if statement +(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node +in the cond_next block selects the right value to use based on where control +flow is coming from: if control flow comes from the cond_false block, X.2 gets +the value of X.1. Alternatively, if control flow comes from cond_tree, it gets +the value of X.0. The intent of this chapter is not to explain the details of +SSA form. For more information, see one of the many online +references.
+ +The question for this article is "who places phi nodes when lowering +assignments to mutable variables?". The issue here is that LLVM +requires that its IR be in SSA form: there is no "non-ssa" mode for it. +However, SSA construction requires non-trivial algorithms and data structures, +so it is inconvenient and wasteful for every front-end to have to reproduce this +logic.
+ +The 'trick' here is that while LLVM does require all register values to be +in SSA form, it does not require (or permit) memory objects to be in SSA form. +In the example above, note that the loads from G and H are direct accesses to +G and H: they are not renamed or versioned. This differs from some other +compiler systems, which does try to version memory objects. In LLVM, instead of +encoding dataflow analysis of memory into the LLVM IR, it is handled with Analysis Passes which are computed on +demand.
+ ++With this in mind, the high-level idea is that we want to make a stack variable +(which lives in memory, because it is on the stack) for each mutable object in +a function. To take advantage of this trick, we need to talk about how LLVM +represents stack variables. +
+ +In LLVM, all memory accesses are explicit with load/store instructions, and +it is carefully designed to not have (or need) an "address-of" operator. Notice +how the type of the @G/@H global variables is actually "i32*" even though the +variable is defined as "i32". What this means is that @G defines space +for an i32 in the global data area, but its name actually refers to the +address for that space. Stack variables work the same way, but instead of being +declared with global variable definitions, they are declared with the +LLVM alloca instruction:
+ ++define i32 @test(i1 %Condition) { +entry: + %X = alloca i32 ; type of %X is i32*. + ... + %tmp = load i32* %X ; load the stack value %X from the stack. + %tmp2 = add i32 %tmp, 1 ; increment it + store i32 %tmp2, i32* %X ; store it back + ... ++
This code shows an example of how you can declare and manipulate a stack +variable in the LLVM IR. Stack memory allocated with the alloca instruction is +fully general: you can pass the address of the stack slot to functions, you can +store it in other variables, etc. In our example above, we could rewrite the +example to use the alloca technique to avoid using a PHI node:
+ ++@G = weak global i32 0 ; type of @G is i32* +@H = weak global i32 0 ; type of @H is i32* + +define i32 @test(i1 %Condition) { +entry: + %X = alloca i32 ; type of %X is i32*. + br i1 %Condition, label %cond_true, label %cond_false + +cond_true: + %X.0 = load i32* @G + store i32 %X.0, i32* %X ; Update X + br label %cond_next + +cond_false: + %X.1 = load i32* @H + store i32 %X.1, i32* %X ; Update X + br label %cond_next + +cond_next: + %X.2 = load i32* %X ; Read X + ret i32 %X.2 +} ++
With this, we have discovered a way to handle arbitrary mutable variables +without the need to create Phi nodes at all:
+ +While this solution has solved our immediate problem, it introduced another +one: we have now apparently introduced a lot of stack traffic for very simple +and common operations, a major performance problem. Fortunately for us, the +LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles +this case, promoting allocas like this into SSA registers, inserting Phi nodes +as appropriate. If you run this example through the pass, for example, you'll +get:
+ ++$ llvm-as < example.ll | opt -mem2reg | llvm-dis +@G = weak global i32 0 +@H = weak global i32 0 + +define i32 @test(i1 %Condition) { +entry: + br i1 %Condition, label %cond_true, label %cond_false + +cond_true: + %X.0 = load i32* @G + br label %cond_next + +cond_false: + %X.1 = load i32* @H + br label %cond_next + +cond_next: + %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.01 +} ++ +
The mem2reg pass is guaranteed to work, and + +which cases. +
+ +The final question you may be asking is: should I bother with this nonsense +for my front-end? Wouldn't it be better if I just did SSA construction +directly, avoiding use of the mem2reg optimization pass? + +Proven, well tested, debug info, etc. +
++Here is the complete code listing for our running example, enhanced with the +if/then/else and for expressions.. To build this example, use: +
+ ++ # Compile + g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy + # Run + ./toy ++
Here is the code:
+ +++