mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-01-17 21:35:07 +00:00
Many typos, grammaro, and wording fixes. Patch by
Kelly Wilson, thanks! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44043 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
0f8d9c04d9
commit
41fcea3bdb
@ -41,7 +41,7 @@ Support</li>
|
||||
|
||||
<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
|
||||
with LLVM</a>" tutorial. This chapter shows you how to transform the <a
|
||||
href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
|
||||
href="LangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into LLVM IR.
|
||||
This will teach you a little bit about how LLVM does things, as well as
|
||||
demonstrate how easy it is to use. It's much more work to build a lexer and
|
||||
parser than it is to generate LLVM IR code. :)
|
||||
@ -79,14 +79,14 @@ public:
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>The Codegen() method says to emit IR for that AST node and all things it
|
||||
<p>The Codegen() method says to emit IR for that AST node along with all the things it
|
||||
depends on, and they all return an LLVM Value object.
|
||||
"Value" is the class used to represent a "<a
|
||||
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
|
||||
Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
|
||||
of SSA values is that their value is computed as the related instruction
|
||||
executes, and it does not get a new value until (and if) the instruction
|
||||
re-executes. In order words, there is no way to "change" an SSA value. For
|
||||
re-executes. In other words, there is no way to "change" an SSA value. For
|
||||
more information, please read up on <a
|
||||
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
|
||||
Assignment</a> - the concepts are really quite natural once you grok them.</p>
|
||||
@ -97,7 +97,7 @@ this. Again, this tutorial won't dwell on good software engineering practices:
|
||||
for our purposes, adding a virtual method is simplest.</p>
|
||||
|
||||
<p>The
|
||||
second thing we want is an "Error" method like we used for parser, which will
|
||||
second thing we want is an "Error" method like we used for the parser, which will
|
||||
be used to report errors found during code generation (for example, use of an
|
||||
undeclared parameter):</p>
|
||||
|
||||
@ -144,7 +144,7 @@ has already been done, and we'll just use it to emit code.
|
||||
|
||||
<div class="doc_text">
|
||||
|
||||
<p>Generating LLVM code for expression nodes is very straight-forward: less
|
||||
<p>Generating LLVM code for expression nodes is very straightforward: less
|
||||
than 45 lines of commented code for all four of our expression nodes. First,
|
||||
we'll do numeric literals:</p>
|
||||
|
||||
@ -174,7 +174,7 @@ Value *VariableExprAST::Codegen() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>References to variables are also quite simple here. In the simple version
|
||||
<p>References to variables are also quite simple using LLVM. In the simple version
|
||||
of Kaleidoscope, we assume that the variable has already been emited somewhere
|
||||
and its value is available. In practice, the only values that can be in the
|
||||
<tt>NamedValues</tt> map are function arguments. This
|
||||
@ -211,9 +211,9 @@ right-hand side, then we compute the result of the binary expression. In this
|
||||
code, we do a simple switch on the opcode to create the right LLVM instruction.
|
||||
</p>
|
||||
|
||||
<p>In this example, the LLVM builder class is starting to show its value.
|
||||
Because it knows where to insert the newly created instruction, you just have to
|
||||
specify what instruction to create (e.g. with <tt>CreateAdd</tt>), which
|
||||
<p>In the example above, the LLVM builder class is starting to show its value.
|
||||
LLVMBuilder knows where to insert the newly created instruction, all you have to
|
||||
do is specify what instruction to create (e.g. with <tt>CreateAdd</tt>), which
|
||||
operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
|
||||
for the generated instruction. One nice thing about LLVM is that the name is
|
||||
just a hint: if there are multiple additions in a single function, the first
|
||||
@ -221,17 +221,16 @@ will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
|
||||
giving it a name like "addtmp42". Local value names for instructions are purely
|
||||
optional, but it makes it much easier to read the IR dumps.</p>
|
||||
|
||||
<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained with
|
||||
<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by
|
||||
strict rules: for example, the Left and Right operators of
|
||||
an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
|
||||
type, and that the result type of the add must match the operand types. Because
|
||||
an <a href="../LangRef.html#i_add">add instruction</a> must have the same
|
||||
type, and the result type of the add must match the operand types. Because
|
||||
all values in Kaleidoscope are doubles, this makes for very simple code for add,
|
||||
sub and mul.</p>
|
||||
|
||||
<p>On the other hand, LLVM specifies that the <a
|
||||
href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
|
||||
(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
|
||||
value. In order to get these semantics, we combine the fcmp instruction with
|
||||
(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with
|
||||
a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
|
||||
converts its input integer into a floating point value by treating the input
|
||||
as an unsigned value. In contrast, if we used the <a
|
||||
@ -261,8 +260,8 @@ Value *CallExprAST::Codegen() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Code generation for function calls is quite straight-forward with LLVM. The
|
||||
code above first looks the name of the function up in the LLVM Module's symbol
|
||||
<p>Code generation for function calls is quite straightforward with LLVM. The
|
||||
code above initially does a function name lookup in the LLVM Module's symbol
|
||||
table. Recall that the LLVM Module is the container that holds all of the
|
||||
functions we are JIT'ing. By giving each function the same name as what the
|
||||
user specifies, we can use the LLVM symbol table to resolve function names for
|
||||
@ -271,8 +270,8 @@ us.</p>
|
||||
<p>Once we have the function to call, we recursively codegen each argument that
|
||||
is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
|
||||
instruction</a>. Note that LLVM uses the native C calling conventions by
|
||||
default, allowing these calls to call into standard library functions like
|
||||
"sin" and "cos" with no additional effort.</p>
|
||||
default, allowing these calls to also call into standard library functions like
|
||||
"sin" and "cos", with no additional effort.</p>
|
||||
|
||||
<p>This wraps up our handling of the four basic expressions that we have so far
|
||||
in Kaleidoscope. Feel free to go in and add some more. For example, by
|
||||
@ -321,7 +320,7 @@ this). Note that Types in LLVM are uniqued just like Constants are, so you
|
||||
don't "new" a type, you "get" it.</p>
|
||||
|
||||
<p>The final line above actually creates the function that the prototype will
|
||||
correspond to. This indicates which type, linkage, and name to use, and which
|
||||
correspond to. This indicates the type, linkage and name to use, as well as which
|
||||
module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
|
||||
means that the function may be defined outside the current module and/or that it
|
||||
is callable by functions outside the module. The Name passed in is the name the
|
||||
@ -343,7 +342,7 @@ above.</p>
|
||||
<p>The Module symbol table works just like the Function symbol table when it
|
||||
comes to name conflicts: if a new function is created with a name was previously
|
||||
added to the symbol table, it will get implicitly renamed when added to the
|
||||
Module. The code above exploits this fact to tell if there was a previous
|
||||
Module. The code above exploits this fact to determine if there was a previous
|
||||
definition of this function.</p>
|
||||
|
||||
<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
|
||||
@ -403,7 +402,7 @@ definition and this one match up. If not, we emit an error.</p>
|
||||
</div>
|
||||
|
||||
<p>The last bit of code for prototypes loops over all of the arguments in the
|
||||
function, setting the name of the LLVM Argument objects to match and registering
|
||||
function, setting the name of the LLVM Argument objects to match, and registering
|
||||
the arguments in the <tt>NamedValues</tt> map for future use by the
|
||||
<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
|
||||
object to the caller. Note that we don't check for conflicting
|
||||
@ -421,8 +420,8 @@ Function *FunctionAST::Codegen() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Code generation for function definitions starts out simply enough: first we
|
||||
codegen the prototype (Proto) and verify that it is ok. We also clear out the
|
||||
<p>Code generation for function definitions starts out simply enough: we just
|
||||
codegen the prototype (Proto) and verify that it is ok. We then clear out the
|
||||
<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
|
||||
last function we compiled. Code generation of the prototype ensures that there
|
||||
is an LLVM Function object that is ready to go for us.</p>
|
||||
@ -445,7 +444,7 @@ the end of the new basic block. Basic blocks in LLVM are an important part
|
||||
of functions that define the <a
|
||||
href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
|
||||
Since we don't have any control flow, our functions will only contain one
|
||||
block so far. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p>
|
||||
block at this point. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@ -465,7 +464,7 @@ the root expression of the function. If no error happens, this emits code to
|
||||
compute the expression into the entry block and returns the value that was
|
||||
computed. Assuming no error, we then create an LLVM <a
|
||||
href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
|
||||
Once the function is built, we call the <tt>verifyFunction</tt> function, which
|
||||
Once the function is built, we call <tt>verifyFunction</tt>, which
|
||||
is provided by LLVM. This function does a variety of consistency checks on the
|
||||
generated code, to determine if our compiler is doing everything right. Using
|
||||
this is important: it can catch a lot of bugs. Once the function is finished
|
||||
@ -481,13 +480,13 @@ and validated, we return it.</p>
|
||||
</div>
|
||||
|
||||
<p>The only piece left here is handling of the error case. For simplicity, we
|
||||
simply handle this by deleting the function we produced with the
|
||||
handle this by merely deleting the function we produced with the
|
||||
<tt>eraseFromParent</tt> method. This allows the user to redefine a function
|
||||
that they incorrectly typed in before: if we didn't delete it, it would live in
|
||||
the symbol table, with a body, preventing future redefinition.</p>
|
||||
|
||||
<p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
|
||||
can return a previously defined forward declaration, this can actually delete
|
||||
<p>This code does have a bug, though. Since the <tt>PrototypeAST::Codegen</tt>
|
||||
can return a previously defined forward declaration, our code can actually delete
|
||||
a forward declaration. There are a number of ways to fix this bug, see what you
|
||||
can come up with! Here is a testcase:</p>
|
||||
|
||||
@ -571,7 +570,7 @@ entry:
|
||||
|
||||
<p>This shows some function calls. Note that this function will take a long
|
||||
time to execute if you call it. In the future we'll add conditional control
|
||||
flow to make recursion actually be useful :).</p>
|
||||
flow to actually make recursion useful :).</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@ -636,7 +635,7 @@ entry:
|
||||
generated. Here you can see the big picture with all the functions referencing
|
||||
each other.</p>
|
||||
|
||||
<p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
|
||||
<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll
|
||||
describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
|
||||
support</a> to this so we can actually start running code!</p>
|
||||
|
||||
|
@ -42,8 +42,8 @@ Flow</li>
|
||||
with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple
|
||||
language and added support for generating LLVM IR. This chapter describes
|
||||
two new techniques: adding optimizer support to your language, and adding JIT
|
||||
compiler support. This shows how to get nice efficient code for your
|
||||
language.</p>
|
||||
compiler support. These additions will demonstrate how to get nice, efficient code
|
||||
for the Kaleidoscope language.</p>
|
||||
|
||||
</div>
|
||||
|
||||
@ -72,14 +72,13 @@ entry:
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This code is a very very literal transcription of the AST built by parsing
|
||||
our code, and as such, lacks optimizations like constant folding (we'd like to
|
||||
get "<tt>add x, 3.0</tt>" in the example above) as well as other more important
|
||||
optimizations. Constant folding in particular is a very common and very
|
||||
<p>This code is a very, very literal transcription of the AST built by parsing
|
||||
the input. As such, this transcription lacks optimizations like constant folding (we'd like to get "<tt>add x, 3.0</tt>" in the example above) as well as other more important
|
||||
optimizations. Constant folding, in particular, is a very common and very
|
||||
important optimization: so much so that many language implementors implement
|
||||
constant folding support in their AST representation.</p>
|
||||
|
||||
<p>With LLVM, you don't need to. Since all calls to build LLVM IR go through
|
||||
<p>With LLVM, you don't need this support in the AST. Since all calls to build LLVM IR go through
|
||||
the LLVM builder, it would be nice if the builder itself checked to see if there
|
||||
was a constant folding opportunity when you call it. If so, it could just do
|
||||
the constant fold and return the constant instead of creating an instruction.
|
||||
@ -93,9 +92,9 @@ static LLVMFoldingBuilder Builder;
|
||||
</div>
|
||||
|
||||
<p>All we did was switch from <tt>LLVMBuilder</tt> to
|
||||
<tt>LLVMFoldingBuilder</tt>. Though we change no other code, now all of our
|
||||
instructions are implicitly constant folded without us having to do anything
|
||||
about it. For example, our example above now compiles to:</p>
|
||||
<tt>LLVMFoldingBuilder</tt>. Though we change no other code, we now have all of our
|
||||
instructions implicitly constant folded without us having to do anything
|
||||
about it. For example, the input above now compiles to:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@ -153,7 +152,7 @@ range of optimizations that you can use, in the form of "passes".</p>
|
||||
|
||||
<div class="doc_text">
|
||||
|
||||
<p>LLVM provides many optimization passes which do many different sorts of
|
||||
<p>LLVM provides many optimization passes, which do many different sorts of
|
||||
things and have different tradeoffs. Unlike other systems, LLVM doesn't hold
|
||||
to the mistaken notion that one set of optimizations is right for all languages
|
||||
and for all situations. LLVM allows a compiler implementor to make complete
|
||||
@ -165,7 +164,7 @@ across as large of body of code as they can (often a whole file, but if run
|
||||
at link time, this can be a substantial portion of the whole program). It also
|
||||
supports and includes "per-function" passes which just operate on a single
|
||||
function at a time, without looking at other functions. For more information
|
||||
on passes and how the get run, see the <a href="../WritingAnLLVMPass.html">How
|
||||
on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How
|
||||
to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM
|
||||
Passes</a>.</p>
|
||||
|
||||
@ -207,13 +206,13 @@ add a set of optimizations to run. The code looks like this:</p>
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This code defines two objects, a <tt>ExistingModuleProvider</tt> and a
|
||||
<p>This code defines two objects, an <tt>ExistingModuleProvider</tt> and a
|
||||
<tt>FunctionPassManager</tt>. The former is basically a wrapper around our
|
||||
<tt>Module</tt> that the PassManager requires. It provides certain flexibility
|
||||
that we're not going to take advantage of here, so I won't dive into what it is
|
||||
all about.</p>
|
||||
that we're not going to take advantage of here, so I won't dive into any details
|
||||
about it.</p>
|
||||
|
||||
<p>The meat of the matter is the definition of "<tt>OurFPM</tt>". It
|
||||
<p>The meat of the matter here, is the definition of "<tt>OurFPM</tt>". It
|
||||
requires a pointer to the <tt>Module</tt> (through the <tt>ModuleProvider</tt>)
|
||||
to construct itself. Once it is set up, we use a series of "add" calls to add
|
||||
a bunch of LLVM passes. The first pass is basically boilerplate, it adds a pass
|
||||
@ -223,7 +222,7 @@ which we will get to in the next section.</p>
|
||||
|
||||
<p>In this case, we choose to add 4 optimization passes. The passes we chose
|
||||
here are a pretty standard set of "cleanup" optimizations that are useful for
|
||||
a wide variety of code. I won't delve into what they do, but believe me that
|
||||
a wide variety of code. I won't delve into what they do but, believe me,
|
||||
they are a good starting place :).</p>
|
||||
|
||||
<p>Once the PassManager is set up, we need to make use of it. We do this by
|
||||
@ -247,7 +246,7 @@ running it after our newly created function is constructed (in
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>As you can see, this is pretty straight-forward. The
|
||||
<p>As you can see, this is pretty straightforward. The
|
||||
<tt>FunctionPassManager</tt> optimizes and updates the LLVM Function* in place,
|
||||
improving (hopefully) its body. With this in place, we can try our test above
|
||||
again:</p>
|
||||
@ -271,7 +270,7 @@ add instruction from every execution of this function.</p>
|
||||
<p>LLVM provides a wide variety of optimizations that can be used in certain
|
||||
circumstances. Some <a href="../Passes.html">documentation about the various
|
||||
passes</a> is available, but it isn't very complete. Another good source of
|
||||
ideas is to look at the passes that <tt>llvm-gcc</tt> or
|
||||
ideas can come from looking at the passes that <tt>llvm-gcc</tt> or
|
||||
<tt>llvm-ld</tt> run to get started. The "<tt>opt</tt>" tool allows you to
|
||||
experiment with passes from the command line, so you can see if they do
|
||||
anything.</p>
|
||||
@ -324,7 +323,7 @@ for you if one is available for your platform, otherwise it will fall back to
|
||||
the interpreter.</p>
|
||||
|
||||
<p>Once the <tt>ExecutionEngine</tt> is created, the JIT is ready to be used.
|
||||
There are a variety of APIs that are useful, but the most simple one is the
|
||||
There are a variety of APIs that are useful, but the simplest one is the
|
||||
"<tt>getPointerToFunction(F)</tt>" method. This method JIT compiles the
|
||||
specified LLVM Function and returns a function pointer to the generated machine
|
||||
code. In our case, this means that we can change the code that parses a
|
||||
@ -353,7 +352,7 @@ static void HandleTopLevelExpression() {
|
||||
function that takes no arguments and returns the computed double. Because the
|
||||
LLVM JIT compiler matches the native platform ABI, this means that you can just
|
||||
cast the result pointer to a function pointer of that type and call it directly.
|
||||
As such, there is no difference between JIT compiled code and native machine
|
||||
This means, there is no difference between JIT compiled code and native machine
|
||||
code that is statically linked into your application.</p>
|
||||
|
||||
<p>With just these two changes, lets see how Kaleidoscope works now!</p>
|
||||
@ -372,7 +371,7 @@ entry:
|
||||
|
||||
<p>Well this looks like it is basically working. The dump of the function
|
||||
shows the "no argument function that always returns double" that we synthesize
|
||||
for each top level expression that is typed it. This demonstrates very basic
|
||||
for each top level expression that is typed in. This demonstrates very basic
|
||||
functionality, but can we do more?</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -397,19 +396,19 @@ entry:
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This illustrates that we can now call user code, but it is a bit subtle what
|
||||
is going on here. Note that we only invoke the JIT on the anonymous functions
|
||||
that <em>calls testfunc</em>, but we never invoked it on <em>testfunc
|
||||
itself</em>.</p>
|
||||
<p>This illustrates that we can now call user code, but there is something a bit subtle
|
||||
going on here. Note that we only invoke the JIT on the anonymous functions
|
||||
that <em>call testfunc</em>, but we never invoked it on <em>testfunc
|
||||
</em>itself.</p>
|
||||
|
||||
<p>What actually happened here is that the anonymous function is
|
||||
<p>What actually happened here is that the anonymous function was
|
||||
JIT'd when requested. When the Kaleidoscope app calls through the function
|
||||
pointer that is returned, the anonymous function starts executing. It ends up
|
||||
making the call to the "testfunc" function, and ends up in a stub that invokes
|
||||
the JIT, lazily, on testfunc. Once the JIT finishes lazily compiling testfunc,
|
||||
it returns and the code re-executes the call.</p>
|
||||
|
||||
<p>In summary, the JIT will lazily JIT code on the fly as it is needed. The
|
||||
<p>In summary, the JIT will lazily JIT code, on the fly, as it is needed. The
|
||||
JIT provides a number of other more advanced interfaces for things like freeing
|
||||
allocated machine code, rejit'ing functions to update them, etc. However, even
|
||||
with this simple code, we get some surprisingly powerful capabilities - check
|
||||
|
@ -55,7 +55,7 @@ User-defined Operators</li>
|
||||
|
||||
<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
|
||||
with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
|
||||
Kaleidoscope language and included support for generating LLVM IR, following by
|
||||
Kaleidoscope language and included support for generating LLVM IR, followed by
|
||||
optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
|
||||
mostly useless: it has no control flow other than call and return. This means
|
||||
that you can't have conditional branches in the code, significantly limiting its
|
||||
@ -71,13 +71,13 @@ have an if/then/else expression plus a simple 'for' loop.</p>
|
||||
<div class="doc_text">
|
||||
|
||||
<p>
|
||||
Extending Kaleidoscope to support if/then/else is quite straight-forward. It
|
||||
Extending Kaleidoscope to support if/then/else is quite straightforward. It
|
||||
basically requires adding lexer support for this "new" concept to the lexer,
|
||||
parser, AST, and LLVM code emitter. This example is nice, because it shows how
|
||||
easy it is to "grow" a language over time, incrementally extending it as new
|
||||
ideas are discovered.</p>
|
||||
|
||||
<p>Before we get going on "how" we do this extension, lets talk about what we
|
||||
<p>Before we get going on "how" we add this extension, lets talk about "what" we
|
||||
want. The basic idea is that we want to be able to write this sort of thing:
|
||||
</p>
|
||||
|
||||
@ -97,7 +97,7 @@ Since we're using a mostly functional form, we'll have it evaluate its
|
||||
conditional, then return the 'then' or 'else' value based on how the condition
|
||||
was resolved. This is very similar to the C "?:" expression.</p>
|
||||
|
||||
<p>The semantics of the if/then/else expression is that it first evaluates the
|
||||
<p>The semantics of the if/then/else expression is that it evaluates the
|
||||
condition to a boolean equality value: 0.0 is considered to be false and
|
||||
everything else is considered to be true.
|
||||
If the condition is true, the first subexpression is evaluated and returned, if
|
||||
@ -105,7 +105,7 @@ the condition is false, the second subexpression is evaluated and returned.
|
||||
Since Kaleidoscope allows side-effects, this behavior is important to nail down.
|
||||
</p>
|
||||
|
||||
<p>Now that we know what we want, lets break this down into its constituent
|
||||
<p>Now that we know what we "want", lets break this down into its constituent
|
||||
pieces.</p>
|
||||
|
||||
</div>
|
||||
@ -118,7 +118,7 @@ If/Then/Else</a></div>
|
||||
|
||||
<div class="doc_text">
|
||||
|
||||
<p>The lexer extensions are straight-forward. First we add new enum values
|
||||
<p>The lexer extensions are straightforward. First we add new enum values
|
||||
for the relevant tokens:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -128,7 +128,7 @@ for the relevant tokens:</p>
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Once we have that, we recognize the new keywords in the lexer, pretty simple
|
||||
<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple
|
||||
stuff:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -179,7 +179,7 @@ If/Then/Else</a></div>
|
||||
<div class="doc_text">
|
||||
|
||||
<p>Now that we have the relevant tokens coming from the lexer and we have the
|
||||
AST node to build, our parsing logic is relatively straight-forward. First we
|
||||
AST node to build, our parsing logic is relatively straightforward. First we
|
||||
define a new parsing function:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -296,14 +296,14 @@ height="315"></center>
|
||||
inserting actual calls into the code and recompiling or by calling these in the
|
||||
debugger. LLVM has many nice features for visualizing various graphs.</p>
|
||||
|
||||
<p>Coming back to the generated code, it is fairly simple: the entry block
|
||||
<p>Getting back to the generated code, it is fairly simple: the entry block
|
||||
evaluates the conditional expression ("x" in our case here) and compares the
|
||||
result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
|
||||
instruction ('one' is "Ordered and Not Equal"). Based on the result of this
|
||||
expression, the code jumps to either the "then" or "else" blocks, which contain
|
||||
the expressions for the true/false cases.</p>
|
||||
|
||||
<p>Once the then/else blocks is finished executing, they both branch back to the
|
||||
<p>Once the then/else blocks are finished executing, they both branch back to the
|
||||
'ifcont' block to execute the code that happens after the if/then/else. In this
|
||||
case the only thing left to do is to return to the caller of the function. The
|
||||
question then becomes: how does the code know which expression to return?</p>
|
||||
@ -320,25 +320,25 @@ block. In this case, if control comes in from the "then" block, it gets the
|
||||
value of "calltmp". If control comes from the "else" block, it gets the value
|
||||
of "calltmp1".</p>
|
||||
|
||||
<p>At this point, you are probably starting to think "oh no! this means my
|
||||
<p>At this point, you are probably starting to think "Oh no! This means my
|
||||
simple and elegant front-end will have to start generating SSA form in order to
|
||||
use LLVM!". Fortunately, this is not the case, and we strongly advise
|
||||
<em>not</em> implementing an SSA construction algorithm in your front-end
|
||||
unless there is an amazingly good reason to do so. In practice, there are two
|
||||
sorts of values that float around in code written in your average imperative
|
||||
sorts of values that float around in code written for your average imperative
|
||||
programming language that might need Phi nodes:</p>
|
||||
|
||||
<ol>
|
||||
<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
|
||||
<li>Values that are implicit in the structure of your AST, such as the phi node
|
||||
<li>Values that are implicit in the structure of your AST, such as the Phi node
|
||||
in this case.</li>
|
||||
</ol>
|
||||
|
||||
<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
|
||||
variables"), we'll talk about #1
|
||||
in depth. For now, just believe me that you don't need SSA construction to
|
||||
handle them. For #2, you have the choice of using the techniques that we will
|
||||
describe for #1, or you can insert Phi nodes directly if convenient. In this
|
||||
handle this case. For #2, you have the choice of using the techniques that we will
|
||||
describe for #1, or you can insert Phi nodes directly, if convenient. In this
|
||||
case, it is really really easy to generate the Phi node, so we choose to do it
|
||||
directly.</p>
|
||||
|
||||
@ -369,7 +369,7 @@ Value *IfExprAST::Codegen() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This code is straight-forward and similar to what we saw before. We emit the
|
||||
<p>This code is straightforward and similar to what we saw before. We emit the
|
||||
expression for the condition, then compare that value to zero to get a truth
|
||||
value as a 1-bit (bool) value.</p>
|
||||
|
||||
@ -395,7 +395,7 @@ block for its "parent" (the function it is currently embedded into).</p>
|
||||
|
||||
<p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
|
||||
into the constructor for the "then" block. This causes the constructor to
|
||||
automatically insert the new block onto the end of the specified function. The
|
||||
automatically insert the new block into the end of the specified function. The
|
||||
other two blocks are created, but aren't yet inserted into the function.</p>
|
||||
|
||||
<p>Once the blocks are created, we can emit the conditional branch that chooses
|
||||
@ -427,7 +427,7 @@ insertion point to be at the end of the specified block. However, since the
|
||||
block. :)</p>
|
||||
|
||||
<p>Once the insertion point is set, we recursively codegen the "then" expression
|
||||
from the AST. To finish off the then block, we create an unconditional branch
|
||||
from the AST. To finish off the "then" block, we create an unconditional branch
|
||||
to the merge block. One interesting (and very important) aspect of the LLVM IR
|
||||
is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
|
||||
to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
|
||||
@ -439,7 +439,7 @@ violate this rule, the verifier will emit an error.</p>
|
||||
is that when we create the Phi node in the merge block, we need to set up the
|
||||
block/value pairs that indicate how the Phi will work. Importantly, the Phi
|
||||
node expects to have an entry for each predecessor of the block in the CFG. Why
|
||||
then are we getting the current block when we just set it to ThenBB 5 lines
|
||||
then, are we getting the current block when we just set it to ThenBB 5 lines
|
||||
above? The problem is that the "Then" expression may actually itself change the
|
||||
block that the Builder is emitting into if, for example, it contains a nested
|
||||
"if/then/else" expression. Because calling Codegen recursively could
|
||||
@ -492,7 +492,7 @@ the if/then/else expression. In our example above, this returned value will
|
||||
feed into the code for the top-level function, which will create the return
|
||||
instruction.</p>
|
||||
|
||||
<p>Overall, we now have the ability to execution conditional code in
|
||||
<p>Overall, we now have the ability to execute conditional code in
|
||||
Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
|
||||
that can calculate a wide variety of numeric functions. Next up we'll add
|
||||
another useful expression that is familiar from non-functional languages...</p>
|
||||
@ -571,7 +571,7 @@ the 'for' Loop</a></div>
|
||||
|
||||
<div class="doc_text">
|
||||
|
||||
<p>The AST node is similarly simple. It basically boils down to capturing
|
||||
<p>The AST node is just as simple. It basically boils down to capturing
|
||||
the variable name and the constituent expressions in the node.</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -704,7 +704,7 @@ the 'for' Loop</a></div>
|
||||
|
||||
<div class="doc_text">
|
||||
|
||||
<p>The first part of codegen is very simple: we just output the start expression
|
||||
<p>The first part of Codegen is very simple: we just output the start expression
|
||||
for the loop value:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -804,7 +804,7 @@ references to it will naturally find it in the symbol table.</p>
|
||||
</div>
|
||||
|
||||
<p>Now that the body is emitted, we compute the next value of the iteration
|
||||
variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
|
||||
variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>'
|
||||
will be the value of the loop variable on the next iteration of the loop.</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@ -839,8 +839,7 @@ statement.</p>
|
||||
</div>
|
||||
|
||||
<p>With the code for the body of the loop complete, we just need to finish up
|
||||
the control flow for it. This remembers the end block (for the phi node), then
|
||||
creates the block for the loop exit ("afterloop"). Based on the value of the
|
||||
the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the
|
||||
exit condition, it creates a conditional branch that chooses between executing
|
||||
the loop again and exiting the loop. Any future code is emitted in the
|
||||
"afterloop" block, so it sets the insertion position to it.</p>
|
||||
@ -869,8 +868,7 @@ the for loop. Finally, code generation of the for loop always returns 0.0, so
|
||||
that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
|
||||
|
||||
<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
|
||||
the tutorial. We added two control flow constructs, and used them to motivate
|
||||
a couple of aspects of the LLVM IR that are important for front-end implementors
|
||||
the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors
|
||||
to know. In the next chapter of our saga, we will get a bit crazier and add
|
||||
<a href="LangImpl6.html">user-defined operators</a> to our poor innocent
|
||||
language.</p>
|
||||
|
Loading…
x
Reference in New Issue
Block a user