docs: Sphinxify docs/tutorial/

Sorry for the massive commit, but I just wanted to knock this one down and it is really straightforward. There are still a couple trivial (i.e. not related to the content) things left to fix: - Use of raw HTML links where :doc:`...` and :ref:`...` could be used instead. If you are a newbie and want to help fix this it would make for some good bite-sized patches; more experienced developers should be focusing on adding new content (to this tutorial or elsewhere, but please _do not_ waste your time on formatting when there is such dire need for documentation (see docs/SphinxQuickstartTemplate.rst to get started writing)). - Highlighting of the kaleidoscope code blocks (currently left as bare `::`). I will be working on writing a custom Pygments highlighter for this, mostly as training for maintaining the `llvm` code-block's lexer in-tree. I want to do this because I am extremely unhappy with how it just "gives up" on the slightest deviation from the expected syntax and leaves the whole code-block un-highlighted. More generally I am looking at writing some Sphinx extensions and keeping them in-tree as well, to support common use cases that currently have no good solution (like "monospace text inside a link"). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169343 91177308-0d34-0410-b5e6-96231b3b80d8
2025-01-14 16:33:28 +00:00 · 2012-12-05 00:26:32 +00:00 · 2012-12-05 00:26:32 +00:00 · ee47edfd8e
commit ee47edfd8e
parent 4e54480531
34 changed files with 17107 additions and 19075 deletions
--- a/docs/Makefile.sphinx
+++ b/docs/Makefile.sphinx
@ -50,8 +50,6 @@ html:
 	@# Kind of a hack, but HTML-formatted docs are on the way out anyway.
 	@echo "Copying legacy HTML-formatted docs into $(BUILDDIR)/html"
 	@cp -a *.html $(BUILDDIR)/html
 	@mkdir -p $(BUILDDIR)/html/tutorial
 	@cp tutorial/*.html tutorial/*.png $(BUILDDIR)/html/tutorial
 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 dirhtml:
--- a/docs/tutorial/LangImpl1.html
+++ b/docs/tutorial/LangImpl1.html
@ -1,348 +0,0 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
  <title>Kaleidoscope: Tutorial Introduction and the Lexer</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="author" content="Chris Lattner">
  <link rel="stylesheet" href="../_static/llvm.css" type="text/css">
 </head>
 <body>
 <h1>Kaleidoscope: Tutorial Introduction and the Lexer</h1>
 <ul>
 <li><a href="index.html">Up to Tutorial Index</a></li>
 <li>Chapter 1
  <ol>
    <li><a href="#intro">Tutorial Introduction</a></li>
    <li><a href="#language">The Basic Language</a></li>
    <li><a href="#lexer">The Lexer</a></li>
  </ol>
 </li>
 <li><a href="LangImpl2.html">Chapter 2</a>: Implementing a Parser and AST</li>
 </ul>
 <div class="doc_author">
  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="intro">Tutorial Introduction</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>Welcome to the "Implementing a language with LLVM" tutorial.  This tutorial
 runs through the implementation of a simple language, showing how fun and
 easy it can be.  This tutorial will get you up and started as well as help to
 build a framework you can extend to other languages.  The code in this tutorial
 can also be used as a playground to hack on other LLVM specific things.
 </p>
 <p>
 The goal of this tutorial is to progressively unveil our language, describing
 how it is built up over time.  This will let us cover a fairly broad range of
 language design and LLVM-specific usage issues, showing and explaining the code
 for it all along the way, without overwhelming you with tons of details up
 front.</p>
 <p>It is useful to point out ahead of time that this tutorial is really about
 teaching compiler techniques and LLVM specifically, <em>not</em> about teaching
 modern and sane software engineering principles.  In practice, this means that
 we'll take a number of shortcuts to simplify the exposition.  For example, the
 code leaks memory, uses global variables all over the place, doesn't use nice
 design patterns like <a
 href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it
 is very simple.  If you dig in and use the code as a basis for future projects,
 fixing these deficiencies shouldn't be hard.</p>
 <p>I've tried to put this tutorial together in a way that makes chapters easy to
 skip over if you are already familiar with or are uninterested in the various
 pieces.  The structure of the tutorial is:
 </p>
 <ul>
 <li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope
 language, and the definition of its Lexer</b> - This shows where we are going
 and the basic functionality that we want it to do.  In order to make this
 tutorial maximally understandable and hackable, we choose to implement 
 everything in C++ instead of using lexer and parser generators.  LLVM obviously
 works just fine with such tools, feel free to use one if you prefer.</li>
 <li><b><a href="LangImpl2.html">Chapter #2</a>: Implementing a Parser and
 AST</b> - With the lexer in place, we can talk about parsing techniques and
 basic AST construction.  This tutorial describes recursive descent parsing and
 operator precedence parsing.  Nothing in Chapters 1 or 2 is LLVM-specific,
 the code doesn't even link in LLVM at this point. :)</li>
 <li><b><a href="LangImpl3.html">Chapter #3</a>: Code generation to LLVM IR</b> -
 With the AST ready, we can show off how easy generation of LLVM IR really 
 is.</li>
 <li><b><a href="LangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer
 Support</b> - Because a lot of people are interested in using LLVM as a JIT,
 we'll dive right into it and show you the 3 lines it takes to add JIT support.
 LLVM is also useful in many other ways, but this is one simple and "sexy" way
 to shows off its power. :)</li>
 <li><b><a href="LangImpl5.html">Chapter #5</a>: Extending the Language: Control
 Flow</b> - With the language up and running, we show how to extend it with
 control flow operations (if/then/else and a 'for' loop).  This gives us a chance
 to talk about simple SSA construction and control flow.</li>
 <li><b><a href="LangImpl6.html">Chapter #6</a>: Extending the Language: 
 User-defined Operators</b> - This is a silly but fun chapter that talks about
 extending the language to let the user program define their own arbitrary
 unary and binary operators (with assignable precedence!).  This lets us build a
 significant piece of the "language" as library routines.</li>
 <li><b><a href="LangImpl7.html">Chapter #7</a>: Extending the Language: Mutable
 Variables</b> - This chapter talks about adding user-defined local variables
 along with an assignment operator.  The interesting part about this is how
 easy and trivial it is to construct SSA form in LLVM: no, LLVM does <em>not</em>
 require your front-end to construct SSA form!</li>
 <li><b><a href="LangImpl8.html">Chapter #8</a>: Conclusion and other useful LLVM
 tidbits</b> - This chapter wraps up the series by talking about potential
 ways to extend the language, but also includes a bunch of pointers to info about
 "special topics" like adding garbage collection support, exceptions, debugging,
 support for "spaghetti stacks", and a bunch of other tips and tricks.</li>
 </ul>
 <p>By the end of the tutorial, we'll have written a bit less than 700 lines of 
 non-comment, non-blank, lines of code.  With this small amount of code, we'll
 have built up a very reasonable compiler for a non-trivial language including
 a hand-written lexer, parser, AST, as well as code generation support with a JIT
 compiler.  While other systems may have interesting "hello world" tutorials,
 I think the breadth of this tutorial is a great testament to the strengths of
 LLVM and why you should consider it if you're interested in language or compiler
 design.</p>
 <p>A note about this tutorial: we expect you to extend the language and play
 with it on your own.  Take the code and go crazy hacking away at it, compilers
 don't need to be scary creatures - it can be a lot of fun to play with
 languages!</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="language">The Basic Language</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>This tutorial will be illustrated with a toy language that we'll call
 "<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived 
 from "meaning beautiful, form, and view").
 Kaleidoscope is a procedural language that allows you to define functions, use
 conditionals, math, etc.  Over the course of the tutorial, we'll extend
 Kaleidoscope to support the if/then/else construct, a for loop, user defined
 operators, JIT compilation with a simple command line interface, etc.</p>
 <p>Because we want to keep things simple, the only datatype in Kaleidoscope is a
 64-bit floating point type (aka 'double' in C parlance).  As such, all values
 are implicitly double precision and the language doesn't require type
 declarations.  This gives the language a very nice and simple syntax.  For
 example, the following simple example computes <a 
 href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p>
 <div class="doc_code">
 <pre>
 # Compute the x'th fibonacci number.
 def fib(x)
  if x &lt; 3 then
    1
  else
    fib(x-1)+fib(x-2)
 # This expression will compute the 40th number.
 fib(40)
 </pre>
 </div>
 <p>We also allow Kaleidoscope to call into standard library functions (the LLVM
 JIT makes this completely trivial).  This means that you can use the 'extern'
 keyword to define a function before you use it (this is also useful for mutually
 recursive functions).  For example:</p>
 <div class="doc_code">
 <pre>
 extern sin(arg);
 extern cos(arg);
 extern atan2(arg1 arg2);
 atan2(sin(.4), cos(42))
 </pre>
 </div>
 <p>A more interesting example is included in Chapter 6 where we write a little
 Kaleidoscope application that <a href="LangImpl6.html#example">displays 
 a Mandelbrot Set</a> at various levels of magnification.</p>
 <p>Lets dive into the implementation of this language!</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="lexer">The Lexer</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>When it comes to implementing a language, the first thing needed is
 the ability to process a text file and recognize what it says.  The traditional
 way to do this is to use a "<a 
 href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
 to break the input up into "tokens".  Each token returned by the lexer includes
 a token code and potentially some metadata (e.g. the numeric value of a number).
 First, we define the possibilities:
 </p>
 <div class="doc_code">
 <pre>
 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 // of these for known things.
 enum Token {
  tok_eof = -1,
  // commands
  tok_def = -2, tok_extern = -3,
  // primary
  tok_identifier = -4, tok_number = -5,
 };
 static std::string IdentifierStr;  // Filled in if tok_identifier
 static double NumVal;              // Filled in if tok_number
 </pre>
 </div>
 <p>Each token returned by our lexer will either be one of the Token enum values
 or it will be an 'unknown' character like '+', which is returned as its ASCII
 value.  If the current token is an identifier, the <tt>IdentifierStr</tt>
 global variable holds the name of the identifier.  If the current token is a
 numeric literal (like 1.0), <tt>NumVal</tt> holds its value.  Note that we use
 global variables for simplicity, this is not the best choice for a real language
 implementation :).
 </p>
 <p>The actual implementation of the lexer is a single function named
 <tt>gettok</tt>. The <tt>gettok</tt> function is called to return the next token
 from standard input.  Its definition starts as:</p>
 <div class="doc_code">
 <pre>
 /// gettok - Return the next token from standard input.
 static int gettok() {
  static int LastChar = ' ';
  // Skip any whitespace.
  while (isspace(LastChar))
    LastChar = getchar();
 </pre>
 </div>
 <p>
 <tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read
 characters one at a time from standard input.  It eats them as it recognizes
 them and stores the last character read, but not processed, in LastChar.  The
 first thing that it has to do is ignore whitespace between tokens.  This is 
 accomplished with the loop above.</p>
 <p>The next thing <tt>gettok</tt> needs to do is recognize identifiers and
 specific keywords like "def".  Kaleidoscope does this with this simple loop:</p>
 <div class="doc_code">
 <pre>
  if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
    IdentifierStr = LastChar;
    while (isalnum((LastChar = getchar())))
      IdentifierStr += LastChar;
    if (IdentifierStr == "def") return tok_def;
    if (IdentifierStr == "extern") return tok_extern;
    return tok_identifier;
  }
 </pre>
 </div>
 <p>Note that this code sets the '<tt>IdentifierStr</tt>' global whenever it
 lexes an identifier.  Also, since language keywords are matched by the same
 loop, we handle them here inline.  Numeric values are similar:</p>
 <div class="doc_code">
 <pre>
  if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
    std::string NumStr;
    do {
      NumStr += LastChar;
      LastChar = getchar();
    } while (isdigit(LastChar) || LastChar == '.');
    NumVal = strtod(NumStr.c_str(), 0);
    return tok_number;
  }
 </pre>
 </div>
 <p>This is all pretty straight-forward code for processing input.  When reading
 a numeric value from input, we use the C <tt>strtod</tt> function to convert it
 to a numeric value that we store in <tt>NumVal</tt>.  Note that this isn't doing
 sufficient error checking: it will incorrectly read "1.23.45.67" and handle it as
 if you typed in "1.23".  Feel free to extend it :).  Next we handle comments:
 </p>
 <div class="doc_code">
 <pre>
  if (LastChar == '#') {
    // Comment until end of line.
    do LastChar = getchar();
    while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
    if (LastChar != EOF)
      return gettok();
  }
 </pre>
 </div>
 <p>We handle comments by skipping to the end of the line and then return the
 next token.  Finally, if the input doesn't match one of the above cases, it is
 either an operator character like '+' or the end of the file.  These are handled
 with this code:</p>
 <div class="doc_code">
 <pre>
  // Check for end of file.  Don't eat the EOF.
  if (LastChar == EOF)
    return tok_eof;
  // Otherwise, just return the character as its ascii value.
  int ThisChar = LastChar;
  LastChar = getchar();
  return ThisChar;
 }
 </pre>
 </div>
 <p>With this, we have the complete lexer for the basic Kaleidoscope language
 (the <a href="LangImpl2.html#code">full code listing</a> for the Lexer is
 available in the <a href="LangImpl2.html">next chapter</a> of the tutorial).
 Next we'll <a href="LangImpl2.html">build a simple parser that uses this to 
 build an Abstract Syntax Tree</a>.  When we have that, we'll include a driver
 so that you can use the lexer and parser together.
 </p>
 <a href="LangImpl2.html">Next: Implementing a Parser and AST</a>
 </div>
 <!-- *********************************************************************** -->
 <hr>
 <address>
  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
  <a href="http://validator.w3.org/check/referer"><img
  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
  <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
  Last modified: $Date$
 </address>
 </body>
 </html>
--- a/docs/tutorial/LangImpl1.rst
+++ b/docs/tutorial/LangImpl1.rst
@ -0,0 +1,280 @@
 =================================================
 Kaleidoscope: Tutorial Introduction and the Lexer
 =================================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_
 Tutorial Introduction
 =====================
 Welcome to the "Implementing a language with LLVM" tutorial. This
 tutorial runs through the implementation of a simple language, showing
 how fun and easy it can be. This tutorial will get you up and started as
 well as help to build a framework you can extend to other languages. The
 code in this tutorial can also be used as a playground to hack on other
 LLVM specific things.
 The goal of this tutorial is to progressively unveil our language,
 describing how it is built up over time. This will let us cover a fairly
 broad range of language design and LLVM-specific usage issues, showing
 and explaining the code for it all along the way, without overwhelming
 you with tons of details up front.
 It is useful to point out ahead of time that this tutorial is really
 about teaching compiler techniques and LLVM specifically, *not* about
 teaching modern and sane software engineering principles. In practice,
 this means that we'll take a number of shortcuts to simplify the
 exposition. For example, the code leaks memory, uses global variables
 all over the place, doesn't use nice design patterns like
 `visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
 it is very simple. If you dig in and use the code as a basis for future
 projects, fixing these deficiencies shouldn't be hard.
 I've tried to put this tutorial together in a way that makes chapters
 easy to skip over if you are already familiar with or are uninterested
 in the various pieces. The structure of the tutorial is:
 -  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope
   language, and the definition of its Lexer - This shows where we are
   going and the basic functionality that we want it to do. In order to
   make this tutorial maximally understandable and hackable, we choose
   to implement everything in C++ instead of using lexer and parser
   generators. LLVM obviously works just fine with such tools, feel free
   to use one if you prefer.
 -  `Chapter #2 <LangImpl2.html>`_: Implementing a Parser and AST -
   With the lexer in place, we can talk about parsing techniques and
   basic AST construction. This tutorial describes recursive descent
   parsing and operator precedence parsing. Nothing in Chapters 1 or 2
   is LLVM-specific, the code doesn't even link in LLVM at this point.
   :)
 -  `Chapter #3 <LangImpl3.html>`_: Code generation to LLVM IR - With
   the AST ready, we can show off how easy generation of LLVM IR really
   is.
 -  `Chapter #4 <LangImpl4.html>`_: Adding JIT and Optimizer Support
   - Because a lot of people are interested in using LLVM as a JIT,
   we'll dive right into it and show you the 3 lines it takes to add JIT
   support. LLVM is also useful in many other ways, but this is one
   simple and "sexy" way to shows off its power. :)
 -  `Chapter #5 <LangImpl5.html>`_: Extending the Language: Control
   Flow - With the language up and running, we show how to extend it
   with control flow operations (if/then/else and a 'for' loop). This
   gives us a chance to talk about simple SSA construction and control
   flow.
 -  `Chapter #6 <LangImpl6.html>`_: Extending the Language:
   User-defined Operators - This is a silly but fun chapter that talks
   about extending the language to let the user program define their own
   arbitrary unary and binary operators (with assignable precedence!).
   This lets us build a significant piece of the "language" as library
   routines.
 -  `Chapter #7 <LangImpl7.html>`_: Extending the Language: Mutable
   Variables - This chapter talks about adding user-defined local
   variables along with an assignment operator. The interesting part
   about this is how easy and trivial it is to construct SSA form in
   LLVM: no, LLVM does *not* require your front-end to construct SSA
   form!
 -  `Chapter #8 <LangImpl8.html>`_: Conclusion and other useful LLVM
   tidbits - This chapter wraps up the series by talking about
   potential ways to extend the language, but also includes a bunch of
   pointers to info about "special topics" like adding garbage
   collection support, exceptions, debugging, support for "spaghetti
   stacks", and a bunch of other tips and tricks.
 By the end of the tutorial, we'll have written a bit less than 700 lines
 of non-comment, non-blank, lines of code. With this small amount of
 code, we'll have built up a very reasonable compiler for a non-trivial
 language including a hand-written lexer, parser, AST, as well as code
 generation support with a JIT compiler. While other systems may have
 interesting "hello world" tutorials, I think the breadth of this
 tutorial is a great testament to the strengths of LLVM and why you
 should consider it if you're interested in language or compiler design.
 A note about this tutorial: we expect you to extend the language and
 play with it on your own. Take the code and go crazy hacking away at it,
 compilers don't need to be scary creatures - it can be a lot of fun to
 play with languages!
 The Basic Language
 ==================
 This tutorial will be illustrated with a toy language that we'll call
 "`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
 from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
 language that allows you to define functions, use conditionals, math,
 etc. Over the course of the tutorial, we'll extend Kaleidoscope to
 support the if/then/else construct, a for loop, user defined operators,
 JIT compilation with a simple command line interface, etc.
 Because we want to keep things simple, the only datatype in Kaleidoscope
 is a 64-bit floating point type (aka 'double' in C parlance). As such,
 all values are implicitly double precision and the language doesn't
 require type declarations. This gives the language a very nice and
 simple syntax. For example, the following simple example computes
 `Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_
 ::
    # Compute the x'th fibonacci number.
    def fib(x)
      if x < 3 then
        1
      else
        fib(x-1)+fib(x-2)
    # This expression will compute the 40th number.
    fib(40)
 We also allow Kaleidoscope to call into standard library functions (the
 LLVM JIT makes this completely trivial). This means that you can use the
 'extern' keyword to define a function before you use it (this is also
 useful for mutually recursive functions). For example:
 ::
    extern sin(arg);
    extern cos(arg);
    extern atan2(arg1 arg2);
    atan2(sin(.4), cos(42))
 A more interesting example is included in Chapter 6 where we write a
 little Kaleidoscope application that `displays a Mandelbrot
 Set <LangImpl6.html#example>`_ at various levels of magnification.
 Lets dive into the implementation of this language!
 The Lexer
 =========
 When it comes to implementing a language, the first thing needed is the
 ability to process a text file and recognize what it says. The
 traditional way to do this is to use a
 "`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka
 'scanner') to break the input up into "tokens". Each token returned by
 the lexer includes a token code and potentially some metadata (e.g. the
 numeric value of a number). First, we define the possibilities:
 .. code-block:: c++
    // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
    // of these for known things.
    enum Token {
      tok_eof = -1,
      // commands
      tok_def = -2, tok_extern = -3,
      // primary
      tok_identifier = -4, tok_number = -5,
    };
    static std::string IdentifierStr;  // Filled in if tok_identifier
    static double NumVal;              // Filled in if tok_number
 Each token returned by our lexer will either be one of the Token enum
 values or it will be an 'unknown' character like '+', which is returned
 as its ASCII value. If the current token is an identifier, the
 ``IdentifierStr`` global variable holds the name of the identifier. If
 the current token is a numeric literal (like 1.0), ``NumVal`` holds its
 value. Note that we use global variables for simplicity, this is not the
 best choice for a real language implementation :).
 The actual implementation of the lexer is a single function named
 ``gettok``. The ``gettok`` function is called to return the next token
 from standard input. Its definition starts as:
 .. code-block:: c++
    /// gettok - Return the next token from standard input.
    static int gettok() {
      static int LastChar = ' ';
      // Skip any whitespace.
      while (isspace(LastChar))
        LastChar = getchar();
 ``gettok`` works by calling the C ``getchar()`` function to read
 characters one at a time from standard input. It eats them as it
 recognizes them and stores the last character read, but not processed,
 in LastChar. The first thing that it has to do is ignore whitespace
 between tokens. This is accomplished with the loop above.
 The next thing ``gettok`` needs to do is recognize identifiers and
 specific keywords like "def". Kaleidoscope does this with this simple
 loop:
 .. code-block:: c++
      if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
        IdentifierStr = LastChar;
        while (isalnum((LastChar = getchar())))
          IdentifierStr += LastChar;
        if (IdentifierStr == "def") return tok_def;
        if (IdentifierStr == "extern") return tok_extern;
        return tok_identifier;
      }
 Note that this code sets the '``IdentifierStr``' global whenever it
 lexes an identifier. Also, since language keywords are matched by the
 same loop, we handle them here inline. Numeric values are similar:
 .. code-block:: c++
      if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
        std::string NumStr;
        do {
          NumStr += LastChar;
          LastChar = getchar();
        } while (isdigit(LastChar) || LastChar == '.');
        NumVal = strtod(NumStr.c_str(), 0);
        return tok_number;
      }
 This is all pretty straight-forward code for processing input. When
 reading a numeric value from input, we use the C ``strtod`` function to
 convert it to a numeric value that we store in ``NumVal``. Note that
 this isn't doing sufficient error checking: it will incorrectly read
 "1.23.45.67" and handle it as if you typed in "1.23". Feel free to
 extend it :). Next we handle comments:
 .. code-block:: c++
      if (LastChar == '#') {
        // Comment until end of line.
        do LastChar = getchar();
        while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
        if (LastChar != EOF)
          return gettok();
      }
 We handle comments by skipping to the end of the line and then return
 the next token. Finally, if the input doesn't match one of the above
 cases, it is either an operator character like '+' or the end of the
 file. These are handled with this code:
 .. code-block:: c++
      // Check for end of file.  Don't eat the EOF.
      if (LastChar == EOF)
        return tok_eof;
      // Otherwise, just return the character as its ascii value.
      int ThisChar = LastChar;
      LastChar = getchar();
      return ThisChar;
    }
 With this, we have the complete lexer for the basic Kaleidoscope
 language (the `full code listing <LangImpl2.html#code>`_ for the Lexer
 is available in the `next chapter <LangImpl2.html>`_ of the tutorial).
 Next we'll `build a simple parser that uses this to build an Abstract
 Syntax Tree <LangImpl2.html>`_. When we have that, we'll include a
 driver so that you can use the lexer and parser together.
 `Next: Implementing a Parser and AST <LangImpl2.html>`_
--- a/docs/tutorial/LangImpl2.html
+++ b/docs/tutorial/LangImpl2.html
--- a/docs/tutorial/LangImpl2.rst
+++ b/docs/tutorial/LangImpl2.rst
--- a/docs/tutorial/LangImpl3.html
+++ b/docs/tutorial/LangImpl3.html
--- a/docs/tutorial/LangImpl3.rst
+++ b/docs/tutorial/LangImpl3.rst
--- a/docs/tutorial/LangImpl4.html
+++ b/docs/tutorial/LangImpl4.html
--- a/docs/tutorial/LangImpl4.rst
+++ b/docs/tutorial/LangImpl4.rst
--- a/docs/tutorial/LangImpl5.html
+++ b/docs/tutorial/LangImpl5.html
--- a/docs/tutorial/LangImpl5.rst
+++ b/docs/tutorial/LangImpl5.rst
--- a/docs/tutorial/LangImpl6.html
+++ b/docs/tutorial/LangImpl6.html
--- a/docs/tutorial/LangImpl6.rst
+++ b/docs/tutorial/LangImpl6.rst
--- a/docs/tutorial/LangImpl7.html
+++ b/docs/tutorial/LangImpl7.html
--- a/docs/tutorial/LangImpl7.rst
+++ b/docs/tutorial/LangImpl7.rst
--- a/docs/tutorial/LangImpl8.html
+++ b/docs/tutorial/LangImpl8.html
@ -1,359 +0,0 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
  <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="author" content="Chris Lattner">
  <link rel="stylesheet" href="../_static/llvm.css" type="text/css">
 </head>
 <body>
 <h1>Kaleidoscope: Conclusion and other useful LLVM tidbits</h1>
 <ul>
 <li><a href="index.html">Up to Tutorial Index</a></li>
 <li>Chapter 8
  <ol>
    <li><a href="#conclusion">Tutorial Conclusion</a></li>
    <li><a href="#llvmirproperties">Properties of LLVM IR</a>
    <ul>
      <li><a href="#targetindep">Target Independence</a></li>
      <li><a href="#safety">Safety Guarantees</a></li>
      <li><a href="#langspecific">Language-Specific Optimizations</a></li>
    </ul>
    </li>
    <li><a href="#tipsandtricks">Tips and Tricks</a>
    <ul>
      <li><a href="#offsetofsizeof">Implementing portable 
                                    offsetof/sizeof</a></li>
      <li><a href="#gcstack">Garbage Collected Stack Frames</a></li>
    </ul>
    </li>
  </ol>
 </li>
 </ul>
 <div class="doc_author">
  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="conclusion">Tutorial Conclusion</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>Welcome to the final chapter of the "<a href="index.html">Implementing a
 language with LLVM</a>" tutorial.  In the course of this tutorial, we have grown
 our little Kaleidoscope language from being a useless toy, to being a
 semi-interesting (but probably still useless) toy. :)</p>
 <p>It is interesting to see how far we've come, and how little code it has
 taken.  We built the entire lexer, parser, AST, code generator, and an 
 interactive run-loop (with a JIT!) by-hand in under 700 lines of
 (non-comment/non-blank) code.</p>
 <p>Our little language supports a couple of interesting features: it supports
 user defined binary and unary operators, it uses JIT compilation for immediate
 evaluation, and it supports a few control flow constructs with SSA construction.
 </p>
 <p>Part of the idea of this tutorial was to show you how easy and fun it can be
 to define, build, and play with languages.  Building a compiler need not be a
 scary or mystical process!  Now that you've seen some of the basics, I strongly
 encourage you to take the code and hack on it.  For example, try adding:</p>
 <ul>
 <li><b>global variables</b> - While global variables have questional value in
 modern software engineering, they are often useful when putting together quick
 little hacks like the Kaleidoscope compiler itself.  Fortunately, our current
 setup makes it very easy to add global variables: just have value lookup check
 to see if an unresolved variable is in the global variable symbol table before
 rejecting it.  To create a new global variable, make an instance of the LLVM
 <tt>GlobalVariable</tt> class.</li>
 <li><b>typed variables</b> - Kaleidoscope currently only supports variables of
 type double.  This gives the language a very nice elegance, because only
 supporting one type means that you never have to specify types.  Different
 languages have different ways of handling this.  The easiest way is to require
 the user to specify types for every variable definition, and record the type
 of the variable in the symbol table along with its Value*.</li>
 <li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start
 extending the type system in all sorts of interesting ways.  Simple arrays are
 very easy and are quite useful for many different applications.  Adding them is
 mostly an exercise in learning how the LLVM <a 
 href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it
 is so nifty/unconventional, it <a 
 href="../GetElementPtr.html">has its own FAQ</a>!  If you add support
 for recursive types (e.g. linked lists), make sure to read the <a 
 href="../ProgrammersManual.html#TypeResolve">section in the LLVM
 Programmer's Manual</a> that describes how to construct them.</li>
 <li><b>standard runtime</b> - Our current language allows the user to access
 arbitrary external functions, and we use it for things like "printd" and
 "putchard".  As you extend the language to add higher-level constructs, often
 these constructs make the most sense if they are lowered to calls into a
 language-supplied runtime.  For example, if you add hash tables to the language,
 it would probably make sense to add the routines to a runtime, instead of 
 inlining them all the way.</li>
 <li><b>memory management</b> - Currently we can only access the stack in
 Kaleidoscope.  It would also be useful to be able to allocate heap memory,
 either with calls to the standard libc malloc/free interface or with a garbage
 collector.  If you would like to use garbage collection, note that LLVM fully
 supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a>
 including algorithms that move objects and need to scan/update the stack.</li>
 <li><b>debugger support</b> - LLVM supports generation of <a 
 href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by
 common debuggers like GDB.  Adding support for debug info is fairly 
 straightforward.  The best way to understand it is to compile some C/C++ code
 with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li>
 <li><b>exception handling support</b> - LLVM supports generation of <a 
 href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate
 with code compiled in other languages.  You could also generate code by
 implicitly making every function return an error value and checking it.  You 
 could also make explicit use of setjmp/longjmp.  There are many different ways
 to go here.</li>
 <li><b>object orientation, generics, database access, complex numbers,
 geometric programming, ...</b> - Really, there is
 no end of crazy features that you can add to the language.</li>
 <li><b>unusual domains</b> - We've been talking about applying LLVM to a domain
 that many people are interested in: building a compiler for a specific language.
 However, there are many other domains that can use compiler technology that are
 not typically considered.  For example, LLVM has been used to implement OpenGL
 graphics acceleration, translate C++ code to ActionScript, and many other
 cute and clever things.  Maybe you will be the first to JIT compile a regular
 expression interpreter into native code with LLVM?</li>
 </ul>
 <p>
 Have fun - try doing something crazy and unusual.  Building a language like
 everyone else always has, is much less fun than trying something a little crazy
 or off the wall and seeing how it turns out.  If you get stuck or want to talk
 about it, feel free to email the <a 
 href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing 
 list</a>: it has lots of people who are interested in languages and are often
 willing to help out.
 </p>
 <p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating
 LLVM IR.  These are some of the more subtle things that may not be obvious, but
 are very useful if you want to take advantage of LLVM's capabilities.</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="llvmirproperties">Properties of the LLVM IR</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>We have a couple common questions about code in the LLVM IR form - lets just
 get these out of the way right now, shall we?</p>
 <!-- ======================================================================= -->
 <h4><a name="targetindep">Target Independence</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Kaleidoscope is an example of a "portable language": any program written in
 Kaleidoscope will work the same way on any target that it runs on.  Many other
 languages have this property, e.g. lisp, java, haskell, javascript, python, etc
 (note that while these languages are portable, not all their libraries are).</p>
 <p>One nice aspect of LLVM is that it is often capable of preserving target
 independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled 
 program and run it on any target that LLVM supports, even emitting C code and
 compiling that on targets that LLVM doesn't support natively.  You can trivially
 tell that the Kaleidoscope compiler generates target-independent code because it
 never queries for any target-specific information when generating code.</p>
 <p>The fact that LLVM provides a compact, target-independent, representation for
 code gets a lot of people excited.  Unfortunately, these people are usually
 thinking about C or a language from the C family when they are asking questions
 about language portability.  I say "unfortunately", because there is really no
 way to make (fully general) C code portable, other than shipping the source code
 around (and of course, C source code is not actually portable in general
 either - ever port a really old application from 32- to 64-bits?).</p>
 <p>The problem with C (again, in its full generality) is that it is heavily
 laden with target specific assumptions.  As one simple example, the preprocessor
 often destructively removes target-independence from the code when it processes
 the input text:</p>
 <div class="doc_code">
 <pre>
 #ifdef __i386__
  int X = 1;
 #else
  int X = 42;
 #endif
 </pre>
 </div>
 <p>While it is possible to engineer more and more complex solutions to problems
 like this, it cannot be solved in full generality in a way that is better than shipping
 the actual source code.</p>
 <p>That said, there are interesting subsets of C that can be made portable.  If
 you are willing to fix primitive types to a fixed size (say int = 32-bits, 
 and long = 64-bits), don't care about ABI compatibility with existing binaries,
 and are willing to give up some other minor features, you can have portable
 code.  This can make sense for specialized domains such as an
 in-kernel language.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="safety">Safety Guarantees</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Many of the languages above are also "safe" languages: it is impossible for
 a program written in Java to corrupt its address space and crash the process
 (assuming the JVM has no bugs).
 Safety is an interesting property that requires a combination of language
 design, runtime support, and often operating system support.</p>
 <p>It is certainly possible to implement a safe language in LLVM, but LLVM IR
 does not itself guarantee safety.  The LLVM IR allows unsafe pointer casts,
 use after free bugs, buffer over-runs, and a variety of other problems.  Safety
 needs to be implemented as a layer on top of LLVM and, conveniently, several
 groups have investigated this.  Ask on the <a 
 href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing 
 list</a> if you are interested in more details.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="langspecific">Language-Specific Optimizations</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>One thing about LLVM that turns off many people is that it does not solve all
 the world's problems in one system (sorry 'world hunger', someone else will have
 to solve you some other day).  One specific complaint is that people perceive
 LLVM as being incapable of performing high-level language-specific optimization:
 LLVM "loses too much information".</p>
 <p>Unfortunately, this is really not the place to give you a full and unified
 version of "Chris Lattner's theory of compiler design".  Instead, I'll make a
 few observations:</p>
 <p>First, you're right that LLVM does lose information.  For example, as of this
 writing, there is no way to distinguish in the LLVM IR whether an SSA-value came
 from a C "int" or a C "long" on an ILP32 machine (other than debug info).  Both
 get compiled down to an 'i32' value and the information about what it came from
 is lost.  The more general issue here, is that the LLVM type system uses
 "structural equivalence" instead of "name equivalence".  Another place this
 surprises people is if you have two types in a high-level language that have the
 same structure (e.g. two different structs that have a single int field): these
 types will compile down into a single LLVM type and it will be impossible to
 tell what it came from.</p>
 <p>Second, while LLVM does lose information, LLVM is not a fixed target: we 
 continue to enhance and improve it in many different ways.  In addition to
 adding new features (LLVM did not always support exceptions or debug info), we
 also extend the IR to capture important information for optimization (e.g.
 whether an argument is sign or zero extended, information about pointers
 aliasing, etc).  Many of the enhancements are user-driven: people want LLVM to
 include some specific feature, so they go ahead and extend it.</p>
 <p>Third, it is <em>possible and easy</em> to add language-specific
 optimizations, and you have a number of choices in how to do it.  As one trivial
 example, it is easy to add language-specific optimization passes that
 "know" things about code compiled for a language.  In the case of the C family,
 there is an optimization pass that "knows" about the standard C library
 functions.  If you call "exit(0)" in main(), it knows that it is safe to
 optimize that into "return 0;" because C specifies what the 'exit'
 function does.</p>
 <p>In addition to simple library knowledge, it is possible to embed a variety of
 other language-specific information into the LLVM IR.  If you have a specific
 need and run into a wall, please bring the topic up on the llvmdev list.  At the
 very worst, you can always treat LLVM as if it were a "dumb code generator" and
 implement the high-level optimizations you desire in your front-end, on the
 language-specific AST.
 </p>
 </div>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="tipsandtricks">Tips and Tricks</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>There is a variety of useful tips and tricks that you come to know after
 working on/with LLVM that aren't obvious at first glance.  Instead of letting
 everyone rediscover them, this section talks about some of these issues.</p>
 <!-- ======================================================================= -->
 <h4><a name="offsetofsizeof">Implementing portable offsetof/sizeof</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>One interesting thing that comes up, if you are trying to keep the code 
 generated by your compiler "target independent", is that you often need to know
 the size of some LLVM type or the offset of some field in an llvm structure.
 For example, you might need to pass the size of a type into a function that
 allocates memory.</p>
 <p>Unfortunately, this can vary widely across targets: for example the width of
 a pointer is trivially target-specific.  However, there is a <a 
 href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever
 way to use the getelementptr instruction</a> that allows you to compute this
 in a portable way.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="gcstack">Garbage Collected Stack Frames</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Some languages want to explicitly manage their stack frames, often so that
 they are garbage collected or to allow easy implementation of closures.  There
 are often better ways to implement these features than explicit stack frames,
 but <a 
 href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM
 does support them,</a> if you want.  It requires your front-end to convert the
 code into <a 
 href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation
 Passing Style</a> and the use of tail calls (which LLVM also supports).</p>
 </div>
 </div>
 <!-- *********************************************************************** -->
 <hr>
 <address>
  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
  <a href="http://validator.w3.org/check/referer"><img
  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
  <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
  Last modified: $Date$
 </address>
 </body>
 </html>
--- a/docs/tutorial/LangImpl8.rst
+++ b/docs/tutorial/LangImpl8.rst
@ -0,0 +1,269 @@
 ======================================================
 Kaleidoscope: Conclusion and other useful LLVM tidbits
 ======================================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_
 Tutorial Conclusion
 ===================
 Welcome to the final chapter of the "`Implementing a language with
 LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
 grown our little Kaleidoscope language from being a useless toy, to
 being a semi-interesting (but probably still useless) toy. :)
 It is interesting to see how far we've come, and how little code it has
 taken. We built the entire lexer, parser, AST, code generator, and an
 interactive run-loop (with a JIT!) by-hand in under 700 lines of
 (non-comment/non-blank) code.
 Our little language supports a couple of interesting features: it
 supports user defined binary and unary operators, it uses JIT
 compilation for immediate evaluation, and it supports a few control flow
 constructs with SSA construction.
 Part of the idea of this tutorial was to show you how easy and fun it
 can be to define, build, and play with languages. Building a compiler
 need not be a scary or mystical process! Now that you've seen some of
 the basics, I strongly encourage you to take the code and hack on it.
 For example, try adding:
 -  **global variables** - While global variables have questional value
   in modern software engineering, they are often useful when putting
   together quick little hacks like the Kaleidoscope compiler itself.
   Fortunately, our current setup makes it very easy to add global
   variables: just have value lookup check to see if an unresolved
   variable is in the global variable symbol table before rejecting it.
   To create a new global variable, make an instance of the LLVM
   ``GlobalVariable`` class.
 -  **typed variables** - Kaleidoscope currently only supports variables
   of type double. This gives the language a very nice elegance, because
   only supporting one type means that you never have to specify types.
   Different languages have different ways of handling this. The easiest
   way is to require the user to specify types for every variable
   definition, and record the type of the variable in the symbol table
   along with its Value\*.
 -  **arrays, structs, vectors, etc** - Once you add types, you can start
   extending the type system in all sorts of interesting ways. Simple
   arrays are very easy and are quite useful for many different
   applications. Adding them is mostly an exercise in learning how the
   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
   works: it is so nifty/unconventional, it `has its own
   FAQ <../GetElementPtr.html>`_! If you add support for recursive types
   (e.g. linked lists), make sure to read the `section in the LLVM
   Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
   describes how to construct them.
 -  **standard runtime** - Our current language allows the user to access
   arbitrary external functions, and we use it for things like "printd"
   and "putchard". As you extend the language to add higher-level
   constructs, often these constructs make the most sense if they are
   lowered to calls into a language-supplied runtime. For example, if
   you add hash tables to the language, it would probably make sense to
   add the routines to a runtime, instead of inlining them all the way.
 -  **memory management** - Currently we can only access the stack in
   Kaleidoscope. It would also be useful to be able to allocate heap
   memory, either with calls to the standard libc malloc/free interface
   or with a garbage collector. If you would like to use garbage
   collection, note that LLVM fully supports `Accurate Garbage
   Collection <../GarbageCollection.html>`_ including algorithms that
   move objects and need to scan/update the stack.
 -  **debugger support** - LLVM supports generation of `DWARF Debug
   info <../SourceLevelDebugging.html>`_ which is understood by common
   debuggers like GDB. Adding support for debug info is fairly
   straightforward. The best way to understand it is to compile some
   C/C++ code with "``llvm-gcc -g -O0``" and taking a look at what it
   produces.
 -  **exception handling support** - LLVM supports generation of `zero
   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
   code compiled in other languages. You could also generate code by
   implicitly making every function return an error value and checking
   it. You could also make explicit use of setjmp/longjmp. There are
   many different ways to go here.
 -  **object orientation, generics, database access, complex numbers,
   geometric programming, ...** - Really, there is no end of crazy
   features that you can add to the language.
 -  **unusual domains** - We've been talking about applying LLVM to a
   domain that many people are interested in: building a compiler for a
   specific language. However, there are many other domains that can use
   compiler technology that are not typically considered. For example,
   LLVM has been used to implement OpenGL graphics acceleration,
   translate C++ code to ActionScript, and many other cute and clever
   things. Maybe you will be the first to JIT compile a regular
   expression interpreter into native code with LLVM?
 Have fun - try doing something crazy and unusual. Building a language
 like everyone else always has, is much less fun than trying something a
 little crazy or off the wall and seeing how it turns out. If you get
 stuck or want to talk about it, feel free to email the `llvmdev mailing
 list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
 of people who are interested in languages and are often willing to help
 out.
 Before we end this tutorial, I want to talk about some "tips and tricks"
 for generating LLVM IR. These are some of the more subtle things that
 may not be obvious, but are very useful if you want to take advantage of
 LLVM's capabilities.
 Properties of the LLVM IR
 =========================
 We have a couple common questions about code in the LLVM IR form - lets
 just get these out of the way right now, shall we?
 Target Independence
 -------------------
 Kaleidoscope is an example of a "portable language": any program written
 in Kaleidoscope will work the same way on any target that it runs on.
 Many other languages have this property, e.g. lisp, java, haskell,
 javascript, python, etc (note that while these languages are portable,
 not all their libraries are).
 One nice aspect of LLVM is that it is often capable of preserving target
 independence in the IR: you can take the LLVM IR for a
 Kaleidoscope-compiled program and run it on any target that LLVM
 supports, even emitting C code and compiling that on targets that LLVM
 doesn't support natively. You can trivially tell that the Kaleidoscope
 compiler generates target-independent code because it never queries for
 any target-specific information when generating code.
 The fact that LLVM provides a compact, target-independent,
 representation for code gets a lot of people excited. Unfortunately,
 these people are usually thinking about C or a language from the C
 family when they are asking questions about language portability. I say
 "unfortunately", because there is really no way to make (fully general)
 C code portable, other than shipping the source code around (and of
 course, C source code is not actually portable in general either - ever
 port a really old application from 32- to 64-bits?).
 The problem with C (again, in its full generality) is that it is heavily
 laden with target specific assumptions. As one simple example, the
 preprocessor often destructively removes target-independence from the
 code when it processes the input text:
 .. code-block:: c
    #ifdef __i386__
      int X = 1;
    #else
      int X = 42;
    #endif
 While it is possible to engineer more and more complex solutions to
 problems like this, it cannot be solved in full generality in a way that
 is better than shipping the actual source code.
 That said, there are interesting subsets of C that can be made portable.
 If you are willing to fix primitive types to a fixed size (say int =
 32-bits, and long = 64-bits), don't care about ABI compatibility with
 existing binaries, and are willing to give up some other minor features,
 you can have portable code. This can make sense for specialized domains
 such as an in-kernel language.
 Safety Guarantees
 -----------------
 Many of the languages above are also "safe" languages: it is impossible
 for a program written in Java to corrupt its address space and crash the
 process (assuming the JVM has no bugs). Safety is an interesting
 property that requires a combination of language design, runtime
 support, and often operating system support.
 It is certainly possible to implement a safe language in LLVM, but LLVM
 IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
 casts, use after free bugs, buffer over-runs, and a variety of other
 problems. Safety needs to be implemented as a layer on top of LLVM and,
 conveniently, several groups have investigated this. Ask on the `llvmdev
 mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
 you are interested in more details.
 Language-Specific Optimizations
 -------------------------------
 One thing about LLVM that turns off many people is that it does not
 solve all the world's problems in one system (sorry 'world hunger',
 someone else will have to solve you some other day). One specific
 complaint is that people perceive LLVM as being incapable of performing
 high-level language-specific optimization: LLVM "loses too much
 information".
 Unfortunately, this is really not the place to give you a full and
 unified version of "Chris Lattner's theory of compiler design". Instead,
 I'll make a few observations:
 First, you're right that LLVM does lose information. For example, as of
 this writing, there is no way to distinguish in the LLVM IR whether an
 SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
 than debug info). Both get compiled down to an 'i32' value and the
 information about what it came from is lost. The more general issue
 here, is that the LLVM type system uses "structural equivalence" instead
 of "name equivalence". Another place this surprises people is if you
 have two types in a high-level language that have the same structure
 (e.g. two different structs that have a single int field): these types
 will compile down into a single LLVM type and it will be impossible to
 tell what it came from.
 Second, while LLVM does lose information, LLVM is not a fixed target: we
 continue to enhance and improve it in many different ways. In addition
 to adding new features (LLVM did not always support exceptions or debug
 info), we also extend the IR to capture important information for
 optimization (e.g. whether an argument is sign or zero extended,
 information about pointers aliasing, etc). Many of the enhancements are
 user-driven: people want LLVM to include some specific feature, so they
 go ahead and extend it.
 Third, it is *possible and easy* to add language-specific optimizations,
 and you have a number of choices in how to do it. As one trivial
 example, it is easy to add language-specific optimization passes that
 "know" things about code compiled for a language. In the case of the C
 family, there is an optimization pass that "knows" about the standard C
 library functions. If you call "exit(0)" in main(), it knows that it is
 safe to optimize that into "return 0;" because C specifies what the
 'exit' function does.
 In addition to simple library knowledge, it is possible to embed a
 variety of other language-specific information into the LLVM IR. If you
 have a specific need and run into a wall, please bring the topic up on
 the llvmdev list. At the very worst, you can always treat LLVM as if it
 were a "dumb code generator" and implement the high-level optimizations
 you desire in your front-end, on the language-specific AST.
 Tips and Tricks
 ===============
 There is a variety of useful tips and tricks that you come to know after
 working on/with LLVM that aren't obvious at first glance. Instead of
 letting everyone rediscover them, this section talks about some of these
 issues.
 Implementing portable offsetof/sizeof
 -------------------------------------
 One interesting thing that comes up, if you are trying to keep the code
 generated by your compiler "target independent", is that you often need
 to know the size of some LLVM type or the offset of some field in an
 llvm structure. For example, you might need to pass the size of a type
 into a function that allocates memory.
 Unfortunately, this can vary widely across targets: for example the
 width of a pointer is trivially target-specific. However, there is a
 `clever way to use the getelementptr
 instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
 that allows you to compute this in a portable way.
 Garbage Collected Stack Frames
 ------------------------------
 Some languages want to explicitly manage their stack frames, often so
 that they are garbage collected or to allow easy implementation of
 closures. There are often better ways to implement these features than
 explicit stack frames, but `LLVM does support
 them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
 if you want. It requires your front-end to convert the code into
 `Continuation Passing
 Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
 the use of tail calls (which LLVM also supports).
--- a/docs/tutorial/OCamlLangImpl1.html
+++ b/docs/tutorial/OCamlLangImpl1.html
@ -1,365 +0,0 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
  <title>Kaleidoscope: Tutorial Introduction and the Lexer</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="author" content="Chris Lattner">
  <meta name="author" content="Erick Tryzelaar">
  <link rel="stylesheet" href="../_static/llvm.css" type="text/css">
 </head>
 <body>
 <h1>Kaleidoscope: Tutorial Introduction and the Lexer</h1>
 <ul>
 <li><a href="index.html">Up to Tutorial Index</a></li>
 <li>Chapter 1
  <ol>
    <li><a href="#intro">Tutorial Introduction</a></li>
    <li><a href="#language">The Basic Language</a></li>
    <li><a href="#lexer">The Lexer</a></li>
  </ol>
 </li>
 <li><a href="OCamlLangImpl2.html">Chapter 2</a>: Implementing a Parser and
 AST</li>
 </ul>
 <div class="doc_author">
 	<p>
 		Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
 		and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
 	</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="intro">Tutorial Introduction</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>Welcome to the "Implementing a language with LLVM" tutorial.  This tutorial
 runs through the implementation of a simple language, showing how fun and
 easy it can be.  This tutorial will get you up and started as well as help to
 build a framework you can extend to other languages.  The code in this tutorial
 can also be used as a playground to hack on other LLVM specific things.
 </p>
 <p>
 The goal of this tutorial is to progressively unveil our language, describing
 how it is built up over time.  This will let us cover a fairly broad range of
 language design and LLVM-specific usage issues, showing and explaining the code
 for it all along the way, without overwhelming you with tons of details up
 front.</p>
 <p>It is useful to point out ahead of time that this tutorial is really about
 teaching compiler techniques and LLVM specifically, <em>not</em> about teaching
 modern and sane software engineering principles.  In practice, this means that
 we'll take a number of shortcuts to simplify the exposition.  For example, the
 code leaks memory, uses global variables all over the place, doesn't use nice
 design patterns like <a
 href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it
 is very simple.  If you dig in and use the code as a basis for future projects,
 fixing these deficiencies shouldn't be hard.</p>
 <p>I've tried to put this tutorial together in a way that makes chapters easy to
 skip over if you are already familiar with or are uninterested in the various
 pieces.  The structure of the tutorial is:
 </p>
 <ul>
 <li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope
 language, and the definition of its Lexer</b> - This shows where we are going
 and the basic functionality that we want it to do.  In order to make this
 tutorial maximally understandable and hackable, we choose to implement
 everything in Objective Caml instead of using lexer and parser generators.
 LLVM obviously works just fine with such tools, feel free to use one if you
 prefer.</li>
 <li><b><a href="OCamlLangImpl2.html">Chapter #2</a>: Implementing a Parser and
 AST</b> - With the lexer in place, we can talk about parsing techniques and
 basic AST construction.  This tutorial describes recursive descent parsing and
 operator precedence parsing.  Nothing in Chapters 1 or 2 is LLVM-specific,
 the code doesn't even link in LLVM at this point. :)</li>
 <li><b><a href="OCamlLangImpl3.html">Chapter #3</a>: Code generation to LLVM
 IR</b> - With the AST ready, we can show off how easy generation of LLVM IR
 really is.</li>
 <li><b><a href="OCamlLangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer
 Support</b> - Because a lot of people are interested in using LLVM as a JIT,
 we'll dive right into it and show you the 3 lines it takes to add JIT support.
 LLVM is also useful in many other ways, but this is one simple and "sexy" way
 to shows off its power. :)</li>
 <li><b><a href="OCamlLangImpl5.html">Chapter #5</a>: Extending the Language:
 Control Flow</b> - With the language up and running, we show how to extend it
 with control flow operations (if/then/else and a 'for' loop).  This gives us a
 chance to talk about simple SSA construction and control flow.</li>
 <li><b><a href="OCamlLangImpl6.html">Chapter #6</a>: Extending the Language:
 User-defined Operators</b> - This is a silly but fun chapter that talks about
 extending the language to let the user program define their own arbitrary
 unary and binary operators (with assignable precedence!).  This lets us build a
 significant piece of the "language" as library routines.</li>
 <li><b><a href="OCamlLangImpl7.html">Chapter #7</a>: Extending the Language:
 Mutable Variables</b> - This chapter talks about adding user-defined local
 variables along with an assignment operator.  The interesting part about this
 is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does
 <em>not</em> require your front-end to construct SSA form!</li>
 <li><b><a href="OCamlLangImpl8.html">Chapter #8</a>: Conclusion and other
 useful LLVM tidbits</b> - This chapter wraps up the series by talking about
 potential ways to extend the language, but also includes a bunch of pointers to
 info about "special topics" like adding garbage collection support, exceptions,
 debugging, support for "spaghetti stacks", and a bunch of other tips and
 tricks.</li>
 </ul>
 <p>By the end of the tutorial, we'll have written a bit less than 700 lines of
 non-comment, non-blank, lines of code.  With this small amount of code, we'll
 have built up a very reasonable compiler for a non-trivial language including
 a hand-written lexer, parser, AST, as well as code generation support with a JIT
 compiler.  While other systems may have interesting "hello world" tutorials,
 I think the breadth of this tutorial is a great testament to the strengths of
 LLVM and why you should consider it if you're interested in language or compiler
 design.</p>
 <p>A note about this tutorial: we expect you to extend the language and play
 with it on your own.  Take the code and go crazy hacking away at it, compilers
 don't need to be scary creatures - it can be a lot of fun to play with
 languages!</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="language">The Basic Language</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>This tutorial will be illustrated with a toy language that we'll call
 "<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived
 from "meaning beautiful, form, and view").
 Kaleidoscope is a procedural language that allows you to define functions, use
 conditionals, math, etc.  Over the course of the tutorial, we'll extend
 Kaleidoscope to support the if/then/else construct, a for loop, user defined
 operators, JIT compilation with a simple command line interface, etc.</p>
 <p>Because we want to keep things simple, the only datatype in Kaleidoscope is a
 64-bit floating point type (aka 'float' in O'Caml parlance).  As such, all
 values are implicitly double precision and the language doesn't require type
 declarations.  This gives the language a very nice and simple syntax.  For
 example, the following simple example computes <a
 href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p>
 <div class="doc_code">
 <pre>
 # Compute the x'th fibonacci number.
 def fib(x)
  if x &lt; 3 then
    1
  else
    fib(x-1)+fib(x-2)
 # This expression will compute the 40th number.
 fib(40)
 </pre>
 </div>
 <p>We also allow Kaleidoscope to call into standard library functions (the LLVM
 JIT makes this completely trivial).  This means that you can use the 'extern'
 keyword to define a function before you use it (this is also useful for mutually
 recursive functions).  For example:</p>
 <div class="doc_code">
 <pre>
 extern sin(arg);
 extern cos(arg);
 extern atan2(arg1 arg2);
 atan2(sin(.4), cos(42))
 </pre>
 </div>
 <p>A more interesting example is included in Chapter 6 where we write a little
 Kaleidoscope application that <a href="OCamlLangImpl6.html#example">displays
 a Mandelbrot Set</a> at various levels of magnification.</p>
 <p>Lets dive into the implementation of this language!</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="lexer">The Lexer</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>When it comes to implementing a language, the first thing needed is
 the ability to process a text file and recognize what it says.  The traditional
 way to do this is to use a "<a
 href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
 to break the input up into "tokens".  Each token returned by the lexer includes
 a token code and potentially some metadata (e.g. the numeric value of a number).
 First, we define the possibilities:
 </p>
 <div class="doc_code">
 <pre>
 (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
 * these others for known things. *)
 type token =
  (* commands *)
  | Def | Extern
  (* primary *)
  | Ident of string | Number of float
  (* unknown *)
  | Kwd of char
 </pre>
 </div>
 <p>Each token returned by our lexer will be one of the token variant values.
 An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>.  If
 the curr token is an identifier, the value will be <tt>Token.Ident s</tt>.  If
 the current token is a numeric literal (like 1.0), the value will be
 <tt>Token.Number 1.0</tt>.
 </p>
 <p>The actual implementation of the lexer is a collection of functions driven
 by a function named <tt>Lexer.lex</tt>.  The <tt>Lexer.lex</tt> function is
 called to return the next token from standard input.  We will use
 <a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a>
 to simplify the tokenization of the standard input.  Its definition starts
 as:</p>
 <div class="doc_code">
 <pre>
 (*===----------------------------------------------------------------------===
 * Lexer
 *===----------------------------------------------------------------------===*)
 let rec lex = parser
  (* Skip any whitespace. *)
  | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
 </pre>
 </div>
 <p>
 <tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
 characters one at a time from the standard input.  It eats them as it recognizes
 them and stores them in in a <tt>Token.token</tt> variant.  The first thing that
 it has to do is ignore whitespace between tokens.  This is accomplished with the
 recursive call above.</p>
 <p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and
 specific keywords like "def".  Kaleidoscope does this with a pattern match
 and a helper function.<p>
 <div class="doc_code">
 <pre>
  (* identifier: [a-zA-Z][a-zA-Z0-9] *)
  | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
      let buffer = Buffer.create 1 in
      Buffer.add_char buffer c;
      lex_ident buffer stream
 ...
 and lex_ident buffer = parser
  | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
      Buffer.add_char buffer c;
      lex_ident buffer stream
  | [&lt; stream=lex &gt;] -&gt;
      match Buffer.contents buffer with
      | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
      | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
      | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
 </pre>
 </div>
 <p>Numeric values are similar:</p>
 <div class="doc_code">
 <pre>
  (* number: [0-9.]+ *)
  | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
      let buffer = Buffer.create 1 in
      Buffer.add_char buffer c;
      lex_number buffer stream
 ...
 and lex_number buffer = parser
  | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
      Buffer.add_char buffer c;
      lex_number buffer stream
  | [&lt; stream=lex &gt;] -&gt;
      [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
 </pre>
 </div>
 <p>This is all pretty straight-forward code for processing input.  When reading
 a numeric value from input, we use the ocaml <tt>float_of_string</tt> function
 to convert it to a numeric value that we store in <tt>Token.Number</tt>.  Note
 that this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
 if the string "1.23.45.67".  Feel free to extend it :).  Next we handle
 comments:
 </p>
 <div class="doc_code">
 <pre>
  (* Comment until end of line. *)
  | [&lt; ' ('#'); stream &gt;] -&gt;
      lex_comment stream
 ...
 and lex_comment = parser
  | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
  | [&lt; 'c; e=lex_comment &gt;] -&gt; e
  | [&lt; &gt;] -&gt; [&lt; &gt;]
 </pre>
 </div>
 <p>We handle comments by skipping to the end of the line and then return the
 next token.  Finally, if the input doesn't match one of the above cases, it is
 either an operator character like '+' or the end of the file.  These are handled
 with this code:</p>
 <div class="doc_code">
 <pre>
  (* Otherwise, just return the character as its ascii value. *)
  | [&lt; 'c; stream &gt;] -&gt;
      [&lt; 'Token.Kwd c; lex stream &gt;]
  (* end of stream. *)
  | [&lt; &gt;] -&gt; [&lt; &gt;]
 </pre>
 </div>
 <p>With this, we have the complete lexer for the basic Kaleidoscope language
 (the <a href="OCamlLangImpl2.html#code">full code listing</a> for the Lexer is
 available in the <a href="OCamlLangImpl2.html">next chapter</a> of the
 tutorial).  Next we'll <a href="OCamlLangImpl2.html">build a simple parser that
 uses this to build an Abstract Syntax Tree</a>.  When we have that, we'll
 include a driver so that you can use the lexer and parser together.
 </p>
 <a href="OCamlLangImpl2.html">Next: Implementing a Parser and AST</a>
 </div>
 <!-- *********************************************************************** -->
 <hr>
 <address>
  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
  <a href="http://validator.w3.org/check/referer"><img
  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
  <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
  <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
  Last modified: $Date$
 </address>
 </body>
 </html>
--- a/docs/tutorial/OCamlLangImpl1.rst
+++ b/docs/tutorial/OCamlLangImpl1.rst
@ -0,0 +1,288 @@
 =================================================
 Kaleidoscope: Tutorial Introduction and the Lexer
 =================================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick
 Tryzelaar <mailto:idadesub@users.sourceforge.net>`_
 Tutorial Introduction
 =====================
 Welcome to the "Implementing a language with LLVM" tutorial. This
 tutorial runs through the implementation of a simple language, showing
 how fun and easy it can be. This tutorial will get you up and started as
 well as help to build a framework you can extend to other languages. The
 code in this tutorial can also be used as a playground to hack on other
 LLVM specific things.
 The goal of this tutorial is to progressively unveil our language,
 describing how it is built up over time. This will let us cover a fairly
 broad range of language design and LLVM-specific usage issues, showing
 and explaining the code for it all along the way, without overwhelming
 you with tons of details up front.
 It is useful to point out ahead of time that this tutorial is really
 about teaching compiler techniques and LLVM specifically, *not* about
 teaching modern and sane software engineering principles. In practice,
 this means that we'll take a number of shortcuts to simplify the
 exposition. For example, the code leaks memory, uses global variables
 all over the place, doesn't use nice design patterns like
 `visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
 it is very simple. If you dig in and use the code as a basis for future
 projects, fixing these deficiencies shouldn't be hard.
 I've tried to put this tutorial together in a way that makes chapters
 easy to skip over if you are already familiar with or are uninterested
 in the various pieces. The structure of the tutorial is:
 -  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope
   language, and the definition of its Lexer - This shows where we are
   going and the basic functionality that we want it to do. In order to
   make this tutorial maximally understandable and hackable, we choose
   to implement everything in Objective Caml instead of using lexer and
   parser generators. LLVM obviously works just fine with such tools,
   feel free to use one if you prefer.
 -  `Chapter #2 <OCamlLangImpl2.html>`_: Implementing a Parser and
   AST - With the lexer in place, we can talk about parsing techniques
   and basic AST construction. This tutorial describes recursive descent
   parsing and operator precedence parsing. Nothing in Chapters 1 or 2
   is LLVM-specific, the code doesn't even link in LLVM at this point.
   :)
 -  `Chapter #3 <OCamlLangImpl3.html>`_: Code generation to LLVM IR -
   With the AST ready, we can show off how easy generation of LLVM IR
   really is.
 -  `Chapter #4 <OCamlLangImpl4.html>`_: Adding JIT and Optimizer
   Support - Because a lot of people are interested in using LLVM as a
   JIT, we'll dive right into it and show you the 3 lines it takes to
   add JIT support. LLVM is also useful in many other ways, but this is
   one simple and "sexy" way to shows off its power. :)
 -  `Chapter #5 <OCamlLangImpl5.html>`_: Extending the Language:
   Control Flow - With the language up and running, we show how to
   extend it with control flow operations (if/then/else and a 'for'
   loop). This gives us a chance to talk about simple SSA construction
   and control flow.
 -  `Chapter #6 <OCamlLangImpl6.html>`_: Extending the Language:
   User-defined Operators - This is a silly but fun chapter that talks
   about extending the language to let the user program define their own
   arbitrary unary and binary operators (with assignable precedence!).
   This lets us build a significant piece of the "language" as library
   routines.
 -  `Chapter #7 <OCamlLangImpl7.html>`_: Extending the Language:
   Mutable Variables - This chapter talks about adding user-defined
   local variables along with an assignment operator. The interesting
   part about this is how easy and trivial it is to construct SSA form
   in LLVM: no, LLVM does *not* require your front-end to construct SSA
   form!
 -  `Chapter #8 <OCamlLangImpl8.html>`_: Conclusion and other useful
   LLVM tidbits - This chapter wraps up the series by talking about
   potential ways to extend the language, but also includes a bunch of
   pointers to info about "special topics" like adding garbage
   collection support, exceptions, debugging, support for "spaghetti
   stacks", and a bunch of other tips and tricks.
 By the end of the tutorial, we'll have written a bit less than 700 lines
 of non-comment, non-blank, lines of code. With this small amount of
 code, we'll have built up a very reasonable compiler for a non-trivial
 language including a hand-written lexer, parser, AST, as well as code
 generation support with a JIT compiler. While other systems may have
 interesting "hello world" tutorials, I think the breadth of this
 tutorial is a great testament to the strengths of LLVM and why you
 should consider it if you're interested in language or compiler design.
 A note about this tutorial: we expect you to extend the language and
 play with it on your own. Take the code and go crazy hacking away at it,
 compilers don't need to be scary creatures - it can be a lot of fun to
 play with languages!
 The Basic Language
 ==================
 This tutorial will be illustrated with a toy language that we'll call
 "`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
 from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
 language that allows you to define functions, use conditionals, math,
 etc. Over the course of the tutorial, we'll extend Kaleidoscope to
 support the if/then/else construct, a for loop, user defined operators,
 JIT compilation with a simple command line interface, etc.
 Because we want to keep things simple, the only datatype in Kaleidoscope
 is a 64-bit floating point type (aka 'float' in O'Caml parlance). As
 such, all values are implicitly double precision and the language
 doesn't require type declarations. This gives the language a very nice
 and simple syntax. For example, the following simple example computes
 `Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_
 ::
    # Compute the x'th fibonacci number.
    def fib(x)
      if x < 3 then
        1
      else
        fib(x-1)+fib(x-2)
    # This expression will compute the 40th number.
    fib(40)
 We also allow Kaleidoscope to call into standard library functions (the
 LLVM JIT makes this completely trivial). This means that you can use the
 'extern' keyword to define a function before you use it (this is also
 useful for mutually recursive functions). For example:
 ::
    extern sin(arg);
    extern cos(arg);
    extern atan2(arg1 arg2);
    atan2(sin(.4), cos(42))
 A more interesting example is included in Chapter 6 where we write a
 little Kaleidoscope application that `displays a Mandelbrot
 Set <OCamlLangImpl6.html#example>`_ at various levels of magnification.
 Lets dive into the implementation of this language!
 The Lexer
 =========
 When it comes to implementing a language, the first thing needed is the
 ability to process a text file and recognize what it says. The
 traditional way to do this is to use a
 "`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka
 'scanner') to break the input up into "tokens". Each token returned by
 the lexer includes a token code and potentially some metadata (e.g. the
 numeric value of a number). First, we define the possibilities:
 .. code-block:: ocaml
    (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
     * these others for known things. *)
    type token =
      (* commands *)
      | Def | Extern
      (* primary *)
      | Ident of string | Number of float
      (* unknown *)
      | Kwd of char
 Each token returned by our lexer will be one of the token variant
 values. An unknown character like '+' will be returned as
 ``Token.Kwd '+'``. If the curr token is an identifier, the value will be
 ``Token.Ident s``. If the current token is a numeric literal (like 1.0),
 the value will be ``Token.Number 1.0``.
 The actual implementation of the lexer is a collection of functions
 driven by a function named ``Lexer.lex``. The ``Lexer.lex`` function is
 called to return the next token from standard input. We will use
 `Camlp4 <http://caml.inria.fr/pub/docs/manual-camlp4/index.html>`_ to
 simplify the tokenization of the standard input. Its definition starts
 as:
 .. code-block:: ocaml
    (*===----------------------------------------------------------------------===
     * Lexer
     *===----------------------------------------------------------------------===*)
    let rec lex = parser
      (* Skip any whitespace. *)
      | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
 ``Lexer.lex`` works by recursing over a ``char Stream.t`` to read
 characters one at a time from the standard input. It eats them as it
 recognizes them and stores them in in a ``Token.token`` variant. The
 first thing that it has to do is ignore whitespace between tokens. This
 is accomplished with the recursive call above.
 The next thing ``Lexer.lex`` needs to do is recognize identifiers and
 specific keywords like "def". Kaleidoscope does this with a pattern
 match and a helper function.
 .. code-block:: ocaml
      (* identifier: [a-zA-Z][a-zA-Z0-9] *)
      | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
          let buffer = Buffer.create 1 in
          Buffer.add_char buffer c;
          lex_ident buffer stream
    ...
    and lex_ident buffer = parser
      | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
          Buffer.add_char buffer c;
          lex_ident buffer stream
      | [< stream=lex >] ->
          match Buffer.contents buffer with
          | "def" -> [< 'Token.Def; stream >]
          | "extern" -> [< 'Token.Extern; stream >]
          | id -> [< 'Token.Ident id; stream >]
 Numeric values are similar:
 .. code-block:: ocaml
      (* number: [0-9.]+ *)
      | [< ' ('0' .. '9' as c); stream >] ->
          let buffer = Buffer.create 1 in
          Buffer.add_char buffer c;
          lex_number buffer stream
    ...
    and lex_number buffer = parser
      | [< ' ('0' .. '9' | '.' as c); stream >] ->
          Buffer.add_char buffer c;
          lex_number buffer stream
      | [< stream=lex >] ->
          [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
 This is all pretty straight-forward code for processing input. When
 reading a numeric value from input, we use the ocaml ``float_of_string``
 function to convert it to a numeric value that we store in
 ``Token.Number``. Note that this isn't doing sufficient error checking:
 it will raise ``Failure`` if the string "1.23.45.67". Feel free to
 extend it :). Next we handle comments:
 .. code-block:: ocaml
      (* Comment until end of line. *)
      | [< ' ('#'); stream >] ->
          lex_comment stream
    ...
    and lex_comment = parser
      | [< ' ('\n'); stream=lex >] -> stream
      | [< 'c; e=lex_comment >] -> e
      | [< >] -> [< >]
 We handle comments by skipping to the end of the line and then return
 the next token. Finally, if the input doesn't match one of the above
 cases, it is either an operator character like '+' or the end of the
 file. These are handled with this code:
 .. code-block:: ocaml
      (* Otherwise, just return the character as its ascii value. *)
      | [< 'c; stream >] ->
          [< 'Token.Kwd c; lex stream >]
      (* end of stream. *)
      | [< >] -> [< >]
 With this, we have the complete lexer for the basic Kaleidoscope
 language (the `full code listing <OCamlLangImpl2.html#code>`_ for the
 Lexer is available in the `next chapter <OCamlLangImpl2.html>`_ of the
 tutorial). Next we'll `build a simple parser that uses this to build an
 Abstract Syntax Tree <OCamlLangImpl2.html>`_. When we have that, we'll
 include a driver so that you can use the lexer and parser together.
 `Next: Implementing a Parser and AST <OCamlLangImpl2.html>`_
--- a/docs/tutorial/OCamlLangImpl2.html
+++ b/docs/tutorial/OCamlLangImpl2.html
--- a/docs/tutorial/OCamlLangImpl2.rst
+++ b/docs/tutorial/OCamlLangImpl2.rst
@ -0,0 +1,899 @@
 ===========================================
 Kaleidoscope: Implementing a Parser and AST
 ===========================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick
 Tryzelaar <mailto:idadesub@users.sourceforge.net>`_
 Chapter 2 Introduction
 ======================
 Welcome to Chapter 2 of the "`Implementing a language with LLVM in
 Objective Caml <index.html>`_" tutorial. This chapter shows you how to
 use the lexer, built in `Chapter 1 <OCamlLangImpl1.html>`_, to build a
 full `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our
 Kaleidoscope language. Once we have a parser, we'll define and build an
 `Abstract Syntax
 Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
 The parser we will build uses a combination of `Recursive Descent
 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
 `Operator-Precedence
 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
 parse the Kaleidoscope language (the latter for binary expressions and
 the former for everything else). Before we get to parsing though, lets
 talk about the output of the parser: the Abstract Syntax Tree.
 The Abstract Syntax Tree (AST)
 ==============================
 The AST for a program captures its behavior in such a way that it is
 easy for later stages of the compiler (e.g. code generation) to
 interpret. We basically want one object for each construct in the
 language, and the AST should closely model the language. In
 Kaleidoscope, we have expressions, a prototype, and a function object.
 We'll start with expressions first:
 .. code-block:: ocaml
    (* expr - Base type for all expression nodes. *)
    type expr =
      (* variant for numeric literals like "1.0". *)
      | Number of float
 The code above shows the definition of the base ExprAST class and one
 subclass which we use for numeric literals. The important thing to note
 about this code is that the Number variant captures the numeric value of
 the literal as an instance variable. This allows later phases of the
 compiler to know what the stored numeric value is.
 Right now we only create the AST, so there are no useful functions on
 them. It would be very easy to add a function to pretty print the code,
 for example. Here are the other expression AST node definitions that
 we'll use in the basic form of the Kaleidoscope language:
 .. code-block:: ocaml
      (* variant for referencing a variable, like "a". *)
      | Variable of string
      (* variant for a binary operator. *)
      | Binary of char * expr * expr
      (* variant for function calls. *)
      | Call of string * expr array
 This is all (intentionally) rather straight-forward: variables capture
 the variable name, binary operators capture their opcode (e.g. '+'), and
 calls capture a function name as well as a list of any argument
 expressions. One thing that is nice about our AST is that it captures
 the language features without talking about the syntax of the language.
 Note that there is no discussion about precedence of binary operators,
 lexical structure, etc.
 For our basic language, these are all of the expression nodes we'll
 define. Because it doesn't have conditional control flow, it isn't
 Turing-complete; we'll fix that in a later installment. The two things
 we need next are a way to talk about the interface to a function, and a
 way to talk about functions themselves:
 .. code-block:: ocaml
    (* proto - This type represents the "prototype" for a function, which captures
     * its name, and its argument names (thus implicitly the number of arguments the
     * function takes). *)
    type proto = Prototype of string * string array
    (* func - This type represents a function definition itself. *)
    type func = Function of proto * expr
 In Kaleidoscope, functions are typed with just a count of their
 arguments. Since all values are double precision floating point, the
 type of each argument doesn't need to be stored anywhere. In a more
 aggressive and realistic language, the "expr" variants would probably
 have a type field.
 With this scaffolding, we can now talk about parsing expressions and
 function bodies in Kaleidoscope.
 Parser Basics
 =============
 Now that we have an AST to build, we need to define the parser code to
 build it. The idea here is that we want to parse something like "x+y"
 (which is returned as three tokens by the lexer) into an AST that could
 be generated with calls like this:
 .. code-block:: ocaml
      let x = Variable "x" in
      let y = Variable "y" in
      let result = Binary ('+', x, y) in
      ...
 The error handling routines make use of the builtin ``Stream.Failure``
 and ``Stream.Error``s. ``Stream.Failure`` is raised when the parser is
 unable to find any matching token in the first position of a pattern.
 ``Stream.Error`` is raised when the first token matches, but the rest do
 not. The error recovery in our parser will not be the best and is not
 particular user-friendly, but it will be enough for our tutorial. These
 exceptions make it easier to handle errors in routines that have various
 return types.
 With these basic types and exceptions, we can implement the first piece
 of our grammar: numeric literals.
 Basic Expression Parsing
 ========================
 We start with numeric literals, because they are the simplest to
 process. For each production in our grammar, we'll define a function
 which parses that production. We call this class of expressions
 "primary" expressions, for reasons that will become more clear `later in
 the tutorial <OCamlLangImpl6.html#unary>`_. In order to parse an
 arbitrary primary expression, we need to determine what sort of
 expression it is. For numeric literals, we have:
 .. code-block:: ocaml
    (* primary
     *   ::= identifier
     *   ::= numberexpr
     *   ::= parenexpr *)
    parse_primary = parser
      (* numberexpr ::= number *)
      | [< 'Token.Number n >] -> Ast.Number n
 This routine is very simple: it expects to be called when the current
 token is a ``Token.Number`` token. It takes the current number value,
 creates a ``Ast.Number`` node, advances the lexer to the next token, and
 finally returns.
 There are some interesting aspects to this. The most important one is
 that this routine eats all of the tokens that correspond to the
 production and returns the lexer buffer with the next token (which is
 not part of the grammar production) ready to go. This is a fairly
 standard way to go for recursive descent parsers. For a better example,
 the parenthesis operator is defined like this:
 .. code-block:: ocaml
      (* parenexpr ::= '(' expression ')' *)
      | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
 This function illustrates a number of interesting things about the
 parser:
 1) It shows how we use the ``Stream.Error`` exception. When called, this
 function expects that the current token is a '(' token, but after
 parsing the subexpression, it is possible that there is no ')' waiting.
 For example, if the user types in "(4 x" instead of "(4)", the parser
 should emit an error. Because errors can occur, the parser needs a way
 to indicate that they happened. In our parser, we use the camlp4
 shortcut syntax ``token ?? "parse error"``, where if the token before
 the ``??`` does not match, then ``Stream.Error "parse error"`` will be
 raised.
 2) Another interesting aspect of this function is that it uses recursion
 by calling ``Parser.parse_primary`` (we will soon see that
 ``Parser.parse_primary`` can call ``Parser.parse_primary``). This is
 powerful because it allows us to handle recursive grammars, and keeps
 each production very simple. Note that parentheses do not cause
 construction of AST nodes themselves. While we could do it this way, the
 most important role of parentheses are to guide the parser and provide
 grouping. Once the parser constructs the AST, parentheses are not
 needed.
 The next simple production is for handling variable references and
 function calls:
 .. code-block:: ocaml
      (* identifierexpr
       *   ::= identifier
       *   ::= identifier '(' argumentexpr ')' *)
      | [< 'Token.Ident id; stream >] ->
          let rec parse_args accumulator = parser
            | [< e=parse_expr; stream >] ->
                begin parser
                  | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
                  | [< >] -> e :: accumulator
                end stream
            | [< >] -> accumulator
          in
          let rec parse_ident id = parser
            (* Call. *)
            | [< 'Token.Kwd '(';
                 args=parse_args [];
                 'Token.Kwd ')' ?? "expected ')'">] ->
                Ast.Call (id, Array.of_list (List.rev args))
            (* Simple variable ref. *)
            | [< >] -> Ast.Variable id
          in
          parse_ident id stream
 This routine follows the same style as the other routines. (It expects
 to be called if the current token is a ``Token.Ident`` token). It also
 has recursion and error handling. One interesting aspect of this is that
 it uses *look-ahead* to determine if the current identifier is a stand
 alone variable reference or if it is a function call expression. It
 handles this by checking to see if the token after the identifier is a
 '(' token, constructing either a ``Ast.Variable`` or ``Ast.Call`` node
 as appropriate.
 We finish up by raising an exception if we received a token we didn't
 expect:
 .. code-block:: ocaml
      | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
 Now that basic expressions are handled, we need to handle binary
 expressions. They are a bit more complex.
 Binary Expression Parsing
 =========================
 Binary expressions are significantly harder to parse because they are
 often ambiguous. For example, when given the string "x+y\*z", the parser
 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
 definitions from mathematics, we expect the later parse, because "\*"
 (multiplication) has higher *precedence* than "+" (addition).
 There are many ways to handle this, but an elegant and efficient way is
 to use `Operator-Precedence
 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
 This parsing technique uses the precedence of binary operators to guide
 recursion. To start with, we need a table of precedences:
 .. code-block:: ocaml
    (* binop_precedence - This holds the precedence for each binary operator that is
     * defined *)
    let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
    (* precedence - Get the precedence of the pending binary operator token. *)
    let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
    ...
    let main () =
      (* Install standard binary operators.
       * 1 is the lowest precedence. *)
      Hashtbl.add Parser.binop_precedence '<' 10;
      Hashtbl.add Parser.binop_precedence '+' 20;
      Hashtbl.add Parser.binop_precedence '-' 20;
      Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
      ...
 For the basic form of Kaleidoscope, we will only support 4 binary
 operators (this can obviously be extended by you, our brave and intrepid
 reader). The ``Parser.precedence`` function returns the precedence for
 the current token, or -1 if the token is not a binary operator. Having a
 ``Hashtbl.t`` makes it easy to add new operators and makes it clear that
 the algorithm doesn't depend on the specific operators involved, but it
 would be easy enough to eliminate the ``Hashtbl.t`` and do the
 comparisons in the ``Parser.precedence`` function. (Or just use a
 fixed-size array).
 With the helper above defined, we can now start parsing binary
 expressions. The basic idea of operator precedence parsing is to break
 down an expression with potentially ambiguous binary operators into
 pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
 Operator precedence parsing considers this as a stream of primary
 expressions separated by binary operators. As such, it will first parse
 the leading primary expression "a", then it will see the pairs [+, b]
 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
 primary expressions, the binary expression parser doesn't need to worry
 about nested subexpressions like (c+d) at all.
 To start, an expression is a primary expression potentially followed by
 a sequence of [binop,primaryexpr] pairs:
 .. code-block:: ocaml
    (* expression
     *   ::= primary binoprhs *)
    and parse_expr = parser
      | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
 ``Parser.parse_bin_rhs`` is the function that parses the sequence of
 pairs for us. It takes a precedence and a pointer to an expression for
 the part that has been parsed so far. Note that "x" is a perfectly valid
 expression: As such, "binoprhs" is allowed to be empty, in which case it
 returns the expression that is passed into it. In our example above, the
 code passes the expression for "a" into ``Parser.parse_bin_rhs`` and the
 current token is "+".
 The precedence value passed into ``Parser.parse_bin_rhs`` indicates the
 *minimal operator precedence* that the function is allowed to eat. For
 example, if the current pair stream is [+, x] and
 ``Parser.parse_bin_rhs`` is passed in a precedence of 40, it will not
 consume any tokens (because the precedence of '+' is only 20). With this
 in mind, ``Parser.parse_bin_rhs`` starts with:
 .. code-block:: ocaml
    (* binoprhs
     *   ::= ('+' primary)* *)
    and parse_bin_rhs expr_prec lhs stream =
      match Stream.peek stream with
      (* If this is a binop, find its precedence. *)
      | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
          let token_prec = precedence c in
          (* If this is a binop that binds at least as tightly as the current binop,
           * consume it, otherwise we are done. *)
          if token_prec < expr_prec then lhs else begin
 This code gets the precedence of the current token and checks to see if
 if is too low. Because we defined invalid tokens to have a precedence of
 -1, this check implicitly knows that the pair-stream ends when the token
 stream runs out of binary operators. If this check succeeds, we know
 that the token is a binary operator and that it will be included in this
 expression:
 .. code-block:: ocaml
            (* Eat the binop. *)
            Stream.junk stream;
            (* Okay, we know this is a binop. *)
            let rhs =
              match Stream.peek stream with
              | Some (Token.Kwd c2) ->
 As such, this code eats (and remembers) the binary operator and then
 parses the primary expression that follows. This builds up the whole
 pair, the first of which is [+, b] for the running example.
 Now that we parsed the left-hand side of an expression and one pair of
 the RHS sequence, we have to decide which way the expression associates.
 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
 unparsed)". To determine this, we look ahead at "binop" to determine its
 precedence and compare it to BinOp's precedence (which is '+' in this
 case):
 .. code-block:: ocaml
                  (* If BinOp binds less tightly with rhs than the operator after
                   * rhs, let the pending operator take rhs as its lhs. *)
                  let next_prec = precedence c2 in
                  if token_prec < next_prec
 If the precedence of the binop to the right of "RHS" is lower or equal
 to the precedence of our current operator, then we know that the
 parentheses associate as "(a+b) binop ...". In our example, the current
 operator is "+" and the next operator is "+", we know that they have the
 same precedence. In this case we'll create the AST node for "a+b", and
 then continue parsing:
 .. code-block:: ocaml
              ... if body omitted ...
            in
            (* Merge lhs/rhs. *)
            let lhs = Ast.Binary (c, lhs, rhs) in
            parse_bin_rhs expr_prec lhs stream
          end
 In our example above, this will turn "a+b+" into "(a+b)" and execute the
 next iteration of the loop, with "+" as the current token. The code
 above will eat, remember, and parse "(c+d)" as the primary expression,
 which makes the current pair equal to [+, (c+d)]. It will then evaluate
 the 'if' conditional above with "\*" as the binop to the right of the
 primary. In this case, the precedence of "\*" is higher than the
 precedence of "+" so the if condition will be entered.
 The critical question left here is "how can the if condition parse the
 right hand side in full"? In particular, to build the AST correctly for
 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
 variable. The code to do this is surprisingly simple (code from the
 above two blocks duplicated for context):
 .. code-block:: ocaml
              match Stream.peek stream with
              | Some (Token.Kwd c2) ->
                  (* If BinOp binds less tightly with rhs than the operator after
                   * rhs, let the pending operator take rhs as its lhs. *)
                  if token_prec < precedence c2
                  then parse_bin_rhs (token_prec + 1) rhs stream
                  else rhs
              | _ -> rhs
            in
            (* Merge lhs/rhs. *)
            let lhs = Ast.Binary (c, lhs, rhs) in
            parse_bin_rhs expr_prec lhs stream
          end
 At this point, we know that the binary operator to the RHS of our
 primary has higher precedence than the binop we are currently parsing.
 As such, we know that any sequence of pairs whose operators are all
 higher precedence than "+" should be parsed together and returned as
 "RHS". To do this, we recursively invoke the ``Parser.parse_bin_rhs``
 function specifying "token\_prec+1" as the minimum precedence required
 for it to continue. In our example above, this will cause it to return
 the AST node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of
 the '+' expression.
 Finally, on the next iteration of the while loop, the "+g" piece is
 parsed and added to the AST. With this little bit of code (14
 non-trivial lines), we correctly handle fully general binary expression
 parsing in a very elegant way. This was a whirlwind tour of this code,
 and it is somewhat subtle. I recommend running through it with a few
 tough examples to see how it works.
 This wraps up handling of expressions. At this point, we can point the
 parser at an arbitrary token stream and build an expression from it,
 stopping at the first token that is not part of the expression. Next up
 we need to handle function definitions, etc.
 Parsing the Rest
 ================
 The next thing missing is handling of function prototypes. In
 Kaleidoscope, these are used both for 'extern' function declarations as
 well as function body definitions. The code to do this is
 straight-forward and not very interesting (once you've survived
 expressions):
 .. code-block:: ocaml
    (* prototype
     *   ::= id '(' id* ')' *)
    let parse_prototype =
      let rec parse_args accumulator = parser
        | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
        | [< >] -> accumulator
      in
      parser
      | [< 'Token.Ident id;
           'Token.Kwd '(' ?? "expected '(' in prototype";
           args=parse_args [];
           'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
          (* success. *)
          Ast.Prototype (id, Array.of_list (List.rev args))
      | [< >] ->
          raise (Stream.Error "expected function name in prototype")
 Given this, a function definition is very simple, just a prototype plus
 an expression to implement the body:
 .. code-block:: ocaml
    (* definition ::= 'def' prototype expression *)
    let parse_definition = parser
      | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
          Ast.Function (p, e)
 In addition, we support 'extern' to declare functions like 'sin' and
 'cos' as well as to support forward declaration of user functions. These
 'extern's are just prototypes with no body:
 .. code-block:: ocaml
    (*  external ::= 'extern' prototype *)
    let parse_extern = parser
      | [< 'Token.Extern; e=parse_prototype >] -> e
 Finally, we'll also let the user type in arbitrary top-level expressions
 and evaluate them on the fly. We will handle this by defining anonymous
 nullary (zero argument) functions for them:
 .. code-block:: ocaml
    (* toplevelexpr ::= expression *)
    let parse_toplevel = parser
      | [< e=parse_expr >] ->
          (* Make an anonymous proto. *)
          Ast.Function (Ast.Prototype ("", [||]), e)
 Now that we have all the pieces, let's build a little driver that will
 let us actually *execute* this code we've built!
 The Driver
 ==========
 The driver for this simply invokes all of the parsing pieces with a
 top-level dispatch loop. There isn't much interesting here, so I'll just
 include the top-level loop. See `below <#code>`_ for full code in the
 "Top-Level Parsing" section.
 .. code-block:: ocaml
    (* top ::= definition | external | expression | ';' *)
    let rec main_loop stream =
      match Stream.peek stream with
      | None -> ()
      (* ignore top-level semicolons. *)
      | Some (Token.Kwd ';') ->
          Stream.junk stream;
          main_loop stream
      | Some token ->
          begin
            try match token with
            | Token.Def ->
                ignore(Parser.parse_definition stream);
                print_endline "parsed a function definition.";
            | Token.Extern ->
                ignore(Parser.parse_extern stream);
                print_endline "parsed an extern.";
            | _ ->
                (* Evaluate a top-level expression into an anonymous function. *)
                ignore(Parser.parse_toplevel stream);
                print_endline "parsed a top-level expr";
            with Stream.Error s ->
              (* Skip token for error recovery. *)
              Stream.junk stream;
              print_endline s;
          end;
          print_string "ready> "; flush stdout;
          main_loop stream
 The most interesting part of this is that we ignore top-level
 semicolons. Why is this, you ask? The basic reason is that if you type
 "4 + 5" at the command line, the parser doesn't know whether that is the
 end of what you will type or not. For example, on the next line you
 could type "def foo..." in which case 4+5 is the end of a top-level
 expression. Alternatively you could type "\* 6", which would continue
 the expression. Having top-level semicolons allows you to type "4+5;",
 and the parser will know you are done.
 Conclusions
 ===========
 With just under 300 lines of commented code (240 lines of non-comment,
 non-blank code), we fully defined our minimal language, including a
 lexer, parser, and AST builder. With this done, the executable will
 validate Kaleidoscope code and tell us if it is grammatically invalid.
 For example, here is a sample interaction:
 .. code-block:: bash
    $ ./toy.byte
    ready> def foo(x y) x+foo(y, 4.0);
    Parsed a function definition.
    ready> def foo(x y) x+y y;
    Parsed a function definition.
    Parsed a top-level expr
    ready> def foo(x y) x+y );
    Parsed a function definition.
    Error: unknown token when expecting an expression
    ready> extern sin(a);
    ready> Parsed an extern
    ready> ^D
    $
 There is a lot of room for extension here. You can define new AST nodes,
 extend the language in many ways, etc. In the `next
 installment <OCamlLangImpl3.html>`_, we will describe how to generate
 LLVM Intermediate Representation (IR) from the AST.
 Full Code Listing
 =================
 Here is the complete code listing for this and the previous chapter.
 Note that it is fully self-contained: you don't need LLVM or any
 external libraries at all for this. (Besides the ocaml standard
 libraries, of course.) To build this, just compile with:
 .. code-block:: bash
    # Compile
    ocamlbuild toy.byte
    # Run
    ./toy.byte
 Here is the code:
 \_tags:
    ::
        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
 token.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer Tokens
         *===----------------------------------------------------------------------===*)
        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
         * these others for known things. *)
        type token =
          (* commands *)
          | Def | Extern
          (* primary *)
          | Ident of string | Number of float
          (* unknown *)
          | Kwd of char
 lexer.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer
         *===----------------------------------------------------------------------===*)
        let rec lex = parser
          (* Skip any whitespace. *)
          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_ident buffer stream
          (* number: [0-9.]+ *)
          | [< ' ('0' .. '9' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_number buffer stream
          (* Comment until end of line. *)
          | [< ' ('#'); stream >] ->
              lex_comment stream
          (* Otherwise, just return the character as its ascii value. *)
          | [< 'c; stream >] ->
              [< 'Token.Kwd c; lex stream >]
          (* end of stream. *)
          | [< >] -> [< >]
        and lex_number buffer = parser
          | [< ' ('0' .. '9' | '.' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_number buffer stream
          | [< stream=lex >] ->
              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
        and lex_ident buffer = parser
          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_ident buffer stream
          | [< stream=lex >] ->
              match Buffer.contents buffer with
              | "def" -> [< 'Token.Def; stream >]
              | "extern" -> [< 'Token.Extern; stream >]
              | id -> [< 'Token.Ident id; stream >]
        and lex_comment = parser
          | [< ' ('\n'); stream=lex >] -> stream
          | [< 'c; e=lex_comment >] -> e
          | [< >] -> [< >]
 ast.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Abstract Syntax Tree (aka Parse Tree)
         *===----------------------------------------------------------------------===*)
        (* expr - Base type for all expression nodes. *)
        type expr =
          (* variant for numeric literals like "1.0". *)
          | Number of float
          (* variant for referencing a variable, like "a". *)
          | Variable of string
          (* variant for a binary operator. *)
          | Binary of char * expr * expr
          (* variant for function calls. *)
          | Call of string * expr array
        (* proto - This type represents the "prototype" for a function, which captures
         * its name, and its argument names (thus implicitly the number of arguments the
         * function takes). *)
        type proto = Prototype of string * string array
        (* func - This type represents a function definition itself. *)
        type func = Function of proto * expr
 parser.ml:
    .. code-block:: ocaml
        (*===---------------------------------------------------------------------===
         * Parser
         *===---------------------------------------------------------------------===*)
        (* binop_precedence - This holds the precedence for each binary operator that is
         * defined *)
        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
        (* precedence - Get the precedence of the pending binary operator token. *)
        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
        (* primary
         *   ::= identifier
         *   ::= numberexpr
         *   ::= parenexpr *)
        let rec parse_primary = parser
          (* numberexpr ::= number *)
          | [< 'Token.Number n >] -> Ast.Number n
          (* parenexpr ::= '(' expression ')' *)
          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
          (* identifierexpr
           *   ::= identifier
           *   ::= identifier '(' argumentexpr ')' *)
          | [< 'Token.Ident id; stream >] ->
              let rec parse_args accumulator = parser
                | [< e=parse_expr; stream >] ->
                    begin parser
                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
                      | [< >] -> e :: accumulator
                    end stream
                | [< >] -> accumulator
              in
              let rec parse_ident id = parser
                (* Call. *)
                | [< 'Token.Kwd '(';
                     args=parse_args [];
                     'Token.Kwd ')' ?? "expected ')'">] ->
                    Ast.Call (id, Array.of_list (List.rev args))
                (* Simple variable ref. *)
                | [< >] -> Ast.Variable id
              in
              parse_ident id stream
          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
        (* binoprhs
         *   ::= ('+' primary)* *)
        and parse_bin_rhs expr_prec lhs stream =
          match Stream.peek stream with
          (* If this is a binop, find its precedence. *)
          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
              let token_prec = precedence c in
              (* If this is a binop that binds at least as tightly as the current binop,
               * consume it, otherwise we are done. *)
              if token_prec < expr_prec then lhs else begin
                (* Eat the binop. *)
                Stream.junk stream;
                (* Parse the primary expression after the binary operator. *)
                let rhs = parse_primary stream in
                (* Okay, we know this is a binop. *)
                let rhs =
                  match Stream.peek stream with
                  | Some (Token.Kwd c2) ->
                      (* If BinOp binds less tightly with rhs than the operator after
                       * rhs, let the pending operator take rhs as its lhs. *)
                      let next_prec = precedence c2 in
                      if token_prec < next_prec
                      then parse_bin_rhs (token_prec + 1) rhs stream
                      else rhs
                  | _ -> rhs
                in
                (* Merge lhs/rhs. *)
                let lhs = Ast.Binary (c, lhs, rhs) in
                parse_bin_rhs expr_prec lhs stream
              end
          | _ -> lhs
        (* expression
         *   ::= primary binoprhs *)
        and parse_expr = parser
          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
        (* prototype
         *   ::= id '(' id* ')' *)
        let parse_prototype =
          let rec parse_args accumulator = parser
            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
            | [< >] -> accumulator
          in
          parser
          | [< 'Token.Ident id;
               'Token.Kwd '(' ?? "expected '(' in prototype";
               args=parse_args [];
               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
              (* success. *)
              Ast.Prototype (id, Array.of_list (List.rev args))
          | [< >] ->
              raise (Stream.Error "expected function name in prototype")
        (* definition ::= 'def' prototype expression *)
        let parse_definition = parser
          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
              Ast.Function (p, e)
        (* toplevelexpr ::= expression *)
        let parse_toplevel = parser
          | [< e=parse_expr >] ->
              (* Make an anonymous proto. *)
              Ast.Function (Ast.Prototype ("", [||]), e)
        (*  external ::= 'extern' prototype *)
        let parse_extern = parser
          | [< 'Token.Extern; e=parse_prototype >] -> e
 toplevel.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Top-Level parsing and JIT Driver
         *===----------------------------------------------------------------------===*)
        (* top ::= definition | external | expression | ';' *)
        let rec main_loop stream =
          match Stream.peek stream with
          | None -> ()
          (* ignore top-level semicolons. *)
          | Some (Token.Kwd ';') ->
              Stream.junk stream;
              main_loop stream
          | Some token ->
              begin
                try match token with
                | Token.Def ->
                    ignore(Parser.parse_definition stream);
                    print_endline "parsed a function definition.";
                | Token.Extern ->
                    ignore(Parser.parse_extern stream);
                    print_endline "parsed an extern.";
                | _ ->
                    (* Evaluate a top-level expression into an anonymous function. *)
                    ignore(Parser.parse_toplevel stream);
                    print_endline "parsed a top-level expr";
                with Stream.Error s ->
                  (* Skip token for error recovery. *)
                  Stream.junk stream;
                  print_endline s;
              end;
              print_string "ready> "; flush stdout;
              main_loop stream
 toy.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Main driver code.
         *===----------------------------------------------------------------------===*)
        let main () =
          (* Install standard binary operators.
           * 1 is the lowest precedence. *)
          Hashtbl.add Parser.binop_precedence '<' 10;
          Hashtbl.add Parser.binop_precedence '+' 20;
          Hashtbl.add Parser.binop_precedence '-' 20;
          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
          (* Prime the first token. *)
          print_string "ready> "; flush stdout;
          let stream = Lexer.lex (Stream.of_channel stdin) in
          (* Run the main "interpreter loop" now. *)
          Toplevel.main_loop stream;
        ;;
        main ()
 `Next: Implementing Code Generation to LLVM IR <OCamlLangImpl3.html>`_
--- a/docs/tutorial/OCamlLangImpl3.html
+++ b/docs/tutorial/OCamlLangImpl3.html
--- a/docs/tutorial/OCamlLangImpl3.rst
+++ b/docs/tutorial/OCamlLangImpl3.rst
@ -0,0 +1,964 @@
 ========================================
 Kaleidoscope: Code generation to LLVM IR
 ========================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick
 Tryzelaar <mailto:idadesub@users.sourceforge.net>`_
 Chapter 3 Introduction
 ======================
 Welcome to Chapter 3 of the "`Implementing a language with
 LLVM <index.html>`_" tutorial. This chapter shows you how to transform
 the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2,
 into LLVM IR. This will teach you a little bit about how LLVM does
 things, as well as demonstrate how easy it is to use. It's much more
 work to build a lexer and parser than it is to generate LLVM IR code. :)
 **Please note**: the code in this chapter and later require LLVM 2.3 or
 LLVM SVN to work. LLVM 2.2 and before will not work with it.
 Code Generation Setup
 =====================
 In order to generate LLVM IR, we want some simple setup to get started.
 First we define virtual code generation (codegen) methods in each AST
 class:
 .. code-block:: ocaml
    let rec codegen_expr = function
      | Ast.Number n -> ...
      | Ast.Variable name -> ...
 The ``Codegen.codegen_expr`` function says to emit IR for that AST node
 along with all the things it depends on, and they all return an LLVM
 Value object. "Value" is the class used to represent a "`Static Single
 Assignment
 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
 register" or "SSA value" in LLVM. The most distinct aspect of SSA values
 is that their value is computed as the related instruction executes, and
 it does not get a new value until (and if) the instruction re-executes.
 In other words, there is no way to "change" an SSA value. For more
 information, please read up on `Static Single
 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
 - the concepts are really quite natural once you grok them.
 The second thing we want is an "Error" exception like we used for the
 parser, which will be used to report errors found during code generation
 (for example, use of an undeclared parameter):
 .. code-block:: ocaml
    exception Error of string
    let context = global_context ()
    let the_module = create_module context "my cool jit"
    let builder = builder context
    let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
    let double_type = double_type context
 The static variables will be used during code generation.
 ``Codgen.the_module`` is the LLVM construct that contains all of the
 functions and global variables in a chunk of code. In many ways, it is
 the top-level structure that the LLVM IR uses to contain code.
 The ``Codegen.builder`` object is a helper object that makes it easy to
 generate LLVM instructions. Instances of the
 ```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
 class keep track of the current place to insert instructions and has
 methods to create new instructions.
 The ``Codegen.named_values`` map keeps track of which values are defined
 in the current scope and what their LLVM representation is. (In other
 words, it is a symbol table for the code). In this form of Kaleidoscope,
 the only things that can be referenced are function parameters. As such,
 function parameters will be in this map when generating code for their
 function body.
 With these basics in place, we can start talking about how to generate
 code for each expression. Note that this assumes that the
 ``Codgen.builder`` has been set up to generate code *into* something.
 For now, we'll assume that this has already been done, and we'll just
 use it to emit code.
 Expression Code Generation
 ==========================
 Generating LLVM code for expression nodes is very straightforward: less
 than 30 lines of commented code for all four of our expression nodes.
 First we'll do numeric literals:
 .. code-block:: ocaml
      | Ast.Number n -> const_float double_type n
 In the LLVM IR, numeric constants are represented with the
 ``ConstantFP`` class, which holds the numeric value in an ``APFloat``
 internally (``APFloat`` has the capability of holding floating point
 constants of Arbitrary Precision). This code basically just creates
 and returns a ``ConstantFP``. Note that in the LLVM IR that constants
 are all uniqued together and shared. For this reason, the API uses "the
 foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".
 .. code-block:: ocaml
      | Ast.Variable name ->
          (try Hashtbl.find named_values name with
            | Not_found -> raise (Error "unknown variable name"))
 References to variables are also quite simple using LLVM. In the simple
 version of Kaleidoscope, we assume that the variable has already been
 emitted somewhere and its value is available. In practice, the only
 values that can be in the ``Codegen.named_values`` map are function
 arguments. This code simply checks to see that the specified name is in
 the map (if not, an unknown variable is being referenced) and returns
 the value for it. In future chapters, we'll add support for `loop
 induction variables <LangImpl5.html#for>`_ in the symbol table, and for
 `local variables <LangImpl7.html#localvars>`_.
 .. code-block:: ocaml
      | Ast.Binary (op, lhs, rhs) ->
          let lhs_val = codegen_expr lhs in
          let rhs_val = codegen_expr rhs in
          begin
            match op with
            | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
            | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
            | '*' -> build_fmul lhs_val rhs_val "multmp" builder
            | '<' ->
                (* Convert bool 0/1 to double 0.0 or 1.0 *)
                let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
                build_uitofp i double_type "booltmp" builder
            | _ -> raise (Error "invalid binary operator")
          end
 Binary operators start to get more interesting. The basic idea here is
 that we recursively emit code for the left-hand side of the expression,
 then the right-hand side, then we compute the result of the binary
 expression. In this code, we do a simple switch on the opcode to create
 the right LLVM instruction.
 In the example above, the LLVM builder class is starting to show its
 value. IRBuilder knows where to insert the newly created instruction,
 all you have to do is specify what instruction to create (e.g. with
 ``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here)
 and optionally provide a name for the generated instruction.
 One nice thing about LLVM is that the name is just a hint. For instance,
 if the code above emits multiple "addtmp" variables, LLVM will
 automatically provide each one with an increasing, unique numeric
 suffix. Local value names for instructions are purely optional, but it
 makes it much easier to read the IR dumps.
 `LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
 rules: for example, the Left and Right operators of an `add
 instruction <../LangRef.html#i_add>`_ must have the same type, and the
 result type of the add must match the operand types. Because all values
 in Kaleidoscope are doubles, this makes for very simple code for add,
 sub and mul.
 On the other hand, LLVM specifies that the `fcmp
 instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
 one bit integer). The problem with this is that Kaleidoscope wants the
 value to be a 0.0 or 1.0 value. In order to get these semantics, we
 combine the fcmp instruction with a `uitofp
 instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
 input integer into a floating point value by treating the input as an
 unsigned value. In contrast, if we used the `sitofp
 instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
 would return 0.0 and -1.0, depending on the input value.
 .. code-block:: ocaml
      | Ast.Call (callee, args) ->
          (* Look up the name in the module table. *)
          let callee =
            match lookup_function callee the_module with
            | Some callee -> callee
            | None -> raise (Error "unknown function referenced")
          in
          let params = params callee in
          (* If argument mismatch error. *)
          if Array.length params == Array.length args then () else
            raise (Error "incorrect # arguments passed");
          let args = Array.map codegen_expr args in
          build_call callee args "calltmp" builder
 Code generation for function calls is quite straightforward with LLVM.
 The code above initially does a function name lookup in the LLVM
 Module's symbol table. Recall that the LLVM Module is the container that
 holds all of the functions we are JIT'ing. By giving each function the
 same name as what the user specifies, we can use the LLVM symbol table
 to resolve function names for us.
 Once we have the function to call, we recursively codegen each argument
 that is to be passed in, and create an LLVM `call
 instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
 calling conventions by default, allowing these calls to also call into
 standard library functions like "sin" and "cos", with no additional
 effort.
 This wraps up our handling of the four basic expressions that we have so
 far in Kaleidoscope. Feel free to go in and add some more. For example,
 by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
 several other interesting instructions that are really easy to plug into
 our basic framework.
 Function Code Generation
 ========================
 Code generation for prototypes and functions must handle a number of
 details, which make their code less beautiful than expression code
 generation, but allows us to illustrate some important points. First,
 lets talk about code generation for prototypes: they are used both for
 function bodies and external function declarations. The code starts
 with:
 .. code-block:: ocaml
    let codegen_proto = function
      | Ast.Prototype (name, args) ->
          (* Make the function type: double(double,double) etc. *)
          let doubles = Array.make (Array.length args) double_type in
          let ft = function_type double_type doubles in
          let f =
            match lookup_function name the_module with
 This code packs a lot of power into a few lines. Note first that this
 function returns a "Function\*" instead of a "Value\*" (although at the
 moment they both are modeled by ``llvalue`` in ocaml). Because a
 "prototype" really talks about the external interface for a function
 (not the value computed by an expression), it makes sense for it to
 return the LLVM Function it corresponds to when codegen'd.
 The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that
 should be used for a given Prototype. Since all function arguments in
 Kaleidoscope are of type double, the first line creates a vector of "N"
 LLVM double types. It then uses the ``Llvm.function_type`` method to
 create a function type that takes "N" doubles as arguments, returns one
 double as a result, and that is not vararg (that uses the function
 ``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued
 just like ``Constant``'s are, so you don't "new" a type, you "get" it.
 The final line above checks if the function has already been defined in
 ``Codegen.the_module``. If not, we will create it.
 .. code-block:: ocaml
            | None -> declare_function name ft the_module
 This indicates the type and name to use, as well as which module to
 insert into. By default we assume a function has
 ``Llvm.Linkage.ExternalLinkage``. "`external
 linkage <LangRef.html#linkage>`_" means that the function may be defined
 outside the current module and/or that it is callable by functions
 outside the module. The "``name``" passed in is the name the user
 specified: this name is registered in "``Codegen.the_module``"s symbol
 table, which is used by the function call code above.
 In Kaleidoscope, I choose to allow redefinitions of functions in two
 cases: first, we want to allow 'extern'ing a function more than once, as
 long as the prototypes for the externs match (since all arguments have
 the same type, we just have to check that the number of arguments
 match). Second, we want to allow 'extern'ing a function and then
 defining a body for it. This is useful when defining mutually recursive
 functions.
 .. code-block:: ocaml
            (* If 'f' conflicted, there was already something named 'name'. If it
             * has a body, don't allow redefinition or reextern. *)
            | Some f ->
                (* If 'f' already has a body, reject this. *)
                if Array.length (basic_blocks f) == 0 then () else
                  raise (Error "redefinition of function");
                (* If 'f' took a different number of arguments, reject. *)
                if Array.length (params f) == Array.length args then () else
                  raise (Error "redefinition of function with different # args");
                f
          in
 In order to verify the logic above, we first check to see if the
 pre-existing function is "empty". In this case, empty means that it has
 no basic blocks in it, which means it has no body. If it has no body, it
 is a forward declaration. Since we don't allow anything after a full
 definition of the function, the code rejects this case. If the previous
 reference to a function was an 'extern', we simply verify that the
 number of arguments for that definition and this one match up. If not,
 we emit an error.
 .. code-block:: ocaml
          (* Set names for all arguments. *)
          Array.iteri (fun i a ->
            let n = args.(i) in
            set_value_name n a;
            Hashtbl.add named_values n a;
          ) (params f);
          f
 The last bit of code for prototypes loops over all of the arguments in
 the function, setting the name of the LLVM Argument objects to match,
 and registering the arguments in the ``Codegen.named_values`` map for
 future use by the ``Ast.Variable`` variant. Once this is set up, it
 returns the Function object to the caller. Note that we don't check for
 conflicting argument names here (e.g. "extern foo(a b a)"). Doing so
 would be very straight-forward with the mechanics we have already used
 above.
 .. code-block:: ocaml
    let codegen_func = function
      | Ast.Function (proto, body) ->
          Hashtbl.clear named_values;
          let the_function = codegen_proto proto in
 Code generation for function definitions starts out simply enough: we
 just codegen the prototype (Proto) and verify that it is ok. We then
 clear out the ``Codegen.named_values`` map to make sure that there isn't
 anything in it from the last function we compiled. Code generation of
 the prototype ensures that there is an LLVM Function object that is
 ready to go for us.
 .. code-block:: ocaml
          (* Create a new basic block to start insertion into. *)
          let bb = append_block context "entry" the_function in
          position_at_end bb builder;
          try
            let ret_val = codegen_expr body in
 Now we get to the point where the ``Codegen.builder`` is set up. The
 first line creates a new `basic
 block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"),
 which is inserted into ``the_function``. The second line then tells the
 builder that new instructions should be inserted into the end of the new
 basic block. Basic blocks in LLVM are an important part of functions
 that define the `Control Flow
 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
 don't have any control flow, our functions will only contain one block
 at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :).
 .. code-block:: ocaml
            let ret_val = codegen_expr body in
            (* Finish off the function. *)
            let _ = build_ret ret_val builder in
            (* Validate the generated code, checking for consistency. *)
            Llvm_analysis.assert_valid_function the_function;
            the_function
 Once the insertion point is set up, we call the ``Codegen.codegen_func``
 method for the root expression of the function. If no error happens,
 this emits code to compute the expression into the entry block and
 returns the value that was computed. Assuming no error, we then create
 an LLVM `ret instruction <../LangRef.html#i_ret>`_, which completes the
 function. Once the function is built, we call
 ``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This
 function does a variety of consistency checks on the generated code, to
 determine if our compiler is doing everything right. Using this is
 important: it can catch a lot of bugs. Once the function is finished and
 validated, we return it.
 .. code-block:: ocaml
          with e ->
            delete_function the_function;
            raise e
 The only piece left here is handling of the error case. For simplicity,
 we handle this by merely deleting the function we produced with the
 ``Llvm.delete_function`` method. This allows the user to redefine a
 function that they incorrectly typed in before: if we didn't delete it,
 it would live in the symbol table, with a body, preventing future
 redefinition.
 This code does have a bug, though. Since the ``Codegen.codegen_proto``
 can return a previously defined forward declaration, our code can
 actually delete a forward declaration. There are a number of ways to fix
 this bug, see what you can come up with! Here is a testcase:
 ::
    extern foo(a b);     # ok, defines foo.
    def foo(a b) c;      # error, 'c' is invalid.
    def bar() foo(1, 2); # error, unknown function "foo"
 Driver Changes and Closing Thoughts
 ===================================
 For now, code generation to LLVM doesn't really get us much, except that
 we can look at the pretty IR calls. The sample code inserts calls to
 Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM
 IR. This gives a nice way to look at the LLVM IR for simple functions.
 For example:
 ::
    ready> 4+5;
    Read top-level expression:
    define double @""() {
    entry:
            %addtmp = fadd double 4.000000e+00, 5.000000e+00
            ret double %addtmp
    }
 Note how the parser turns the top-level expression into anonymous
 functions for us. This will be handy when we add `JIT
 support <OCamlLangImpl4.html#jit>`_ in the next chapter. Also note that
 the code is very literally transcribed, no optimizations are being
 performed. We will `add
 optimizations <OCamlLangImpl4.html#trivialconstfold>`_ explicitly in the
 next chapter.
 ::
    ready> def foo(a b) a*a + 2*a*b + b*b;
    Read function definition:
    define double @foo(double %a, double %b) {
    entry:
            %multmp = fmul double %a, %a
            %multmp1 = fmul double 2.000000e+00, %a
            %multmp2 = fmul double %multmp1, %b
            %addtmp = fadd double %multmp, %multmp2
            %multmp3 = fmul double %b, %b
            %addtmp4 = fadd double %addtmp, %multmp3
            ret double %addtmp4
    }
 This shows some simple arithmetic. Notice the striking similarity to the
 LLVM builder calls that we use to create the instructions.
 ::
    ready> def bar(a) foo(a, 4.0) + bar(31337);
    Read function definition:
    define double @bar(double %a) {
    entry:
            %calltmp = call double @foo(double %a, double 4.000000e+00)
            %calltmp1 = call double @bar(double 3.133700e+04)
            %addtmp = fadd double %calltmp, %calltmp1
            ret double %addtmp
    }
 This shows some function calls. Note that this function will take a long
 time to execute if you call it. In the future we'll add conditional
 control flow to actually make recursion useful :).
 ::
    ready> extern cos(x);
    Read extern:
    declare double @cos(double)
    ready> cos(1.234);
    Read top-level expression:
    define double @""() {
    entry:
            %calltmp = call double @cos(double 1.234000e+00)
            ret double %calltmp
    }
 This shows an extern for the libm "cos" function, and a call to it.
 ::
    ready> ^D
    ; ModuleID = 'my cool jit'
    define double @""() {
    entry:
            %addtmp = fadd double 4.000000e+00, 5.000000e+00
            ret double %addtmp
    }
    define double @foo(double %a, double %b) {
    entry:
            %multmp = fmul double %a, %a
            %multmp1 = fmul double 2.000000e+00, %a
            %multmp2 = fmul double %multmp1, %b
            %addtmp = fadd double %multmp, %multmp2
            %multmp3 = fmul double %b, %b
            %addtmp4 = fadd double %addtmp, %multmp3
            ret double %addtmp4
    }
    define double @bar(double %a) {
    entry:
            %calltmp = call double @foo(double %a, double 4.000000e+00)
            %calltmp1 = call double @bar(double 3.133700e+04)
            %addtmp = fadd double %calltmp, %calltmp1
            ret double %addtmp
    }
    declare double @cos(double)
    define double @""() {
    entry:
            %calltmp = call double @cos(double 1.234000e+00)
            ret double %calltmp
    }
 When you quit the current demo, it dumps out the IR for the entire
 module generated. Here you can see the big picture with all the
 functions referencing each other.
 This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
 we'll describe how to `add JIT codegen and optimizer
 support <OCamlLangImpl4.html>`_ to this so we can actually start running
 code!
 Full Code Listing
 =================
 Here is the complete code listing for our running example, enhanced with
 the LLVM code generator. Because this uses the LLVM libraries, we need
 to link them in. To do this, we use the
 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
 our makefile/command line about which options to use:
 .. code-block:: bash
    # Compile
    ocamlbuild toy.byte
    # Run
    ./toy.byte
 Here is the code:
 \_tags:
    ::
        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
        <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
 myocamlbuild.ml:
    .. code-block:: ocaml
        open Ocamlbuild_plugin;;
        ocaml_lib ~extern:true "llvm";;
        ocaml_lib ~extern:true "llvm_analysis";;
        flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
 token.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer Tokens
         *===----------------------------------------------------------------------===*)
        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
         * these others for known things. *)
        type token =
          (* commands *)
          | Def | Extern
          (* primary *)
          | Ident of string | Number of float
          (* unknown *)
          | Kwd of char
 lexer.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer
         *===----------------------------------------------------------------------===*)
        let rec lex = parser
          (* Skip any whitespace. *)
          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_ident buffer stream
          (* number: [0-9.]+ *)
          | [< ' ('0' .. '9' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_number buffer stream
          (* Comment until end of line. *)
          | [< ' ('#'); stream >] ->
              lex_comment stream
          (* Otherwise, just return the character as its ascii value. *)
          | [< 'c; stream >] ->
              [< 'Token.Kwd c; lex stream >]
          (* end of stream. *)
          | [< >] -> [< >]
        and lex_number buffer = parser
          | [< ' ('0' .. '9' | '.' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_number buffer stream
          | [< stream=lex >] ->
              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
        and lex_ident buffer = parser
          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_ident buffer stream
          | [< stream=lex >] ->
              match Buffer.contents buffer with
              | "def" -> [< 'Token.Def; stream >]
              | "extern" -> [< 'Token.Extern; stream >]
              | id -> [< 'Token.Ident id; stream >]
        and lex_comment = parser
          | [< ' ('\n'); stream=lex >] -> stream
          | [< 'c; e=lex_comment >] -> e
          | [< >] -> [< >]
 ast.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Abstract Syntax Tree (aka Parse Tree)
         *===----------------------------------------------------------------------===*)
        (* expr - Base type for all expression nodes. *)
        type expr =
          (* variant for numeric literals like "1.0". *)
          | Number of float
          (* variant for referencing a variable, like "a". *)
          | Variable of string
          (* variant for a binary operator. *)
          | Binary of char * expr * expr
          (* variant for function calls. *)
          | Call of string * expr array
        (* proto - This type represents the "prototype" for a function, which captures
         * its name, and its argument names (thus implicitly the number of arguments the
         * function takes). *)
        type proto = Prototype of string * string array
        (* func - This type represents a function definition itself. *)
        type func = Function of proto * expr
 parser.ml:
    .. code-block:: ocaml
        (*===---------------------------------------------------------------------===
         * Parser
         *===---------------------------------------------------------------------===*)
        (* binop_precedence - This holds the precedence for each binary operator that is
         * defined *)
        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
        (* precedence - Get the precedence of the pending binary operator token. *)
        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
        (* primary
         *   ::= identifier
         *   ::= numberexpr
         *   ::= parenexpr *)
        let rec parse_primary = parser
          (* numberexpr ::= number *)
          | [< 'Token.Number n >] -> Ast.Number n
          (* parenexpr ::= '(' expression ')' *)
          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
          (* identifierexpr
           *   ::= identifier
           *   ::= identifier '(' argumentexpr ')' *)
          | [< 'Token.Ident id; stream >] ->
              let rec parse_args accumulator = parser
                | [< e=parse_expr; stream >] ->
                    begin parser
                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
                      | [< >] -> e :: accumulator
                    end stream
                | [< >] -> accumulator
              in
              let rec parse_ident id = parser
                (* Call. *)
                | [< 'Token.Kwd '(';
                     args=parse_args [];
                     'Token.Kwd ')' ?? "expected ')'">] ->
                    Ast.Call (id, Array.of_list (List.rev args))
                (* Simple variable ref. *)
                | [< >] -> Ast.Variable id
              in
              parse_ident id stream
          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
        (* binoprhs
         *   ::= ('+' primary)* *)
        and parse_bin_rhs expr_prec lhs stream =
          match Stream.peek stream with
          (* If this is a binop, find its precedence. *)
          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
              let token_prec = precedence c in
              (* If this is a binop that binds at least as tightly as the current binop,
               * consume it, otherwise we are done. *)
              if token_prec < expr_prec then lhs else begin
                (* Eat the binop. *)
                Stream.junk stream;
                (* Parse the primary expression after the binary operator. *)
                let rhs = parse_primary stream in
                (* Okay, we know this is a binop. *)
                let rhs =
                  match Stream.peek stream with
                  | Some (Token.Kwd c2) ->
                      (* If BinOp binds less tightly with rhs than the operator after
                       * rhs, let the pending operator take rhs as its lhs. *)
                      let next_prec = precedence c2 in
                      if token_prec < next_prec
                      then parse_bin_rhs (token_prec + 1) rhs stream
                      else rhs
                  | _ -> rhs
                in
                (* Merge lhs/rhs. *)
                let lhs = Ast.Binary (c, lhs, rhs) in
                parse_bin_rhs expr_prec lhs stream
              end
          | _ -> lhs
        (* expression
         *   ::= primary binoprhs *)
        and parse_expr = parser
          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
        (* prototype
         *   ::= id '(' id* ')' *)
        let parse_prototype =
          let rec parse_args accumulator = parser
            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
            | [< >] -> accumulator
          in
          parser
          | [< 'Token.Ident id;
               'Token.Kwd '(' ?? "expected '(' in prototype";
               args=parse_args [];
               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
              (* success. *)
              Ast.Prototype (id, Array.of_list (List.rev args))
          | [< >] ->
              raise (Stream.Error "expected function name in prototype")
        (* definition ::= 'def' prototype expression *)
        let parse_definition = parser
          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
              Ast.Function (p, e)
        (* toplevelexpr ::= expression *)
        let parse_toplevel = parser
          | [< e=parse_expr >] ->
              (* Make an anonymous proto. *)
              Ast.Function (Ast.Prototype ("", [||]), e)
        (*  external ::= 'extern' prototype *)
        let parse_extern = parser
          | [< 'Token.Extern; e=parse_prototype >] -> e
 codegen.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Code Generation
         *===----------------------------------------------------------------------===*)
        open Llvm
        exception Error of string
        let context = global_context ()
        let the_module = create_module context "my cool jit"
        let builder = builder context
        let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
        let double_type = double_type context
        let rec codegen_expr = function
          | Ast.Number n -> const_float double_type n
          | Ast.Variable name ->
              (try Hashtbl.find named_values name with
                | Not_found -> raise (Error "unknown variable name"))
          | Ast.Binary (op, lhs, rhs) ->
              let lhs_val = codegen_expr lhs in
              let rhs_val = codegen_expr rhs in
              begin
                match op with
                | '+' -> build_add lhs_val rhs_val "addtmp" builder
                | '-' -> build_sub lhs_val rhs_val "subtmp" builder
                | '*' -> build_mul lhs_val rhs_val "multmp" builder
                | '<' ->
                    (* Convert bool 0/1 to double 0.0 or 1.0 *)
                    let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
                    build_uitofp i double_type "booltmp" builder
                | _ -> raise (Error "invalid binary operator")
              end
          | Ast.Call (callee, args) ->
              (* Look up the name in the module table. *)
              let callee =
                match lookup_function callee the_module with
                | Some callee -> callee
                | None -> raise (Error "unknown function referenced")
              in
              let params = params callee in
              (* If argument mismatch error. *)
              if Array.length params == Array.length args then () else
                raise (Error "incorrect # arguments passed");
              let args = Array.map codegen_expr args in
              build_call callee args "calltmp" builder
        let codegen_proto = function
          | Ast.Prototype (name, args) ->
              (* Make the function type: double(double,double) etc. *)
              let doubles = Array.make (Array.length args) double_type in
              let ft = function_type double_type doubles in
              let f =
                match lookup_function name the_module with
                | None -> declare_function name ft the_module
                (* If 'f' conflicted, there was already something named 'name'. If it
                 * has a body, don't allow redefinition or reextern. *)
                | Some f ->
                    (* If 'f' already has a body, reject this. *)
                    if block_begin f <> At_end f then
                      raise (Error "redefinition of function");
                    (* If 'f' took a different number of arguments, reject. *)
                    if element_type (type_of f) <> ft then
                      raise (Error "redefinition of function with different # args");
                    f
              in
              (* Set names for all arguments. *)
              Array.iteri (fun i a ->
                let n = args.(i) in
                set_value_name n a;
                Hashtbl.add named_values n a;
              ) (params f);
              f
        let codegen_func = function
          | Ast.Function (proto, body) ->
              Hashtbl.clear named_values;
              let the_function = codegen_proto proto in
              (* Create a new basic block to start insertion into. *)
              let bb = append_block context "entry" the_function in
              position_at_end bb builder;
              try
                let ret_val = codegen_expr body in
                (* Finish off the function. *)
                let _ = build_ret ret_val builder in
                (* Validate the generated code, checking for consistency. *)
                Llvm_analysis.assert_valid_function the_function;
                the_function
              with e ->
                delete_function the_function;
                raise e
 toplevel.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Top-Level parsing and JIT Driver
         *===----------------------------------------------------------------------===*)
        open Llvm
        (* top ::= definition | external | expression | ';' *)
        let rec main_loop stream =
          match Stream.peek stream with
          | None -> ()
          (* ignore top-level semicolons. *)
          | Some (Token.Kwd ';') ->
              Stream.junk stream;
              main_loop stream
          | Some token ->
              begin
                try match token with
                | Token.Def ->
                    let e = Parser.parse_definition stream in
                    print_endline "parsed a function definition.";
                    dump_value (Codegen.codegen_func e);
                | Token.Extern ->
                    let e = Parser.parse_extern stream in
                    print_endline "parsed an extern.";
                    dump_value (Codegen.codegen_proto e);
                | _ ->
                    (* Evaluate a top-level expression into an anonymous function. *)
                    let e = Parser.parse_toplevel stream in
                    print_endline "parsed a top-level expr";
                    dump_value (Codegen.codegen_func e);
                with Stream.Error s | Codegen.Error s ->
                  (* Skip token for error recovery. *)
                  Stream.junk stream;
                  print_endline s;
              end;
              print_string "ready> "; flush stdout;
              main_loop stream
 toy.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Main driver code.
         *===----------------------------------------------------------------------===*)
        open Llvm
        let main () =
          (* Install standard binary operators.
           * 1 is the lowest precedence. *)
          Hashtbl.add Parser.binop_precedence '<' 10;
          Hashtbl.add Parser.binop_precedence '+' 20;
          Hashtbl.add Parser.binop_precedence '-' 20;
          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
          (* Prime the first token. *)
          print_string "ready> "; flush stdout;
          let stream = Lexer.lex (Stream.of_channel stdin) in
          (* Run the main "interpreter loop" now. *)
          Toplevel.main_loop stream;
          (* Print out all the generated code. *)
          dump_module Codegen.the_module
        ;;
        main ()
 `Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_
--- a/docs/tutorial/OCamlLangImpl4.html
+++ b/docs/tutorial/OCamlLangImpl4.html
--- a/docs/tutorial/OCamlLangImpl4.rst
+++ b/docs/tutorial/OCamlLangImpl4.rst
@ -0,0 +1,918 @@
 ==============================================
 Kaleidoscope: Adding JIT and Optimizer Support
 ==============================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick
 Tryzelaar <mailto:idadesub@users.sourceforge.net>`_
 Chapter 4 Introduction
 ======================
 Welcome to Chapter 4 of the "`Implementing a language with
 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
 of a simple language and added support for generating LLVM IR. This
 chapter describes two new techniques: adding optimizer support to your
 language, and adding JIT compiler support. These additions will
 demonstrate how to get nice, efficient code for the Kaleidoscope
 language.
 Trivial Constant Folding
 ========================
 **Note:** the default ``IRBuilder`` now always includes the constant
 folding optimisations below.
 Our demonstration for Chapter 3 is elegant and easy to extend.
 Unfortunately, it does not produce wonderful code. For example, when
 compiling simple code, we don't get obvious optimizations:
 ::
    ready> def test(x) 1+2+x;
    Read function definition:
    define double @test(double %x) {
    entry:
            %addtmp = fadd double 1.000000e+00, 2.000000e+00
            %addtmp1 = fadd double %addtmp, %x
            ret double %addtmp1
    }
 This code is a very, very literal transcription of the AST built by
 parsing the input. As such, this transcription lacks optimizations like
 constant folding (we'd like to get "``add x, 3.0``" in the example
 above) as well as other more important optimizations. Constant folding,
 in particular, is a very common and very important optimization: so much
 so that many language implementors implement constant folding support in
 their AST representation.
 With LLVM, you don't need this support in the AST. Since all calls to
 build LLVM IR go through the LLVM builder, it would be nice if the
 builder itself checked to see if there was a constant folding
 opportunity when you call it. If so, it could just do the constant fold
 and return the constant instead of creating an instruction. This is
 exactly what the ``LLVMFoldingBuilder`` class does.
 All we did was switch from ``LLVMBuilder`` to ``LLVMFoldingBuilder``.
 Though we change no other code, we now have all of our instructions
 implicitly constant folded without us having to do anything about it.
 For example, the input above now compiles to:
 ::
    ready> def test(x) 1+2+x;
    Read function definition:
    define double @test(double %x) {
    entry:
            %addtmp = fadd double 3.000000e+00, %x
            ret double %addtmp
    }
 Well, that was easy :). In practice, we recommend always using
 ``LLVMFoldingBuilder`` when generating code like this. It has no
 "syntactic overhead" for its use (you don't have to uglify your compiler
 with constant checks everywhere) and it can dramatically reduce the
 amount of LLVM IR that is generated in some cases (particular for
 languages with a macro preprocessor or that use a lot of constants).
 On the other hand, the ``LLVMFoldingBuilder`` is limited by the fact
 that it does all of its analysis inline with the code as it is built. If
 you take a slightly more complex example:
 ::
    ready> def test(x) (1+2+x)*(x+(1+2));
    ready> Read function definition:
    define double @test(double %x) {
    entry:
            %addtmp = fadd double 3.000000e+00, %x
            %addtmp1 = fadd double %x, 3.000000e+00
            %multmp = fmul double %addtmp, %addtmp1
            ret double %multmp
    }
 In this case, the LHS and RHS of the multiplication are the same value.
 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
 instead of computing "``x*3``" twice.
 Unfortunately, no amount of local analysis will be able to detect and
 correct this. This requires two transformations: reassociation of
 expressions (to make the add's lexically identical) and Common
 Subexpression Elimination (CSE) to delete the redundant add instruction.
 Fortunately, LLVM provides a broad range of optimizations that you can
 use, in the form of "passes".
 LLVM Optimization Passes
 ========================
 LLVM provides many optimization passes, which do many different sorts of
 things and have different tradeoffs. Unlike other systems, LLVM doesn't
 hold to the mistaken notion that one set of optimizations is right for
 all languages and for all situations. LLVM allows a compiler implementor
 to make complete decisions about what optimizations to use, in which
 order, and in what situation.
 As a concrete example, LLVM supports both "whole module" passes, which
 look across as large of body of code as they can (often a whole file,
 but if run at link time, this can be a substantial portion of the whole
 program). It also supports and includes "per-function" passes which just
 operate on a single function at a time, without looking at other
 functions. For more information on passes and how they are run, see the
 `How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
 `List of LLVM Passes <../Passes.html>`_.
 For Kaleidoscope, we are currently generating functions on the fly, one
 at a time, as the user types them in. We aren't shooting for the
 ultimate optimization experience in this setting, but we also want to
 catch the easy and quick stuff where possible. As such, we will choose
 to run a few per-function optimizations as the user types the function
 in. If we wanted to make a "static Kaleidoscope compiler", we would use
 exactly the code we have now, except that we would defer running the
 optimizer until the entire file has been parsed.
 In order to get per-function optimizations going, we need to set up a
 `Llvm.PassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold and
 organize the LLVM optimizations that we want to run. Once we have that,
 we can add a set of optimizations to run. The code looks like this:
 .. code-block:: ocaml
      (* Create the JIT. *)
      let the_execution_engine = ExecutionEngine.create Codegen.the_module in
      let the_fpm = PassManager.create_function Codegen.the_module in
      (* Set up the optimizer pipeline.  Start with registering info about how the
       * target lays out data structures. *)
      DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
      (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
      add_instruction_combining the_fpm;
      (* reassociate expressions. *)
      add_reassociation the_fpm;
      (* Eliminate Common SubExpressions. *)
      add_gvn the_fpm;
      (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
      add_cfg_simplification the_fpm;
      ignore (PassManager.initialize the_fpm);
      (* Run the main "interpreter loop" now. *)
      Toplevel.main_loop the_fpm the_execution_engine stream;
 The meat of the matter here, is the definition of "``the_fpm``". It
 requires a pointer to the ``the_module`` to construct itself. Once it is
 set up, we use a series of "add" calls to add a bunch of LLVM passes.
 The first pass is basically boilerplate, it adds a pass so that later
 optimizations know how the data structures in the program are laid out.
 The "``the_execution_engine``" variable is related to the JIT, which we
 will get to in the next section.
 In this case, we choose to add 4 optimization passes. The passes we
 chose here are a pretty standard set of "cleanup" optimizations that are
 useful for a wide variety of code. I won't delve into what they do but,
 believe me, they are a good starting place :).
 Once the ``Llvm.PassManager.`` is set up, we need to make use of it. We
 do this by running it after our newly created function is constructed
 (in ``Codegen.codegen_func``), but before it is returned to the client:
 .. code-block:: ocaml
    let codegen_func the_fpm = function
          ...
          try
            let ret_val = codegen_expr body in
            (* Finish off the function. *)
            let _ = build_ret ret_val builder in
            (* Validate the generated code, checking for consistency. *)
            Llvm_analysis.assert_valid_function the_function;
            (* Optimize the function. *)
            let _ = PassManager.run_function the_function the_fpm in
            the_function
 As you can see, this is pretty straightforward. The ``the_fpm``
 optimizes and updates the LLVM Function\* in place, improving
 (hopefully) its body. With this in place, we can try our test above
 again:
 ::
    ready> def test(x) (1+2+x)*(x+(1+2));
    ready> Read function definition:
    define double @test(double %x) {
    entry:
            %addtmp = fadd double %x, 3.000000e+00
            %multmp = fmul double %addtmp, %addtmp
            ret double %multmp
    }
 As expected, we now get our nicely optimized code, saving a floating
 point add instruction from every execution of this function.
 LLVM provides a wide variety of optimizations that can be used in
 certain circumstances. Some `documentation about the various
 passes <../Passes.html>`_ is available, but it isn't very complete.
 Another good source of ideas can come from looking at the passes that
 ``Clang`` runs to get started. The "``opt``" tool allows you to
 experiment with passes from the command line, so you can see if they do
 anything.
 Now that we have reasonable code coming out of our front-end, lets talk
 about executing it!
 Adding a JIT Compiler
 =====================
 Code that is available in LLVM IR can have a wide variety of tools
 applied to it. For example, you can run optimizations on it (as we did
 above), you can dump it out in textual or binary forms, you can compile
 the code to an assembly file (.s) for some target, or you can JIT
 compile it. The nice thing about the LLVM IR representation is that it
 is the "common currency" between many different parts of the compiler.
 In this section, we'll add JIT compiler support to our interpreter. The
 basic idea that we want for Kaleidoscope is to have the user enter
 function bodies as they do now, but immediately evaluate the top-level
 expressions they type in. For example, if they type in "1 + 2;", we
 should evaluate and print out 3. If they define a function, they should
 be able to call it from the command line.
 In order to do this, we first declare and initialize the JIT. This is
 done by adding a global variable and a call in ``main``:
 .. code-block:: ocaml
    ...
    let main () =
      ...
      (* Create the JIT. *)
      let the_execution_engine = ExecutionEngine.create Codegen.the_module in
      ...
 This creates an abstract "Execution Engine" which can be either a JIT
 compiler or the LLVM interpreter. LLVM will automatically pick a JIT
 compiler for you if one is available for your platform, otherwise it
 will fall back to the interpreter.
 Once the ``Llvm_executionengine.ExecutionEngine.t`` is created, the JIT
 is ready to be used. There are a variety of APIs that are useful, but
 the simplest one is the
 "``Llvm_executionengine.ExecutionEngine.run_function``" function. This
 method JIT compiles the specified LLVM Function and returns a function
 pointer to the generated machine code. In our case, this means that we
 can change the code that parses a top-level expression to look like
 this:
 .. code-block:: ocaml
                (* Evaluate a top-level expression into an anonymous function. *)
                let e = Parser.parse_toplevel stream in
                print_endline "parsed a top-level expr";
                let the_function = Codegen.codegen_func the_fpm e in
                dump_value the_function;
                (* JIT the function, returning a function pointer. *)
                let result = ExecutionEngine.run_function the_function [||]
                  the_execution_engine in
                print_string "Evaluated to ";
                print_float (GenericValue.as_float Codegen.double_type result);
                print_newline ();
 Recall that we compile top-level expressions into a self-contained LLVM
 function that takes no arguments and returns the computed double.
 Because the LLVM JIT compiler matches the native platform ABI, this
 means that you can just cast the result pointer to a function pointer of
 that type and call it directly. This means, there is no difference
 between JIT compiled code and native machine code that is statically
 linked into your application.
 With just these two changes, lets see how Kaleidoscope works now!
 ::
    ready> 4+5;
    define double @""() {
    entry:
            ret double 9.000000e+00
    }
    Evaluated to 9.000000
 Well this looks like it is basically working. The dump of the function
 shows the "no argument function that always returns double" that we
 synthesize for each top level expression that is typed in. This
 demonstrates very basic functionality, but can we do more?
 ::
    ready> def testfunc(x y) x + y*2;
    Read function definition:
    define double @testfunc(double %x, double %y) {
    entry:
            %multmp = fmul double %y, 2.000000e+00
            %addtmp = fadd double %multmp, %x
            ret double %addtmp
    }
    ready> testfunc(4, 10);
    define double @""() {
    entry:
            %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
            ret double %calltmp
    }
    Evaluated to 24.000000
 This illustrates that we can now call user code, but there is something
 a bit subtle going on here. Note that we only invoke the JIT on the
 anonymous functions that *call testfunc*, but we never invoked it on
 *testfunc* itself. What actually happened here is that the JIT scanned
 for all non-JIT'd functions transitively called from the anonymous
 function and compiled all of them before returning from
 ``run_function``.
 The JIT provides a number of other more advanced interfaces for things
 like freeing allocated machine code, rejit'ing functions to update them,
 etc. However, even with this simple code, we get some surprisingly
 powerful capabilities - check this out (I removed the dump of the
 anonymous functions, you should get the idea by now :) :
 ::
    ready> extern sin(x);
    Read extern:
    declare double @sin(double)
    ready> extern cos(x);
    Read extern:
    declare double @cos(double)
    ready> sin(1.0);
    Evaluated to 0.841471
    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
    Read function definition:
    define double @foo(double %x) {
    entry:
            %calltmp = call double @sin(double %x)
            %multmp = fmul double %calltmp, %calltmp
            %calltmp2 = call double @cos(double %x)
            %multmp4 = fmul double %calltmp2, %calltmp2
            %addtmp = fadd double %multmp, %multmp4
            ret double %addtmp
    }
    ready> foo(4.0);
    Evaluated to 1.000000
 Whoa, how does the JIT know about sin and cos? The answer is
 surprisingly simple: in this example, the JIT started execution of a
 function and got to a function call. It realized that the function was
 not yet JIT compiled and invoked the standard set of routines to resolve
 the function. In this case, there is no body defined for the function,
 so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope
 process itself. Since "``sin``" is defined within the JIT's address
 space, it simply patches up calls in the module to call the libm version
 of ``sin`` directly.
 The LLVM JIT provides a number of interfaces (look in the
 ``llvm_executionengine.mli`` file) for controlling how unknown functions
 get resolved. It allows you to establish explicit mappings between IR
 objects and addresses (useful for LLVM global variables that you want to
 map to static tables, for example), allows you to dynamically decide on
 the fly based on the function name, and even allows you to have the JIT
 compile functions lazily the first time they're called.
 One interesting application of this is that we can now extend the
 language by writing arbitrary C code to implement operations. For
 example, if we add:
 .. code-block:: c++
    /* putchard - putchar that takes a double and returns 0. */
    extern "C"
    double putchard(double X) {
      putchar((char)X);
      return 0;
    }
 Now we can produce simple output to the console by using things like:
 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
 on the console (120 is the ASCII code for 'x'). Similar code could be
 used to implement file I/O, console input, and many other capabilities
 in Kaleidoscope.
 This completes the JIT and optimizer chapter of the Kaleidoscope
 tutorial. At this point, we can compile a non-Turing-complete
 programming language, optimize and JIT compile it in a user-driven way.
 Next up we'll look into `extending the language with control flow
 constructs <OCamlLangImpl5.html>`_, tackling some interesting LLVM IR
 issues along the way.
 Full Code Listing
 =================
 Here is the complete code listing for our running example, enhanced with
 the LLVM JIT and optimizer. To build this example, use:
 .. code-block:: bash
    # Compile
    ocamlbuild toy.byte
    # Run
    ./toy.byte
 Here is the code:
 \_tags:
    ::
        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
        <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
        <*.{byte,native}>: use_llvm_executionengine, use_llvm_target
        <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
 myocamlbuild.ml:
    .. code-block:: ocaml
        open Ocamlbuild_plugin;;
        ocaml_lib ~extern:true "llvm";;
        ocaml_lib ~extern:true "llvm_analysis";;
        ocaml_lib ~extern:true "llvm_executionengine";;
        ocaml_lib ~extern:true "llvm_target";;
        ocaml_lib ~extern:true "llvm_scalar_opts";;
        flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
        dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
 token.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer Tokens
         *===----------------------------------------------------------------------===*)
        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
         * these others for known things. *)
        type token =
          (* commands *)
          | Def | Extern
          (* primary *)
          | Ident of string | Number of float
          (* unknown *)
          | Kwd of char
 lexer.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Lexer
         *===----------------------------------------------------------------------===*)
        let rec lex = parser
          (* Skip any whitespace. *)
          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_ident buffer stream
          (* number: [0-9.]+ *)
          | [< ' ('0' .. '9' as c); stream >] ->
              let buffer = Buffer.create 1 in
              Buffer.add_char buffer c;
              lex_number buffer stream
          (* Comment until end of line. *)
          | [< ' ('#'); stream >] ->
              lex_comment stream
          (* Otherwise, just return the character as its ascii value. *)
          | [< 'c; stream >] ->
              [< 'Token.Kwd c; lex stream >]
          (* end of stream. *)
          | [< >] -> [< >]
        and lex_number buffer = parser
          | [< ' ('0' .. '9' | '.' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_number buffer stream
          | [< stream=lex >] ->
              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
        and lex_ident buffer = parser
          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
              Buffer.add_char buffer c;
              lex_ident buffer stream
          | [< stream=lex >] ->
              match Buffer.contents buffer with
              | "def" -> [< 'Token.Def; stream >]
              | "extern" -> [< 'Token.Extern; stream >]
              | id -> [< 'Token.Ident id; stream >]
        and lex_comment = parser
          | [< ' ('\n'); stream=lex >] -> stream
          | [< 'c; e=lex_comment >] -> e
          | [< >] -> [< >]
 ast.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Abstract Syntax Tree (aka Parse Tree)
         *===----------------------------------------------------------------------===*)
        (* expr - Base type for all expression nodes. *)
        type expr =
          (* variant for numeric literals like "1.0". *)
          | Number of float
          (* variant for referencing a variable, like "a". *)
          | Variable of string
          (* variant for a binary operator. *)
          | Binary of char * expr * expr
          (* variant for function calls. *)
          | Call of string * expr array
        (* proto - This type represents the "prototype" for a function, which captures
         * its name, and its argument names (thus implicitly the number of arguments the
         * function takes). *)
        type proto = Prototype of string * string array
        (* func - This type represents a function definition itself. *)
        type func = Function of proto * expr
 parser.ml:
    .. code-block:: ocaml
        (*===---------------------------------------------------------------------===
         * Parser
         *===---------------------------------------------------------------------===*)
        (* binop_precedence - This holds the precedence for each binary operator that is
         * defined *)
        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
        (* precedence - Get the precedence of the pending binary operator token. *)
        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
        (* primary
         *   ::= identifier
         *   ::= numberexpr
         *   ::= parenexpr *)
        let rec parse_primary = parser
          (* numberexpr ::= number *)
          | [< 'Token.Number n >] -> Ast.Number n
          (* parenexpr ::= '(' expression ')' *)
          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
          (* identifierexpr
           *   ::= identifier
           *   ::= identifier '(' argumentexpr ')' *)
          | [< 'Token.Ident id; stream >] ->
              let rec parse_args accumulator = parser
                | [< e=parse_expr; stream >] ->
                    begin parser
                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
                      | [< >] -> e :: accumulator
                    end stream
                | [< >] -> accumulator
              in
              let rec parse_ident id = parser
                (* Call. *)
                | [< 'Token.Kwd '(';
                     args=parse_args [];
                     'Token.Kwd ')' ?? "expected ')'">] ->
                    Ast.Call (id, Array.of_list (List.rev args))
                (* Simple variable ref. *)
                | [< >] -> Ast.Variable id
              in
              parse_ident id stream
          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
        (* binoprhs
         *   ::= ('+' primary)* *)
        and parse_bin_rhs expr_prec lhs stream =
          match Stream.peek stream with
          (* If this is a binop, find its precedence. *)
          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
              let token_prec = precedence c in
              (* If this is a binop that binds at least as tightly as the current binop,
               * consume it, otherwise we are done. *)
              if token_prec < expr_prec then lhs else begin
                (* Eat the binop. *)
                Stream.junk stream;
                (* Parse the primary expression after the binary operator. *)
                let rhs = parse_primary stream in
                (* Okay, we know this is a binop. *)
                let rhs =
                  match Stream.peek stream with
                  | Some (Token.Kwd c2) ->
                      (* If BinOp binds less tightly with rhs than the operator after
                       * rhs, let the pending operator take rhs as its lhs. *)
                      let next_prec = precedence c2 in
                      if token_prec < next_prec
                      then parse_bin_rhs (token_prec + 1) rhs stream
                      else rhs
                  | _ -> rhs
                in
                (* Merge lhs/rhs. *)
                let lhs = Ast.Binary (c, lhs, rhs) in
                parse_bin_rhs expr_prec lhs stream
              end
          | _ -> lhs
        (* expression
         *   ::= primary binoprhs *)
        and parse_expr = parser
          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
        (* prototype
         *   ::= id '(' id* ')' *)
        let parse_prototype =
          let rec parse_args accumulator = parser
            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
            | [< >] -> accumulator
          in
          parser
          | [< 'Token.Ident id;
               'Token.Kwd '(' ?? "expected '(' in prototype";
               args=parse_args [];
               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
              (* success. *)
              Ast.Prototype (id, Array.of_list (List.rev args))
          | [< >] ->
              raise (Stream.Error "expected function name in prototype")
        (* definition ::= 'def' prototype expression *)
        let parse_definition = parser
          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
              Ast.Function (p, e)
        (* toplevelexpr ::= expression *)
        let parse_toplevel = parser
          | [< e=parse_expr >] ->
              (* Make an anonymous proto. *)
              Ast.Function (Ast.Prototype ("", [||]), e)
        (*  external ::= 'extern' prototype *)
        let parse_extern = parser
          | [< 'Token.Extern; e=parse_prototype >] -> e
 codegen.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Code Generation
         *===----------------------------------------------------------------------===*)
        open Llvm
        exception Error of string
        let context = global_context ()
        let the_module = create_module context "my cool jit"
        let builder = builder context
        let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
        let double_type = double_type context
        let rec codegen_expr = function
          | Ast.Number n -> const_float double_type n
          | Ast.Variable name ->
              (try Hashtbl.find named_values name with
                | Not_found -> raise (Error "unknown variable name"))
          | Ast.Binary (op, lhs, rhs) ->
              let lhs_val = codegen_expr lhs in
              let rhs_val = codegen_expr rhs in
              begin
                match op with
                | '+' -> build_add lhs_val rhs_val "addtmp" builder
                | '-' -> build_sub lhs_val rhs_val "subtmp" builder
                | '*' -> build_mul lhs_val rhs_val "multmp" builder
                | '<' ->
                    (* Convert bool 0/1 to double 0.0 or 1.0 *)
                    let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
                    build_uitofp i double_type "booltmp" builder
                | _ -> raise (Error "invalid binary operator")
              end
          | Ast.Call (callee, args) ->
              (* Look up the name in the module table. *)
              let callee =
                match lookup_function callee the_module with
                | Some callee -> callee
                | None -> raise (Error "unknown function referenced")
              in
              let params = params callee in
              (* If argument mismatch error. *)
              if Array.length params == Array.length args then () else
                raise (Error "incorrect # arguments passed");
              let args = Array.map codegen_expr args in
              build_call callee args "calltmp" builder
        let codegen_proto = function
          | Ast.Prototype (name, args) ->
              (* Make the function type: double(double,double) etc. *)
              let doubles = Array.make (Array.length args) double_type in
              let ft = function_type double_type doubles in
              let f =
                match lookup_function name the_module with
                | None -> declare_function name ft the_module
                (* If 'f' conflicted, there was already something named 'name'. If it
                 * has a body, don't allow redefinition or reextern. *)
                | Some f ->
                    (* If 'f' already has a body, reject this. *)
                    if block_begin f <> At_end f then
                      raise (Error "redefinition of function");
                    (* If 'f' took a different number of arguments, reject. *)
                    if element_type (type_of f) <> ft then
                      raise (Error "redefinition of function with different # args");
                    f
              in
              (* Set names for all arguments. *)
              Array.iteri (fun i a ->
                let n = args.(i) in
                set_value_name n a;
                Hashtbl.add named_values n a;
              ) (params f);
              f
        let codegen_func the_fpm = function
          | Ast.Function (proto, body) ->
              Hashtbl.clear named_values;
              let the_function = codegen_proto proto in
              (* Create a new basic block to start insertion into. *)
              let bb = append_block context "entry" the_function in
              position_at_end bb builder;
              try
                let ret_val = codegen_expr body in
                (* Finish off the function. *)
                let _ = build_ret ret_val builder in
                (* Validate the generated code, checking for consistency. *)
                Llvm_analysis.assert_valid_function the_function;
                (* Optimize the function. *)
                let _ = PassManager.run_function the_function the_fpm in
                the_function
              with e ->
                delete_function the_function;
                raise e
 toplevel.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Top-Level parsing and JIT Driver
         *===----------------------------------------------------------------------===*)
        open Llvm
        open Llvm_executionengine
        (* top ::= definition | external | expression | ';' *)
        let rec main_loop the_fpm the_execution_engine stream =
          match Stream.peek stream with
          | None -> ()
          (* ignore top-level semicolons. *)
          | Some (Token.Kwd ';') ->
              Stream.junk stream;
              main_loop the_fpm the_execution_engine stream
          | Some token ->
              begin
                try match token with
                | Token.Def ->
                    let e = Parser.parse_definition stream in
                    print_endline "parsed a function definition.";
                    dump_value (Codegen.codegen_func the_fpm e);
                | Token.Extern ->
                    let e = Parser.parse_extern stream in
                    print_endline "parsed an extern.";
                    dump_value (Codegen.codegen_proto e);
                | _ ->
                    (* Evaluate a top-level expression into an anonymous function. *)
                    let e = Parser.parse_toplevel stream in
                    print_endline "parsed a top-level expr";
                    let the_function = Codegen.codegen_func the_fpm e in
                    dump_value the_function;
                    (* JIT the function, returning a function pointer. *)
                    let result = ExecutionEngine.run_function the_function [||]
                      the_execution_engine in
                    print_string "Evaluated to ";
                    print_float (GenericValue.as_float Codegen.double_type result);
                    print_newline ();
                with Stream.Error s | Codegen.Error s ->
                  (* Skip token for error recovery. *)
                  Stream.junk stream;
                  print_endline s;
              end;
              print_string "ready> "; flush stdout;
              main_loop the_fpm the_execution_engine stream
 toy.ml:
    .. code-block:: ocaml
        (*===----------------------------------------------------------------------===
         * Main driver code.
         *===----------------------------------------------------------------------===*)
        open Llvm
        open Llvm_executionengine
        open Llvm_target
        open Llvm_scalar_opts
        let main () =
          ignore (initialize_native_target ());
          (* Install standard binary operators.
           * 1 is the lowest precedence. *)
          Hashtbl.add Parser.binop_precedence '<' 10;
          Hashtbl.add Parser.binop_precedence '+' 20;
          Hashtbl.add Parser.binop_precedence '-' 20;
          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
          (* Prime the first token. *)
          print_string "ready> "; flush stdout;
          let stream = Lexer.lex (Stream.of_channel stdin) in
          (* Create the JIT. *)
          let the_execution_engine = ExecutionEngine.create Codegen.the_module in
          let the_fpm = PassManager.create_function Codegen.the_module in
          (* Set up the optimizer pipeline.  Start with registering info about how the
           * target lays out data structures. *)
          DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
          (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
          add_instruction_combination the_fpm;
          (* reassociate expressions. *)
          add_reassociation the_fpm;
          (* Eliminate Common SubExpressions. *)
          add_gvn the_fpm;
          (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
          add_cfg_simplification the_fpm;
          ignore (PassManager.initialize the_fpm);
          (* Run the main "interpreter loop" now. *)
          Toplevel.main_loop the_fpm the_execution_engine stream;
          (* Print out all the generated code. *)
          dump_module Codegen.the_module
        ;;
        main ()
 bindings.c
    .. code-block:: c
        #include <stdio.h>
        /* putchard - putchar that takes a double and returns 0. */
        extern double putchard(double X) {
          putchar((char)X);
          return 0;
        }
 `Next: Extending the language: control flow <OCamlLangImpl5.html>`_
--- a/docs/tutorial/OCamlLangImpl5.html
+++ b/docs/tutorial/OCamlLangImpl5.html
--- a/docs/tutorial/OCamlLangImpl5.rst
+++ b/docs/tutorial/OCamlLangImpl5.rst
--- a/docs/tutorial/OCamlLangImpl6.html
+++ b/docs/tutorial/OCamlLangImpl6.html
--- a/docs/tutorial/OCamlLangImpl6.rst
+++ b/docs/tutorial/OCamlLangImpl6.rst
--- a/docs/tutorial/OCamlLangImpl7.html
+++ b/docs/tutorial/OCamlLangImpl7.html
--- a/docs/tutorial/OCamlLangImpl7.rst
+++ b/docs/tutorial/OCamlLangImpl7.rst
--- a/docs/tutorial/OCamlLangImpl8.html
+++ b/docs/tutorial/OCamlLangImpl8.html
@ -1,359 +0,0 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
  <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="author" content="Chris Lattner">
  <link rel="stylesheet" href="../_static/llvm.css" type="text/css">
 </head>
 <body>
 <h1>Kaleidoscope: Conclusion and other useful LLVM tidbits</h1>
 <ul>
 <li><a href="index.html">Up to Tutorial Index</a></li>
 <li>Chapter 8
  <ol>
    <li><a href="#conclusion">Tutorial Conclusion</a></li>
    <li><a href="#llvmirproperties">Properties of LLVM IR</a>
    <ul>
      <li><a href="#targetindep">Target Independence</a></li>
      <li><a href="#safety">Safety Guarantees</a></li>
      <li><a href="#langspecific">Language-Specific Optimizations</a></li>
    </ul>
    </li>
    <li><a href="#tipsandtricks">Tips and Tricks</a>
    <ul>
      <li><a href="#offsetofsizeof">Implementing portable 
                                    offsetof/sizeof</a></li>
      <li><a href="#gcstack">Garbage Collected Stack Frames</a></li>
    </ul>
    </li>
  </ol>
 </li>
 </ul>
 <div class="doc_author">
  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="conclusion">Tutorial Conclusion</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>Welcome to the final chapter of the "<a href="index.html">Implementing a
 language with LLVM</a>" tutorial.  In the course of this tutorial, we have grown
 our little Kaleidoscope language from being a useless toy, to being a
 semi-interesting (but probably still useless) toy. :)</p>
 <p>It is interesting to see how far we've come, and how little code it has
 taken.  We built the entire lexer, parser, AST, code generator, and an 
 interactive run-loop (with a JIT!) by-hand in under 700 lines of
 (non-comment/non-blank) code.</p>
 <p>Our little language supports a couple of interesting features: it supports
 user defined binary and unary operators, it uses JIT compilation for immediate
 evaluation, and it supports a few control flow constructs with SSA construction.
 </p>
 <p>Part of the idea of this tutorial was to show you how easy and fun it can be
 to define, build, and play with languages.  Building a compiler need not be a
 scary or mystical process!  Now that you've seen some of the basics, I strongly
 encourage you to take the code and hack on it.  For example, try adding:</p>
 <ul>
 <li><b>global variables</b> - While global variables have questional value in
 modern software engineering, they are often useful when putting together quick
 little hacks like the Kaleidoscope compiler itself.  Fortunately, our current
 setup makes it very easy to add global variables: just have value lookup check
 to see if an unresolved variable is in the global variable symbol table before
 rejecting it.  To create a new global variable, make an instance of the LLVM
 <tt>GlobalVariable</tt> class.</li>
 <li><b>typed variables</b> - Kaleidoscope currently only supports variables of
 type double.  This gives the language a very nice elegance, because only
 supporting one type means that you never have to specify types.  Different
 languages have different ways of handling this.  The easiest way is to require
 the user to specify types for every variable definition, and record the type
 of the variable in the symbol table along with its Value*.</li>
 <li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start
 extending the type system in all sorts of interesting ways.  Simple arrays are
 very easy and are quite useful for many different applications.  Adding them is
 mostly an exercise in learning how the LLVM <a 
 href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it
 is so nifty/unconventional, it <a 
 href="../GetElementPtr.html">has its own FAQ</a>!  If you add support
 for recursive types (e.g. linked lists), make sure to read the <a 
 href="../ProgrammersManual.html#TypeResolve">section in the LLVM
 Programmer's Manual</a> that describes how to construct them.</li>
 <li><b>standard runtime</b> - Our current language allows the user to access
 arbitrary external functions, and we use it for things like "printd" and
 "putchard".  As you extend the language to add higher-level constructs, often
 these constructs make the most sense if they are lowered to calls into a
 language-supplied runtime.  For example, if you add hash tables to the language,
 it would probably make sense to add the routines to a runtime, instead of 
 inlining them all the way.</li>
 <li><b>memory management</b> - Currently we can only access the stack in
 Kaleidoscope.  It would also be useful to be able to allocate heap memory,
 either with calls to the standard libc malloc/free interface or with a garbage
 collector.  If you would like to use garbage collection, note that LLVM fully
 supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a>
 including algorithms that move objects and need to scan/update the stack.</li>
 <li><b>debugger support</b> - LLVM supports generation of <a 
 href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by
 common debuggers like GDB.  Adding support for debug info is fairly 
 straightforward.  The best way to understand it is to compile some C/C++ code
 with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li>
 <li><b>exception handling support</b> - LLVM supports generation of <a 
 href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate
 with code compiled in other languages.  You could also generate code by
 implicitly making every function return an error value and checking it.  You 
 could also make explicit use of setjmp/longjmp.  There are many different ways
 to go here.</li>
 <li><b>object orientation, generics, database access, complex numbers,
 geometric programming, ...</b> - Really, there is
 no end of crazy features that you can add to the language.</li>
 <li><b>unusual domains</b> - We've been talking about applying LLVM to a domain
 that many people are interested in: building a compiler for a specific language.
 However, there are many other domains that can use compiler technology that are
 not typically considered.  For example, LLVM has been used to implement OpenGL
 graphics acceleration, translate C++ code to ActionScript, and many other
 cute and clever things.  Maybe you will be the first to JIT compile a regular
 expression interpreter into native code with LLVM?</li>
 </ul>
 <p>
 Have fun - try doing something crazy and unusual.  Building a language like
 everyone else always has, is much less fun than trying something a little crazy
 or off the wall and seeing how it turns out.  If you get stuck or want to talk
 about it, feel free to email the <a 
 href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing 
 list</a>: it has lots of people who are interested in languages and are often
 willing to help out.
 </p>
 <p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating
 LLVM IR.  These are some of the more subtle things that may not be obvious, but
 are very useful if you want to take advantage of LLVM's capabilities.</p>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="llvmirproperties">Properties of the LLVM IR</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>We have a couple common questions about code in the LLVM IR form - lets just
 get these out of the way right now, shall we?</p>
 <!-- ======================================================================= -->
 <h4><a name="targetindep">Target Independence</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Kaleidoscope is an example of a "portable language": any program written in
 Kaleidoscope will work the same way on any target that it runs on.  Many other
 languages have this property, e.g. lisp, java, haskell, javascript, python, etc
 (note that while these languages are portable, not all their libraries are).</p>
 <p>One nice aspect of LLVM is that it is often capable of preserving target
 independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled 
 program and run it on any target that LLVM supports, even emitting C code and
 compiling that on targets that LLVM doesn't support natively.  You can trivially
 tell that the Kaleidoscope compiler generates target-independent code because it
 never queries for any target-specific information when generating code.</p>
 <p>The fact that LLVM provides a compact, target-independent, representation for
 code gets a lot of people excited.  Unfortunately, these people are usually
 thinking about C or a language from the C family when they are asking questions
 about language portability.  I say "unfortunately", because there is really no
 way to make (fully general) C code portable, other than shipping the source code
 around (and of course, C source code is not actually portable in general
 either - ever port a really old application from 32- to 64-bits?).</p>
 <p>The problem with C (again, in its full generality) is that it is heavily
 laden with target specific assumptions.  As one simple example, the preprocessor
 often destructively removes target-independence from the code when it processes
 the input text:</p>
 <div class="doc_code">
 <pre>
 #ifdef __i386__
  int X = 1;
 #else
  int X = 42;
 #endif
 </pre>
 </div>
 <p>While it is possible to engineer more and more complex solutions to problems
 like this, it cannot be solved in full generality in a way that is better than shipping
 the actual source code.</p>
 <p>That said, there are interesting subsets of C that can be made portable.  If
 you are willing to fix primitive types to a fixed size (say int = 32-bits, 
 and long = 64-bits), don't care about ABI compatibility with existing binaries,
 and are willing to give up some other minor features, you can have portable
 code.  This can make sense for specialized domains such as an
 in-kernel language.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="safety">Safety Guarantees</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Many of the languages above are also "safe" languages: it is impossible for
 a program written in Java to corrupt its address space and crash the process
 (assuming the JVM has no bugs).
 Safety is an interesting property that requires a combination of language
 design, runtime support, and often operating system support.</p>
 <p>It is certainly possible to implement a safe language in LLVM, but LLVM IR
 does not itself guarantee safety.  The LLVM IR allows unsafe pointer casts,
 use after free bugs, buffer over-runs, and a variety of other problems.  Safety
 needs to be implemented as a layer on top of LLVM and, conveniently, several
 groups have investigated this.  Ask on the <a 
 href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing 
 list</a> if you are interested in more details.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="langspecific">Language-Specific Optimizations</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>One thing about LLVM that turns off many people is that it does not solve all
 the world's problems in one system (sorry 'world hunger', someone else will have
 to solve you some other day).  One specific complaint is that people perceive
 LLVM as being incapable of performing high-level language-specific optimization:
 LLVM "loses too much information".</p>
 <p>Unfortunately, this is really not the place to give you a full and unified
 version of "Chris Lattner's theory of compiler design".  Instead, I'll make a
 few observations:</p>
 <p>First, you're right that LLVM does lose information.  For example, as of this
 writing, there is no way to distinguish in the LLVM IR whether an SSA-value came
 from a C "int" or a C "long" on an ILP32 machine (other than debug info).  Both
 get compiled down to an 'i32' value and the information about what it came from
 is lost.  The more general issue here, is that the LLVM type system uses
 "structural equivalence" instead of "name equivalence".  Another place this
 surprises people is if you have two types in a high-level language that have the
 same structure (e.g. two different structs that have a single int field): these
 types will compile down into a single LLVM type and it will be impossible to
 tell what it came from.</p>
 <p>Second, while LLVM does lose information, LLVM is not a fixed target: we 
 continue to enhance and improve it in many different ways.  In addition to
 adding new features (LLVM did not always support exceptions or debug info), we
 also extend the IR to capture important information for optimization (e.g.
 whether an argument is sign or zero extended, information about pointers
 aliasing, etc).  Many of the enhancements are user-driven: people want LLVM to
 include some specific feature, so they go ahead and extend it.</p>
 <p>Third, it is <em>possible and easy</em> to add language-specific
 optimizations, and you have a number of choices in how to do it.  As one trivial
 example, it is easy to add language-specific optimization passes that
 "know" things about code compiled for a language.  In the case of the C family,
 there is an optimization pass that "knows" about the standard C library
 functions.  If you call "exit(0)" in main(), it knows that it is safe to
 optimize that into "return 0;" because C specifies what the 'exit'
 function does.</p>
 <p>In addition to simple library knowledge, it is possible to embed a variety of
 other language-specific information into the LLVM IR.  If you have a specific
 need and run into a wall, please bring the topic up on the llvmdev list.  At the
 very worst, you can always treat LLVM as if it were a "dumb code generator" and
 implement the high-level optimizations you desire in your front-end, on the
 language-specific AST.
 </p>
 </div>
 </div>
 <!-- *********************************************************************** -->
 <h2><a name="tipsandtricks">Tips and Tricks</a></h2>
 <!-- *********************************************************************** -->
 <div>
 <p>There is a variety of useful tips and tricks that you come to know after
 working on/with LLVM that aren't obvious at first glance.  Instead of letting
 everyone rediscover them, this section talks about some of these issues.</p>
 <!-- ======================================================================= -->
 <h4><a name="offsetofsizeof">Implementing portable offsetof/sizeof</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>One interesting thing that comes up, if you are trying to keep the code 
 generated by your compiler "target independent", is that you often need to know
 the size of some LLVM type or the offset of some field in an llvm structure.
 For example, you might need to pass the size of a type into a function that
 allocates memory.</p>
 <p>Unfortunately, this can vary widely across targets: for example the width of
 a pointer is trivially target-specific.  However, there is a <a 
 href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever
 way to use the getelementptr instruction</a> that allows you to compute this
 in a portable way.</p>
 </div>
 <!-- ======================================================================= -->
 <h4><a name="gcstack">Garbage Collected Stack Frames</a></h4>
 <!-- ======================================================================= -->
 <div>
 <p>Some languages want to explicitly manage their stack frames, often so that
 they are garbage collected or to allow easy implementation of closures.  There
 are often better ways to implement these features than explicit stack frames,
 but <a 
 href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM
 does support them,</a> if you want.  It requires your front-end to convert the
 code into <a 
 href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation
 Passing Style</a> and the use of tail calls (which LLVM also supports).</p>
 </div>
 </div>
 <!-- *********************************************************************** -->
 <hr>
 <address>
  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
  <a href="http://validator.w3.org/check/referer"><img
  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
  <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
  Last modified: $Date$
 </address>
 </body>
 </html>
--- a/docs/tutorial/OCamlLangImpl8.rst
+++ b/docs/tutorial/OCamlLangImpl8.rst
@ -0,0 +1,269 @@
 ======================================================
 Kaleidoscope: Conclusion and other useful LLVM tidbits
 ======================================================
 .. contents::
   :local:
 Written by `Chris Lattner <mailto:sabre@nondot.org>`_
 Tutorial Conclusion
 ===================
 Welcome to the final chapter of the "`Implementing a language with
 LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
 grown our little Kaleidoscope language from being a useless toy, to
 being a semi-interesting (but probably still useless) toy. :)
 It is interesting to see how far we've come, and how little code it has
 taken. We built the entire lexer, parser, AST, code generator, and an
 interactive run-loop (with a JIT!) by-hand in under 700 lines of
 (non-comment/non-blank) code.
 Our little language supports a couple of interesting features: it
 supports user defined binary and unary operators, it uses JIT
 compilation for immediate evaluation, and it supports a few control flow
 constructs with SSA construction.
 Part of the idea of this tutorial was to show you how easy and fun it
 can be to define, build, and play with languages. Building a compiler
 need not be a scary or mystical process! Now that you've seen some of
 the basics, I strongly encourage you to take the code and hack on it.
 For example, try adding:
 -  **global variables** - While global variables have questional value
   in modern software engineering, they are often useful when putting
   together quick little hacks like the Kaleidoscope compiler itself.
   Fortunately, our current setup makes it very easy to add global
   variables: just have value lookup check to see if an unresolved
   variable is in the global variable symbol table before rejecting it.
   To create a new global variable, make an instance of the LLVM
   ``GlobalVariable`` class.
 -  **typed variables** - Kaleidoscope currently only supports variables
   of type double. This gives the language a very nice elegance, because
   only supporting one type means that you never have to specify types.
   Different languages have different ways of handling this. The easiest
   way is to require the user to specify types for every variable
   definition, and record the type of the variable in the symbol table
   along with its Value\*.
 -  **arrays, structs, vectors, etc** - Once you add types, you can start
   extending the type system in all sorts of interesting ways. Simple
   arrays are very easy and are quite useful for many different
   applications. Adding them is mostly an exercise in learning how the
   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
   works: it is so nifty/unconventional, it `has its own
   FAQ <../GetElementPtr.html>`_! If you add support for recursive types
   (e.g. linked lists), make sure to read the `section in the LLVM
   Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
   describes how to construct them.
 -  **standard runtime** - Our current language allows the user to access
   arbitrary external functions, and we use it for things like "printd"
   and "putchard". As you extend the language to add higher-level
   constructs, often these constructs make the most sense if they are
   lowered to calls into a language-supplied runtime. For example, if
   you add hash tables to the language, it would probably make sense to
   add the routines to a runtime, instead of inlining them all the way.
 -  **memory management** - Currently we can only access the stack in
   Kaleidoscope. It would also be useful to be able to allocate heap
   memory, either with calls to the standard libc malloc/free interface
   or with a garbage collector. If you would like to use garbage
   collection, note that LLVM fully supports `Accurate Garbage
   Collection <../GarbageCollection.html>`_ including algorithms that
   move objects and need to scan/update the stack.
 -  **debugger support** - LLVM supports generation of `DWARF Debug
   info <../SourceLevelDebugging.html>`_ which is understood by common
   debuggers like GDB. Adding support for debug info is fairly
   straightforward. The best way to understand it is to compile some
   C/C++ code with "``llvm-gcc -g -O0``" and taking a look at what it
   produces.
 -  **exception handling support** - LLVM supports generation of `zero
   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
   code compiled in other languages. You could also generate code by
   implicitly making every function return an error value and checking
   it. You could also make explicit use of setjmp/longjmp. There are
   many different ways to go here.
 -  **object orientation, generics, database access, complex numbers,
   geometric programming, ...** - Really, there is no end of crazy
   features that you can add to the language.
 -  **unusual domains** - We've been talking about applying LLVM to a
   domain that many people are interested in: building a compiler for a
   specific language. However, there are many other domains that can use
   compiler technology that are not typically considered. For example,
   LLVM has been used to implement OpenGL graphics acceleration,
   translate C++ code to ActionScript, and many other cute and clever
   things. Maybe you will be the first to JIT compile a regular
   expression interpreter into native code with LLVM?
 Have fun - try doing something crazy and unusual. Building a language
 like everyone else always has, is much less fun than trying something a
 little crazy or off the wall and seeing how it turns out. If you get
 stuck or want to talk about it, feel free to email the `llvmdev mailing
 list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
 of people who are interested in languages and are often willing to help
 out.
 Before we end this tutorial, I want to talk about some "tips and tricks"
 for generating LLVM IR. These are some of the more subtle things that
 may not be obvious, but are very useful if you want to take advantage of
 LLVM's capabilities.
 Properties of the LLVM IR
 =========================
 We have a couple common questions about code in the LLVM IR form - lets
 just get these out of the way right now, shall we?
 Target Independence
 -------------------
 Kaleidoscope is an example of a "portable language": any program written
 in Kaleidoscope will work the same way on any target that it runs on.
 Many other languages have this property, e.g. lisp, java, haskell,
 javascript, python, etc (note that while these languages are portable,
 not all their libraries are).
 One nice aspect of LLVM is that it is often capable of preserving target
 independence in the IR: you can take the LLVM IR for a
 Kaleidoscope-compiled program and run it on any target that LLVM
 supports, even emitting C code and compiling that on targets that LLVM
 doesn't support natively. You can trivially tell that the Kaleidoscope
 compiler generates target-independent code because it never queries for
 any target-specific information when generating code.
 The fact that LLVM provides a compact, target-independent,
 representation for code gets a lot of people excited. Unfortunately,
 these people are usually thinking about C or a language from the C
 family when they are asking questions about language portability. I say
 "unfortunately", because there is really no way to make (fully general)
 C code portable, other than shipping the source code around (and of
 course, C source code is not actually portable in general either - ever
 port a really old application from 32- to 64-bits?).
 The problem with C (again, in its full generality) is that it is heavily
 laden with target specific assumptions. As one simple example, the
 preprocessor often destructively removes target-independence from the
 code when it processes the input text:
 .. code-block:: c
    #ifdef __i386__
      int X = 1;
    #else
      int X = 42;
    #endif
 While it is possible to engineer more and more complex solutions to
 problems like this, it cannot be solved in full generality in a way that
 is better than shipping the actual source code.
 That said, there are interesting subsets of C that can be made portable.
 If you are willing to fix primitive types to a fixed size (say int =
 32-bits, and long = 64-bits), don't care about ABI compatibility with
 existing binaries, and are willing to give up some other minor features,
 you can have portable code. This can make sense for specialized domains
 such as an in-kernel language.
 Safety Guarantees
 -----------------
 Many of the languages above are also "safe" languages: it is impossible
 for a program written in Java to corrupt its address space and crash the
 process (assuming the JVM has no bugs). Safety is an interesting
 property that requires a combination of language design, runtime
 support, and often operating system support.
 It is certainly possible to implement a safe language in LLVM, but LLVM
 IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
 casts, use after free bugs, buffer over-runs, and a variety of other
 problems. Safety needs to be implemented as a layer on top of LLVM and,
 conveniently, several groups have investigated this. Ask on the `llvmdev
 mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
 you are interested in more details.
 Language-Specific Optimizations
 -------------------------------
 One thing about LLVM that turns off many people is that it does not
 solve all the world's problems in one system (sorry 'world hunger',
 someone else will have to solve you some other day). One specific
 complaint is that people perceive LLVM as being incapable of performing
 high-level language-specific optimization: LLVM "loses too much
 information".
 Unfortunately, this is really not the place to give you a full and
 unified version of "Chris Lattner's theory of compiler design". Instead,
 I'll make a few observations:
 First, you're right that LLVM does lose information. For example, as of
 this writing, there is no way to distinguish in the LLVM IR whether an
 SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
 than debug info). Both get compiled down to an 'i32' value and the
 information about what it came from is lost. The more general issue
 here, is that the LLVM type system uses "structural equivalence" instead
 of "name equivalence". Another place this surprises people is if you
 have two types in a high-level language that have the same structure
 (e.g. two different structs that have a single int field): these types
 will compile down into a single LLVM type and it will be impossible to
 tell what it came from.
 Second, while LLVM does lose information, LLVM is not a fixed target: we
 continue to enhance and improve it in many different ways. In addition
 to adding new features (LLVM did not always support exceptions or debug
 info), we also extend the IR to capture important information for
 optimization (e.g. whether an argument is sign or zero extended,
 information about pointers aliasing, etc). Many of the enhancements are
 user-driven: people want LLVM to include some specific feature, so they
 go ahead and extend it.
 Third, it is *possible and easy* to add language-specific optimizations,
 and you have a number of choices in how to do it. As one trivial
 example, it is easy to add language-specific optimization passes that
 "know" things about code compiled for a language. In the case of the C
 family, there is an optimization pass that "knows" about the standard C
 library functions. If you call "exit(0)" in main(), it knows that it is
 safe to optimize that into "return 0;" because C specifies what the
 'exit' function does.
 In addition to simple library knowledge, it is possible to embed a
 variety of other language-specific information into the LLVM IR. If you
 have a specific need and run into a wall, please bring the topic up on
 the llvmdev list. At the very worst, you can always treat LLVM as if it
 were a "dumb code generator" and implement the high-level optimizations
 you desire in your front-end, on the language-specific AST.
 Tips and Tricks
 ===============
 There is a variety of useful tips and tricks that you come to know after
 working on/with LLVM that aren't obvious at first glance. Instead of
 letting everyone rediscover them, this section talks about some of these
 issues.
 Implementing portable offsetof/sizeof
 -------------------------------------
 One interesting thing that comes up, if you are trying to keep the code
 generated by your compiler "target independent", is that you often need
 to know the size of some LLVM type or the offset of some field in an
 llvm structure. For example, you might need to pass the size of a type
 into a function that allocates memory.
 Unfortunately, this can vary widely across targets: for example the
 width of a pointer is trivially target-specific. However, there is a
 `clever way to use the getelementptr
 instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
 that allows you to compute this in a portable way.
 Garbage Collected Stack Frames
 ------------------------------
 Some languages want to explicitly manage their stack frames, often so
 that they are garbage collected or to allow easy implementation of
 closures. There are often better ways to implement these features than
 explicit stack frames, but `LLVM does support
 them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
 if you want. It requires your front-end to convert the code into
 `Continuation Passing
 Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
 the use of tail calls (which LLVM also supports).
--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@ -1,36 +1,30 @@
 ================================
 LLVM Tutorial: Table of Contents
 ================================
-.. TODO:: Use Sphinx toctree once all of these pages are converted.
+Kaleidoscope: Implementing a Language with LLVM
 ===============================================
-#. Kaleidoscope: Implementing a Language with LLVM
+.. toctree::
   :titlesonly:
   :glob:
   :numbered:
-   #. `Tutorial Introduction and the Lexer <LangImpl1.html>`__
+   LangImpl*
   #. `Implementing a Parser and AST <LangImpl2.html>`__
   #. `Implementing Code Generation to LLVM IR <LangImpl3.html>`__
   #. `Adding JIT and Optimizer Support <LangImpl4.html>`__
   #. `Extending the language: control flow <LangImpl5.html>`__
   #. `Extending the language: user-defined operators <LangImpl6.html>`__
   #. `Extending the language: mutable variables / SSA
      construction <LangImpl7.html>`__
   #. `Conclusion and other useful LLVM tidbits <LangImpl8.html>`__
-#. Kaleidoscope: Implementing a Language with LLVM in Objective Caml
+Kaleidoscope: Implementing a Language with LLVM in Objective Caml
 =================================================================
-   #. `Tutorial Introduction and the Lexer <OCamlLangImpl1.html>`__
+.. toctree::
-   #. `Implementing a Parser and AST <OCamlLangImpl2.html>`__
+   :titlesonly:
-   #. `Implementing Code Generation to LLVM IR <OCamlLangImpl3.html>`__
+   :glob:
-   #. `Adding JIT and Optimizer Support <OCamlLangImpl4.html>`__
+   :numbered:
   #. `Extending the language: control flow <OCamlLangImpl5.html>`__
   #. `Extending the language: user-defined
      operators <OCamlLangImpl6.html>`__
   #. `Extending the language: mutable variables / SSA
      construction <OCamlLangImpl7.html>`__
   #. `Conclusion and other useful LLVM tidbits <OCamlLangImpl8.html>`__
-#. Advanced Topics
+   OCamlLangImpl*
   #. `Writing an Optimization for
      LLVM <http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html>`_
 Advanced Topics
 ===============
 #. `Writing an Optimization for LLVM <http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html>`_