Fix some documentation for the tutorial.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48966 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Erick Tryzelaar 2008-03-30 19:14:31 +00:00
parent 8dd6505868
commit d564686dff
2 changed files with 40 additions and 40 deletions

View File

@ -219,15 +219,15 @@ type token =
</div>
<p>Each token returned by our lexer will be one of the token variant values.
An unknown character like '+' will be returned as <tt>Kwd '+'</tt>. If the
curr token is an identifier, the value will be <tt>Ident s</tt>. If the
current token is a numeric literal (like 1.0), the value will be
<tt>Number 1.0</tt>.
An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>. If
the curr token is an identifier, the value will be <tt>Token.Ident s</tt>. If
the current token is a numeric literal (like 1.0), the value will be
<tt>Token.Number 1.0</tt>.
</p>
<p>The actual implementation of the lexer is a collection of functions driven
by a function named <tt>lex</tt>. The <tt>lex</tt> function is called to
return the next token from standard input. We will use
by a function named <tt>Lexer.lex</tt>. The <tt>Lexer.lex</tt> function is
called to return the next token from standard input. We will use
<a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a>
to simplify the tokenization of the standard input. Its definition starts
as:</p>
@ -245,13 +245,13 @@ let rec lex = parser
</div>
<p>
<tt>lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
<tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
characters one at a time from the standard input. It eats them as it recognizes
them and stores them in in a <tt>token</tt> variant. The first thing that it
has to do is ignore whitespace between tokens. This is accomplished with the
them and stores them in in a <tt>Token.token</tt> variant. The first thing that
it has to do is ignore whitespace between tokens. This is accomplished with the
recursive call above.</p>
<p>The next thing <tt>lex</tt> needs to do is recognize identifiers and
<p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and
specific keywords like "def". Kaleidoscope does this with this a pattern match
and a helper function.<p>
@ -300,8 +300,8 @@ and lex_number buffer = parser
<p>This is all pretty straight-forward code for processing input. When reading
a numeric value from input, we use the ocaml <tt>float_of_string</tt> function
to convert it to a numeric value that we store in <tt>NumVal</tt>. Note that
this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
to convert it to a numeric value that we store in <tt>Token.Number</tt>. Note
that this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
if the string "1.23.45.67". Feel free to extend it :). Next we handle
comments:
</p>

View File

@ -240,13 +240,13 @@ error"</tt>, where if the token before the <tt>??</tt> does not match, then
<tt>Stream.Error "parse error"</tt> will be raised.</p>
<p>2) Another interesting aspect of this function is that it uses recursion by
calling <tt>parse_primary</tt> (we will soon see that <tt>parse_primary</tt> can
call <tt>parse_primary</tt>). This is powerful because it allows us to handle
recursive grammars, and keeps each production very simple. Note that
parentheses do not cause construction of AST nodes themselves. While we could
do it this way, the most important role of parentheses are to guide the parser
and provide grouping. Once the parser constructs the AST, parentheses are not
needed.</p>
calling <tt>Parser.parse_primary</tt> (we will soon see that
<tt>Parser.parse_primary</tt> can call <tt>Parser.parse_primary</tt>). This is
powerful because it allows us to handle recursive grammars, and keeps each
production very simple. Note that parentheses do not cause construction of AST
nodes themselves. While we could do it this way, the most important role of
parentheses are to guide the parser and provide grouping. Once the parser
constructs the AST, parentheses are not needed.</p>
<p>The next simple production is for handling variable references and function
calls:</p>
@ -345,12 +345,12 @@ let main () =
<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
(this can obviously be extended by you, our brave and intrepid reader). The
<tt>precedence</tt> function returns the precedence for the current token,
or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt> makes
it easy to add new operators and makes it clear that the algorithm doesn't
<tt>Parser.precedence</tt> function returns the precedence for the current
token, or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt>
makes it easy to add new operators and makes it clear that the algorithm doesn't
depend on the specific operators involved, but it would be easy enough to
eliminate the <tt>Hashtbl.t</tt> and do the comparisons in the
<tt>precedence</tt> function. (Or just use a fixed-size array).</p>
<tt>Parser.precedence</tt> function. (Or just use a fixed-size array).</p>
<p>With the helper above defined, we can now start parsing binary expressions.
The basic idea of operator precedence parsing is to break down an expression
@ -376,19 +376,19 @@ and parse_expr = parser
</pre>
</div>
<p><tt>parse_bin_rhs</tt> is the function that parses the sequence of pairs for
us. It takes a precedence and a pointer to an expression for the part that has been
parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
allowed to be empty, in which case it returns the expression that is passed into
it. In our example above, the code passes the expression for "a" into
<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
<p><tt>Parser.parse_bin_rhs</tt> is the function that parses the sequence of
pairs for us. It takes a precedence and a pointer to an expression for the part
that has been parsed so far. Note that "x" is a perfectly valid expression: As
such, "binoprhs" is allowed to be empty, in which case it returns the expression
that is passed into it. In our example above, the code passes the expression for
"a" into <tt>Parser.parse_bin_rhs</tt> and the current token is "+".</p>
<p>The precedence value passed into <tt>parse_bin_rhs</tt> indicates the <em>
minimal operator precedence</em> that the function is allowed to eat. For
example, if the current pair stream is [+, x] and <tt>parse_bin_rhs</tt> is
passed in a precedence of 40, it will not consume any tokens (because the
precedence of '+' is only 20). With this in mind, <tt>parse_bin_rhs</tt> starts
with:</p>
<p>The precedence value passed into <tt>Parser.parse_bin_rhs</tt> indicates the
<em>minimal operator precedence</em> that the function is allowed to eat. For
example, if the current pair stream is [+, x] and <tt>Parser.parse_bin_rhs</tt>
is passed in a precedence of 40, it will not consume any tokens (because the
precedence of '+' is only 20). With this in mind, <tt>Parser.parse_bin_rhs</tt>
starts with:</p>
<div class="doc_code">
<pre>
@ -497,10 +497,10 @@ context):</p>
has higher precedence than the binop we are currently parsing. As such, we know
that any sequence of pairs whose operators are all higher precedence than "+"
should be parsed together and returned as "RHS". To do this, we recursively
invoke the <tt>parse_bin_rhs</tt> function specifying "token_prec+1" as the
minimum precedence required for it to continue. In our example above, this will
cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the
RHS of the '+' expression.</p>
invoke the <tt>Parser.parse_bin_rhs</tt> function specifying "token_prec+1" as
the minimum precedence required for it to continue. In our example above, this
will cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set
as the RHS of the '+' expression.</p>
<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
and added to the AST. With this little bit of code (14 non-trivial lines), we
@ -705,7 +705,7 @@ course.) To build this, just compile with:</p>
# Compile
ocamlbuild toy.byte
# Run
./toy
./toy.byte
</pre>
</div>