mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2024-12-13 04:30:23 +00:00
Fix some documentation for the tutorial.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48966 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
8dd6505868
commit
d564686dff
@ -219,15 +219,15 @@ type token =
|
||||
</div>
|
||||
|
||||
<p>Each token returned by our lexer will be one of the token variant values.
|
||||
An unknown character like '+' will be returned as <tt>Kwd '+'</tt>. If the
|
||||
curr token is an identifier, the value will be <tt>Ident s</tt>. If the
|
||||
current token is a numeric literal (like 1.0), the value will be
|
||||
<tt>Number 1.0</tt>.
|
||||
An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>. If
|
||||
the curr token is an identifier, the value will be <tt>Token.Ident s</tt>. If
|
||||
the current token is a numeric literal (like 1.0), the value will be
|
||||
<tt>Token.Number 1.0</tt>.
|
||||
</p>
|
||||
|
||||
<p>The actual implementation of the lexer is a collection of functions driven
|
||||
by a function named <tt>lex</tt>. The <tt>lex</tt> function is called to
|
||||
return the next token from standard input. We will use
|
||||
by a function named <tt>Lexer.lex</tt>. The <tt>Lexer.lex</tt> function is
|
||||
called to return the next token from standard input. We will use
|
||||
<a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a>
|
||||
to simplify the tokenization of the standard input. Its definition starts
|
||||
as:</p>
|
||||
@ -245,13 +245,13 @@ let rec lex = parser
|
||||
</div>
|
||||
|
||||
<p>
|
||||
<tt>lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
|
||||
<tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
|
||||
characters one at a time from the standard input. It eats them as it recognizes
|
||||
them and stores them in in a <tt>token</tt> variant. The first thing that it
|
||||
has to do is ignore whitespace between tokens. This is accomplished with the
|
||||
them and stores them in in a <tt>Token.token</tt> variant. The first thing that
|
||||
it has to do is ignore whitespace between tokens. This is accomplished with the
|
||||
recursive call above.</p>
|
||||
|
||||
<p>The next thing <tt>lex</tt> needs to do is recognize identifiers and
|
||||
<p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and
|
||||
specific keywords like "def". Kaleidoscope does this with this a pattern match
|
||||
and a helper function.<p>
|
||||
|
||||
@ -300,8 +300,8 @@ and lex_number buffer = parser
|
||||
|
||||
<p>This is all pretty straight-forward code for processing input. When reading
|
||||
a numeric value from input, we use the ocaml <tt>float_of_string</tt> function
|
||||
to convert it to a numeric value that we store in <tt>NumVal</tt>. Note that
|
||||
this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
|
||||
to convert it to a numeric value that we store in <tt>Token.Number</tt>. Note
|
||||
that this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
|
||||
if the string "1.23.45.67". Feel free to extend it :). Next we handle
|
||||
comments:
|
||||
</p>
|
||||
|
@ -240,13 +240,13 @@ error"</tt>, where if the token before the <tt>??</tt> does not match, then
|
||||
<tt>Stream.Error "parse error"</tt> will be raised.</p>
|
||||
|
||||
<p>2) Another interesting aspect of this function is that it uses recursion by
|
||||
calling <tt>parse_primary</tt> (we will soon see that <tt>parse_primary</tt> can
|
||||
call <tt>parse_primary</tt>). This is powerful because it allows us to handle
|
||||
recursive grammars, and keeps each production very simple. Note that
|
||||
parentheses do not cause construction of AST nodes themselves. While we could
|
||||
do it this way, the most important role of parentheses are to guide the parser
|
||||
and provide grouping. Once the parser constructs the AST, parentheses are not
|
||||
needed.</p>
|
||||
calling <tt>Parser.parse_primary</tt> (we will soon see that
|
||||
<tt>Parser.parse_primary</tt> can call <tt>Parser.parse_primary</tt>). This is
|
||||
powerful because it allows us to handle recursive grammars, and keeps each
|
||||
production very simple. Note that parentheses do not cause construction of AST
|
||||
nodes themselves. While we could do it this way, the most important role of
|
||||
parentheses are to guide the parser and provide grouping. Once the parser
|
||||
constructs the AST, parentheses are not needed.</p>
|
||||
|
||||
<p>The next simple production is for handling variable references and function
|
||||
calls:</p>
|
||||
@ -345,12 +345,12 @@ let main () =
|
||||
|
||||
<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
|
||||
(this can obviously be extended by you, our brave and intrepid reader). The
|
||||
<tt>precedence</tt> function returns the precedence for the current token,
|
||||
or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt> makes
|
||||
it easy to add new operators and makes it clear that the algorithm doesn't
|
||||
<tt>Parser.precedence</tt> function returns the precedence for the current
|
||||
token, or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt>
|
||||
makes it easy to add new operators and makes it clear that the algorithm doesn't
|
||||
depend on the specific operators involved, but it would be easy enough to
|
||||
eliminate the <tt>Hashtbl.t</tt> and do the comparisons in the
|
||||
<tt>precedence</tt> function. (Or just use a fixed-size array).</p>
|
||||
<tt>Parser.precedence</tt> function. (Or just use a fixed-size array).</p>
|
||||
|
||||
<p>With the helper above defined, we can now start parsing binary expressions.
|
||||
The basic idea of operator precedence parsing is to break down an expression
|
||||
@ -376,19 +376,19 @@ and parse_expr = parser
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p><tt>parse_bin_rhs</tt> is the function that parses the sequence of pairs for
|
||||
us. It takes a precedence and a pointer to an expression for the part that has been
|
||||
parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
|
||||
allowed to be empty, in which case it returns the expression that is passed into
|
||||
it. In our example above, the code passes the expression for "a" into
|
||||
<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
|
||||
<p><tt>Parser.parse_bin_rhs</tt> is the function that parses the sequence of
|
||||
pairs for us. It takes a precedence and a pointer to an expression for the part
|
||||
that has been parsed so far. Note that "x" is a perfectly valid expression: As
|
||||
such, "binoprhs" is allowed to be empty, in which case it returns the expression
|
||||
that is passed into it. In our example above, the code passes the expression for
|
||||
"a" into <tt>Parser.parse_bin_rhs</tt> and the current token is "+".</p>
|
||||
|
||||
<p>The precedence value passed into <tt>parse_bin_rhs</tt> indicates the <em>
|
||||
minimal operator precedence</em> that the function is allowed to eat. For
|
||||
example, if the current pair stream is [+, x] and <tt>parse_bin_rhs</tt> is
|
||||
passed in a precedence of 40, it will not consume any tokens (because the
|
||||
precedence of '+' is only 20). With this in mind, <tt>parse_bin_rhs</tt> starts
|
||||
with:</p>
|
||||
<p>The precedence value passed into <tt>Parser.parse_bin_rhs</tt> indicates the
|
||||
<em>minimal operator precedence</em> that the function is allowed to eat. For
|
||||
example, if the current pair stream is [+, x] and <tt>Parser.parse_bin_rhs</tt>
|
||||
is passed in a precedence of 40, it will not consume any tokens (because the
|
||||
precedence of '+' is only 20). With this in mind, <tt>Parser.parse_bin_rhs</tt>
|
||||
starts with:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@ -497,10 +497,10 @@ context):</p>
|
||||
has higher precedence than the binop we are currently parsing. As such, we know
|
||||
that any sequence of pairs whose operators are all higher precedence than "+"
|
||||
should be parsed together and returned as "RHS". To do this, we recursively
|
||||
invoke the <tt>parse_bin_rhs</tt> function specifying "token_prec+1" as the
|
||||
minimum precedence required for it to continue. In our example above, this will
|
||||
cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the
|
||||
RHS of the '+' expression.</p>
|
||||
invoke the <tt>Parser.parse_bin_rhs</tt> function specifying "token_prec+1" as
|
||||
the minimum precedence required for it to continue. In our example above, this
|
||||
will cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set
|
||||
as the RHS of the '+' expression.</p>
|
||||
|
||||
<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
|
||||
and added to the AST. With this little bit of code (14 non-trivial lines), we
|
||||
@ -705,7 +705,7 @@ course.) To build this, just compile with:</p>
|
||||
# Compile
|
||||
ocamlbuild toy.byte
|
||||
# Run
|
||||
./toy
|
||||
./toy.byte
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user