mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2026-04-23 22:23:00 +00:00
Various updates from Sam Bishop:
"I have been working my way through the JIT and Kaleidoscope tutorials in my (minuscule) spare time. Thanks again for writing them! I have attached a patch containing some minor changes, ranging from spelling and grammar fixes to adding a "Next: <next tutorial section>" hyperlink to the bottom of each page. Every page has been given the "next link" treatment, but otherwise I'm only half way through the Kaleidoscope tutorial. I will send a follow-on patch if time permits." git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46933 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
@@ -98,7 +98,7 @@ know what the stored numeric value is.</p>
|
||||
<p>Right now we only create the AST, so there are no useful accessor methods on
|
||||
them. It would be very easy to add a virtual method to pretty print the code,
|
||||
for example. Here are the other expression AST node definitions that we'll use
|
||||
in the basic form of the Kaleidoscope language.
|
||||
in the basic form of the Kaleidoscope language:
|
||||
</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@@ -130,7 +130,7 @@ public:
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This is all (intentially) rather straight-forward: variables capture the
|
||||
<p>This is all (intentionally) rather straight-forward: variables capture the
|
||||
variable name, binary operators capture their opcode (e.g. '+'), and calls
|
||||
capture a function name as well as a list of any argument expressions. One thing
|
||||
that is nice about our AST is that it captures the language features without
|
||||
@@ -201,7 +201,7 @@ calls like this:</p>
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
|
||||
/// token the parser it looking at. getNextToken reads another token from the
|
||||
/// token the parser is looking at. getNextToken reads another token from the
|
||||
/// lexer and updates CurTok with its results.
|
||||
static int CurTok;
|
||||
static int getNextToken() {
|
||||
@@ -263,11 +263,11 @@ static ExprAST *ParseNumberExpr() {
|
||||
|
||||
<p>This routine is very simple: it expects to be called when the current token
|
||||
is a <tt>tok_number</tt> token. It takes the current number value, creates
|
||||
a <tt>NumberExprAST</tt> node, advances the lexer to the next token and finally
|
||||
a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally
|
||||
returns.</p>
|
||||
|
||||
<p>There are some interesting aspects to this. The most important one is that
|
||||
this routine eats all of the tokens that correspond to the production, and
|
||||
this routine eats all of the tokens that correspond to the production and
|
||||
returns the lexer buffer with the next token (which is not part of the grammar
|
||||
production) ready to go. This is a fairly standard way to go for recursive
|
||||
descent parsers. For a better example, the parenthesis operator is defined like
|
||||
@@ -293,7 +293,7 @@ static ExprAST *ParseParenExpr() {
|
||||
parser:</p>
|
||||
|
||||
<p>
|
||||
1) it shows how we use the Error routines. When called, this function expects
|
||||
1) It shows how we use the Error routines. When called, this function expects
|
||||
that the current token is a '(' token, but after parsing the subexpression, it
|
||||
is possible that there is no ')' waiting. For example, if the user types in
|
||||
"(4 x" instead of "(4)", the parser should emit an error. Because errors can
|
||||
@@ -305,8 +305,8 @@ calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt>
|
||||
<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
|
||||
recursive grammars, and keeps each production very simple. Note that
|
||||
parentheses do not cause construction of AST nodes themselves. While we could
|
||||
do it this way, the most important role of parens are to guide the parser and
|
||||
provide grouping. Once the parser constructs the AST, parens are not
|
||||
do it this way, the most important role of parentheses are to guide the parser
|
||||
and provide grouping. Once the parser constructs the AST, parentheses are not
|
||||
needed.</p>
|
||||
|
||||
<p>The next simple production is for handling variable references and function
|
||||
@@ -350,21 +350,21 @@ static ExprAST *ParseIdentifierExpr() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>This routine follows the same style as the other routines (it expects to be
|
||||
<p>This routine follows the same style as the other routines. (It expects to be
|
||||
called if the current token is a <tt>tok_identifier</tt> token). It also has
|
||||
recursion and error handling. One interesting aspect of this is that it uses
|
||||
<em>look-ahead</em> to determine if the current identifier is a stand alone
|
||||
variable reference or if it is a function call expression. It handles this by
|
||||
checking to see if the token after the identifier is a '(' token, and constructs
|
||||
checking to see if the token after the identifier is a '(' token, constructing
|
||||
either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
|
||||
</p>
|
||||
|
||||
<p>Now that we have all of our simple expression parsing logic in place, we can
|
||||
define a helper function to wrap it together into one entry-point. We call this
|
||||
<p>Now that we have all of our simple expression-parsing logic in place, we can
|
||||
define a helper function to wrap it together into one entry point. We call this
|
||||
class of expressions "primary" expressions, for reasons that will become more
|
||||
clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
|
||||
parse an arbitrary primary expression, we need to determine what sort of
|
||||
specific expression it is:</p>
|
||||
expression it is:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@@ -383,13 +383,13 @@ static ExprAST *ParsePrimary() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Now that you see the definition of this function, it makes it more obvious
|
||||
why we can assume the state of CurTok in the various functions. This uses
|
||||
look-ahead to determine which sort of expression is being inspected, and parses
|
||||
it with a function call.</p>
|
||||
<p>Now that you see the definition of this function, it is more obvious why we
|
||||
can assume the state of CurTok in the various functions. This uses look-ahead
|
||||
to determine which sort of expression is being inspected, and then parses it
|
||||
with a function call.</p>
|
||||
|
||||
<p>Now that basic expressions are handled, we need to handle binary expressions,
|
||||
which are a bit more complex.</p>
|
||||
<p>Now that basic expressions are handled, we need to handle binary expressions.
|
||||
They are a bit more complex.</p>
|
||||
|
||||
</div>
|
||||
|
||||
@@ -447,12 +447,12 @@ int main() {
|
||||
or -1 if the token is not a binary operator. Having a map makes it easy to add
|
||||
new operators and makes it clear that the algorithm doesn't depend on the
|
||||
specific operators involved, but it would be easy enough to eliminate the map
|
||||
and do the comparisons in the <tt>GetTokPrecedence</tt> function (or just use
|
||||
and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use
|
||||
a fixed-size array).</p>
|
||||
|
||||
<p>With the helper above defined, we can now start parsing binary expressions.
|
||||
The basic idea of operator precedence parsing is to break down an expression
|
||||
with potentially ambiguous binary operators into pieces. Consider for example
|
||||
with potentially ambiguous binary operators into pieces. Consider ,for example,
|
||||
the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this
|
||||
as a stream of primary expressions separated by binary operators. As such,
|
||||
it will first parse the leading primary expression "a", then it will see the
|
||||
@@ -708,7 +708,7 @@ static FunctionAST *ParseTopLevelExpr() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Now that we have all the pieces, lets build a little driver that will let us
|
||||
<p>Now that we have all the pieces, let's build a little driver that will let us
|
||||
actually <em>execute</em> this code we've built!</p>
|
||||
|
||||
</div>
|
||||
@@ -732,7 +732,7 @@ static void MainLoop() {
|
||||
fprintf(stderr, "ready> ");
|
||||
switch (CurTok) {
|
||||
case tok_eof: return;
|
||||
case ';': getNextToken(); break; // ignore top level semicolons.
|
||||
case ';': getNextToken(); break; // ignore top-level semicolons.
|
||||
case tok_def: HandleDefinition(); break;
|
||||
case tok_extern: HandleExtern(); break;
|
||||
default: HandleTopLevelExpression(); break;
|
||||
@@ -742,13 +742,13 @@ static void MainLoop() {
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>The most interesting part of this is that we ignore top-level semi colons.
|
||||
<p>The most interesting part of this is that we ignore top-level semicolons.
|
||||
Why is this, you ask? The basic reason is that if you type "4 + 5" at the
|
||||
command line, the parser doesn't know whether that is the end of what you will type
|
||||
or not. For example, on the next line you could type "def foo..." in which case
|
||||
4+5 is the end of a top-level expression. Alternatively you could type "* 6",
|
||||
which would continue the expression. Having top-level semicolons allows you to
|
||||
type "4+5;" and the parser will know you are done.</p>
|
||||
type "4+5;", and the parser will know you are done.</p>
|
||||
|
||||
</div>
|
||||
|
||||
@@ -760,8 +760,8 @@ type "4+5;" and the parser will know you are done.</p>
|
||||
|
||||
<p>With just under 400 lines of commented code (240 lines of non-comment,
|
||||
non-blank code), we fully defined our minimal language, including a lexer,
|
||||
parser and AST builder. With this done, the executable will validate
|
||||
Kaleidoscope code and tell us if it is gramatically invalid. For
|
||||
parser, and AST builder. With this done, the executable will validate
|
||||
Kaleidoscope code and tell us if it is grammatically invalid. For
|
||||
example, here is a sample interaction:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
@@ -798,8 +798,8 @@ Representation (IR) from the AST.</p>
|
||||
<p>
|
||||
Here is the complete code listing for this and the previous chapter.
|
||||
Note that it is fully self-contained: you don't need LLVM or any external
|
||||
libraries at all for this (other than the C and C++ standard libraries of
|
||||
course). To build this, just compile with:</p>
|
||||
libraries at all for this. (Besides the C and C++ standard libraries, of
|
||||
course.) To build this, just compile with:</p>
|
||||
|
||||
<div class="doc_code">
|
||||
<pre>
|
||||
@@ -955,7 +955,7 @@ public:
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
|
||||
/// token the parser it looking at. getNextToken reads another token from the
|
||||
/// token the parser is looking at. getNextToken reads another token from the
|
||||
/// lexer and updates CurTok with its results.
|
||||
static int CurTok;
|
||||
static int getNextToken() {
|
||||
@@ -1167,7 +1167,7 @@ static void HandleExtern() {
|
||||
}
|
||||
|
||||
static void HandleTopLevelExpression() {
|
||||
// Evaluate a top level expression into an anonymous function.
|
||||
// Evaluate a top-level expression into an anonymous function.
|
||||
if (FunctionAST *F = ParseTopLevelExpr()) {
|
||||
fprintf(stderr, "Parsed a top-level expr\n");
|
||||
} else {
|
||||
@@ -1182,7 +1182,7 @@ static void MainLoop() {
|
||||
fprintf(stderr, "ready> ");
|
||||
switch (CurTok) {
|
||||
case tok_eof: return;
|
||||
case ';': getNextToken(); break; // ignore top level semicolons.
|
||||
case ';': getNextToken(); break; // ignore top-level semicolons.
|
||||
case tok_def: HandleDefinition(); break;
|
||||
case tok_extern: HandleExtern(); break;
|
||||
default: HandleTopLevelExpression(); break;
|
||||
@@ -1211,6 +1211,7 @@ int main() {
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a>
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
Reference in New Issue
Block a user