Kaleidoscope: Extending the Language: Operator Overloading

Welcome to Part 6 of the "Implementing a language with LLVM" tutorial. At this point in our tutorial, we now have a fully functional language that is fairly minimal, but also useful. One big problem with it though is that it doesn't have many useful operators (like division, logical negation, or even any comparisons other than less-than.

This chapter of the tutorial takes a wild digression into adding operator overloading to the simple and beautiful Kaleidoscope language, giving us a simple and ugly language in some ways, but also a powerful one at the same time. One of the great things about creating your own language is that you get to decide what is good or bad. In this tutorial we'll assume that it is okay and use this as a way to show some interesting parsing techniques.

The operator overloading that we will add to Kaleidoscope is more general than languages like C++. In C++, you are only allowed to redefine existing operators: you can't programatically change the grammar, introduce new operators, change precedence levels, etc. In this chapter, we will add this capability to Kaleidoscope, which will allow us to round out the set of operators that are supported, culminating in a more interesting example app.

The point of going into operator overloading in a tutorial like this is to show the power and flexibility of using a hand-written parser. The parser we are using so far is using recursive descent for most parts of the grammar, and operator precedence parsing for the expressions. See Chapter 2 for details. Without using operator precedence parsing, it would be very difficult to allow the programmer to introduce new operators into the grammar: the grammar is dynamically extensible as the JIT runs.

The two specific features we'll add are programmable unary operators (right now, Kaleidoscope has no unary operators at all) as well as binary operators. An example of this is:

# Logical unary not.
def unary!(v)
  if v then
    0
  else
    1;

# Define > with the same precedence as <.
def binary> 10 (LHS RHS)
  !(LHS < RHS);     # alternatively, could just use "RHS < LHS"

# Binary "logical or", (note that it does not "short circuit")
def binary| 5 (LHS RHS)
  if LHS then
    1
  else if RHS then
    1
  else
    0;

# Define = with slightly lower precedence than relationals.
def binary= 9 (LHS RHS)
  !(LHS < RHS | LHS > RHS);

Many languages aspire to being able to implement their standard runtime library in the language itself. In Kaleidoscope, we can implement significant parts of the language in the library!

We will break down implementation of these features into two parts: implementing support for overloading of binary operators and adding unary operators.

Adding support for overloaded binary operators is pretty simple with our current framework. We'll first add support for the unary/binary keywords:

enum Token {
  ...
  // operators
  tok_binary = -11, tok_unary = -12
};
...
static int gettok() {
...
    if (IdentifierStr == "for") return tok_for;
    if (IdentifierStr == "in") return tok_in;
    if (IdentifierStr == "binary") return tok_binary;
    if (IdentifierStr == "unary") return tok_unary;
    return tok_identifier;

This just adds lexer support for the unary and binary keywords, like we did in previous chapters. One nice thing about our current AST is that we represent binary operators fully generally with their ASCII code as the opcode. For our extended operators, we'll use the same representation, so we don't need any new AST or parser support.

On the other hand, we have to be able to represent the definitions of these new operators, in the "def binary| 5" part of the function definition. In the grammar so far, the "name" for the function definition is parsed as the "prototype" production and into the PrototypeAST AST node. To represent our new user-defined operators as prototypes, we have to extend the PrototypeAST AST node like this:

/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator.
class PrototypeAST {
  std::string Name;
  std::vector<std::string> Args;
  bool isOperator;
  unsigned Precedence;  // Precedence if a binary op.
public:
  PrototypeAST(const std::string &name, const std::vector<std::string> &args,
               bool isoperator = false, unsigned prec = 0)
  : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
  
  bool isUnaryOp() const { return isOperator && Args.size() == 1; }
  bool isBinaryOp() const { return isOperator && Args.size() == 2; }
  
  char getOperatorName() const {
    assert(isUnaryOp() || isBinaryOp());
    return Name[Name.size()-1];
  }
  
  unsigned getBinaryPrecedence() const { return Precedence; }
  
  Function *Codegen();
};

Basically, in addition to knowing a name for the prototype, we now keep track of whether it was an operator, and if it was, what precedence level the operator is at. The precedence is only used for binary operators.

...

Here is the complete code listing for our running example, enhanced with the if/then/else and for expressions.. To build this example, use:

   # Compile
   g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
   # Run
   ./toy

Here is the code: