mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-31 08:16:47 +00:00 
			
		
		
		
	Finish off PR23080 by renaming the debug info IR constructs from `MD*` to `DI*`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the *previous* commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to *this* commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236120 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			460 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			460 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ======================================
 | |
| Kaleidoscope: Adding Debug Information
 | |
| ======================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| Chapter 8 Introduction
 | |
| ======================
 | |
| 
 | |
| Welcome to Chapter 8 of the "`Implementing a language with
 | |
| LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a
 | |
| decent little programming language with functions and variables.
 | |
| What happens if something goes wrong though, how do you debug your
 | |
| program?
 | |
| 
 | |
| Source level debugging uses formatted data that helps a debugger
 | |
| translate from binary and the state of the machine back to the
 | |
| source that the programmer wrote. In LLVM we generally use a format
 | |
| called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
 | |
| that represents types, source locations, and variable locations. 
 | |
| 
 | |
| The short summary of this chapter is that we'll go through the
 | |
| various things you have to add to a programming language to
 | |
| support debug info, and how you translate that into DWARF.
 | |
| 
 | |
| Caveat: For now we can't debug via the JIT, so we'll need to compile
 | |
| our program down to something small and standalone. As part of this
 | |
| we'll make a few modifications to the running of the language and
 | |
| how programs are compiled. This means that we'll have a source file
 | |
| with a simple program written in Kaleidoscope rather than the
 | |
| interactive JIT. It does involve a limitation that we can only
 | |
| have one "top level" command at a time to reduce the number of
 | |
| changes necessary.
 | |
| 
 | |
| Here's the sample program we'll be compiling:
 | |
| 
 | |
| .. code-block:: python
 | |
| 
 | |
|    def fib(x)
 | |
|      if x < 3 then
 | |
|        1
 | |
|      else
 | |
|        fib(x-1)+fib(x-2);
 | |
| 
 | |
|    fib(10)
 | |
| 
 | |
| 
 | |
| Why is this a hard problem?
 | |
| ===========================
 | |
| 
 | |
| Debug information is a hard problem for a few different reasons - mostly
 | |
| centered around optimized code. First, optimization makes keeping source
 | |
| locations more difficult. In LLVM IR we keep the original source location
 | |
| for each IR level instruction on the instruction. Optimization passes
 | |
| should keep the source locations for newly created instructions, but merged
 | |
| instructions only get to keep a single location - this can cause jumping
 | |
| around when stepping through optimized programs. Secondly, optimization
 | |
| can move variables in ways that are either optimized out, shared in memory
 | |
| with other variables, or difficult to track. For the purposes of this
 | |
| tutorial we're going to avoid optimization (as you'll see with one of the
 | |
| next sets of patches).
 | |
| 
 | |
| Ahead-of-Time Compilation Mode
 | |
| ==============================
 | |
| 
 | |
| To highlight only the aspects of adding debug information to a source
 | |
| language without needing to worry about the complexities of JIT debugging
 | |
| we're going to make a few changes to Kaleidoscope to support compiling
 | |
| the IR emitted by the front end into a simple standalone program that
 | |
| you can execute, debug, and see results.
 | |
| 
 | |
| First we make our anonymous function that contains our top level
 | |
| statement be our "main":
 | |
| 
 | |
| .. code-block:: udiff
 | |
| 
 | |
|   -    PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
 | |
|   +    PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>());
 | |
| 
 | |
| just with the simple change of giving it a name.
 | |
| 
 | |
| Then we're going to remove the command line code wherever it exists:
 | |
| 
 | |
| .. code-block:: udiff
 | |
| 
 | |
|   @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
 | |
|    /// top ::= definition | external | expression | ';'
 | |
|    static void MainLoop() {
 | |
|      while (1) {
 | |
|   -    fprintf(stderr, "ready> ");
 | |
|        switch (CurTok) {
 | |
|        case tok_eof:
 | |
|          return;
 | |
|   @@ -1184,7 +1183,6 @@ int main() {
 | |
|      BinopPrecedence['*'] = 40; // highest.
 | |
|  
 | |
|      // Prime the first token.
 | |
|   -  fprintf(stderr, "ready> ");
 | |
|      getNextToken();
 | |
|  
 | |
| Lastly we're going to disable all of the optimization passes and the JIT so
 | |
| that the only thing that happens after we're done parsing and generating
 | |
| code is that the llvm IR goes to standard error:
 | |
| 
 | |
| .. code-block:: udiff
 | |
| 
 | |
|   @@ -1108,17 +1108,8 @@ static void HandleExtern() {
 | |
|    static void HandleTopLevelExpression() {
 | |
|      // Evaluate a top-level expression into an anonymous function.
 | |
|      if (FunctionAST *F = ParseTopLevelExpr()) {
 | |
|   -    if (Function *LF = F->Codegen()) {
 | |
|   -      // We're just doing this to make sure it executes.
 | |
|   -      TheExecutionEngine->finalizeObject();
 | |
|   -      // JIT the function, returning a function pointer.
 | |
|   -      void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
 | |
|   -
 | |
|   -      // Cast it to the right type (takes no arguments, returns a double) so we
 | |
|   -      // can call it as a native function.
 | |
|   -      double (*FP)() = (double (*)())(intptr_t)FPtr;
 | |
|   -      // Ignore the return value for this.
 | |
|   -      (void)FP;
 | |
|   +    if (!F->Codegen()) {
 | |
|   +      fprintf(stderr, "Error generating code for top level expr");
 | |
|        }
 | |
|      } else {
 | |
|        // Skip token for error recovery.
 | |
|   @@ -1439,11 +1459,11 @@ int main() {
 | |
|      // target lays out data structures.
 | |
|      TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
 | |
|      OurFPM.add(new DataLayoutPass());
 | |
|   +#if 0
 | |
|      OurFPM.add(createBasicAliasAnalysisPass());
 | |
|      // Promote allocas to registers.
 | |
|      OurFPM.add(createPromoteMemoryToRegisterPass());
 | |
|   @@ -1218,7 +1210,7 @@ int main() {
 | |
|      OurFPM.add(createGVNPass());
 | |
|      // Simplify the control flow graph (deleting unreachable blocks, etc).
 | |
|      OurFPM.add(createCFGSimplificationPass());
 | |
|   -
 | |
|   +  #endif
 | |
|      OurFPM.doInitialization();
 | |
|  
 | |
|      // Set the global so the code gen can use this.
 | |
| 
 | |
| This relatively small set of changes get us to the point that we can compile
 | |
| our piece of Kaleidoscope language down to an executable program via this
 | |
| command line:
 | |
| 
 | |
| .. code-block:: bash
 | |
| 
 | |
|   Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
 | |
| 
 | |
| which gives an a.out/a.exe in the current working directory.
 | |
| 
 | |
| Compile Unit
 | |
| ============
 | |
| 
 | |
| The top level container for a section of code in DWARF is a compile unit.
 | |
| This contains the type and function data for an individual translation unit
 | |
| (read: one file of source code). So the first thing we need to do is
 | |
| construct one for our fib.ks file.
 | |
| 
 | |
| DWARF Emission Setup
 | |
| ====================
 | |
| 
 | |
| Similar to the ``IRBuilder`` class we have a
 | |
| ```DIBuilder`` <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
 | |
| that helps in constructing debug metadata for an llvm IR file. It
 | |
| corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
 | |
| Using it does require that you be more familiar with DWARF terminology than
 | |
| you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
 | |
| read through the general documentation on the
 | |
| ```Metadata Format`` <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
 | |
| should be a little more clear. We'll be using this class to construct all
 | |
| of our IR level descriptions. Construction for it takes a module so we
 | |
| need to construct it shortly after we construct our module. We've left it
 | |
| as a global static variable to make it a bit easier to use.
 | |
| 
 | |
| Next we're going to create a small container to cache some of our frequent
 | |
| data. The first will be our compile unit, but we'll also write a bit of
 | |
| code for our one type since we won't have to worry about multiple typed
 | |
| expressions:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   static DIBuilder *DBuilder;
 | |
| 
 | |
|   struct DebugInfo {
 | |
|     DICompileUnit *TheCU;
 | |
|     DIType *DblTy;
 | |
| 
 | |
|     DIType *getDoubleTy();
 | |
|   } KSDbgInfo;
 | |
| 
 | |
|   DIType *DebugInfo::getDoubleTy() {
 | |
|     if (DblTy.isValid())
 | |
|       return DblTy;
 | |
| 
 | |
|     DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
 | |
|     return DblTy;
 | |
|   }
 | |
| 
 | |
| And then later on in ``main`` when we're constructing our module:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   DBuilder = new DIBuilder(*TheModule);
 | |
| 
 | |
|   KSDbgInfo.TheCU = DBuilder->createCompileUnit(
 | |
|       dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
 | |
| 
 | |
| There are a couple of things to note here. First, while we're producing a
 | |
| compile unit for a language called Kaleidoscope we used the language
 | |
| constant for C. This is because a debugger wouldn't necessarily understand
 | |
| the calling conventions or default ABI for a language it doesn't recognize
 | |
| and we follow the C ABI in our llvm code generation so it's the closest
 | |
| thing to accurate. This ensures we can actually call functions from the
 | |
| debugger and have them execute. Secondly, you'll see the "fib.ks" in the
 | |
| call to ``createCompileUnit``. This is a default hard coded value since
 | |
| we're using shell redirection to put our source into the Kaleidoscope
 | |
| compiler. In a usual front end you'd have an input file name and it would
 | |
| go there.
 | |
| 
 | |
| One last thing as part of emitting debug information via DIBuilder is that
 | |
| we need to "finalize" the debug information. The reasons are part of the
 | |
| underlying API for DIBuilder, but make sure you do this near the end of
 | |
| main:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   DBuilder->finalize();
 | |
| 
 | |
| before you dump out the module.
 | |
| 
 | |
| Functions
 | |
| =========
 | |
| 
 | |
| Now that we have our ``Compile Unit`` and our source locations, we can add
 | |
| function definitions to the debug info. So in ``PrototypeAST::Codegen`` we
 | |
| add a few lines of code to describe a context for our subprogram, in this
 | |
| case the "File", and the actual definition of the function itself.
 | |
| 
 | |
| So the context:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
 | |
|                                       KSDbgInfo.TheCU.getDirectory());
 | |
| 
 | |
| giving us an DIFile and asking the ``Compile Unit`` we created above for the
 | |
| directory and filename where we are currently. Then, for now, we use some
 | |
| source locations of 0 (since our AST doesn't currently have source location
 | |
| information) and construct our function definition:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   DIScope *FContext = Unit;
 | |
|   unsigned LineNo = 0;
 | |
|   unsigned ScopeLine = 0;
 | |
|   DISubprogram *SP = DBuilder->createFunction(
 | |
|       FContext, Name, StringRef(), Unit, LineNo,
 | |
|       CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
 | |
|       true /* definition */, ScopeLine, DINode::FlagPrototyped, false, F);
 | |
| 
 | |
| and we now have an DISubprogram that contains a reference to all of our
 | |
| metadata for the function.
 | |
| 
 | |
| Source Locations
 | |
| ================
 | |
| 
 | |
| The most important thing for debug information is accurate source location -
 | |
| this makes it possible to map your source code back. We have a problem though,
 | |
| Kaleidoscope really doesn't have any source location information in the lexer
 | |
| or parser so we'll need to add it.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|    struct SourceLocation {
 | |
|      int Line;
 | |
|      int Col;
 | |
|    };
 | |
|    static SourceLocation CurLoc;
 | |
|    static SourceLocation LexLoc = {1, 0};
 | |
| 
 | |
|    static int advance() {
 | |
|      int LastChar = getchar();
 | |
| 
 | |
|      if (LastChar == '\n' || LastChar == '\r') {
 | |
|        LexLoc.Line++;
 | |
|        LexLoc.Col = 0;
 | |
|      } else
 | |
|        LexLoc.Col++;
 | |
|      return LastChar;
 | |
|    }
 | |
| 
 | |
| In this set of code we've added some functionality on how to keep track of the
 | |
| line and column of the "source file". As we lex every token we set our current
 | |
| current "lexical location" to the assorted line and column for the beginning
 | |
| of the token. We do this by overriding all of the previous calls to
 | |
| ``getchar()`` with our new ``advance()`` that keeps track of the information
 | |
| and then we have added to all of our AST classes a source location:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|    class ExprAST {
 | |
|      SourceLocation Loc;
 | |
| 
 | |
|      public:
 | |
|        int getLine() const { return Loc.Line; }
 | |
|        int getCol() const { return Loc.Col; }
 | |
|        ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
 | |
|        virtual std::ostream &dump(std::ostream &out, int ind) {
 | |
|          return out << ':' << getLine() << ':' << getCol() << '\n';
 | |
|        }
 | |
| 
 | |
| that we pass down through when we create a new expression:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|    LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
 | |
| 
 | |
| giving us locations for each of our expressions and variables.
 | |
| 
 | |
| From this we can make sure to tell ``DIBuilder`` when we're at a new source
 | |
| location so it can use that when we generate the rest of our code and make
 | |
| sure that each instruction has source location information. We do this
 | |
| by constructing another small function:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   void DebugInfo::emitLocation(ExprAST *AST) {
 | |
|     DIScope *Scope;
 | |
|     if (LexicalBlocks.empty())
 | |
|       Scope = TheCU;
 | |
|     else
 | |
|       Scope = LexicalBlocks.back();
 | |
|     Builder.SetCurrentDebugLocation(
 | |
|         DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
 | |
|   }
 | |
| 
 | |
| that both tells the main ``IRBuilder`` where we are, but also what scope
 | |
| we're in. Since we've just created a function above we can either be in
 | |
| the main file scope (like when we created our function), or now we can be
 | |
| in the function scope we just created. To represent this we create a stack
 | |
| of scopes:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|    std::vector<DIScope *> LexicalBlocks;
 | |
|    std::map<const PrototypeAST *, DIScope *> FnScopeMap;
 | |
| 
 | |
| and keep a map of each function to the scope that it represents (an
 | |
| DISubprogram is also an DIScope).
 | |
| 
 | |
| Then we make sure to:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|    KSDbgInfo.emitLocation(this);
 | |
| 
 | |
| emit the location every time we start to generate code for a new AST, and
 | |
| also:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   KSDbgInfo.FnScopeMap[this] = SP;
 | |
| 
 | |
| store the scope (function) when we create it and use it:
 | |
| 
 | |
|   KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
 | |
| 
 | |
| when we start generating the code for each function.
 | |
| 
 | |
| also, don't forget to pop the scope back off of your scope stack at the
 | |
| end of the code generation for the function:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   // Pop off the lexical block for the function since we added it
 | |
|   // unconditionally.
 | |
|   KSDbgInfo.LexicalBlocks.pop_back();
 | |
| 
 | |
| Variables
 | |
| =========
 | |
| 
 | |
| Now that we have functions, we need to be able to print out the variables
 | |
| we have in scope. Let's get our function arguments set up so we can get
 | |
| decent backtraces and see how our functions are being called. It isn't
 | |
| a lot of code, and we generally handle it when we're creating the
 | |
| argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
 | |
|   DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
 | |
|                                       KSDbgInfo.TheCU.getDirectory());
 | |
|   DILocalVariable D = DBuilder->createLocalVariable(
 | |
|       dwarf::DW_TAG_arg_variable, Scope, Args[Idx], Unit, Line,
 | |
|       KSDbgInfo.getDoubleTy(), Idx);
 | |
| 
 | |
|   Instruction *Call = DBuilder->insertDeclare(
 | |
|       Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
 | |
|   Call->setDebugLoc(DebugLoc::get(Line, 0, Scope));
 | |
| 
 | |
| Here we're doing a few things. First, we're grabbing our current scope
 | |
| for the variable so we can say what range of code our variable is valid
 | |
| through. Second, we're creating the variable, giving it the scope,
 | |
| the name, source location, type, and since it's an argument, the argument
 | |
| index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
 | |
| level that we've got a variable in an alloca (and it gives a starting
 | |
| location for the variable). Lastly, we set a source location for the
 | |
| beginning of the scope on the declare.
 | |
| 
 | |
| One interesting thing to note at this point is that various debuggers have
 | |
| assumptions based on how code and debug information was generated for them
 | |
| in the past. In this case we need to do a little bit of a hack to avoid
 | |
| generating line information for the function prologue so that the debugger
 | |
| knows to skip over those instructions when setting a breakpoint. So in
 | |
| ``FunctionAST::CodeGen`` we add a couple of lines:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   // Unset the location for the prologue emission (leading instructions with no
 | |
|   // location in a function are considered part of the prologue and the debugger
 | |
|   // will run past them when breaking on a function)
 | |
|   KSDbgInfo.emitLocation(nullptr);
 | |
| 
 | |
| and then emit a new location when we actually start generating code for the
 | |
| body of the function:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   KSDbgInfo.emitLocation(Body);
 | |
| 
 | |
| With this we have enough debug information to set breakpoints in functions,
 | |
| print out argument variables, and call functions. Not too bad for just a
 | |
| few simple lines of code!
 | |
| 
 | |
| Full Code Listing
 | |
| =================
 | |
| 
 | |
| Here is the complete code listing for our running example, enhanced with
 | |
| debug information. To build this example, use:
 | |
| 
 | |
| .. code-block:: bash
 | |
| 
 | |
|     # Compile
 | |
|     clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
 | |
|     # Run
 | |
|     ./toy
 | |
| 
 | |
| Here is the code:
 | |
| 
 | |
| .. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
 | |
|    :language: c++
 | |
| 
 | |
| `Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_
 | |
| 
 |