mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-31 08:16:47 +00:00 
			
		
		
		
	Tobias Nurmiranta git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15095 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			534 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			534 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | |
|                       "http://www.w3.org/TR/html4/strict.dtd">
 | |
| <html>
 | |
| <head>
 | |
|   <title>Accurate Garbage Collection with LLVM</title>
 | |
|   <link rel="stylesheet" href="llvm.css" type="text/css">
 | |
| </head>
 | |
| <body>
 | |
| 
 | |
| <div class="doc_title">
 | |
|   Accurate Garbage Collection with LLVM
 | |
| </div>
 | |
| 
 | |
| <ol>
 | |
|   <li><a href="#introduction">Introduction</a>
 | |
|     <ul>
 | |
|     <li><a href="#feature">GC features provided and algorithms supported</a></li>
 | |
|     </ul>
 | |
|   </li>
 | |
| 
 | |
|   <li><a href="#interfaces">Interfaces for user programs</a>
 | |
|     <ul>
 | |
|     <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li>
 | |
|     <li><a href="#allocate">Allocating memory from the GC</a></li>
 | |
|     <li><a href="#barriers">Reading and writing references to the heap</a></li>
 | |
|     <li><a href="#explicit">Explicit invocation of the garbage collector</a></li>
 | |
|     </ul>
 | |
|   </li>
 | |
| 
 | |
|   <li><a href="#gcimpl">Implementing a garbage collector</a>
 | |
|     <ul>
 | |
|     <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li>
 | |
|     <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li>
 | |
|     </ul>
 | |
|   </li>
 | |
|   <li><a href="#gcimpls">GC implementations available</a>
 | |
|     <ul>
 | |
|     <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li>
 | |
|     </ul>
 | |
|   </li>
 | |
| 
 | |
| <!--
 | |
|   <li><a href="#codegen">Implementing GC support in a code generator</a></li>
 | |
| -->
 | |
| </ol>
 | |
| 
 | |
| <div class="doc_author">
 | |
|   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section">
 | |
|   <a name="introduction">Introduction</a>
 | |
| </div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>Garbage collection is a widely used technique that frees the programmer from
 | |
| having to know the life-times of heap objects, making software easier to produce
 | |
| and maintain.  Many programming languages rely on garbage collection for
 | |
| automatic memory management.  There are two primary forms of garbage collection:
 | |
| conservative and accurate.</p>
 | |
| 
 | |
| <p>Conservative garbage collection often does not require any special support
 | |
| from either the language or the compiler: it can handle non-type-safe
 | |
| programming languages (such as C/C++) and does not require any special
 | |
| information from the compiler.  The [LINK] Boehm collector is an example of a
 | |
| state-of-the-art conservative collector.</p>
 | |
| 
 | |
| <p>Accurate garbage collection requires the ability to identify all pointers in
 | |
| the program at run-time (which requires that the source-language be type-safe in
 | |
| most cases).  Identifying pointers at run-time requires compiler support to
 | |
| locate all places that hold live pointer variables at run-time, including the
 | |
| <a href="#roots">processor stack and registers</a>.</p>
 | |
| 
 | |
| <p>
 | |
| Conservative garbage collection is attractive because it does not require any
 | |
| special compiler support, but it does have problems.  In particular, because the
 | |
| conservative garbage collector cannot <i>know</i> that a particular word in the
 | |
| machine is a pointer, it cannot move live objects in the heap (preventing the
 | |
| use of compacting and generational GC algorithms) and it can occasionally suffer
 | |
| from memory leaks due to integer values that happen to point to objects in the
 | |
| program.  In addition, some aggressive compiler transformations can break
 | |
| conservative garbage collectors (though these seem rare in practice).
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| Accurate garbage collectors do not suffer from any of these problems, but they
 | |
| can suffer from degraded scalar optimization of the program.  In particular,
 | |
| because the runtime must be able to identify and update all pointers active in
 | |
| the program, some optimizations are less effective.  In practice, however, the
 | |
| locality and performance benefits of using aggressive garbage allocation
 | |
| techniques dominates any low-level losses.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| This document describes the mechanisms and interfaces provided by LLVM to
 | |
| support accurate garbage collection.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="feature">GC features provided and algorithms supported</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| LLVM provides support for a broad class of garbage collection algorithms,
 | |
| including compacting semi-space collectors, mark-sweep collectors, generational
 | |
| collectors, and even reference counting implementations.  It includes support
 | |
| for <a href="#barriers">read and write barriers</a>, and associating <a
 | |
| href="#roots">meta-data with stack objects</a> (used for tagless garbage
 | |
| collection).  All LLVM code generators support garbage collection, including the
 | |
| C backend.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| We hope that the primitive support built into LLVM is sufficient to support a
 | |
| broad class of garbage collected languages, including Scheme, ML, scripting
 | |
| languages, Java, C#, etc.  That said, the implemented garbage collectors may
 | |
| need to be extended to support language-specific features such as finalization,
 | |
| weak references, or other features.  As these needs are identified and
 | |
| implemented, they should be added to this specification.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| LLVM does not currently support garbage collection of multi-threaded programs or
 | |
| GC-safe points other than function calls, but these will be added in the future
 | |
| as there is interest.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section">
 | |
|   <a name="interfaces">Interfaces for user programs</a>
 | |
| </div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>This section describes the interfaces provided by LLVM and by the garbage
 | |
| collector run-time that should be used by user programs.  As such, this is the
 | |
| interface that front-end authors should generate code for.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <div class="doc_code"><tt>
 | |
|   void %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata)
 | |
| </tt></div>
 | |
| 
 | |
| <p>
 | |
| The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable
 | |
| on the stack.  The first argument contains the address of the variable on the
 | |
| stack, and the second contains a pointer to metadata that should be associated
 | |
| with the pointer (which <b>must</b> be a constant or global value address).  At
 | |
| runtime, the <tt>llvm.gcroot</tt> intrinsic stores a null pointer into the
 | |
| specified location to initialize the pointer.</p>
 | |
| 
 | |
| <p>
 | |
| Consider the following fragment of Java code:
 | |
| </p>
 | |
| 
 | |
| <pre>
 | |
|        {
 | |
|          Object X;   // A null-initialized reference to an object
 | |
|          ...
 | |
|        }
 | |
| </pre>
 | |
| 
 | |
| <p>
 | |
| This block (which may be located in the middle of a function or in a loop nest),
 | |
| could be compiled to this LLVM code:
 | |
| </p>
 | |
| 
 | |
| <pre>
 | |
| Entry:
 | |
|    ;; In the entry block for the function, allocate the
 | |
|    ;; stack space for X, which is an LLVM pointer.
 | |
|    %X = alloca %Object*
 | |
|    ...
 | |
| 
 | |
|    ;; "CodeBlock" is the block corresponding to the start
 | |
|    ;;  of the scope above.
 | |
| CodeBlock:
 | |
|    ;; Initialize the object, telling LLVM that it is now live.
 | |
|    ;; Java has type-tags on objects, so it doesn't need any
 | |
|    ;; metadata.
 | |
|    call void %llvm.gcroot(%Object** %X, sbyte* null)
 | |
|    ...
 | |
| 
 | |
|    ;; As the pointer goes out of scope, store a null value into
 | |
|    ;; it, to indicate that the value is no longer live.
 | |
|    store %Object* null, %Object** %X
 | |
|    ...
 | |
| </pre>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="allocate">Allocating memory from the GC</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <div class="doc_code"><tt>
 | |
|   sbyte *%llvm_gc_allocate(unsigned %Size)
 | |
| </tt></div>
 | |
| 
 | |
| <p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the
 | |
| garbage collector implementation to allocate memory.  It returns a
 | |
| zeroed-out block of memory of the appropriate size.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="barriers">Reading and writing references to the heap</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <div class="doc_code"><tt>
 | |
|   sbyte *%llvm.gcread(sbyte *, sbyte **)<br>
 | |
|   void %llvm.gcwrite(sbyte*, sbyte*, sbyte**)
 | |
| </tt></div>
 | |
| 
 | |
| <p>Several of the more interesting garbage collectors (e.g., generational
 | |
| collectors) need to be informed when the mutator (the program that needs garbage
 | |
| collection) reads or writes object references into the heap.  In the case of a
 | |
| generational collector, it needs to keep track of which "old" generation objects
 | |
| have references stored into them.  The amount of code that typically needs to be
 | |
| executed is usually quite small (and not on the critical path of any 
 | |
| computation), so the overall performance impact of the inserted code is 
 | |
| tolerable.</p>
 | |
| 
 | |
| <p>To support garbage collectors that use read or write barriers, LLVM provides
 | |
| the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics.  The first
 | |
| intrinsic has exactly the same semantics as a non-volatile LLVM load and the
 | |
| second has the same semantics as a non-volatile LLVM store, with the
 | |
| additions that they also take a pointer to the start of the memory
 | |
| object as an argument.  At code generation
 | |
| time, these intrinsics are replaced with calls into the garbage collector
 | |
| (<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a
 | |
| href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then
 | |
| inlined into the code.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| If you are writing a front-end for a garbage collected language, every load or
 | |
| store of a reference from or to the heap should use these intrinsics instead of
 | |
| normal LLVM loads/stores.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="initialize">Garbage collector startup and initialization</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <div class="doc_code"><tt>
 | |
|   void %llvm_gc_initialize(unsigned %InitialHeapSize)
 | |
| </tt></div>
 | |
| 
 | |
| <p>
 | |
| The <tt>llvm_gc_initialize</tt> function should be called once before any other
 | |
| garbage collection functions are called.  This gives the garbage collector the
 | |
| chance to initialize itself and allocate the heap spaces.  The initial heap size
 | |
| to allocate should be specified as an argument.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="explicit">Explicit invocation of the garbage collector</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <div class="doc_code"><tt>
 | |
|   void %llvm_gc_collect()
 | |
| </tt></div>
 | |
| 
 | |
| <p>
 | |
| The <tt>llvm_gc_collect</tt> function is exported by the garbage collector
 | |
| implementations to provide a full collection, even when the heap is not
 | |
| exhausted.  This can be used by end-user code as a hint, and may be ignored by
 | |
| the garbage collector.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section">
 | |
|   <a name="gcimpl">Implementing a garbage collector</a>
 | |
| </div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| Implementing a garbage collector for LLVM is fairly straight-forward.  The LLVM
 | |
| garbage collectors are provided in a form that makes them easy to link into the
 | |
| language-specific runtime that a language front-end would use.  They require
 | |
| functionality from the language-specific runtime to get information about <a
 | |
| href="#gcdescriptors">where pointers are located in heap objects</a>.
 | |
| </p>
 | |
| 
 | |
| <p>The
 | |
| implementation must include the <a
 | |
| href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a
 | |
| href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement
 | |
| the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well.  To
 | |
| do this, it will probably have to <a href="#traceroots">trace through the roots
 | |
| from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors
 | |
| for heap objects</a>.  Luckily, there are some <a href="#gcimpls">example
 | |
| implementations</a> available.
 | |
| </p>
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
|   <div class="doc_code"><tt>
 | |
|     void *llvm_gc_read(void*, void **)<br>
 | |
|     void llvm_gc_write(void*, void *, void**)
 | |
|  </tt></div>
 | |
| 
 | |
| <p>
 | |
| These functions <i>must</i> be implemented in every garbage collector, even if
 | |
| they do not need read/write barriers.  In this case, just load or store the
 | |
| pointer, then return.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| If an actual read or write barrier is needed, it should be straight-forward to
 | |
| implement it.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="callbacks">Callback functions used to implement the garbage collector</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| <p>
 | |
| Garbage collector implementations make use of call-back functions that are
 | |
| implemented by other parts of the LLVM system.
 | |
| </p>
 | |
| </div>
 | |
| 
 | |
| <!--_________________________________________________________________________-->
 | |
| <div class="doc_subsubsection">
 | |
|   <a name="traceroots">Tracing GC pointers from the program stack</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
|   <div class="doc_code"><tt>
 | |
|      void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta));
 | |
|   </tt></div>
 | |
| 
 | |
| <p>
 | |
| The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code
 | |
| generator that iterates through all of the GC roots on the stack, calling the
 | |
| specified function pointer with each record.  For each GC root, the address of
 | |
| the pointer and the meta-data (from the <a
 | |
| href="#gcroot"><tt>llvm.gcroot</tt></a> intrinsic) are provided.
 | |
| </p>
 | |
| </div>
 | |
| 
 | |
| <!--_________________________________________________________________________-->
 | |
| <div class="doc_subsubsection">
 | |
|   <a name="staticroots">Tracing GC pointers from static roots</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| TODO
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!--_________________________________________________________________________-->
 | |
| <div class="doc_subsubsection">
 | |
|   <a name="gcdescriptors">Tracing GC pointers from heap objects</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| <p>
 | |
| The three most common ways to keep track of where pointers live in heap objects
 | |
| are (listed in order of space overhead required):</p>
 | |
| 
 | |
| <ol>
 | |
| <li>In languages with polymorphic objects, pointers from an object header are
 | |
| usually used to identify the GC pointers in the heap object.  This is common for
 | |
| object-oriented languages like Self, Smalltalk, Java, or C#.</li>
 | |
| 
 | |
| <li>If heap objects are not polymorphic, often the "shape" of the heap can be
 | |
| determined from the roots of the heap or from some other meta-data [<a
 | |
| href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a
 | |
| href="#tolmach94">Tolmach94</a>].  In this case, the garbage collector can
 | |
| propagate the information around from meta data stored with the roots.  This
 | |
| often eliminates the need to have a header on objects in the heap.  This is
 | |
| common in the ML family.</li>
 | |
| 
 | |
| <li>If all heap objects have pointers in the same locations, or pointers can be
 | |
| distinguished just by looking at them (e.g., the low order bit is clear), no
 | |
| book-keeping is needed at all.  This is common for Lisp-like languages.</li>
 | |
| </ol>
 | |
| 
 | |
| <p>The LLVM garbage collectors are capable of supporting all of these styles of
 | |
| language, including ones that mix various implementations.  To do this, it
 | |
| allows the source-language to associate meta-data with the <a
 | |
| href="#roots">stack roots</a>, and the heap tracing routines can propagate the
 | |
| information.  In addition, LLVM allows the front-end to extract GC information
 | |
| from in any form from a specific object pointer (this supports situations #1 and
 | |
| #3).
 | |
| </p>
 | |
| 
 | |
| <p><b>Making this efficient</b></p>
 | |
| 
 | |
| 
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section">
 | |
|   <a name="gcimpls">GC implementations available</a>
 | |
| </div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| To make this more concrete, the currently implemented LLVM garbage collectors
 | |
| all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base.
 | |
| If you are interested in implementing an algorithm, there are many interesting
 | |
| possibilities (mark/sweep, a generational collector, a reference counting
 | |
| collector, etc), or you could choose to improve one of the existing algorithms.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection">
 | |
|   <a name="semispace">SemiSpace - A simple copying garbage collector</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| <p>
 | |
| SemiSpace is a very simple copying collector.  When it starts up, it allocates
 | |
| two blocks of memory for the heap.  It uses a simple bump-pointer allocator to
 | |
| allocate memory from the first block until it runs out of space.  When it runs
 | |
| out of space, it traces through all of the roots of the program, copying blocks
 | |
| to the other half of the memory space.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!--_________________________________________________________________________-->
 | |
| <div class="doc_subsubsection">
 | |
|   Possible Improvements
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| If a collection cycle happens and the heap is not compacted very much (say less
 | |
| than 25% of the allocated memory was freed), the memory regions should be
 | |
| doubled in size.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section">
 | |
|   <a name="references">References</a>
 | |
| </div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew
 | |
| W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p>
 | |
| 
 | |
| <p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for
 | |
| strongly typed programming languages.  Benjamin Goldberg. ACM SIGPLAN
 | |
| PLDI'91.</p>
 | |
| 
 | |
| <p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using
 | |
| explicit type parameters.  Andrew Tolmach.  Proceedings of the 1994 ACM
 | |
| conference on LISP and functional programming.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <hr>
 | |
| <address>
 | |
|   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
 | |
|   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
 | |
|   <a href="http://validator.w3.org/check/referer"><img
 | |
|   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
 | |
| 
 | |
|   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
 | |
|   <a href="http://llvm.cs.uiuc.edu">LLVM Compiler Infrastructure</a><br>
 | |
|   Last modified: $Date$
 | |
| </address>
 | |
| 
 | |
| </body>
 | |
| </html>
 |