mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-30 16:17:05 +00:00 
			
		
		
		
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37017 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			615 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			615 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | |
|                       "http://www.w3.org/TR/html4/strict.dtd">
 | |
| <html>
 | |
| <head>
 | |
|   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 | |
|   <title>LLVM Bitcode File Format</title>
 | |
|   <link rel="stylesheet" href="llvm.css" type="text/css">
 | |
| </head>
 | |
| <body>
 | |
| <div class="doc_title"> LLVM Bitcode File Format </div>
 | |
| <ol>
 | |
|   <li><a href="#abstract">Abstract</a></li>
 | |
|   <li><a href="#overview">Overview</a></li>
 | |
|   <li><a href="#bitstream">Bitstream Format</a>
 | |
|     <ol>
 | |
|     <li><a href="#magic">Magic Numbers</a></li>
 | |
|     <li><a href="#primitives">Primitives</a></li>
 | |
|     <li><a href="#abbrevid">Abbreviation IDs</a></li>
 | |
|     <li><a href="#blocks">Blocks</a></li>
 | |
|     <li><a href="#datarecord">Data Records</a></li>
 | |
|     <li><a href="#abbreviations">Abbreviations</a></li>
 | |
|     <li><a href="#stdblocks">Standard Blocks</a></li>
 | |
|     </ol>
 | |
|   </li>
 | |
|   <li><a href="#llvmir">LLVM IR Encoding</a>
 | |
|     <ol>
 | |
|     <li><a href="#basics">Basics</a></li>
 | |
|     </ol>
 | |
|   </li>
 | |
| </ol>
 | |
| <div class="doc_author">
 | |
|   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>.
 | |
| </p>
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section"> <a name="abstract">Abstract</a></div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>This document describes the LLVM bitstream file format and the encoding of
 | |
| the LLVM IR into it.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section"> <a name="overview">Overview</a></div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| What is commonly known as the LLVM bitcode file format (also, sometimes
 | |
| anachronistically known as bytecode) is actually two things: a <a 
 | |
| href="#bitstream">bitstream container format</a>
 | |
| and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
 | |
| 
 | |
| <p>
 | |
| The bitstream format is an abstract encoding of structured data, very
 | |
| similar to XML in some ways.  Like XML, bitstream files contain tags, and nested
 | |
| structures, and you can parse the file without having to understand the tags.
 | |
| Unlike XML, the bitstream format is a binary encoding, and unlike XML it
 | |
| provides a mechanism for the file to self-describe "abbreviations", which are
 | |
| effectively size optimizations for the content.</p>
 | |
| 
 | |
| <p>This document first describes the LLVM bitstream format, then describes the
 | |
| record structure used by LLVM IR files.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section"> <a name="bitstream">Bitstream Format</a></div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| The bitstream format is literally a stream of bits, with a very simple
 | |
| structure.  This structure consists of the following concepts:
 | |
| </p>
 | |
| 
 | |
| <ul>
 | |
| <li>A "<a href="#magic">magic number</a>" that identifies the contents of
 | |
|     the stream.</li>
 | |
| <li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
 | |
|     integers.</li> 
 | |
| <li><a href="#blocks">Blocks</a>, which define nested content.</li> 
 | |
| <li><a href="#datarecord">Data Records</a>, which describe entities within the
 | |
|     file.</li> 
 | |
| <li>Abbreviations, which specify compression optimizations for the file.</li> 
 | |
| </ul>
 | |
| 
 | |
| <p>Note that the <a 
 | |
| href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
 | |
| used to dump and inspect arbitrary bitstreams, which is very useful for
 | |
| understanding the encoding.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="magic">Magic Numbers</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>The first four bytes of the stream identify the encoding of the file.  This
 | |
| is used by a reader to know what is contained in the file.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="primitives">Primitives</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| A bitstream literally consists of a stream of bits.  This stream is made up of a
 | |
| number of primitive values that encode a stream of unsigned integer values.
 | |
| These
 | |
| integers are are encoded in two ways: either as <a href="#fixedwidth">Fixed
 | |
| Width Integers</a> or as <a href="#variablewidth">Variable Width
 | |
| Integers</a>.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="fixedwidth">Fixed Width Integers</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>Fixed-width integer values have their low bits emitted directly to the file.
 | |
|    For example, a 3-bit integer value encodes 1 as 001.  Fixed width integers
 | |
|    are used when there are a well-known number of options for a field.  For
 | |
|    example, boolean values are usually encoded with a 1-bit wide integer. 
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="variablewidth">Variable Width
 | |
| Integers</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>Variable-width integer (VBR) values encode values of arbitrary size,
 | |
| optimizing for the case where the values are small.  Given a 4-bit VBR field,
 | |
| any 3-bit value (0 through 7) is encoded directly, with the high bit set to
 | |
| zero.  Values larger than N-1 bits emit their bits in a series of N-1 bit
 | |
| chunks, where all but the last set the high bit.</p>
 | |
| 
 | |
| <p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
 | |
| vbr4 value.  The first set of four bits indicates the value 3 (011) with a
 | |
| continuation piece (indicated by a high bit of 1).  The next word indicates a
 | |
| value of 24 (011 << 3) with no continuation.  The sum (3+24) yields the value
 | |
| 27.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="char6">6-bit characters</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>6-bit characters encode common characters into a fixed 6-bit field.  They
 | |
| represent the following characters with the following 6-bit values:</p>
 | |
| 
 | |
| <ul>
 | |
| <li>'a' .. 'z' - 0 .. 25</li>
 | |
| <li>'A' .. 'Z' - 26 .. 52</li>
 | |
| <li>'0' .. '9' - 53 .. 61</li>
 | |
| <li>'.' - 62</li>
 | |
| <li>'_' - 63</li>
 | |
| </ul>
 | |
| 
 | |
| <p>This encoding is only suitable for encoding characters and strings that
 | |
| consist only of the above characters.  It is completely incapable of encoding
 | |
| characters not in the set.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="wordalign">Word Alignment</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>Occasionally, it is useful to emit zero bits until the bitstream is a
 | |
| multiple of 32 bits.  This ensures that the bit position in the stream can be
 | |
| represented as a multiple of 32-bit words.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="abbrevid">Abbreviation IDs</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
 | |
| <a href="#datarecord">Data Records</a>.  Both of these start with an
 | |
| abbreviation ID encoded as a fixed-bitwidth field.  The width is specified by
 | |
| the current block, as described below.  The value of the abbreviation ID
 | |
| specifies either a builtin ID (which have special meanings, defined below) or
 | |
| one of the abbreviation IDs defined by the stream itself.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| The set of builtin abbrev IDs is:
 | |
| </p>
 | |
| 
 | |
| <ul>
 | |
| <li>0 - <a href="#END_BLOCK">END_BLOCK</a> - This abbrev ID marks the end of the
 | |
|     current block.</li>
 | |
| <li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
 | |
|     beginning of a new block.</li>
 | |
| <li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
 | |
|     abbreviation.</li>
 | |
| <li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
 | |
|     definition of an unabbreviated record.</li>
 | |
| </ul>
 | |
| 
 | |
| <p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
 | |
| an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="blocks">Blocks</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| Blocks in a bitstream denote nested regions of the stream, and are identified by
 | |
| a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
 | |
| function bodies).  Nested blocks capture the hierachical structure of the data
 | |
| encoded in it, and various properties are associated with blocks as the file is
 | |
| parsed.  Block definitions allow the reader to efficiently skip blocks
 | |
| in constant time if the reader wants a summary of blocks, or if it wants to
 | |
| efficiently skip data they do not understand.  The LLVM IR reader uses this
 | |
| mechanism to skip function bodies, lazily reading them on demand.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| When reading and encoding the stream, several properties are maintained for the
 | |
| block.  In particular, each block maintains:
 | |
| </p>
 | |
| 
 | |
| <ol>
 | |
| <li>A current abbrev id width.  This value starts at 2, and is set every time a
 | |
|     block record is entered.  The block entry specifies the abbrev id width for
 | |
|     the body of the block.</li>
 | |
| 
 | |
| <li>A set of abbreviations.  Abbreviations may be defined within a block, or
 | |
|     they may be associated with all blocks of a particular ID.
 | |
| </li>
 | |
| </ol>
 | |
| 
 | |
| <p>As sub blocks are entered, these properties are saved and the new sub-block
 | |
| has its own set of abbreviations, and its own abbrev id width.  When a sub-block
 | |
| is popped, the saved values are restored.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK
 | |
| Encoding</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
 | |
|      <align32bits>, blocklen<sub>32</sub>]</tt></p>
 | |
| 
 | |
| <p>
 | |
| The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record.
 | |
| The <tt>blockid</tt> value is encoded as a 8-bit VBR identifier, and indicates
 | |
| the type of block being entered (which is application specific).  The
 | |
| <tt>newabbrevlen</tt> value is a 4-bit VBR which specifies the
 | |
| abbrev id width for the sub-block.  The <tt>blocklen</tt> is a 32-bit aligned
 | |
| value that specifies the size of the subblock, in 32-bit words.  This value
 | |
| allows the reader to skip over the entire block in one jump.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="END_BLOCK">END_BLOCK
 | |
| Encoding</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><tt>[END_BLOCK, <align32bits>]</tt></p>
 | |
| 
 | |
| <p>
 | |
| The END_BLOCK abbreviation ID specifies the end of the current block record.
 | |
| Its end is aligned to 32-bits to ensure that the size of the block is an even
 | |
| multiple of 32-bits.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="datarecord">Data Records</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| <p>
 | |
| Data records consist of a record code and a number of (up to) 64-bit integer
 | |
| values.  The interpretation of the code and values is application specific and
 | |
| there are multiple different ways to encode a record (with an unabbrev record
 | |
| or with an abbreviation).  In the LLVM IR format, for example, there is a record
 | |
| which encodes the target triple of a module.  The code is MODULE_CODE_TRIPLE,
 | |
| and the values of the record are the ascii codes for the characters in the
 | |
| string.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
 | |
| Encoding</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
 | |
|        op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
 | |
| 
 | |
| <p>An UNABBREV_RECORD provides a default fallback encoding, which is both
 | |
| completely general and also extremely inefficient.  It can describe an arbitrary
 | |
| record, by emitting the code and operands as vbrs.</p>
 | |
| 
 | |
| <p>For example, emitting an LLVM IR target triple as an unabbreviated record
 | |
| requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
 | |
| MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
 | |
| the number of operands), and a vbr6 for each character.  Since there are no
 | |
| letters with value less than 32, each letter would need to be emitted as at
 | |
| least a two-part VBR, which means that each letter would require at least 12
 | |
| bits.  This is not an efficient encoding, but it is fully general.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
 | |
| Encoding</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><tt>[<abbrevid>, fields...]</tt></p>
 | |
| 
 | |
| <p>An abbreviated record is a abbreviation id followed by a set of fields that
 | |
| are encoded according to the <a href="#abbreviations">abbreviation 
 | |
| definition</a>.  This allows records to be encoded significantly more densely
 | |
| than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
 | |
| type, and allows the abbreviation types to be specified in the stream itself,
 | |
| which allows the files to be completely self describing.  The actual encoding
 | |
| of abbreviations is defined below.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| <p>
 | |
| Abbreviations are an important form of compression for bitstreams.  The idea is
 | |
| to specify a dense encoding for a class of records once, then use that encoding
 | |
| to emit many records.  It takes space to emit the encoding into the file, but
 | |
| the space is recouped (hopefully plus some) when the records that use it are
 | |
| emitted.
 | |
| </p>
 | |
| 
 | |
| <p>
 | |
| Abbreviations can be determined dynamically per client, per file.  Since the
 | |
| abbreviations are stored in the bitstream itself, different streams of the same
 | |
| format can contain different sets of abbreviations if the specific stream does
 | |
| not need it.  As a concrete example, LLVM IR files usually emit an abbreviation
 | |
| for binary operators.  If a specific LLVM module contained no or few binary
 | |
| operators, the abbreviation does not need to be emitted.
 | |
| </p>
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
 | |
|  Encoding</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
 | |
|  ...]</tt></p>
 | |
| 
 | |
| <p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
 | |
| by a VBR that specifies the number of abbrev operands, then the abbrev
 | |
| operands themselves.  Abbreviation operands come in three forms.  They all start
 | |
| with a single bit that indicates whether the abbrev operand is a literal operand
 | |
| (when the bit is 1) or an encoding operand (when the bit is 0).</p>
 | |
| 
 | |
| <ol>
 | |
| <li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
 | |
| Literal operands specify that the value in the result
 | |
| is always a single specific value.  This specific value is emitted as a vbr8
 | |
| after the bit indicating that it is a literal operand.</li>
 | |
| <li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
 | |
|  - Operand encodings that do not have extra data are just emitted as their code.
 | |
| </li>
 | |
| <li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>, 
 | |
| value<sub>vbr5</sub>]</tt> - Operand encodings that do have extra data are
 | |
| emitted as their code, followed by the extra data.
 | |
| </li>
 | |
| </ol>
 | |
| 
 | |
| <p>The possible operand encodings are:</p>
 | |
| 
 | |
| <ul>
 | |
| <li>1 - Fixed - The field should be emitted as a <a 
 | |
|     href="#fixedwidth">fixed-width value</a>, whose width
 | |
|     is specified by the encoding operand.</li>
 | |
| <li>2 - VBR - The field should be emitted as a <a 
 | |
|     href="#variablewidth">variable-width value</a>, whose width
 | |
|     is specified by the encoding operand.</li>
 | |
| <li>3 - Array - This field is an array of values.  The element type of the array
 | |
|     is specified by the next encoding operand.</li>
 | |
| <li>4 - Char6 - This field should be emitted as a <a href="#char6">char6-encoded
 | |
|     value</a>.</li>
 | |
| </ul>
 | |
| 
 | |
| <p>For example, target triples in LLVM modules are encoded as a record of the
 | |
| form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>.  Consider if the bitstream emitted
 | |
| the following abbrev entry:</p>
 | |
| 
 | |
| <ul>
 | |
| <li><tt>[0, Fixed, 4]</tt></li>
 | |
| <li><tt>[0, Array]</tt></li>
 | |
| <li><tt>[0, Char6]</tt></li>
 | |
| </ul>
 | |
| 
 | |
| <p>When emitting a record with this abbreviation, the above entry would be
 | |
| emitted as:</p>
 | |
| 
 | |
| <p><tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>,
 | |
|    0<sub>6</sub>, 1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt></p>
 | |
| 
 | |
| <p>These values are:</p>
 | |
| 
 | |
| <ol>
 | |
| <li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
 | |
| <li>The second value, 2, is the code for TRIPLE in LLVM IR files.</li>
 | |
| <li>The third value, 4, is the length of the array.</li>
 | |
| <li>The rest of the values are the char6 encoded values for "abcd".</li>
 | |
| </ol>
 | |
| 
 | |
| <p>With this abbreviation, the triple is emitted with only 37 bits (assuming a
 | |
| abbrev id width of 3).  Without the abbreviation, significantly more space would
 | |
| be required to emit the target triple.  Also, since the TRIPLE value is not
 | |
| emitted as a literal in the abbreviation, the abbreviation can also be used for
 | |
| any other string value.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="stdblocks">Standard Blocks</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| In addition to the basic block structure and record encodings, the bitstream
 | |
| also defines specific builtin block types.  These block types specify how the
 | |
| stream is to be decoded or other metadata.  In the future, new standard blocks
 | |
| may be added.
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"><a name="BLOCKINFO">#0 - BLOCKINFO
 | |
| Block</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>The BLOCKINFO block allows the description of metadata for other blocks.  The
 | |
|   currently specified records are:</p>
 | |
|  
 | |
| <ul>
 | |
| <li><tt>[SETBID (#1), blockid]</tt></li>
 | |
| <li><tt>[DEFINE_ABBREV, ...]</tt></li>
 | |
| </ul>
 | |
| 
 | |
| <p>
 | |
| The SETBID record indicates which block ID is being described.  The standard
 | |
| DEFINE_ABBREV record specifies an abbreviation.  The abbreviation is associated
 | |
| with the record ID, and any records with matching ID automatically get the
 | |
| abbreviation. 
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <div class="doc_section"> <a name="llvmir">LLVM IR Encoding</a></div>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>LLVM IR is encoded into a bitstream by defining blocks and records.  It uses
 | |
| blocks for things like constant pools, functions, symbol tables, etc.  It uses
 | |
| records for things like instructions, global variable descriptors, type
 | |
| descriptions, etc.  This document does not describe the set of abbreviations
 | |
| that the writer uses, as these are fully self-described in the file, and the
 | |
| reader is not allowed to build in any knowledge of this.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="basics">Basics</a>
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"><a name="ir_magic">LLVM IR Magic Number</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| The magic number for LLVM IR files is:
 | |
| </p>
 | |
| 
 | |
| <p><tt>['B'<sub>8</sub>, 'C'<sub>8</sub>, 0x0<sub>4</sub>, 0xC<sub>4</sub>,
 | |
| 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt></p>
 | |
| 
 | |
| <p>When viewed as bytes, this is "BC 0xC0DE".</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"><a name="ir_signed_vbr">Signed VBRs</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| <a href="#variablewidth">Variable Width Integers</a> are an efficient way to
 | |
| encode arbitrary sized unsigned values, but is an extremely inefficient way to
 | |
| encode signed values (as signed values are otherwise treated as maximally large
 | |
| unsigned values).</p>
 | |
| 
 | |
| <p>As such, signed vbr values of a specific width are emitted as follows:</p>
 | |
| 
 | |
| <ul>
 | |
| <li>Positive values are emitted as vbrs of the specified width, but with their
 | |
|     value shifted left by one.</li>
 | |
| <li>Negative values are emitted as vbrs of the specified width, but the negated
 | |
|     value is shifted left by one, and the low bit is set.</li>
 | |
| </ul>
 | |
| 
 | |
| <p>With this encoding, small positive and small negative values can both be
 | |
| emitted efficiently.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!-- _______________________________________________________________________ -->
 | |
| <div class="doc_subsubsection"><a name="ir_blocks">LLVM IR Blocks</a></div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| LLVM IR is defined with the following blocks:
 | |
| </p>
 | |
| 
 | |
| <ul>
 | |
| <li>8  - MODULE_BLOCK - This is the top-level block that contains the
 | |
|     entire module, and describes a variety of per-module information.</li>
 | |
| <li>9  - PARAMATTR_BLOCK - This enumerates the parameter attributes.</li>
 | |
| <li>10 - TYPE_BLOCK - This describes all of the types in the module.</li>
 | |
| <li>11 - CONSTANTS_BLOCK - This describes constants for a module or
 | |
|     function.</li>
 | |
| <li>12 - FUNCTION_BLOCK - This describes a function body.</li>
 | |
| <li>13 - TYPE_SYMTAB_BLOCK - This describes the type symbol table.</li>
 | |
| <li>14 - VALUE_SYMTAB_BLOCK - This describes a value symbol table.</li>
 | |
| </ul>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- ======================================================================= -->
 | |
| <div class="doc_subsection"><a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
 | |
| </div>
 | |
| 
 | |
| <div class="doc_text">
 | |
| 
 | |
| <p>
 | |
| </p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <hr>
 | |
| <address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
 | |
|  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
 | |
| <a href="http://validator.w3.org/check/referer"><img
 | |
|  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
 | |
|  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
 | |
| <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
 | |
| Last modified: $Date$
 | |
| </address>
 | |
| </body>
 | |
| </html>
 |