2007-05-12 05:37:42 +00:00
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
2007-01-20 23:21:08 +00:00
|
|
|
<html>
|
|
|
|
<head>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
|
|
<title>LLVM Bitcode File Format</title>
|
|
|
|
<link rel="stylesheet" href="llvm.css" type="text/css">
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="doc_title"> LLVM Bitcode File Format </div>
|
|
|
|
<ol>
|
|
|
|
<li><a href="#abstract">Abstract</a></li>
|
2007-05-12 03:23:40 +00:00
|
|
|
<li><a href="#overview">Overview</a></li>
|
|
|
|
<li><a href="#bitstream">Bitstream Format</a>
|
|
|
|
<ol>
|
|
|
|
<li><a href="#magic">Magic Numbers</a></li>
|
2007-05-12 05:37:42 +00:00
|
|
|
<li><a href="#primitives">Primitives</a></li>
|
|
|
|
<li><a href="#abbrevid">Abbreviation IDs</a></li>
|
|
|
|
<li><a href="#blocks">Blocks</a></li>
|
|
|
|
<li><a href="#datarecord">Data Records</a></li>
|
2007-05-12 07:49:15 +00:00
|
|
|
<li><a href="#abbreviations">Abbreviations</a></li>
|
2007-05-13 00:59:52 +00:00
|
|
|
<li><a href="#stdblocks">Standard Blocks</a></li>
|
2007-05-12 03:23:40 +00:00
|
|
|
</ol>
|
|
|
|
</li>
|
2008-07-09 05:14:23 +00:00
|
|
|
<li><a href="#wrapper">Bitcode Wrapper Format</a>
|
|
|
|
</li>
|
2007-05-13 01:39:44 +00:00
|
|
|
<li><a href="#llvmir">LLVM IR Encoding</a>
|
|
|
|
<ol>
|
|
|
|
<li><a href="#basics">Basics</a></li>
|
|
|
|
</ol>
|
|
|
|
</li>
|
2007-01-20 23:21:08 +00:00
|
|
|
</ol>
|
|
|
|
<div class="doc_author">
|
2007-10-08 18:42:45 +00:00
|
|
|
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
|
|
|
|
and <a href="http://www.reverberate.org">Joshua Haberman</a>.
|
2007-01-20 23:21:08 +00:00
|
|
|
</p>
|
|
|
|
</div>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
<!-- *********************************************************************** -->
|
2007-05-12 03:23:40 +00:00
|
|
|
<div class="doc_section"> <a name="abstract">Abstract</a></div>
|
2007-01-20 23:21:08 +00:00
|
|
|
<!-- *********************************************************************** -->
|
2007-05-12 03:23:40 +00:00
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
<div class="doc_text">
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
<p>This document describes the LLVM bitstream file format and the encoding of
|
|
|
|
the LLVM IR into it.</p>
|
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
</div>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
<!-- *********************************************************************** -->
|
2007-05-12 03:23:40 +00:00
|
|
|
<div class="doc_section"> <a name="overview">Overview</a></div>
|
2007-01-20 23:21:08 +00:00
|
|
|
<!-- *********************************************************************** -->
|
2007-05-12 03:23:40 +00:00
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
<div class="doc_text">
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
<p>
|
|
|
|
What is commonly known as the LLVM bitcode file format (also, sometimes
|
|
|
|
anachronistically known as bytecode) is actually two things: a <a
|
|
|
|
href="#bitstream">bitstream container format</a>
|
|
|
|
and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
|
|
|
|
|
|
|
|
<p>
|
2007-05-12 08:01:52 +00:00
|
|
|
The bitstream format is an abstract encoding of structured data, very
|
2007-05-12 03:23:40 +00:00
|
|
|
similar to XML in some ways. Like XML, bitstream files contain tags, and nested
|
|
|
|
structures, and you can parse the file without having to understand the tags.
|
|
|
|
Unlike XML, the bitstream format is a binary encoding, and unlike XML it
|
|
|
|
provides a mechanism for the file to self-describe "abbreviations", which are
|
|
|
|
effectively size optimizations for the content.</p>
|
|
|
|
|
2008-07-09 05:14:23 +00:00
|
|
|
<p>LLVM IR files may be optionally embedded into a <a
|
|
|
|
href="#wrapper">wrapper</a> structure that makes it easy to embed extra data
|
|
|
|
along with LLVM IR files.</p>
|
|
|
|
|
|
|
|
<p>This document first describes the LLVM bitstream format, describes the
|
|
|
|
wrapper format, then describes the record structure used by LLVM IR files.
|
2007-05-12 03:23:40 +00:00
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div class="doc_section"> <a name="bitstream">Bitstream Format</a></div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The bitstream format is literally a stream of bits, with a very simple
|
|
|
|
structure. This structure consists of the following concepts:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
2007-05-12 05:37:42 +00:00
|
|
|
<li>A "<a href="#magic">magic number</a>" that identifies the contents of
|
|
|
|
the stream.</li>
|
|
|
|
<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
|
|
|
|
integers.</li>
|
|
|
|
<li><a href="#blocks">Blocks</a>, which define nested content.</li>
|
|
|
|
<li><a href="#datarecord">Data Records</a>, which describe entities within the
|
|
|
|
file.</li>
|
2007-05-12 03:23:40 +00:00
|
|
|
<li>Abbreviations, which specify compression optimizations for the file.</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>Note that the <a
|
|
|
|
href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
|
|
|
|
used to dump and inspect arbitrary bitstreams, which is very useful for
|
|
|
|
understanding the encoding.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="magic">Magic Numbers</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
2007-10-08 18:42:45 +00:00
|
|
|
<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43).
|
|
|
|
The second two bytes are an application-specific magic number. Generic
|
|
|
|
bitcode tools can look at only the first two bytes to verify the file is
|
|
|
|
bitcode, while application-specific programs will want to look at all four.</p>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
</div>
|
|
|
|
|
2007-05-12 05:37:42 +00:00
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="primitives">Primitives</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
2007-10-08 18:42:45 +00:00
|
|
|
A bitstream literally consists of a stream of bits, which are read in order
|
|
|
|
starting with the least significant bit of each byte. The stream is made up of a
|
2007-05-13 01:39:44 +00:00
|
|
|
number of primitive values that encode a stream of unsigned integer values.
|
|
|
|
These
|
2007-05-12 05:37:42 +00:00
|
|
|
integers are are encoded in two ways: either as <a href="#fixedwidth">Fixed
|
|
|
|
Width Integers</a> or as <a href="#variablewidth">Variable Width
|
|
|
|
Integers</a>.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
2007-05-12 05:37:42 +00:00
|
|
|
<div class="doc_subsubsection"> <a name="fixedwidth">Fixed Width Integers</a>
|
|
|
|
</div>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
2007-05-12 05:37:42 +00:00
|
|
|
<p>Fixed-width integer values have their low bits emitted directly to the file.
|
|
|
|
For example, a 3-bit integer value encodes 1 as 001. Fixed width integers
|
|
|
|
are used when there are a well-known number of options for a field. For
|
|
|
|
example, boolean values are usually encoded with a 1-bit wide integer.
|
2007-05-12 03:23:40 +00:00
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
2007-05-12 05:37:42 +00:00
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="variablewidth">Variable Width
|
|
|
|
Integers</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>Variable-width integer (VBR) values encode values of arbitrary size,
|
|
|
|
optimizing for the case where the values are small. Given a 4-bit VBR field,
|
|
|
|
any 3-bit value (0 through 7) is encoded directly, with the high bit set to
|
|
|
|
zero. Values larger than N-1 bits emit their bits in a series of N-1 bit
|
|
|
|
chunks, where all but the last set the high bit.</p>
|
|
|
|
|
|
|
|
<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
|
|
|
|
vbr4 value. The first set of four bits indicates the value 3 (011) with a
|
|
|
|
continuation piece (indicated by a high bit of 1). The next word indicates a
|
|
|
|
value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value
|
|
|
|
27.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="char6">6-bit characters</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>6-bit characters encode common characters into a fixed 6-bit field. They
|
2007-05-12 07:50:14 +00:00
|
|
|
represent the following characters with the following 6-bit values:</p>
|
2007-05-12 05:37:42 +00:00
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>'a' .. 'z' - 0 .. 25</li>
|
2007-10-08 18:42:45 +00:00
|
|
|
<li>'A' .. 'Z' - 26 .. 51</li>
|
|
|
|
<li>'0' .. '9' - 52 .. 61</li>
|
2007-05-12 05:37:42 +00:00
|
|
|
<li>'.' - 62</li>
|
|
|
|
<li>'_' - 63</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>This encoding is only suitable for encoding characters and strings that
|
|
|
|
consist only of the above characters. It is completely incapable of encoding
|
|
|
|
characters not in the set.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="wordalign">Word Alignment</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>Occasionally, it is useful to emit zero bits until the bitstream is a
|
|
|
|
multiple of 32 bits. This ensures that the bit position in the stream can be
|
|
|
|
represented as a multiple of 32-bit words.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="abbrevid">Abbreviation IDs</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
|
|
|
|
<a href="#datarecord">Data Records</a>. Both of these start with an
|
|
|
|
abbreviation ID encoded as a fixed-bitwidth field. The width is specified by
|
|
|
|
the current block, as described below. The value of the abbreviation ID
|
|
|
|
specifies either a builtin ID (which have special meanings, defined below) or
|
|
|
|
one of the abbreviation IDs defined by the stream itself.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The set of builtin abbrev IDs is:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>0 - <a href="#END_BLOCK">END_BLOCK</a> - This abbrev ID marks the end of the
|
|
|
|
current block.</li>
|
|
|
|
<li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
|
|
|
|
beginning of a new block.</li>
|
2007-05-12 07:49:15 +00:00
|
|
|
<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
|
|
|
|
abbreviation.</li>
|
|
|
|
<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
|
|
|
|
definition of an unabbreviated record.</li>
|
2007-05-12 05:37:42 +00:00
|
|
|
</ul>
|
|
|
|
|
2007-05-12 07:49:15 +00:00
|
|
|
<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
|
|
|
|
an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
|
2007-05-12 05:37:42 +00:00
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="blocks">Blocks</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Blocks in a bitstream denote nested regions of the stream, and are identified by
|
|
|
|
a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
|
2007-10-08 18:42:45 +00:00
|
|
|
function bodies). Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a>
|
|
|
|
whose meaning is defined by Bitcode; block IDs 8 and greater are
|
|
|
|
application specific. Nested blocks capture the hierachical structure of the data
|
2007-05-12 05:37:42 +00:00
|
|
|
encoded in it, and various properties are associated with blocks as the file is
|
|
|
|
parsed. Block definitions allow the reader to efficiently skip blocks
|
|
|
|
in constant time if the reader wants a summary of blocks, or if it wants to
|
|
|
|
efficiently skip data they do not understand. The LLVM IR reader uses this
|
|
|
|
mechanism to skip function bodies, lazily reading them on demand.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
When reading and encoding the stream, several properties are maintained for the
|
|
|
|
block. In particular, each block maintains:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>A current abbrev id width. This value starts at 2, and is set every time a
|
|
|
|
block record is entered. The block entry specifies the abbrev id width for
|
|
|
|
the body of the block.</li>
|
|
|
|
|
2007-10-08 18:42:45 +00:00
|
|
|
<li>A set of abbreviations. Abbreviations may be defined within a block, in
|
|
|
|
which case they are only defined in that block (neither subblocks nor
|
|
|
|
enclosing blocks see the abbreviation). Abbreviations can also be defined
|
|
|
|
inside a <a href="#BLOCKINFO">BLOCKINFO</a> block, in which case they are
|
|
|
|
defined in all blocks that match the ID that the BLOCKINFO block is describing.
|
2007-05-12 05:37:42 +00:00
|
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>As sub blocks are entered, these properties are saved and the new sub-block
|
|
|
|
has its own set of abbreviations, and its own abbrev id width. When a sub-block
|
|
|
|
is popped, the saved values are restored.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK
|
|
|
|
Encoding</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
|
|
|
|
<align32bits>, blocklen<sub>32</sub>]</tt></p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record.
|
|
|
|
The <tt>blockid</tt> value is encoded as a 8-bit VBR identifier, and indicates
|
2007-10-08 18:42:45 +00:00
|
|
|
the type of block being entered (which can be a <a href="#stdblocks">standard
|
|
|
|
block</a> or an application-specific block). The
|
2007-05-12 05:37:42 +00:00
|
|
|
<tt>newabbrevlen</tt> value is a 4-bit VBR which specifies the
|
|
|
|
abbrev id width for the sub-block. The <tt>blocklen</tt> is a 32-bit aligned
|
|
|
|
value that specifies the size of the subblock, in 32-bit words. This value
|
|
|
|
allows the reader to skip over the entire block in one jump.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="END_BLOCK">END_BLOCK
|
|
|
|
Encoding</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p><tt>[END_BLOCK, <align32bits>]</tt></p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The END_BLOCK abbreviation ID specifies the end of the current block record.
|
|
|
|
Its end is aligned to 32-bits to ensure that the size of the block is an even
|
|
|
|
multiple of 32-bits.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="datarecord">Data Records</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
2007-05-12 07:49:15 +00:00
|
|
|
<p>
|
|
|
|
Data records consist of a record code and a number of (up to) 64-bit integer
|
|
|
|
values. The interpretation of the code and values is application specific and
|
|
|
|
there are multiple different ways to encode a record (with an unabbrev record
|
|
|
|
or with an abbreviation). In the LLVM IR format, for example, there is a record
|
|
|
|
which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE,
|
|
|
|
and the values of the record are the ascii codes for the characters in the
|
|
|
|
string.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
|
|
|
|
Encoding</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
|
|
|
|
op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
|
|
|
|
|
|
|
|
<p>An UNABBREV_RECORD provides a default fallback encoding, which is both
|
|
|
|
completely general and also extremely inefficient. It can describe an arbitrary
|
|
|
|
record, by emitting the code and operands as vbrs.</p>
|
|
|
|
|
|
|
|
<p>For example, emitting an LLVM IR target triple as an unabbreviated record
|
|
|
|
requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
|
|
|
|
MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
|
|
|
|
the number of operands), and a vbr6 for each character. Since there are no
|
|
|
|
letters with value less than 32, each letter would need to be emitted as at
|
|
|
|
least a two-part VBR, which means that each letter would require at least 12
|
|
|
|
bits. This is not an efficient encoding, but it is fully general.</p>
|
2007-05-12 05:37:42 +00:00
|
|
|
|
2007-05-12 07:49:15 +00:00
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
|
|
|
|
Encoding</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p><tt>[<abbrevid>, fields...]</tt></p>
|
|
|
|
|
|
|
|
<p>An abbreviated record is a abbreviation id followed by a set of fields that
|
|
|
|
are encoded according to the <a href="#abbreviations">abbreviation
|
|
|
|
definition</a>. This allows records to be encoded significantly more densely
|
|
|
|
than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
|
|
|
|
type, and allows the abbreviation types to be specified in the stream itself,
|
|
|
|
which allows the files to be completely self describing. The actual encoding
|
|
|
|
of abbreviations is defined below.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
2007-05-12 05:37:42 +00:00
|
|
|
<p>
|
2007-05-12 07:49:15 +00:00
|
|
|
Abbreviations are an important form of compression for bitstreams. The idea is
|
|
|
|
to specify a dense encoding for a class of records once, then use that encoding
|
|
|
|
to emit many records. It takes space to emit the encoding into the file, but
|
|
|
|
the space is recouped (hopefully plus some) when the records that use it are
|
|
|
|
emitted.
|
2007-05-12 05:37:42 +00:00
|
|
|
</p>
|
|
|
|
|
2007-05-12 07:49:15 +00:00
|
|
|
<p>
|
|
|
|
Abbreviations can be determined dynamically per client, per file. Since the
|
|
|
|
abbreviations are stored in the bitstream itself, different streams of the same
|
|
|
|
format can contain different sets of abbreviations if the specific stream does
|
|
|
|
not need it. As a concrete example, LLVM IR files usually emit an abbreviation
|
|
|
|
for binary operators. If a specific LLVM module contained no or few binary
|
|
|
|
operators, the abbreviation does not need to be emitted.
|
|
|
|
</p>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
|
|
|
|
Encoding</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
|
|
|
|
...]</tt></p>
|
|
|
|
|
2007-10-08 18:42:45 +00:00
|
|
|
<p>A DEFINE_ABBREV record adds an abbreviation to the list of currently
|
|
|
|
defined abbreviations in the scope of this block. This definition only
|
|
|
|
exists inside this immediate block -- it is not visible in subblocks or
|
|
|
|
enclosing blocks.
|
|
|
|
Abbreviations are implicitly assigned IDs
|
|
|
|
sequentially starting from 4 (the first application-defined abbreviation ID).
|
|
|
|
Any abbreviations defined in a BLOCKINFO record receive IDs first, in order,
|
|
|
|
followed by any abbreviations defined within the block itself.
|
|
|
|
Abbreviated data records reference this ID to indicate what abbreviation
|
|
|
|
they are invoking.</p>
|
|
|
|
|
2007-05-12 07:49:15 +00:00
|
|
|
<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
|
|
|
|
by a VBR that specifies the number of abbrev operands, then the abbrev
|
|
|
|
operands themselves. Abbreviation operands come in three forms. They all start
|
|
|
|
with a single bit that indicates whether the abbrev operand is a literal operand
|
|
|
|
(when the bit is 1) or an encoding operand (when the bit is 0).</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
|
|
|
|
Literal operands specify that the value in the result
|
|
|
|
is always a single specific value. This specific value is emitted as a vbr8
|
|
|
|
after the bit indicating that it is a literal operand.</li>
|
|
|
|
<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
|
2007-05-13 00:59:52 +00:00
|
|
|
- Operand encodings that do not have extra data are just emitted as their code.
|
2007-05-12 07:49:15 +00:00
|
|
|
</li>
|
|
|
|
<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
|
2007-05-13 00:59:52 +00:00
|
|
|
value<sub>vbr5</sub>]</tt> - Operand encodings that do have extra data are
|
|
|
|
emitted as their code, followed by the extra data.
|
2007-05-12 07:49:15 +00:00
|
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
|
2007-05-13 00:59:52 +00:00
|
|
|
<p>The possible operand encodings are:</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>1 - Fixed - The field should be emitted as a <a
|
|
|
|
href="#fixedwidth">fixed-width value</a>, whose width
|
2007-10-08 18:42:45 +00:00
|
|
|
is specified by the operand's extra data.</li>
|
2007-05-13 00:59:52 +00:00
|
|
|
<li>2 - VBR - The field should be emitted as a <a
|
|
|
|
href="#variablewidth">variable-width value</a>, whose width
|
2007-10-08 18:42:45 +00:00
|
|
|
is specified by the operand's extra data.</li>
|
|
|
|
<li>3 - Array - This field is an array of values. The array operand has no
|
|
|
|
extra data, but expects another operand to follow it which indicates the
|
|
|
|
element type of the array. When reading an array in an abbreviated record,
|
|
|
|
the first integer is a vbr6 that indicates the array length, followed by
|
|
|
|
the encoded elements of the array. An array may only occur as the last
|
|
|
|
operand of an abbreviation (except for the one final operand that gives
|
|
|
|
the array's type).</li>
|
2007-05-13 00:59:52 +00:00
|
|
|
<li>4 - Char6 - This field should be emitted as a <a href="#char6">char6-encoded
|
2007-10-08 18:42:45 +00:00
|
|
|
value</a>. This operand type takes no extra data.</li>
|
2007-05-13 00:59:52 +00:00
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>For example, target triples in LLVM modules are encoded as a record of the
|
|
|
|
form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>. Consider if the bitstream emitted
|
|
|
|
the following abbrev entry:</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li><tt>[0, Fixed, 4]</tt></li>
|
|
|
|
<li><tt>[0, Array]</tt></li>
|
|
|
|
<li><tt>[0, Char6]</tt></li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>When emitting a record with this abbreviation, the above entry would be
|
|
|
|
emitted as:</p>
|
|
|
|
|
|
|
|
<p><tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>,
|
|
|
|
0<sub>6</sub>, 1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt></p>
|
|
|
|
|
|
|
|
<p>These values are:</p>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
|
|
|
|
<li>The second value, 2, is the code for TRIPLE in LLVM IR files.</li>
|
|
|
|
<li>The third value, 4, is the length of the array.</li>
|
|
|
|
<li>The rest of the values are the char6 encoded values for "abcd".</li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<p>With this abbreviation, the triple is emitted with only 37 bits (assuming a
|
|
|
|
abbrev id width of 3). Without the abbreviation, significantly more space would
|
|
|
|
be required to emit the target triple. Also, since the TRIPLE value is not
|
|
|
|
emitted as a literal in the abbreviation, the abbreviation can also be used for
|
|
|
|
any other string value.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="stdblocks">Standard Blocks</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
In addition to the basic block structure and record encodings, the bitstream
|
|
|
|
also defines specific builtin block types. These block types specify how the
|
|
|
|
stream is to be decoded or other metadata. In the future, new standard blocks
|
2007-10-08 18:42:45 +00:00
|
|
|
may be added. Block IDs 0-7 are reserved for standard blocks.
|
2007-05-13 00:59:52 +00:00
|
|
|
</p>
|
|
|
|
|
2007-05-12 05:37:42 +00:00
|
|
|
</div>
|
|
|
|
|
2007-05-13 00:59:52 +00:00
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"><a name="BLOCKINFO">#0 - BLOCKINFO
|
|
|
|
Block</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>The BLOCKINFO block allows the description of metadata for other blocks. The
|
|
|
|
currently specified records are:</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li><tt>[SETBID (#1), blockid]</tt></li>
|
|
|
|
<li><tt>[DEFINE_ABBREV, ...]</tt></li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>
|
2007-10-08 18:42:45 +00:00
|
|
|
The SETBID record indicates which block ID is being described. SETBID
|
|
|
|
records can occur multiple times throughout the block to change which
|
|
|
|
block ID is being described. There must be a SETBID record prior to
|
|
|
|
any other records.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Standard DEFINE_ABBREV records can occur inside BLOCKINFO blocks, but unlike
|
|
|
|
their occurrence in normal blocks, the abbreviation is defined for blocks
|
|
|
|
matching the block ID we are describing, <i>not</i> the BLOCKINFO block itself.
|
|
|
|
The abbreviations defined in BLOCKINFO blocks receive abbreviation ids
|
|
|
|
as described in <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a>.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Note that although the data in BLOCKINFO blocks is described as "metadata," the
|
|
|
|
abbreviations they contain are essential for parsing records from the
|
|
|
|
corresponding blocks. It is not safe to skip them.
|
2007-05-13 00:59:52 +00:00
|
|
|
</p>
|
|
|
|
|
|
|
|
</div>
|
2007-05-12 05:37:42 +00:00
|
|
|
|
2008-07-09 05:14:23 +00:00
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div class="doc_section"> <a name="wrapper">Bitcode Wrapper Format</a></div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
|
|
|
|
structure. This structure contains a simple header that indicates the offset
|
|
|
|
and size of the embedded BC file. This allows additional information to be
|
|
|
|
stored alongside the BC file. The structure of this file header is:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
<pre>
|
|
|
|
[Magic<sub>32</sub>,
|
|
|
|
Version<sub>32</sub>,
|
|
|
|
Offset<sub>32</sub>,
|
|
|
|
Size<sub>32</sub>,
|
|
|
|
CPUType<sub>32</sub>]
|
|
|
|
</pre></p>
|
|
|
|
|
|
|
|
<p>Each of the fields are 32-bit fields stored in little endian form (as with
|
|
|
|
the rest of the bitcode file fields). The Magic number is always
|
|
|
|
<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>. The Offset
|
|
|
|
field is the offset in bytes to the start of the bitcode stream in the file, and
|
|
|
|
the Size field is a size in bytes of the stream. CPUType is a target-specific
|
|
|
|
value that can be used to encode the CPU of the target.
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
2007-05-12 03:23:40 +00:00
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div class="doc_section"> <a name="llvmir">LLVM IR Encoding</a></div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
2007-05-13 01:39:44 +00:00
|
|
|
<p>LLVM IR is encoded into a bitstream by defining blocks and records. It uses
|
|
|
|
blocks for things like constant pools, functions, symbol tables, etc. It uses
|
|
|
|
records for things like instructions, global variable descriptors, type
|
|
|
|
descriptions, etc. This document does not describe the set of abbreviations
|
|
|
|
that the writer uses, as these are fully self-described in the file, and the
|
|
|
|
reader is not allowed to build in any knowledge of this.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="basics">Basics</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"><a name="ir_magic">LLVM IR Magic Number</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The magic number for LLVM IR files is:
|
|
|
|
</p>
|
|
|
|
|
2007-10-08 18:42:45 +00:00
|
|
|
<p><tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt></p>
|
2007-05-13 01:39:44 +00:00
|
|
|
|
2007-10-08 18:42:45 +00:00
|
|
|
<p>When combined with the bitcode magic number and viewed as bytes, this is "BC 0xC0DE".</p>
|
2007-05-13 01:39:44 +00:00
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"><a name="ir_signed_vbr">Signed VBRs</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
<a href="#variablewidth">Variable Width Integers</a> are an efficient way to
|
|
|
|
encode arbitrary sized unsigned values, but is an extremely inefficient way to
|
|
|
|
encode signed values (as signed values are otherwise treated as maximally large
|
|
|
|
unsigned values).</p>
|
|
|
|
|
|
|
|
<p>As such, signed vbr values of a specific width are emitted as follows:</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>Positive values are emitted as vbrs of the specified width, but with their
|
|
|
|
value shifted left by one.</li>
|
|
|
|
<li>Negative values are emitted as vbrs of the specified width, but the negated
|
|
|
|
value is shifted left by one, and the low bit is set.</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>With this encoding, small positive and small negative values can both be
|
|
|
|
emitted efficiently.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
<!-- _______________________________________________________________________ -->
|
|
|
|
<div class="doc_subsubsection"><a name="ir_blocks">LLVM IR Blocks</a></div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
LLVM IR is defined with the following blocks:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>8 - MODULE_BLOCK - This is the top-level block that contains the
|
|
|
|
entire module, and describes a variety of per-module information.</li>
|
|
|
|
<li>9 - PARAMATTR_BLOCK - This enumerates the parameter attributes.</li>
|
|
|
|
<li>10 - TYPE_BLOCK - This describes all of the types in the module.</li>
|
|
|
|
<li>11 - CONSTANTS_BLOCK - This describes constants for a module or
|
|
|
|
function.</li>
|
|
|
|
<li>12 - FUNCTION_BLOCK - This describes a function body.</li>
|
|
|
|
<li>13 - TYPE_SYMTAB_BLOCK - This describes the type symbol table.</li>
|
|
|
|
<li>14 - VALUE_SYMTAB_BLOCK - This describes a value symbol table.</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<div class="doc_subsection"><a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div class="doc_text">
|
|
|
|
|
|
|
|
<p>
|
|
|
|
</p>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
</div>
|
2007-05-12 03:23:40 +00:00
|
|
|
|
|
|
|
|
2007-01-20 23:21:08 +00:00
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<hr>
|
|
|
|
<address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
|
|
|
|
src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
|
|
|
|
<a href="http://validator.w3.org/check/referer"><img
|
|
|
|
src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
|
2007-05-12 03:23:40 +00:00
|
|
|
<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
|
2007-01-20 23:21:08 +00:00
|
|
|
<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
|
|
|
|
Last modified: $Date$
|
|
|
|
</address>
|
|
|
|
</body>
|
|
|
|
</html>
|