continued description

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37003 91177308-0d34-0410-b5e6-96231b3b80d8
2025-04-09 01:38:03 +00:00 · 2007-05-12 07:49:15 +00:00 · 2007-05-12 07:49:15 +00:00 · daeb63c220
commit daeb63c220
parent 3a1716db58
1 changed files with 108 additions and 5 deletions
--- a/docs/BitCodeFormat.html
+++ b/docs/BitCodeFormat.html
@ -18,6 +18,7 @@
    <li><a href="#abbrevid">Abbreviation IDs</a></li>
    <li><a href="#blocks">Blocks</a></li>
    <li><a href="#datarecord">Data Records</a></li>
+    <li><a href="#abbreviations">Abbreviations</a></li>
    </ol>
  </li>
  <li><a href="#llvmir">LLVM IR Encoding</a></li>
@ -213,12 +214,14 @@ The set of builtin abbrev IDs is:
    current block.</li>
 <li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
    beginning of a new block.</li>
-<li>2 - DEFINE_ABBREV - This defines a new abbreviation.</li>
-<li>3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated
-    record.</li>
+<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
+    abbreviation.</li>
+<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
+    definition of an unabbreviated record.</li>
 </ul>

-<p>Abbreviation IDs 4 and above are defined by the stream itself.</p>
+<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
+an <a href="#abbrev_records">abbreviated record encoding</a>.</p>

 </div>

@ -303,10 +306,110 @@ multiple of 32-bits.</p>
 </div>

 <div class="doc_text">
+<p>
+Data records consist of a record code and a number of (up to) 64-bit integer
+values.  The interpretation of the code and values is application specific and
+there are multiple different ways to encode a record (with an unabbrev record
+or with an abbreviation).  In the LLVM IR format, for example, there is a record
+which encodes the target triple of a module.  The code is MODULE_CODE_TRIPLE,
+and the values of the record are the ascii codes for the characters in the
+string.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
+       op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
+
+<p>An UNABBREV_RECORD provides a default fallback encoding, which is both
+completely general and also extremely inefficient.  It can describe an arbitrary
+record, by emitting the code and operands as vbrs.</p>
+
+<p>For example, emitting an LLVM IR target triple as an unabbreviated record
+requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
+MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
+the number of operands), and a vbr6 for each character.  Since there are no
+letters with value less than 32, each letter would need to be emitted as at
+least a two-part VBR, which means that each letter would require at least 12
+bits.  This is not an efficient encoding, but it is fully general.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[&lt;abbrevid&gt;, fields...]</tt></p>
+
+<p>An abbreviated record is a abbreviation id followed by a set of fields that
+are encoded according to the <a href="#abbreviations">abbreviation 
+definition</a>.  This allows records to be encoded significantly more densely
+than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
+type, and allows the abbreviation types to be specified in the stream itself,
+which allows the files to be completely self describing.  The actual encoding
+of abbreviations is defined below.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
+</div>
+
+<div class="doc_text">
+<p>
+Abbreviations are an important form of compression for bitstreams.  The idea is
+to specify a dense encoding for a class of records once, then use that encoding
+to emit many records.  It takes space to emit the encoding into the file, but
+the space is recouped (hopefully plus some) when the records that use it are
+emitted.
+</p>

 <p>
-blah
+Abbreviations can be determined dynamically per client, per file.  Since the
+abbreviations are stored in the bitstream itself, different streams of the same
+format can contain different sets of abbreviations if the specific stream does
+not need it.  As a concrete example, LLVM IR files usually emit an abbreviation
+for binary operators.  If a specific LLVM module contained no or few binary
+operators, the abbreviation does not need to be emitted.
 </p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
+ Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
+ ...]</tt></p>
+
+<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
+by a VBR that specifies the number of abbrev operands, then the abbrev
+operands themselves.  Abbreviation operands come in three forms.  They all start
+with a single bit that indicates whether the abbrev operand is a literal operand
+(when the bit is 1) or an encoding operand (when the bit is 0).</p>
+
+<ol>
+<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
+Literal operands specify that the value in the result
+is always a single specific value.  This specific value is emitted as a vbr8
+after the bit indicating that it is a literal operand.</li>
+<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
+ - blah
+</li>
+<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>, 
+value<sub>vbr5</sub>]</tt> -
+
+</li>
+</ol>

 </div>