Documented (Java properties file like) syntax of config file format

Added definitions for some of the configuration items.
Made the document HTML 4.01 Strict compliant.
Ran ispell on it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15877 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Reid Spencer 2004-08-17 09:18:37 +00:00
parent 2d1e01c795
commit aaa3da9665

View File

@ -12,7 +12,7 @@
margin-right: 1em; margin-bottom: 1em; }
.td_left { border: 2px solid gray; text-align: left; }
</style>
<meta name="author" content="Reid Spencer" name="author">
<meta name="author" content="Reid Spencer">
<meta name="description"
content="A description of the use and design of the LLVM Compiler Driver.">
</head>
@ -86,8 +86,7 @@
interfaces need to be understood).</li>
<li>Supports source language translator invocation via both dynamically
loadable shared objects and invocation of an executable.</li>
</ol>
</p>
</ul>
</div>
<!-- _______________________________________________________________________ -->
@ -96,7 +95,7 @@
<p>At a high level, <tt>llvmc</tt> operation is very simple. The basic action
taken by <tt>llvmc</tt> is to simply invoke some tool or set of tools to fill
the user's request for compilation. Every execution of <tt>llvmc</tt>takes the
following sequence of steps:<br/>
following sequence of steps:</p>
<dl>
<dt><b>Collect Command Line Options</b></dt>
<dd>The command line options provide the marching orders to <tt>llvmc</tt>
@ -108,9 +107,10 @@
<dd>Based on the options and the suffixes of the filenames presented, a set
of configuration files are read to configure the actions <tt>llvmc</tt> will
take. Configuration files are provided by either LLVM or the front end
compiler tools that B<llvmc> invokes. These files determine what actions
<tt>llvmc</tt> will take in response to the user's request. See the section
on <a href="#configuration">configuration</a> for more details.</dd>
compiler tools that <tt>llvmc</tt> invokes. These files determine what
actions <tt>llvmc</tt> will take in response to the user's request. See
the section on <a href="#configuration">configuration</a> for more details.
</dd>
<dt><b>Determine Phases To Execute</b></dt>
<dd>Based on the command line options and configuration files,
<tt>llvmc</tt> determines the compilation <a href="#phases">phases</a> that
@ -132,18 +132,18 @@
<dd>If any action fails (returns a non-zero result code), <tt>llvmc</tt>
also fails and returns the result code from the failing action. If
everything succeeds, <tt>llvmc</tt> will return a zero result code.</dd>
</dl></p>
</dl>
<p><tt>llvmc</tt>'s operation must be simple, regular and predictable.
Developers need to be able to rely on it to take a consistent approach to
compilation. For example, the invocation:</p>
<tt><pre>
llvmc -O2 x.c y.c z.c -o xyz</pre></tt>
<code>
llvmc -O2 x.c y.c z.c -o xyz</code>
<p>must produce <i>exactly</i> the same results as:</p>
<tt><pre>
llvmc -O2 x.c
llvmc -O2 y.c
llvmc -O2 z.c
llvmc -O2 x.o y.o z.o -o xyz</pre></tt>
<code>
llvmc -O2 x.c
llvmc -O2 y.c
llvmc -O2 z.c
llvmc -O2 x.o y.o z.o -o xyz</code>
<p>To accomplish this, <tt>llvmc</tt> uses a very simple goal oriented
procedure to do its work. The overall goal is to produce a functioning
executable. To accomplish this, <tt>llvmc</tt> always attempts to execute a
@ -254,10 +254,11 @@
<p>An action, with regard to <tt>llvmc</tt> is a basic operation that it takes
in order to fulfill the user's request. Each phase of compilation will invoke
zero or more actions in order to accomplish that phase.</p>
<p>Actions come in two forms:<ol>
<p>Actions come in two forms:</p>
<ul>
<li>Invokable Executables</li>
<li>Functions in a shared library</li>
</ul></p>
</ul>
</div>
<!-- *********************************************************************** -->
@ -274,9 +275,9 @@
<tt>llvmc</tt>. Configuration information is relatively static for a
given release of LLVM and a front end compiler. However, the details may
change from release to release of either. Users are encouraged to simply use
the various options of the B<llvmc> command and ignore the configuration of
the tool. These configuration files are for compiler writers and LLVM
developers. Those wishing to simply use B<llvmc> don't need to understand
the various options of the <tt>llvmc</tt> command and ignore the configuration
of the tool. These configuration files are for compiler writers and LLVM
developers. Those wishing to simply use <tt>llvmc</tt> don't need to understand
this section but it may be instructive on how the tool works.</p>
</div>
@ -300,9 +301,9 @@ were written for LLVM or not. Furthermore, not all compilation front ends will
have the same capabilities. Some front ends will simply generate LLVM assembly
code, others will be able to generate fully optimized byte code. In general,
<tt>llvmc</tt> doesn't make any assumptions about the capabilities or command
line options of a sub-tool. It simply uses the details found in the configuration
files and leaves it to the compiler writer to specify the configuration
correctly.</p>
line options of a sub-tool. It simply uses the details found in the
configuration files and leaves it to the compiler writer to specify the
configuration correctly.</p>
<p>This approach means that new compiler front ends can be up and working very
quickly. As a first cut, a front end can simply compile its source to raw
@ -336,15 +337,12 @@ optimization.</p>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="filetypes"></a>Configuration Files</div>
<div class="doc_text">
<h3>File Types</h3>
<p>There are two types of configuration files: the master configuration file
and the language specific configuration file. The master configuration file
contains the general configuration of <tt>llvmc</tt> itself and is supplied
with the tool. It contains information that is source language agnostic.
Language specific configuration files tell <tt>llvmc</tt> how to invoke the
language's compiler for a variety of different tasks and what other tools
are needed to backfill the compiler's missing features (e.g.
optimization).</p>
<h3>File Contents</h3>
<p>Each configuration file provides the details for a single source language
that is to be compiled. This configuration information tells <tt>llvmc</tt>
how to invoke the language's pre-processor, translator, optimizer, assembler
and linker. Note that a given source language needn't provide all these tools
as many of them exist in llvm currently.</p>
<h3>Directory Search</h3>
<p><tt>llvmc</tt> always looks for files of a specific name. It uses the
@ -365,77 +363,192 @@ optimization.</p>
<li>If the configuration file sought still can't be found, <tt>llvmc</tt>
will print an error message and exit.</li>
</ol>
The first file found in this search will be used. Other files with the same
name will be ignored even if they exist in one of the subsequent search
<p>The first file found in this search will be used. Other files with the
same name will be ignored even if they exist in one of the subsequent search
locations.</p>
<h3>File Names</h3>
<p>In the directories searched, a file named <tt>master</tt> will be
recognized as the master configuration file for <tt>llvmc</tt>. Note that
users <i>may</i> override the master file with a copy in their home directory
but they are advised not to. This capability is only useful for compiler
implementers needing to alter the master configuration while developing
their compiler front end. When reading the configuration files, the master
files are always read first.</p>
<p>Language specific configuration files are given specific names to foster
faster lookup. The name of a given language specific configuration file is
the same as the suffix used to identify files containing source in that
language. For example, a configuration file for C++ source might be named
<tt>cpp</tt>, <tt>C</tt>, or <tt>cxx</tt>.</p>
<p>In the directories searched, each configuration file is given a specific
name to foster faster lookup (so llvmc doesn't have to do directory searches).
The name of a given language specific configuration file is simply the same
as the suffix used to identify files containing source in that language.
For example, a configuration file for C++ source might be named
<tt>cpp</tt>, <tt>C</tt>, or <tt>cxx</tt>. For languages that support multiple
file suffixes, multiple (probably identical) files (or symbolic links) will
need to be provided.</p>
<h3>What Gets Read</h3>
<p>The master configuration file is always read. Which language specific
configuration files are read depends on the command line options and the
suffixes of the file names provided on <tt>llvmc</tt>'s command line. Note
<p>Which configuration files are read depends on the command line options and
the suffixes of the file names provided on <tt>llvmc</tt>'s command line. Note
that the <tt>--x LANGUAGE</tt> option alters the language that <tt>llvmc</tt>
uses for the subsequent files on the command line. Only the language
specific configuration files actually needed to complete <tt>llvmc</tt>'s
task are read. Other language specific files will be ignored.</p>
uses for the subsequent files on the command line. Only the configuration
files actually needed to complete <tt>llvmc</tt>'s task are read. Other
language specific files will be ignored.</p>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="syntax"></a>Syntax</div>
<div class="doc_text">
<p>The syntax of the configuration files is yet to be determined. There are
two viable options remaining:<br/>
<p>The syntax of the configuration files is very simple and somewhat
compatible with Java's property files. Here are the syntax rules:</p>
<ul>
<li>XML DTD Specific To <tt>llvmc</tt></li>
<li>Windows .ini style file with numerous sections</li>
</ul></p>
<li>The file encoding is ASCII.</li>
<li>The file is line oriented. There should be one configuration item per
line. Lines are terminated by the newline character (0x0A).</li>
<li>A configuration item consists of a name, an <tt>=</tt> and a value.</li>
<li>A name consists of a sequence of identifiers separated by period.</li>
<li>An identifier consists of specific keywords made up of only lower case
and upper case letters (e.g. <tt>lang.name</tt>).</li>
<li>Values come in four flavors: booleans, integers, commands and
strings.</li>
<li>Valid "false" boolean values are <tt>false False FALSE no No NO
off Off</tt> and <tt>OFF</tt>.</li>
<li>Valid "true" boolean values are <tt>true True TRUE yes Yes YES
on On</tt> and <tt>ON</tt>.</li>
<li>Integers are simply sequences of digits.</li>
<li>Commands start with a program name and are followed by a sequence of
words that are passed to that program as command line arguments. Program
arguments that begin and end with the <tt>@</tt> sign will have their value
substituted. Program names beginning with <tt>/</tt> are considered to be
absolute. Otherwise the <tt>PATH</tt> will be applied to find the program to
execute.</li>
<li>Strings are composed of multiple sequences of characters from the
character class <tt>[-A-Za-z0-9_:%+/\\|,]</tt> separated by white
space.</li>
<li>White space on a line is folded. Multiple blanks or tabs will be
reduced to a single blank.</li>
<li>White space before the configuration item's name is ignored.</li>
<li>White space on either side of the <tt>=</tt> is ignored.</li>
<li>White space in a string value is used to separate the individual
components of the string value but otherwise ignored.</li>
<li>Comments are introduced by the <tt>#</tt> character. Everything after a
<tt>#</tt> and before the end of line is ignored.</li>
</ul>
</div>
<!-- _______________________________________________________________________ -->
<div class="doc_subsection"><a name="master_items">Configuration Items</a></div>
<div class="doc_subsection"><a name="items">Configuration Items</a></div>
<div class="doc_text">
<p>The following description of configuration items is syntax-less and simply
uses a naming hierarchy to describe the configuration items. Whatever
syntax is chosen will need to map the hierarchy to the given syntax.</p>
<p>The table below provides definitions of the allowed configuration items
that may appear in a configuration file. Every item has a default value and
does not need to appear in the configuration file. Missing items will have the
default value. Each identifier may appear as all lower case, first letter
capitalized or all upper case.</p>
<table>
<tr>
<th>Name</th>
<th>Value Type</th>
<th>Description</th>
<th>Default</th>
</tr>
<tr><td colspan="4"><h4>LANG ITEMS</h4></td></tr>
<tr>
<td><b>lang.name</b></td>
<td>string</td>
<td class="td_left">Provides the common name for a language definition.
For example "C++", "Pascal", "FORTRAN", etc.</td>
<td><i>blank</i></td>
</tr>
<tr>
<td><b>Capabilities.hasPreProcessor</b></td>
<td><b>lang.opt1</b></td>
<td>string</td>
<td class="td_left">Specifies the parameters to give the optimizer when <tt>-O1</tt> is
specified on the <tt>llvmc</tt> command line.</td>
<td><tt>-simplifycfg -instcombine -mem2reg</tt></td>
</tr>
<tr>
<td><b>lang.opt2</b></td>
<td>string</td>
<td class="td_left">Specifies the parameters to give the optimizer when <tt>-O2</tt> is
specified on the <tt>llvmc</tt> command line.</td>
<td><i>TBD</i></td>
</tr>
<tr>
<td><b>lang.opt3</b></td>
<td>string</td>
<td class="td_left">Specifies the parameters to give the optimizer when <tt>-O3</tt> is
specified on the <tt>llvmc</tt> command line.</td>
<td><i>TBD</i></td>
</tr>
<tr>
<td><b>lang.opt4</b></td>
<td>string</td>
<td class="td_left">Specifies the parameters to give the optimizer when <tt>-O4</tt> is
specified on the <tt>llvmc</tt> command line.</td>
<td><i>TBD</i></td>
</tr>
<tr>
<td><b>lang.opt5</b></td>
<td>string</td>
<td class="td_left">Specifies the parameters to give the optimizer when <tt>-O5</tt> is
specified on the <tt>llvmc</tt> command line.</td>
<td><i>TBD</i></td>
</tr>
<tr><td colspan="4"><h4>PREPROCESSOR ITEMS</h4></td></tr>
<tr>
<td><b>preprocessor.command</b></td>
<td>command</td>
<td class="td_left">This provides the command prototype that will be used
to run the preprocessor. Valid substitutions are <tt>@in@</tt> for the
input file and <tt>@out@</tt> for the output file. This is generally only
used with the <tt>-E</tt> option.</td>
<td>&lt;blank&gt;</td>
</tr>
<tr>
<td><b>preprocessor.required</b></td>
<td>boolean</td>
<td class="td_left">This item specifies whether the language has a
pre-processing phase or not. This controls whether the B<-E> option works
for the language or not.</td>
<td class="td_left">This item specifies whether the pre-processing phase
is required by the language. If the value is true, then the
<tt>preprocessor.command</tt> value must not be blank. With this option,
<tt>llvmc</tt> will always run the preprocessor as it assumes that the
translation and optimization phases don't know how to pre-process their
input.</td>
<td>false</td>
</tr>
<tr><td colspan="4"><h4>TRANSLATOR ITEMS</h4></td></tr>
<tr>
<td><b>translator.command</b></td>
<td>command</td>
<td class="td_left">This provides the command prototype that will be used
to run the translator. Valid substitutions are <tt>@in@</tt> for the
input file and <tt>@out@</tt> for the output file.</td>
<td>&lt;blank&gt;</td>
</tr>
<tr>
<td><b>Capabilities.outputFormat</b></td>
<td>"bc" or "ll"</td>
<td><b>translator.output</b></td>
<td><tt>native</tt>, <tt>bytecode</tt> or <tt>assembly</tt></td>
<td class="td_left">This item specifies the kind of output the language's
compiler generates. The choices are either bytecode (<tt>bc</tt>) or LLVM
assembly (<tt>ll</tt>).</td>
translator generates.</td>
<td><tt>bytecode</tt></td>
</tr>
<tr>
<td><b>Capabilities.understandsOptimization</b></td>
<td><b>translator.preprocesses</b></td>
<td>boolean</td>
<td>Indicates whether the compiler for this language understands the
<tt>-O</tt> options or not</td>
<td class="td_left">Indicates that the translator also preprocesses. If this is true, then
<tt>llvmc</tt> will skip the pre-processing phase whenever the final
phase is not pre-processing.</td>
<td><tt>false</tt></td>
</tr>
<tr>
<td><b>translator.optimizers</b></td>
<td>boolean</td>
<td class="td_left">Indicates that the translator also optimizes. If this is true, then
<tt>llvmc</tt> will skip the optimization phase whenever the final phase
is optimization or later.</td>
<td><tt>false</tt></td>
</tr>
<tr>
<td><b>translator.groks_dash_o</b></td>
<td>boolean</td>
<td class="td_left">Indicates that the translator understands the <i>intent</i> of the
various <tt>-O</tt><i>n</i> options to <tt>llvmc</tt>. This will cause the
<tt>-O</tt><i>n</i> option to be based to the translator instead of the
equivalent options provided by <tt>lang.opt</tt><i>n</i>.</td>
<td><tt>false</tt></td>
</tr>
<tr><td colspan="4"><h4>OPTIMIZER ITEMS</h4></td></tr>
<tr><td colspan="4"><h4>ASSEMBLER ITEMS</h4></td></tr>
<tr><td colspan="4"><h4>LINKER ITEMS</h4></td></tr>
</table>
</div>