diff --git a/docs/CompilerDriver.html b/docs/CompilerDriver.html new file mode 100644 index 00000000000..a5ba1a68542 --- /dev/null +++ b/docs/CompilerDriver.html @@ -0,0 +1,572 @@ + + + + + The LLVM Compiler Driver (llvmc) + + + + + + +
The LLVM Compiler Driver (llvmc)
+

NOTE: This document is a work in progress!

+
    +
  1. Abstract
  2. +
  3. Introduction +
      +
    1. Purpose
    2. +
    3. Operation
    4. +
    5. Phases
    6. +
    7. Actions
    8. +
    +
  4. +
  5. Details +
  6. Configuration +
  7. Glossary +
+
+

Written by Reid Spencer +

+
+ + +
Abstract
+ +
+

This document describes the requirements, design, and configuration of the + LLVM compiler driver, llvmc. The compiler driver knows about LLVM's + tool set and can be configured to know about a variety of compilers for + source languages. It uses this knowledge to execute the tools necessary + to accomplish general compilation, optimization, and linking tasks. The main + purpose of llvmc is to provide a simple and consistent interface to + all compilation tasks. This reduces the burden on the end user who can just + learn to use llvmc instead of the entire LLVM tool set and all the + source language compilers compatible with LLVM.

+
+ +
Introduction
+ +
+

The llvmc tool is a configurable compiler + driver. As such, it isn't the compiler, optimizer, + or linker itself but it drives (invokes) other software that perform those + tasks. If you are familiar with the GNU Compiler Collection's gcc + tool, llvmc is very similar.

+

The following introductory sections will help you understand why this tool + is necessary and what it does.

+
+ + +
Purpose
+
+

llvmc was invented to make compilation with LLVM based compilers + easier. To accomplish this, llvmc strives to:

+ +

Additionally, llvmc makes it easier to write a compiler for use + with LLVM, because it:

+
+ + +
Operation
+
+

At a high level, llvmc operation is very simple. The basic action + taken by llvmc is to simply invoke some tool or set of tools to fill + the user's request for compilation. Every execution of llvmctakes the + following sequence of steps:
+

+
Collect Command Line Options
+
The command line options provide the marching orders to llvmc + on what actions it should perform. This is the request the user is making + of llvmc and it is interpreted first. See the llvmc + manual page for details on the + options.
+
Read Configuration Files
+
Based on the options and the suffixes of the filenames presented, a set + of configuration files are read to configure the actions llvmc will + take. Configuration files are provided by either LLVM or the front end + compiler tools that B invokes. These files determine what actions + llvmc will take in response to the user's request. See the section + on configuration for more details.
+
Determine Phases To Execute
+
Based on the command line options and configuration files, + llvmc determines the compilation phases that + must be executed by the user's request. This is the primary work of + llvmc.
+
Determine Actions To Execute
+
Each phase to be executed can result in the + invocation of one or more actions. An action is + either a whole program or a function in a dynamically linked shared library. + In this step, llvmc determines the sequence of actions that must be + executed. Actions will always be executed in a deterministic order.
+
Execute Actions
+
The actions necessary to support the user's + original request are executed sequentially and deterministically. All + actions result in either the invocation of a whole program to perform the + action or the loading of a dynamically linkable shared library and invocation + of a standard interface function within that library.
+
Termination
+
If any action fails (returns a non-zero result code), llvmc + also fails and returns the result code from the failing action. If + everything succeeds, llvmc will return a zero result code.
+

+

llvmc's operation must be simple, regular and predictable. + Developers need to be able to rely on it to take a consistent approach to + compilation. For example, the invocation:

+
+   llvmc -O2 x.c y.c z.c -o xyz
+

must produce exactly the same results as:

+
+   llvmc -O2 x.c
+   llvmc -O2 y.c
+   llvmc -O2 z.c
+   llvmc -O2 x.o y.o z.o -o xyz
+

To accomplish this, llvmc uses a very simple goal oriented + procedure to do its work. The overall goal is to produce a functioning + executable. To accomplish this, llvmc always attempts to execute a + series of compilation phases in the same sequence. + However, the user's options to llvmc can cause the sequence of phases + to start in the middle or finish early.

+
+ + +
Phases
+
+

llvmc breaks every compilation task into the following five + distinct phases:

+
Preprocessing
Not all languages support preprocessing; + but for those that do, this phase can be invoked. This phase is for + languages that provide combining, filtering, or otherwise altering with the + source language input before the translator parses it. Although C and C++ + are the most common users of this phase, other languages may provide their + own preprocessor (whether its the C pre-processor or not).
+
+
Translation
The translation phase converts the source + language input into something that LLVM can interpret and use for + downstream phases. The translation is essentially from "non-LLVM form" to + "LLVM form".
+
+
Optimization
Once an LLVM Module has been obtained from + the translation phase, the program enters the optimization phase. This phase + attempts to optimize all of the input provided on the command line according + to the options provided.
+
+
Linking
The inputs are combined to form a complete + program.
+
+

The following table shows the inputs, outputs, and command line options + applicabe to each phase.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PhaseInputsOutputsOptions
Preprocessing
  • Source Language File
  • Source Language File
+
-E
+
Stops the compilation after preprocessing
+
Translation
    +
  • Source Language File
  • +
    +
  • LLVM Assembly
  • +
  • LLVM Bytecode
  • +
  • LLVM C++ IR
  • +
+
-c
+
Stops the compilation after translation so that optimization and + linking are not done.
+
-S
+
Stops the compilation before object code is written so that only + assembly code remains.
+
Optimization
    +
  • LLVM Assembly
  • +
  • LLVM Bytecode
  • +
    +
  • LLVM Bytecode
  • +
+
-Ox +
This group of options affects the amount of optimization + performed.
+
Linking
    +
  • LLVM Bytecode
  • +
  • Native Object Code
  • +
  • LLVM Library
  • +
  • Native Library
  • +
    +
  • LLVM Bytecode Executable
  • +
  • Native Executable
  • +
+
-L
Specifies a path for library search.
+
-l
Specifies a library to link in.
+
+
+ + +
Actions
+
+

An action, with regard to llvmc is a basic operation that it takes + in order to fulfill the user's request. Each phase of compilation will invoke + zero or more actions in order to accomplish that phase.

+

Actions come in two forms:

    +
  1. Invokable Executables
  2. +
  3. Functions in a shared library
  4. +

    +
+ + +
Details
+ +
+
+ + +
Configuration
+ +
+

This section of the document describes the configuration files used by + llvmc. Configuration information is relatively static for a + given release of LLVM and a front end compiler. However, the details may + change from release to release of either. Users are encouraged to simply use + the various options of the B command and ignore the configuration of + the tool. These configuration files are for compiler writers and LLVM + developers. Those wishing to simply use B don't need to understand + this section but it may be instructive on how the tool works.

+
+ + +
Overview
+
+

llvmc is highly configurable both on the command line and in +configuration files. The options it understands are generic, consistent and +simple by design. Furthermore, the llvmc options apply to the +compilation of any LLVM enabled programming language. To be enabled as a +supported source language compiler, a compiler writer must provide a +configuration file that tells llvmc how to invoke the compiler +and what its capabilities are. The purpose of the configuration files then +is to allow compiler writers to specify to llvmc how the compiler +should be invoked. Users may but are not advised to alter the compiler's +llvmc configuration.

+ +

Because llvmc just invokes other programs, it must deal with the +available command line options for those programs regardless of whether they +were written for LLVM or not. Furthermore, not all compilation front ends will +have the same capabilities. Some front ends will simply generate LLVM assembly +code, others will be able to generate fully optimized byte code. In general, +llvmc doesn't make any assumptions about the capabilities or command +line options of a sub-tool. It simply uses the details found in the configuration +files and leaves it to the compiler writer to specify the configuration +correctly.

+ +

This approach means that new compiler front ends can be up and working very +quickly. As a first cut, a front end can simply compile its source to raw +(unoptimized) bytecode or LLVM assembly and llvmc can be configured +to pick up the slack (translate LLVM assembly to bytecode, optimize the +bytecode, generate native assembly, link, etc.). In fact, the front end need +not use any LLVM libraries, and it could be written in any language (instead of +C++). The configuration data will allow the full range of optimization, +assembly, and linking capabilities that LLVM provides to be added to these kinds +of tools. Enabling the rapid development of front-ends is one of the primary +goals of llvmc.

+ +

As a compiler front end matures, it may utilize the LLVM libraries and tools +to more efficiently produce optimized bytecode directly in a single compilation +and optimization program. In these cases, multiple tools would not be needed +and the configuration data for the compiler would change.

+ +

Configuring llvmc to the needs and capabilities of a source language +compiler is relatively straight forward. A compiler writer must provide a +definition of what to do for each of the five compilation phases for each of +the optimization levels. The specification consists simply of prototypical +command lines into which llvmc can substitute command line +arguments and file names. Note that any given phase can be completely blank if +the source language's compiler combines multiple phases into a single program. +For example, quite often pre-processing, translation, and optimization are +combined into a single program. The specification for such a compiler would have +blank entries for pre-processing and translation but a full command line for +optimization.

+
+ + +
Configuration Files
+
+

Types of Files

+

There are two types of configuration files: the master configuration file + and the language specific configuration file. The master configuration file + contains the general configuration of llvmc itself and is supplied + with the tool. It contains information that is source language agnostic. + Language specific configuration files tell llvmc how to invoke the + language's compiler for a variety of different tasks and what other tools + are needed to backfill the compiler's missing features (e.g. + optimization).

+ +

Directory Search

+

llvmc always looks for files of a specific name. It uses the + first file with the name its looking for by searching directories in the + following order:
+

    +
  1. Any directory specified by the --config-dir option will be + checked first.
  2. +
  3. If the environment variable LLVM_CONFIG_DIR is set, and it contains + the name of a valid directory, that directory will be searched next.
  4. +
  5. If the user's home directory (typically /home/user contains + a sub-directory named .llvm and that directory contains a + sub-directory named etc then that directory will be tried + next.
  6. +
  7. If the LLVM installation directory (typically /usr/local/llvm + contains a sub-directory named etc then that directory will be + tried last.
  8. +
  9. If the configuration file sought still can't be found, llvmc + will print an error message and exit.
  10. +
+ The first file found in this search will be used. Other files with the same + name will be ignored even if they exist in one of the subsequent search + locations.

+ +

File Names

+

In the directories searched, a file named master will be + recognized as the master configuration file for llvmc. Note that + users may override the master file with a copy in their home directory + but they are advised not to. This capability is only useful for compiler + implementers needing to alter the master configuration while developing + their compiler front end. When reading the configuration files, the master + files are always read first.

+

Language specific configuration files are given specific names to foster + faster lookup. The name of a given language specific configuration file is + the same as the suffix used to identify files containing source in that + language. For example, a configuration file for C++ source might be named + cpp, C, or cxx.

+ +

What Gets Read

+

The master configuration file is always read. Which language specific + configuration files are read depends on the command line options and the + suffixes of the file names provided on llvmc's command line. Note + that the --x LANGUAGE option alters the language that llvmc + uses for the subsequent files on the command line. Only the language + specific configuration files actually needed to complete llvmc's + task are read. Other language specific files will be ignored.

+
+ + +
Syntax
+
+

The syntax of the configuration files is yet to be determined. There are + two viable options remaining:
+

+
+ + +
+ Master Configuration Items +
+
+
+
+=head3 Section: [lang=I]
+
+This section provides the master configuration data for a given language. The
+language specific data will be found in a file named I.
+
+=over
+
+=item CI
+
+This adds the I specified to the list of recognized suffixes for
+the I identified in the section. As many suffixes as are commonly used
+for source files for the I should be specified. 
+
+=back
+
+=begin html
+
+

For example, the following might appear for C++: +


+[lang=C++]
+suffix=.cpp
+suffix=.cxx
+suffix=.C
+

+ +=end html +
+
+ + +
+ Language Specific Configuration Items +
+
+
+=head3 Section: [general]
+
+=over
+
+=item C
+
+This item specifies whether the language has a pre-processing phase or not. This
+controls whether the B<-E> option works for the language or not.
+
+=item C
+
+This item specifies the kind of output the language's compiler generates. The
+choices are either bytecode (C) or LLVM assembly (C).
+
+=back
+
+=head3 Section: [-O0]
+
+=over
+
+=item CI
+
+This item specifies the I to use for pre-processing the input.
+
+=over
+
+Valid substitutions for this item are:
+
+=item %in%
+
+The input source file.
+
+=item %out%
+
+The output file.
+
+=item %options%
+
+Any pre-processing specific options (e.g. B<-I>).
+
+=back
+
+=item CI
+
+This item specifies the I to use for translating the source
+language input into the output format given by the C item.
+
+=item CI
+
+This item specifies the I for optimizing the translator's output.
+
+=back
+
+
+ + +
Glossary
+ +
+

This document uses precise terms in reference to the various artifacts and + concepts related to compilation. The terms used throughout this document are + defined below.

+
+
assembly
+
A compilation phase in which LLVM bytecode or + LLVM assembly code is assembled to a native code format (either target + specific aseembly language or the platform's native object file format). +
+ +
compiler
+
Refers to any program that can be invoked by llvmc to accomplish + the work of one or more compilation phases.
+ +
driver
+
Refers to llvmc itself.
+ +
linking
+
A compilation phase in which LLVM bytecode files + and (optionally) native system libraries are combined to form a complete + executable program.
+ +
optimization
+
A compilation phase in which LLVM bytecode is + optimized.
+ +
phase
+
Refers to any one of the five compilation phases that that + llvmc supports. The five phases are: + preprocessing, + translation, + optimization, + assembly, + linking.
+ +
source language
+
Any common programming language (e.g. C, C++, Java, Stacker, ML, + FORTRAN). These languages are distinguished from any of the lower level + languages (such as LLVM or native assembly), by the fact that a + translation phase + is required before LLVM can be applied.
+ +
tool
+
Refers to any program in the LLVM tool set.
+ +
translation
+
A compilation phase in which + source language code is translated into + either LLVM assembly language or LLVM bytecode.
+
+
+ +
+
Valid CSS!Valid HTML 4.01!Reid Spencer
+The LLVM Compiler Infrastructure
+Last modified: $Date$ +
+ + +