LLVM 2.0 Release Notes
- Introduction
- What's New?
- Installation Instructions
- Portability and Supported Platforms
- Known Problems
- Additional Information
This document contains the release notes for the LLVM compiler
infrastructure, release 2.0. Here we describe the status of LLVM, including any
known problems and major improvements from the previous release. All LLVM
releases may be downloaded from the LLVM
releases web site.
For more information about LLVM, including information about the latest
release, please check out the main LLVM
web site. If you have questions or comments, the LLVM developer's mailing
list is a good place to send them.
Note that if you are reading this file from CVS or the main LLVM web page,
this document applies to the next release, not the current one. To see
the release notes for the current or previous releases, see the releases page.
This is the eleventh public release of the LLVM Compiler Infrastructure.
Being the first major release since 1.0, this release is different in several
ways from our previous releases:
- We took this as an opportunity to
break backwards compatibility with the LLVM 1.x bytecode and .ll file format.
If you have LLVM 1.9 .ll files that you would like to upgrade to LLVM 2.x, we
recommend the use of the stand alone llvm-upgrade
tool (which is included with 2.0). We intend to keep compatibility with .ll
and .bc formats within the 2.x release series, like we did within the 1.x
series.
- There are several significant change to the LLVM IR and internal APIs, such
as a major overhaul of the type system, the completely new bitcode file
format, etc (described below).
- We designed the release around a 6 month release cycle instead of the usual
3-month cycle. This gave us extra time to develop and test some of the
more invasive features in this release.
- LLVM 2.0 no longer supports the llvm-gcc3 front-end. Users are required to
upgrade to llvm-gcc4. llvm-gcc4 includes many features over
llvm-gcc3, is faster, and is much easier to
build from source.
Note that while this is a major version bump, this release has been
extensively tested on a wide range of software. It is easy to say that this
is our best release yet, in terms of both features and correctness. This is
the first LLVM release to correctly compile and optimize major software like
LLVM itself, Mozilla/Seamonkey, Qt 4.3rc1, kOffice, etc out of the box on
linux/x86.
Changes to the LLVM IR itself:
- Integer types are now completely signless. This means that we
have types like i8/i16/i32 instead of ubyte/sbyte/short/ushort/int
etc. LLVM operations that depend on sign have been split up into
separate instructions (PR950). This
eliminates cast instructions that just change the sign of the operands (e.g.
int -> uint), which reduces the size of the IR and makes optimizers
simpler to write.
- Integer types with arbitrary bitwidths (e.g. i13, i36, i42, i1057, etc) are
now supported in the LLVM IR and optimizations (PR1043). However, neither llvm-gcc
(PR1284) nor the native code generators
(PR1270) support non-standard width
integers yet.
- 'Type planes' have been removed (PR411).
It is no longer possible to have two values with the same name in the
same symbol table. This simplifies LLVM internals, allowing significant
speedups.
- Global variables and functions in .ll files are now prefixed with
@ instead of % (PR645).
- The LLVM 1.x "bytecode" format has been replaced with a
completely new binary representation, named 'bitcode'. The Bitcode Format brings a
number of advantages to the LLVM over the old bytecode format: it is denser
(files are smaller), more extensible, requires less memory to read,
is easier to keep backwards compatible (so LLVM 2.5 will read 2.0 .bc
files), and has many other nice features.
- Load and store instructions now track the alignment of their pointer
(PR400). This allows the IR to
express loads that are not sufficiently aligned (e.g. due to '#pragma
packed') or to capture extra alignment information.
Major new features:
- A number of ELF features are now supported by LLVM, including 'visibility',
extern weak linkage, Thread Local Storage (TLS) with the __thread
keyword, and symbol aliases.
Among other things, this means that many of the special options needed to
configure llvm-gcc on linux are no longer needed, and special hacks to build
large C++ libraries like Qt are not needed.
- LLVM now has a new MSIL backend. llc -march=msil will now turn LLVM
into MSIL (".net") bytecode. This is still fairly early development
with a number of limitations.
- A new llvm-upgrade tool
exists to migrates LLVM 1.9 .ll files to LLVM 2.0 syntax.
New features include:
- Precompiled Headers (PCH) are now supported.
- "#pragma packed" is now supported, as are the various features
described above (visibility, extern weak linkage, __thread, aliases,
etc).
- Tracking function parameter/result attributes is now possible.
- Many internal enhancements have been added, such as improvements to
NON_LVALUE_EXPR, arrays with non-zero base, structs with variable sized
fields, VIEW_CONVERT_EXPR, CEIL_DIV_EXPR, nested functions, and many other
things. This is primarily to supports non-C GCC front-ends, like Ada.
- It is simpler to configure llvm-gcc for linux.
New features include:
- The pass manager has been entirely
rewritten, making it significantly smaller, simpler, and more extensible.
Support has been added to run FunctionPasses interlaced with
CallGraphSCCPasses, and we now support loop transformations explicitly with
LoopPass.
- The -scalarrepl pass can now promote unions containing FP values
into a register, it can also handle unions of vectors of the same
size.
- LLVM 2.0 includes a new loop rotation pass, which converts "for loops" into
"do/while loops", where the condition is at the bottom of the loop.
- The Loop Strength Reduction pass has been improved, and support added
for sinking expressions across blocks to reduce register pressure.
- ModulePasses may now use the result of FunctionPasses.
- The [Post]DominatorSet classes have been removed from LLVM and clients
switched to use the far-more-efficient ETForest class instead.
- The ImmediateDominator class has also been removed, and clients have been
switched to use DominatorTree instead.
- The predicate simplifier pass has been improved, making it able to do
simple value range propagation and eliminate more conditionals.
New features include:
- Support was added for software floating point, which allows LLVM to target
chips that don't have hardware FPUs (e.g. ARM thumb mode).
- A new register scavenger has been implemented, which is useful for
finding free registers after register allocation. This is useful when
rewriting frame references on RISC targets, for example.
- Heuristics have been added to avoid coalescing vregs with very large live
ranges to physregs. This was bad because it effectively pinned the physical
register for the entire lifetime of the virtual register (PR711).
- Support now exists for very simple (but still very useful)
rematerialization the register allocator, enough to move
instructions like "load immediate" and constant pool loads.
- Switch statement lowering is significantly better, improving codegen for
sparse switches that have dense subregions, and implemented support
for the shift/and trick.
- Added support for tracking physreg sub-registers and super-registers
in the code generator, as well as extensive register
allocator changes to track them.
- There is initial support for virtreg sub-registers
(PR1350).
Other improvements include:
- Inline assembly support is much more solid that before.
The two primary features still missing are support for 80-bit floating point
stack registers on X86 (PR879), and
support for inline asm in the C backend (PR802).
- DWARF debug information generation has been improved. LLVM now passes
most of the GDB testsuite on MacOS and debug info is more dense.
- Codegen support for Zero-cost DWARF exception handling has been added (PR592). It is mostly
complete and just in need of continued bug fixes and optimizations at
this point. However, support in llvm-g++ is disabled with an
#ifdef for the 2.0 release (PR870).
- The code generator now has more accurate and general hooks for
describing addressing modes ("isLegalAddressingMode") to
optimizations like loop strength reduction and code sinking.
- Progress has been made on a direct Mach-o .o file writer. Many small
apps work, but it is still not quite complete.
In addition, the LLVM target description format has itself been extended in
several ways:
- Extended TargetData to support better target parameterization in
the .ll/.bc files, eliminating the 'pointersize/endianness' attributes
in the files (PR761).
- TargetData was generalized for finer grained alignment handling,
handling of vector alignment, and handling of preferred alignment
- LLVM now supports describing target calling conventions
explicitly in .td files, reducing the amount of C++ code that needs
to be written for a port.
X86-specific Code Generator Enhancements:
- The MMX instruction set is now supported through intrinsics.
- The scheduler was improved to better reduce register pressure on
X86 and other targets that are register pressure sensitive.
- Linux/x86-64 support is much better.
- PIC support for linux/x86 has been added.
- The X86 backend now supports the GCC regparm attribute.
- LLVM now supports inline asm with multiple constraint letters per operand
(like "ri") which is common in X86 inline asms.
ARM-specific Code Generator Enhancements:
- The ARM code generator is now stable and fully supported.
- There are major new features, including support for ARM
v4-v6 chips, vfp support, soft float point support, pre/postinc support,
load/store multiple generation, constant pool entry motion (to support
large functions), and inline asm support, weak linkage support, static
ctor/dtor support and many bug fixes.
- Added support for Thumb code generation (llc -march=thumb).
- The ARM backend now supports the ARM AAPCS/EABI ABI and PIC codegen on
arm/linux.
- Several bugs were fixed for DWARF debug info generation on arm/linux.
PowerPC-specific Code Generator Enhancements:
- The PowerPC 64 JIT now supports addressing code loaded above the 2G
boundary.
- Improved support for the Linux/ppc ABI and the linux/ppc JIT is fully
functional now. llvm-gcc and static compilation are not fully supported
yet though.
- Many PowerPC 64 bug fixes.
More specific changes include:
- LLVM no longer relies on static destructors to shut itself down. Instead,
it lazily initializes itself and shuts down when llvm_shutdown() is
explicitly called.
- LLVM now has significantly fewer static constructors, reducing startup time.
- Several classes have been refactored to reduce the amount of code that
gets linked into apps that use the JIT.
- Construction of intrinsic function declarations has been simplified.
- The gccas/gccld tools have been replaced with small shell scripts.
- Support has been added to llvm-test for running on low-memory
or slow machines (make SMALL_PROBLEM_SIZE=1).
LLVM 2.0 contains a revamp of the type system and several other significant
internal changes. If you are programming to the C++ API, be aware of the
following major changes:
- Pass registration is slightly different in LLVM 2.0 (you now need an
intptr_t in your constructor), as explained in the Writing an LLVM Pass
document.
- ConstantBool, ConstantIntegral and ConstantInt
classes have been merged together, we now just have
ConstantInt.
- Type::IntTy, Type::UIntTy, Type::SByteTy, ... are
replaced by Type::Int8Ty, Type::Int16Ty, etc. LLVM types
have always corresponded to fixed size types
(e.g. long was always 64-bits), but the type system no longer includes
information about the sign of the type.
- Several classes (CallInst, GetElementPtrInst,
ConstantArray, etc), that once took std::vector as
arguments now take ranges instead. For example, you can create a
GetElementPtrInst with code like:
Value *Ops[] = { Op1, Op2, Op3 };
GEP = new GetElementPtrInst(BasePtr, Ops, 3);
This avoids creation of a temporary vector (and a call to malloc/free). If
you have an std::vector, use code like this:
std::vector<Value*> Ops = ...;
GEP = new GetElementPtrInst(BasePtr, &Ops[0], Ops.size());
- CastInst is now abstract and its functionality is split into several parts,
one for each of the new cast
instructions.
- Instruction::getNext()/getPrev() are now private (along with
BasicBlock::getNext, etc), for efficiency reasons (they are now no
longer just simple pointers). Please use BasicBlock::iterator, etc instead.
- Module::getNamedFunction() is now called
Module::getFunction().
- SymbolTable.h has been split into ValueSymbolTable.h and
TypeSymbolTable.h.
LLVM is known to work on the following platforms:
- Intel and AMD machines running Red Hat Linux, Fedora Core and FreeBSD
(and probably other unix-like systems).
- Intel and AMD machines running on Win32 using MinGW libraries (native)
- Sun UltraSPARC workstations running Solaris 8.
- Intel and AMD machines running on Win32 with the Cygwin libraries (limited
support is available for native builds with Visual C++).
- PowerPC and X86-based Mac OS X systems, running 10.2 and above in 32-bit and
64-bit modes.
- Alpha-based machines running Debian GNU/Linux.
- Itanium-based machines running Linux and HP-UX.
The core LLVM infrastructure uses
GNU autoconf to adapt itself
to the machine and operating system on which it is built. However, minor
porting may be required to get LLVM to work on new platforms. We welcome your
portability patches and reports of successful builds or error messages.
This section contains all known problems with the LLVM system, listed by
component. As new problems are discovered, they will be added to these
sections. If you run into a problem, please check the LLVM bug database and submit a bug if
there isn't already one.
The following components of this LLVM release are either untested, known to
be broken or unreliable, or are in early development. These components should
not be relied on, and bugs should not be filed against them, but they may be
useful to some people. In particular, if you would like to work on one of these
components, please contact us on the LLVMdev list.
- The -cee pass is known to be buggy, and may be removed in in a
future release.
- C++ EH support
- The IA64 code generator is experimental.
- The Alpha JIT is experimental.
- "-filetype=asm" (the default) is the only supported value for the
-filetype llc option.
- The Thumb mode works only on ARMv6 or higher processors. On sub-ARMv6
processors, any thumb program compiled with LLVM crashes or produces wrong
results. (PR1388)
- Compilation for ARM Linux OABI (old ABI) is supported, but not fully tested.
- QEMU-ARM (<= 0.9.0) wrongly executes programs compiled with LLVM. A non-affected QEMU version must be used or this
patch must be applied on QEMU.
- The SPARC backend only supports the 32-bit SPARC ABI (-m32), it does not
support the 64-bit SPARC ABI (-m64).
- On 21164s, some rare FP arithmetic sequences which may trap do not have the
appropriate nops inserted to ensure restartability.
- C++ programs are likely to fail on IA64, as calls to setjmp are
made where the argument is not 16-byte aligned, as required on IA64. (Strictly
speaking this is not a bug in the IA64 back-end; it will also be encountered
when building C++ programs using the C back-end.)
- The C++ front-end does not use IA64
ABI compliant layout of v-tables. In particular, it just stores function
pointers instead of function descriptors in the vtable. This bug prevents
mixing C++ code compiled with LLVM with C++ objects compiled by other C++
compilers.
- There are a few ABI violations which will lead to problems when mixing LLVM
output with code built with other compilers, particularly for floating-point
programs.
- Defining vararg functions is not supported (but calling them is ok).
- The Itanium backend has bitrotted somewhat.
Bugs
llvm-gcc4 does not currently support Link-Time
Optimization on most platforms "out-of-the-box". Please inquire on the
llvmdev mailing list if you are interested.
Notes
"long double" is silently transformed by the front-end into "double". There
is no support for floating point data types of any size other than 32 and 64
bits.
llvm-gcc does not support __builtin_apply yet.
See Constructing Calls: Dispatching a call to another function.
llvm-gcc partially supports these GCC extensions:
- Nested Functions: As in Algol and Pascal, lexical scoping of functions.
Nested functions are supported, but llvm-gcc does not support non-local
gotos or taking the address of a nested function.
- Function Attributes:
Declaring that functions have no side effects or that they can never
return.
Supported: alias, always_inline, cdecl,
constructor, destructor,
deprecated, fastcall, format,
format_arg, non_null, noreturn, regparm
section, stdcall, unused, used,
visibility, warn_unused_result, weak
Ignored: noinline, pure, const, nothrow,
malloc, no_instrument_function
llvm-gcc supports the vast majority of GCC extensions, including:
- Pragmas: Pragmas accepted by GCC.
- Local Labels: Labels local to a block.
- Other Builtins:
Other built-in functions.
- Variable Attributes:
Specifying attributes of variables.
- Type Attributes: Specifying attributes of types.
- Thread-Local: Per-thread variables.
- Variable Length:
Arrays whose length is computed at run time.
- Labels as Values: Getting pointers to labels and computed gotos.
- Statement Exprs: Putting statements and declarations inside expressions.
- Typeof:
typeof
: referring to the type of an expression.
- Lvalues: Using
?:
, ",
" and casts in lvalues.
- Conditionals: Omitting the middle operand of a
?:
expression.
- Long Long: Double-word integers.
- Complex: Data types for complex numbers.
- Hex Floats:Hexadecimal floating-point constants.
- Zero Length: Zero-length arrays.
- Empty Structures: Structures with no members.
- Variadic Macros: Macros with a variable number of arguments.
- Escaped Newlines: Slightly looser rules for escaped newlines.
- Extended Asm: Assembler instructions with C expressions as operands.
- Constraints: Constraints for asm operands.
- Asm Labels: Specifying the assembler name to use for a C symbol.
- Explicit Reg Vars: Defining variables residing in specified registers.
- Vector Extensions: Using vector instructions through built-in functions.
- Target Builtins: Built-in functions specific to particular targets.
- Subscripting: Any array can be subscripted, even if not an lvalue.
- Pointer Arith: Arithmetic on
void
-pointers and function pointers.
- Initializers: Non-constant initializers.
- Compound Literals: Compound literals give structures, unions,
or arrays as values.
- Designated Inits: Labeling elements of initializers.
- Cast to Union: Casting to union type from any member of the union.
- Case Ranges: `case 1 ... 9' and such.
- Mixed Declarations: Mixing declarations and code.
- Function Prototypes: Prototype declarations and old-style definitions.
- C++ Comments: C++ comments are recognized.
- Dollar Signs: Dollar sign is allowed in identifiers.
- Character Escapes:
\e
stands for the character <ESC>.
- Alignment: Inquiring about the alignment of a type or variable.
- Inline: Defining inline functions (as fast as macros).
- Alternate Keywords:
__const__
, __asm__
, etc., for header files.
- Incomplete Enums:
enum foo;
, with details to follow.
- Function Names: Printable strings which are the name of the current function.
- Return Address: Getting the return or frame address of a function.
- Unnamed Fields: Unnamed struct/union fields within structs/unions.
- Attribute Syntax: Formal syntax for attributes.
If you run into GCC extensions which have not been included in any of these
lists, please let us know (also including whether or not they work).
The C++ front-end is considered to be fully
tested and works for a number of non-trivial programs, including LLVM
itself, Qt, Mozilla, etc.
A wide variety of additional information is available on the LLVM web page, in particular in the documentation section. The web page also
contains versions of the API documentation which is up-to-date with the CVS
version of the source code.
You can access versions of these documents specific to this release by going
into the "llvm/doc/" directory in the LLVM tree.
If you have any questions or comments about LLVM, please feel free to contact
us via the mailing
lists.
LLVM Compiler Infrastructure
Last modified: $Date$