optimizer and codegen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57468 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Chris Lattner
2008-10-13 21:50:36 +00:00
parent 7752d1ac04
commit f301387a36

View File

@ -157,7 +157,7 @@ in this section.
<ul> <ul>
<li><p>The most visible end-user change in LLVM 2.4 is that it includes many <li><p>The most visible end-user change in LLVM 2.4 is that it includes many
optimizations and changes ot make -O0 compile times much faster. You should see optimizations and changes to make -O0 compile times much faster. You should see
improvements on the order of 30% or more faster than LLVM 2.3. There are many improvements on the order of 30% or more faster than LLVM 2.3. There are many
pieces to this change, described in more detail below. The speedups and new pieces to this change, described in more detail below. The speedups and new
components can also be used for JIT compilers that want fast compilation as components can also be used for JIT compilers that want fast compilation as
@ -195,8 +195,9 @@ includes support for the C, C++, Objective-C, Ada, and Fortran front-ends.</p>
<ul> <ul>
<li>LLVM 2.4 supports the full set of atomic <tt>__sync_*</tt> builtins. LLVM <li>LLVM 2.4 supports the full set of atomic <tt>__sync_*</tt> builtins. LLVM
2.3 only supported those used by OpenMP, but 2.4 supports them all. While 2.3 only supported those used by OpenMP, but 2.4 supports them all. While
llvm-gcc supports all of these builtins, note that not all targets do. X86 and llvm-gcc supports all of these builtins, note that not all targets do. X86
PowerPC are known to support them all in both 32-bit and 64-bit mode.</li> support them all in both 32-bit and 64-bit mode and PowerPC supports them all
except for the 64-bit operations when in 32-bit mode.</li>
<li>llvm-gcc now supports an <tt>-flimited-precision</tt> option, which tells <li>llvm-gcc now supports an <tt>-flimited-precision</tt> option, which tells
the compiler that it is ok to use low-precision approximations of certain libm the compiler that it is ok to use low-precision approximations of certain libm
@ -274,30 +275,40 @@ function should be optimized for code size.</li>
<div class="doc_text"> <div class="doc_text">
<p>In addition to a huge array of bug fixes and minor performance tweaks, the <p>In addition to a huge array of bug fixes and minor performance tweaks, this
LLVM 2.4 optimizers support a few major enhancements:</p> release includes a few major enhancements and additions to the optimizers:</p>
<ul> <ul>
<li>GVN now does local PRE?</li> <li>The Global Value Numbering (GVN) pass now does local Partial Redundancy
Elimination (PRE) to eliminate some partially redundant expressions in cases
where doing so won't grow code size.</li>
<li>Matthijs' Dead argument elimination rewrite</li> <li>LLVM 2.4 includes a new loop deletion pass (which removes output-free
provably-finite loops) and a rewritten Aggressive Dead Code Elimination (ADCE)
<li>Old-ADCE used control dependence and deleted output-free infinite loops. pass that no longer uses control dependence information. These changes speed up
Added a new Loop deletion pass (for deleting output free provably-finite loops) the optimizer and also prevents it from deleting output-free infinite
and rewrote ADCE to be simpler faster, and not need control dependence.</li> loops.</li>
<li>SparsePropagation framework for lattice-based dataflow solvers.</li>
<li>Tail duplication was removed from the standard optimizer sequence.</li>
<li>Various helper functions (ComputeMaskedBits, ComputeNumSignBits, etc) were
pulled out of instcombine and put into a new ValueTracking.h file, where they
can be reused by other passes.</li>
<li>The new AddReadAttrs pass works out which functions are read-only or <li>The new AddReadAttrs pass works out which functions are read-only or
read-none (these correspond to 'pure' and 'const' in C) and marks them read-none (these correspond to 'pure' and 'const' in C) and marks them
with the appropriate attribute.</li> with the appropriate attribute.</li>
<li>LLVM 2.4 now includes a new SparsePropagation framework, which makes it
trivial to build lattice-based dataflow solvers that operate over LLVM IR. Using
this interface means that you just define objects to represent your lattice
values and the transfer functions that operate on them. It handles the
mechanics of worklist processing, liveness tracking, handling PHI nodes,
etc.</li>
<li>Various helper functions (ComputeMaskedBits, ComputeNumSignBits, etc) were
pulled out of the Instruction Combining pass and put into a new
<tt>ValueTracking.h</tt> header, where they can be reused by other passes.</li>
<li>The tail duplication pass has been removed from the standard optimizer
sequence used by llvm-gcc. This pass still exists, but the benefits it once
provided are now achieved by other passes.</li>
</ul> </ul>
</div> </div>
@ -314,21 +325,41 @@ which allows us to implement more aggressive algorithms and make it run
faster:</p> faster:</p>
<ul> <ul>
<li>asm writers split out to their own library to avoid JITs having to link <li>The target-independent code generator supports (and the X86 backend
them in.</li> currently implements) a new interface for "fast" instruction selection. This
<li>Big asm writer refactoring + TargetAsmInfo</li> interface is optimized to produce code as quickly as possible, sacrificing
<li>2-addr pass and coalescer can now remat trivial insts to avoid a copy.</li> code quality to do it. This is used by default at -O0 or when using
<li>spiller to commute instructions in order to fold a reload</li> "llc -fast" on X86. It is straight-forward to add support for
<li>Stack slot coloring?</li> other targets if faster -O0 compilation is desired.</li>
<li>Live intervals renumbering? Is this useful to external people?</li>
<li>'is as cheap as a move' instruction flag</li> <li>In addition to the new 'fast' instruction selection path, many existing
<li>Improvements to selection dag viewing</li> pieces of the code generator have been optimized in significant ways.
<li>fast isel</li> SelectionDAG's are now pool allocated and use better algorithms in many
<li>Selection dag speedups</li> places, the ".s" file printers now use raw_ostream to emit text much faster,
<li>asmwriter + raw_ostream -> fastah</li> etc. The end result of these improvements is that the compiler also takes
<li>Partitioned Boolean Quadratic Programming (PBQP) based register substantially less time to generate code that is just as good (and often
allocator.</li> better) than before.</li>
<li>...</li>
<li>Each target has been split to separate the .s file printing logic from the
rest of the target. This enables JIT compilers that don't link in the
(somewhat large) code and data tables used for printing a .s file.</li>
<li>The code generator now includes a "stack slot coloring" pass, which packs
together individual spilled values into common stack slots. This reduces
the size of stack frames with many spills, which tends to increase L1 cache
effectiveness.</li>
<li>Various pieces of the register allocator (e.g. the coalescer and two-address
operation elimination pass) now know how to rematerialize trivial operations
to avoid copies and include several other optimizations.</li>
<li>The <a href="CodeGenerator.html#selectiondag_process">graphs</a> produced by
the <tt>llc -view-*-dags</tt> options are now significantly prettier and
easier to read.</li>
<li>LLVM 2.4 includes a new register allocator based on Partitioned Boolean
Quadratic Programming (PBQP). This register allocator is still in
development, but is very simple and clean.</li>
</ul> </ul>