optimizer and codegen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57468 91177308-0d34-0410-b5e6-96231b3b80d8
2025-07-04 02:24:29 +00:00 · 2008-10-13 21:50:36 +00:00
parent 7752d1ac04
commit f301387a36
1 changed files with 65 additions and 34 deletions
--- a/docs/ReleaseNotes.html
+++ b/docs/ReleaseNotes.html
@ -157,7 +157,7 @@ in this section.
 <ul>
 <li><p>The most visible end-user change in LLVM 2.4 is that it includes many
-optimizations and changes ot make -O0 compile times much faster.  You should see
+optimizations and changes to make -O0 compile times much faster.  You should see
 improvements on the order of 30% or more faster than LLVM 2.3.  There are many
 pieces to this change, described in more detail below.  The speedups and new
 components can also be used for JIT compilers that want fast compilation as
@ -195,8 +195,9 @@ includes support for the C, C++, Objective-C, Ada, and Fortran front-ends.</p>
 <ul>
 <li>LLVM 2.4 supports the full set of atomic <tt>__sync_*</tt> builtins.  LLVM
 2.3 only supported those used by OpenMP, but 2.4 supports them all.  While
-llvm-gcc supports all of these builtins, note that not all targets do.  X86 and
+llvm-gcc supports all of these builtins, note that not all targets do.  X86 
-PowerPC are known to support them all in both 32-bit and 64-bit mode.</li>
+support them all in both 32-bit and 64-bit mode and PowerPC supports them all
 except for the 64-bit operations when in 32-bit mode.</li>
 <li>llvm-gcc now supports an <tt>-flimited-precision</tt> option, which tells
 the compiler that it is ok to use low-precision approximations of certain libm
@ -274,30 +275,40 @@ function should be optimized for code size.</li>
 <div class="doc_text">
-<p>In addition to a huge array of bug fixes and minor performance tweaks, the
+<p>In addition to a huge array of bug fixes and minor performance tweaks, this
-LLVM 2.4 optimizers support a few major enhancements:</p>
+release includes a few major enhancements and additions to the optimizers:</p>
 <ul>
-<li>GVN now does local PRE?</li>
+<li>The Global Value Numbering (GVN) pass now does local Partial Redundancy
 Elimination (PRE) to eliminate some partially redundant expressions in cases
 where doing so won't grow code size.</li>
-<li>Matthijs' Dead argument elimination rewrite</li>
+<li>LLVM 2.4 includes a new loop deletion pass (which removes output-free
-
+provably-finite loops) and a rewritten Aggressive Dead Code Elimination (ADCE)
-<li>Old-ADCE used control dependence and deleted output-free infinite loops.
+pass that no longer uses control dependence information.  These changes speed up
-Added a new Loop deletion pass (for deleting output free provably-finite loops)
+the optimizer and also prevents it from deleting output-free infinite
-and rewrote ADCE to be simpler faster, and not need control dependence.</li>
+loops.</li>
 <li>SparsePropagation framework for lattice-based dataflow solvers.</li>
 <li>Tail duplication was removed from the standard optimizer sequence.</li>
 <li>Various helper functions (ComputeMaskedBits, ComputeNumSignBits, etc) were
 pulled out of instcombine and put into a new ValueTracking.h file, where they
 can be reused by other passes.</li>
 <li>The new AddReadAttrs pass works out which functions are read-only or
 read-none (these correspond to 'pure' and 'const' in C) and marks them
 with the appropriate attribute.</li>
 <li>LLVM 2.4 now includes a new SparsePropagation framework, which makes it
 trivial to build lattice-based dataflow solvers that operate over LLVM IR. Using
 this interface means that you just define objects to represent your lattice
 values and the transfer functions that operate on them.  It handles the
 mechanics of worklist processing, liveness tracking, handling PHI nodes,
 etc.</li>
 <li>Various helper functions (ComputeMaskedBits, ComputeNumSignBits, etc) were
 pulled out of the Instruction Combining pass and put into a new 
 <tt>ValueTracking.h</tt> header, where they can be reused by other passes.</li>
 <li>The tail duplication pass has been removed from the standard optimizer
 sequence used by llvm-gcc.  This pass still exists, but the benefits it once
 provided are now achieved by other passes.</li>
 </ul>
 </div>
@ -314,21 +325,41 @@ which allows us to implement more aggressive algorithms and make it run
 faster:</p>
 <ul>
-<li>asm writers split out to their own library to avoid JITs having to link
+<li>The target-independent code generator supports (and the X86 backend
- them in.</li>
+    currently implements) a new interface for "fast" instruction selection. This
-<li>Big asm writer refactoring + TargetAsmInfo</li>
+    interface is optimized to produce code as quickly as possible, sacrificing
-<li>2-addr pass and coalescer can now remat trivial insts to avoid a copy.</li>
+    code quality to do it.  This is used by default at -O0 or when using
-<li>spiller to commute instructions in order to fold a reload</li>
+    "llc -fast" on X86.  It is straight-forward to add support for
-<li>Stack slot coloring?</li>
+    other targets if faster -O0 compilation is desired.</li>
-<li>Live intervals renumbering?  Is this useful to external people?</li>
+
-<li>'is as cheap as a move' instruction flag</li>
+<li>In addition to the new 'fast' instruction selection path, many existing
-<li>Improvements to selection dag viewing</li>
+    pieces of the code generator have been optimized in significant ways.
-<li>fast isel</li>
+    SelectionDAG's are now pool allocated and use better algorithms in many
-<li>Selection dag speedups</li>
+    places, the ".s" file printers now use raw_ostream to emit text much faster,
-<li>asmwriter + raw_ostream -> fastah</li>
+    etc.  The end result of these improvements is that the compiler also takes
-<li>Partitioned Boolean Quadratic Programming (PBQP) based register 
+    substantially less time to generate code that is just as good (and often
-allocator.</li>
+    better) than before.</li>
-<li>...</li>
+
 <li>Each target has been split to separate the .s file printing logic from the
    rest of the target.  This enables JIT compilers that don't link in the
    (somewhat large) code and data tables used for printing a .s file.</li>
 <li>The code generator now includes a "stack slot coloring" pass, which packs
    together individual spilled values into common stack slots.  This reduces
    the size of stack frames with many spills, which tends to increase L1 cache
    effectiveness.</li>
 <li>Various pieces of the register allocator (e.g. the coalescer and two-address
    operation elimination pass) now know how to rematerialize trivial operations
    to avoid copies and include several other optimizations.</li>
 <li>The <a href="CodeGenerator.html#selectiondag_process">graphs</a> produced by
    the <tt>llc -view-*-dags</tt> options are now significantly prettier and
    easier to read.</li>
 <li>LLVM 2.4 includes a new register allocator based on Partitioned Boolean
    Quadratic Programming (PBQP).  This register allocator is still in
    development, but is very simple and clean.</li>
 </ul>