Hans Wennborg 4d48c3f1aa LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue()
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.

The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.

Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.

I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.

Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.

Differential Revision: http://reviews.llvm.org/D6397

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222768 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 17:23:05 +00:00
..
2014-11-24 20:53:26 +00:00
2014-11-11 21:30:22 +00:00

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//