Commit Graph

82 Commits

Author SHA1 Message Date
Jingyue Wu
85e632de29 Add a speculative execution pass
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.

Credit goes to Jingyue Wu for writing an earlier version of this pass.

Patched by Bjarke Roune. 

Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.

Reviewers: eliben, dberlin, meheff, jingyue, hfinkel

Reviewed By: jingyue, hfinkel

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9360

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237459 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-15 17:54:48 +00:00
Adam Nemet
dd469afe15 New Loop Distribution pass
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass.  Loop Distribution becomes
the second user of this analysis.

The pass is off by default and can be enabled
with -enable-loop-distribution.  There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.

I decided to remove the control-dependence calculation from this first
version.  This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately.  Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop.  This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.

The pass keeps DominatorTree and LoopInfo updated.  I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops.  SimplifyLoop is violated in some cases and I have a FIXME
covering this.

Reviewers: hfinkel, nadav, aschwaighofer

Reviewed By: aschwaighofer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237358 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-14 12:05:18 +00:00
Jingyue Wu
9cecacd16a Simplify n-ary adds by reassociation
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify

  void foo(int a, int b) {
    bar(a + b);
    bar((a + 2) + b);
  }

to

  void foo(int a, int b) {
    int t = a + b;
    bar(t);
    bar(t + 2);
  }

saving one add instruction.

Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).

Test Plan: nary-add.ll

Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick

Reviewed By: sanjoy, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234855 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-14 04:59:22 +00:00
James Molloy
fb45b9fafc Reapply r233175 and r233183: float2int.
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233370 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-27 10:36:57 +00:00
Nick Lewycky
b3ad90eacc Revert r233175 and r233183 with it. This pulls float2int back out of the tree, due to PR23038.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233350 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-27 02:00:11 +00:00
James Molloy
09f1b672cb Reapply r233062: "float2int": Add a new pass to demote from float to int where possible.
Now with a fix for PR23008 and extra regression test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233175 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-25 10:03:42 +00:00
Hans Wennborg
f61cd8b368 Revert r233062 ""float2int": Add a new pass to demote from float to int where possible."
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"

Also reverting follow-up r233064.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233105 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-24 20:07:08 +00:00
James Molloy
a54c5b4489 "float2int": Add a new pass to demote from float to int where possible.
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.

This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.

This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.

To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233062 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-24 11:15:23 +00:00
Duncan P. N. Exon Smith
a60d430e31 Verifier: Remove the separate -verify-di pass
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`.  This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.

Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232772 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-19 22:24:17 +00:00
Karthik Bhat
52610d84ad Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.

For e.g. given a loop like -
  for(int i=0;i<N;i++)
    for(int j=0;j<N;j++)
      A[j][i] = A[j][i]+B[j][i];

is interchanged to -
  for(int j=0;j<N;j++)
    for(int i=0;i<N;i++)
      A[j][i] = A[j][i]+B[j][i];

This pass is currently disabled by default.

To give a brief introduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.

TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.

A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231458 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 10:11:25 +00:00
Philip Reames
673db11fdb Add a pass for constructing gc.statepoint sequences w/explicit relocations
This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'.

This patch is setting the stage for work to continue in tree.  In particular, there are known naming and style violations in the current patch.  I'll try to get those resolved over the next week or so.  As I touch each area to make style changes, I need to make sure we have adequate testing in place.  As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage.

The pass has several main subproblems it needs to address:
- First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future.  Note that the current change doesn't actually contain a useful liveness analysis.  It was seperated into a followup change as the code wasn't ready to be shared.  Instead, the current implementation just considers any dominating def of appropriate pointer type to be live.
- Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. 
- Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229945 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 01:06:44 +00:00
Adam Nemet
0ea25c2e64 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229893 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 19:15:04 +00:00
NAKAMURA Takumi
383d8c7fdd Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector.
r229622: "[LoopAccesses] Make VectorizerParams global"
  r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it"
  r229624: "[LoopAccesses] Cache the result of canVectorizeMemory"
  r229626: "[LoopAccesses] Create the analysis pass"
  r229628: "[LoopAccesses] Change debug messages from LV to LAA"
  r229630: "[LoopAccesses] Add canAnalyzeLoop"
  r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport"
  r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport"
  r229633: "[LoopAccesses] Add -analyze support"
  r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference"
  r229638: "Analysis: fix buildbots"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229650 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 08:34:47 +00:00
Adam Nemet
718b023033 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229626 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 03:43:24 +00:00
Hal Finkel
5b43c8551e [BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.

Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).

I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229462 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 01:36:59 +00:00
Chandler Carruth
417c5c172c [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229094 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 10:01:29 +00:00
Philip Reames
2e38beb32f Add a pass for inserting safepoints into (nearly) arbitrary IR
This pass is responsible for figuring out where to place call safepoints and safepoint polls. It doesn't actually make the relocations explicit; that's the job of the RewriteStatepointsForGC pass (http://reviews.llvm.org/D6975).

Note that this code is not yet finalized.  Its moving in tree for incremental development, but further cleanup is needed and will happen over the next few days.  It is not yet part of the standard pass order.  

Planned changes in the near future:
 - I plan on restructuring the statepoint rewrite to use the functions add to the IRBuilder a while back. 
 - In the current pass, the function "gc.safepoint_poll" is treated specially but is not an intrinsic. I plan to make identifying the poll function a property of the GCStrategy at some point in the near future.
 - As follow on patches, I will be separating a collection of test cases we have out of tree and submitting them upstream. 
 - It's not explicit in the code, but these two patches are introducing a new state for a statepoint which looks a lot like a patchpoint. There's no a transient form which doesn't yet have the relocations explicitly represented, but does prevent reordering of memory operations. Once this is in, I need to update actually make this explicit by reserving the 'unused' argument of the statepoint as a flag, updating the docs, and making the code explicitly check for such a thing. This wasn't really planned, but once I split the two passes - which was done for other reasons - the intermediate state fell out. Just reminds us once again that we need to merge statepoints and patchpoints at some point in the not that distant future.

Future directions planned:
 - Identifying more cases where a backedge safepoint isn't required to ensure timely execution of a safepoint poll.
 - Tweaking the insertion process to generate easier to optimize IR. (For example, investigating making SplitBackedge) the default.
 - Adding opt-in flags for a GCStrategy to use this pass. Once done, add this pass to the actual pass ordering.

Differential Revision: http://reviews.llvm.org/D6981



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228090 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 00:37:33 +00:00
Jingyue Wu
2918efd551 Add straight-line strength reduction to LLVM
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

  #pragma unroll
  foo (int i = 0; i < 3; ++i) {
    sum += foo((b + i) * s);
  }

into

  sum += foo(b * s);
  sum += foo((b + 1) * s);
  sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

  b * s
  (b + 1) * s
  (b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

  t1 = b * s
  t2 = t1 + s
  t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Test Plan: test/StraightLineStrengthReduce/slsr.ll

Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick

Reviewed By: jholewinski, atrick

Subscribers: karthikthecool, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7310

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228016 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:37:06 +00:00
Chandler Carruth
b63fed3b97 [PM] Refactor the core logic to run EarlyCSE over a function into an
object that manages a single run of this pass.

This was already essentially how it worked. Within the run function, it
would point members at *stack local* allocations that were only live for
a single run. Instead, it seems much cleaner to have a utility object
whose lifetime is clearly bounded by the run of the pass over the
function and can use member variables in a more direct way.

This also makes it easy to plumb the analyses used into it from the pass
and will make it re-usable with the new pass manager.

No functionality changed here, its just a refactoring.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227162 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 01:34:14 +00:00
Sanjoy Das
148e8c9b8b Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

This pass was originally r226201.  It was reverted because it used C++
features not supported by MSVC 2012.

Differential Revision: http://reviews.llvm.org/D6693



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226238 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 01:03:22 +00:00
Sanjoy Das
df1b4f601d Revert r226201 (Add a new pass "inductive range check elimination")
The change used C++11 features not supported by MSVC 2012.  I will fix
the change to use things supported MSVC 2012 and recommit shortly.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226216 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 22:18:10 +00:00
Sanjoy Das
0170a308ec Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

Differential Revision: http://reviews.llvm.org/D6693



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226201 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 20:45:46 +00:00
Juergen Ributzka
4be836d563 [C API] Make the 'lower switch' pass available via the C API.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217630 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 21:32:32 +00:00
Hal Finkel
1d6c2d717d Add an AlignmentFromAssumptions Pass
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.

The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could).  Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217344 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-07 20:05:11 +00:00
Jan Vesely
f7a325b3a1 Initialize FlattenCFG pass
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 20:31:52 +00:00
Hal Finkel
16fd27b2c3 Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213864 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 14:25:39 +00:00
Gerolf Hoflehner
d94715e273 MergedLoadStoreMotion pass
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213396 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 19:13:09 +00:00
Jiangning Liu
c5bc067a0f Move GlobalMerge from Transform to CodeGen.
This patch is to move GlobalMerge pass from Transform/Scalar                                                           
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210951 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 22:57:59 +00:00
Jiangning Liu
f847ccb87a Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-11 06:44:53 +00:00
Michael J. Spencer
8bfb46e790 Add LoadCombine pass.
This pass is disabled by default. Use -combine-loads to enable in -O[1-3]

Differential revision: http://reviews.llvm.org/D3580

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-29 01:55:07 +00:00
Rafael Espindola
21cfedee05 Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208978 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 13:02:18 +00:00
Jiangning Liu
d5db8765d6 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208934 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-15 23:45:42 +00:00
Eli Bendersky
167a57ca45 Add an optimization that does CSE in a group of similar GEPs.
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.

The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.

Review: http://reviews.llvm.org/D3462

Patch by Jingyue Wu.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207783 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 18:38:36 +00:00
Duncan P. N. Exon Smith
32791b02fa verify-di: Implement DebugInfoVerifier
Implement DebugInfoVerifier, which steals verification relying on
DebugInfoFinder from Verifier.

  - Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps
    DebugInfoVerifier.  Uses -verify-di command-line flag.

  - Change verifyModule() to invoke DebugInfoVerifier as well as
    Verifier.

  - Add a call to createDebugInfoVerifierPass() wherever there was a
    call to createVerifierPass().

This implementation as a module pass should sidestep efficiency issues,
allowing us to turn debug info verification back on.

<rdar://problem/15500563>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206300 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 16:27:38 +00:00
Quentin Colombet
8048c44580 [CodeGenPrepare] Move CodeGenPrepare into lib/CodeGen.
CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.

Move CodeGenPrepare into libLLVMCodeGen to avoid that.

Follow-up of <rdar://problem/15519855>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201912 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-22 00:07:45 +00:00
Juergen Ributzka
943ce55f39 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200062 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-25 02:02:55 +00:00
Hans Wennborg
503793e834 Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200058 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-25 01:18:18 +00:00
Juergen Ributzka
96172cb4a4 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200034 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-24 20:18:00 +00:00
Juergen Ributzka
dc6f9b9a4f Revert "Add Constant Hoisting Pass"
This reverts commit r200022 to unbreak the build bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200024 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-24 18:40:30 +00:00
Juergen Ributzka
fb282c68b7 Add Constant Hoisting Pass
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.

First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.

If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.

When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.

This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)

Reviewed by Eric

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200022 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-24 18:23:08 +00:00
Chandler Carruth
56e1394c88 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-13 09:26:24 +00:00
Richard Sandiford
0f778794c8 Add a Scalarizer pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195471 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-22 16:58:05 +00:00
Hal Finkel
bebe48dbfe Add a loop rerolling pass
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194939 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-16 23:59:05 +00:00
Diego Novillo
563b29f8db SampleProfileLoader pass. Initial setup.
This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194566 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-13 12:22:21 +00:00
Chandler Carruth
3748de6e2d Remove the long, long defunct IR block placement pass.
This pass was based on the previous (essentially unused) profiling
infrastructure and the assumption that by ordering the basic blocks at
the IR level in a particular way, the correct layout would happen in the
end. This sometimes worked, and mostly didn't. It also was a really
naive implementation of the classical paper that dates from when branch
predictors were primarily directional and when loop structure wasn't
commonly available. It also didn't factor into the equation
non-fallthrough branches and other machine level details.

Anyways, for all of these reasons and more, I wrote
MachineBlockPlacement, which completely supercedes this pass. It both
uses modern profile information infrastructure, and actually works. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190748 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-14 09:28:14 +00:00
Richard Sandiford
a8a7099c18 Turn MipsOptimizeMathLibCalls into a target-independent scalar transform
...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189097 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-23 10:27:02 +00:00
Tom Stellard
01d7203ef8 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187764 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-06 02:43:45 +00:00
Tom Stellard
57e6b2d1f3 SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187278 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-27 00:01:07 +00:00
Meador Inge
be87bce32b Remove the simplify-libcalls pass (finally)
This commit completely removes what is left of the simplify-libcalls
pass.  All of the functionality has now been migrated to the instcombine
and functionattrs passes.  The following C API functions are now NOPs:

  1. LLVMAddSimplifyLibCallsPass
  2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184459 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-20 19:48:07 +00:00
Matt Arsenault
ad966ea7a8 Move StructurizeCFG out of R600 to generic Transforms.
Register it with PassManager

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184343 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-19 20:18:24 +00:00