6157 Commits

Author SHA1 Message Date
David Majnemer
5e9c6212a8 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225077 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 07:29:47 +00:00
Chandler Carruth
ce7f347da2 [SROA] Teach SROA to be more aggressive in splitting now that we have
a pre-splitting pass over loads and stores.

Historically, splitting could cause enough problems that I hamstrung the
entire process with a requirement that splittable integer loads and
stores must cover the entire alloca. All smaller loads and stores were
unsplittable to prevent chaos from ensuing. With the new pre-splitting
logic that does load/store pair splitting I introduced in r225061, we
can now very nicely handle arbitrarily splittable loads and stores. In
order to fully benefit from these smarts, we need to mark all of the
integer loads and stores as splittable.

However, we don't actually want to rewrite partitions with all integer
loads and stores marked as splittable. This will fail to extract scalar
integers from aggregates, which is kind of the point of SROA. =] In
order to resolve this, what we really want to do is only do
pre-splitting on the alloca slices with integer loads and stores fully
splittable. This allows us to uncover all non-integer uses of the alloca
that would benefit from a split in an integer load or store (and where
introducing the split is safe because it is just memory transfer from
a load to a store). Once done, we make all the non-whole-alloca integer
loads and stores unsplittable just as they have historically been,
repartition and rewrite.

The result is that when there are integer loads and stores anywhere
within an alloca (such as from a memcpy of a sub-object of a larger
object), we can split them up if there are non-integer components to the
aggregate hiding beneath. I've added the challenging test cases to
demonstrate how this is able to promote to scalars even a case where we
have even *partially* overlapping loads and stores.

This restores the single-store behavior for small arrays of i8s which is
really nice. I've restored both the little endian testing and big endian
testing for these exactly as they were prior to r225061. It also forced
me to be more aggressive in an alignment test to actually defeat SROA.
=] Without the added volatiles there, we actually split up the weird i16
loads and produce nice double allocas with better alignment.

This also uncovered a number of bugs where we failed to handle
splittable load and store slices which didn't have a begininng offset of
zero. Those fixes are included, and without them the existing test cases
explode in glorious fireworks. =]

I've kept support for leaving whole-alloca integer loads and stores as
splittable even for the purpose of rewriting, but I think that's likely
no longer needed. With the new pre-splitting, we might be able to remove
all the splitting support for loads and stores from the rewriter. Not
doing that in this patch to try to isolate any performance regressions
that causes in an easy to find and revert chunk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225074 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 03:55:54 +00:00
Chandler Carruth
40a8741994 [SROA] Add a test case for r225068 / PR22080.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225070 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 00:34:29 +00:00
Chandler Carruth
450b39e971 [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225061 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-01 11:54:38 +00:00
Sanjay Patel
28650b8ec2 InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225050 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 22:14:05 +00:00
David Majnemer
0f77ccd6bb InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225034 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 04:21:41 +00:00
Philip Reames
91a083c57f Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224968 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 23:27:30 +00:00
Philip Reames
1714ad67bd Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224965 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 23:00:57 +00:00
Philip Reames
456b7b602c Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224961 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 22:46:21 +00:00
David Majnemer
7627d9c229 InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224849 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 09:50:35 +00:00
David Majnemer
998ae69abe InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224847 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 09:10:14 +00:00
Michael Kuperstein
a098c770e1 [ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits()
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. 

This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.

Differential Revision: http://reviews.llvm.org/D6758

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224765 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 11:33:41 +00:00
Michael Liao
b9e302f3ca [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224757 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 08:26:55 +00:00
Bruno Cardoso Lopes
a559a2317c [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224740 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 22:35:46 +00:00
David Majnemer
6df827240e This should have been part of r224676.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224677 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 04:48:34 +00:00
David Majnemer
854a37649a InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 04:45:35 +00:00
David Majnemer
9cd99a0724 InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X  into  X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX  into  X
(X & INT_MIN) ? X | INT_MIN : X  into  X
(X & INT_MIN) ? X : X | INT_MIN  into  X | INT_MIN

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 03:04:38 +00:00
Bruno Cardoso Lopes
06833ca7c1 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224588 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 17:12:35 +00:00
Sanjay Patel
7c5fa50875 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224583 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes
01b07d541b Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes
cba407d019 [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224574 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 14:23:15 +00:00
David Majnemer
73059bd1f1 ConstantFold: Shifting undef by zero results in undef
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 23:54:43 +00:00
Suyog Sarda
4bfc4f2e8c Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224424 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 10:34:27 +00:00
Elena Demikhovsky
982a8b3aeb Added 5 more tests related to sink store revision 224247
- by Ella Bolshinsky

http://reviews.llvm.org/D6420



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224418 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 08:12:59 +00:00
Erik Eckstein
96bd465d6c Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224417 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 07:29:19 +00:00
David Majnemer
891ec6d69f InstSimplify: shl nsw/nuw undef, %V -> undef
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.

However, shl behaves like an identity function when the right hand side
is zero.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224405 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 01:54:33 +00:00
Elena Demikhovsky
14fb445715 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224334 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 11:50:42 +00:00
Sanjoy Das
574e01c32e Teach ScalarEvolution to exploit min and max expressions when proving
isKnownPredicate.

The motivation for this change is to optimize away checks in loops
like this:

    limit = min(t, len)
    for (i = 0 to limit)
      if (i >= len || i < 0) throw_array_of_of_bounds();
      a[i] = ...

Differential Revision: http://reviews.llvm.org/D6635



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224285 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 22:50:15 +00:00
Duncan P. N. Exon Smith
1ef70ff39b IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 19:07:53 +00:00
Elena Demikhovsky
a8a374135b Added a test related to 224247 revision
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224248 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 14:14:10 +00:00
Suyog Sarda
4dcffed444 Typo Correction in Test Case. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224244 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 12:19:46 +00:00
Ahmed Bougacha
780a093afb Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-13 23:22:12 +00:00
Renato Golin
1e173b7139 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224198 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-13 20:23:18 +00:00
David Majnemer
3b7e6d27d2 ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume
Respect the MaxDepth recursion limit, doing otherwise will trigger an
assert in computeKnownBits.

This fixes PR21891.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224168 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 23:59:29 +00:00
Suyog Sarda
1dea0dc279 This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it. 
 
 Test case :
 
       float hadd(float* a) {
           return (a[0] + a[1]) + (a[2] + a[3]);
        }
 
 
 AArch64 assembly before patch :
 
        ldp	s0, s1, [x0]
 	ldp	s2, s3, [x0, #8]
 	fadd	s0, s0, s1
 	fadd	s1, s2, s3
 	fadd	s0, s0, s1
 	ret
 
 AArch64 assembly after patch :
 
        ldp	d0, d1, [x0]
 	fadd	v0.2s, v0.2s, v1.2s
 	faddp	s0, v0.2s
 	ret

Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224119 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 12:53:44 +00:00
Steven Wu
a511846bdf Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224097 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 04:34:07 +00:00
Andrea Di Biagio
f27500040b [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224054 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 20:44:59 +00:00
David Majnemer
c57bee5399 InstSimplify: Remove usesless %a parameter from tests
No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224016 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 12:56:17 +00:00
Michael Kuperstein
1696b35ff1 The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value.
Patch by Amjad Aboud

Differential Revision: http://reviews.llvm.org/D6525


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224015 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 12:41:10 +00:00
David Majnemer
72c6bdbf70 ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0
Zero is usually a nicer constant to have than -1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223969 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 21:58:15 +00:00
David Majnemer
ea9bcfc707 ConstantFold: an undef shift amount results in undef
X shifted by undef results in undef because the undef value can
represent values greater than the width of the operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223968 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 21:38:05 +00:00
David Majnemer
895316336e ConstantFold: div undef, 0 should fold to undef, not zero
Dividing by zero yields an undefined value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223924 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 09:14:55 +00:00
David Majnemer
6578f1beb1 InstSimplify: [al]shr exact undef, %X -> undef
Exact shifts always keep the non-zero bits of their input.  This means
it keeps it's undef bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 09:14:52 +00:00
David Majnemer
1297775557 InstSimplify: div %X, 0 -> undef
We already optimized rem %X, 0 to undef, we should do the same for div.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223919 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 07:52:18 +00:00
Ahmed Bougacha
605c40341b [ARM] Combine base-updating/post-incrementing vector load/stores.
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 00:07:37 +00:00
Chandler Carruth
3508e27903 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223813 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 19:21:16 +00:00
Duncan P. N. Exon Smith
dad20b2ae2 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223802 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 18:38:53 +00:00
Sonam Kumari
05a824843d Removal Of Duplicate Test Cases and Addition Of Missing Check Statements
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223768 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 10:46:38 +00:00
Ankur Garg
df35082f20 [test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC.
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.

test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.

(Removed test57 and test58)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 10:35:19 +00:00
Chandler Carruth
e78a87b633 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223764 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 08:55:32 +00:00