Previously we would refrain from attempting to increase the linkage of
available_externally globals because they were considered weak for the
linker. Now they are treated more like a declaration instead of a weak
definition.
This was causing SSE alignment faults in Chromuim, when some code
assumed it could increase the alignment of a dllimported global that it
didn't control. http://crbug.com/509256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242091 91177308-0d34-0410-b5e6-96231b3b80d8
When spotting that a loop can use ctpop, we were incorrectly replacing all uses of a value with a value derived from ctpop.
The bug here was exposed because we were replacing a use prior to the ctpop with the ctpop value and so we have a use before def, i.e., we changed
%tobool.5 = icmp ne i32 %num, 0
store i1 %tobool.5, i1* %ptr
br i1 %tobool.5, label %for.body.lr.ph, label %for.end
to
store i1 %1, i1* %ptr
%0 = call i32 @llvm.ctpop.i32(i32 %num)
%1 = icmp ne i32 %0, 0
br i1 %1, label %for.body.lr.ph, label %for.end
Even if we inserted the ctpop so that it dominates the store here, that would still be incorrect. The store doesn’t want the result of ctpop.
The fix is very simple, and involves replacing only the branch condition with the ctpop instead of all uses.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242068 91177308-0d34-0410-b5e6-96231b3b80d8
Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N")
and a runtime trip count. Also, do not unroll loops with unroll full metadata if the
loop has a runtime loop count. Previously, such loops would be unrolled with a
very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be
enabled resulting in a very large (and likely unwise) unroll factor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242047 91177308-0d34-0410-b5e6-96231b3b80d8
Passes should never modify it, just use the const version. While there
reduce copying in LoopInterchange. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242041 91177308-0d34-0410-b5e6-96231b3b80d8
There is no suitable basic block to sink instructions in loops without
exits. The only way an instruction in a loop without exits can be used
is as an incoming value to a PHI. In such cases, the incoming block for
the corresponding value is unreachable.
This fixes PR24013.
Differential Revision: http://reviews.llvm.org/D10903
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241987 91177308-0d34-0410-b5e6-96231b3b80d8
The following functions are moved from the LoopVectorizer to VectorUtils:
- getGEPInductionOperand
- stripGetElementPtr
- getUniqueCastUse
- getStrideFromPointer
These used to be static functions in LoopVectorize, but will also be used by
the upcoming loop versioning LICM transformation.
Patch by Ashutosh Nema!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241980 91177308-0d34-0410-b5e6-96231b3b80d8
No in-tree alias analysis used this facility, and it was not called in
any particularly rigorous way, so it seems unlikely to be correct.
Note that one of the only stateful AA implementations in-tree,
GlobalsModRef is completely broken currently (and any AA passes like it
are equally broken) because Module AA passes are not effectively
invalidated when a function pass that fails to update the AA stack runs.
Ultimately, it doesn't seem like we know how we want to build stateful
AA, and until then trying to support and maintain correctness for an
untested API is essentially impossible. To that end, I'm planning to rip
out all of the update API. It can return if and when we need it and know
how to build it on top of the new pass manager and as part of *tested*
stateful AA implementations in the tree.
Differential Revision: http://reviews.llvm.org/D10889
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241975 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Following the discussion on r241884, it's more reasonable to assume that a
target has no vector registers by default instead of letting every such
target overrides getNumberOfRegisters.
Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to
return 0 when Vector is true, and partially reverts r241884 which
modifies NVPTXTTIImpl::getNumberOfRegisters.
It also fixes a performance bug in LoopVectorizer. Even if a target has
no vector registers, vectorization may still help ILP. So, we need both
checks to be false before disabling loop vectorization all together.
Reviewers: hfinkel
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241942 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The class will obviously need improvement down the road. For one, there
is no reason that addPHINodes would have to be exposed like that. I
will make this and other improvements in follow-up patches.
The main goal is to be able to share this functionality. The
LoopLoadElimination pass I am working on needs it too. Later we can
move other clients as well (LV and Ashutosh's LICMVer).
Reviewers: hfinkel, ashutosh.nema
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10577
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241932 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This makes them available to the LoopVersioning class as that is moved
to its own module in the next patch.
Reviewers: ashutosh.nema, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241931 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Reviewers: rnk, JosephTremoulet, reames, nlewycky, rjmccall
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11041
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241888 91177308-0d34-0410-b5e6-96231b3b80d8
Not doing this can lead to misoptimizations down the line, e.g. because
of range metadata on the replacing load excluding values that are valid
for the load that is being replaced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241886 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In RewriteLoopExitValues, before expanding out an SCEV expression using
SCEVExpander, try to see if an existing LLVM IR expression already
computes the value we're interested in. If so use that existing
expression.
Apart from reducing IndVars' reliance on the rest of the compilation
pipeline, this also prevents IndVars from concluding some expressions as
"high cost" when they're not. For instance,
`InductiveRangeCheckElimination` often emits code of the following form:
```
len = umin(len_A, len_B)
loop:
...
if (i++ < len)
goto loop
outside_loop:
use(i)
```
`SCEVExpander` refuses to rewrite the use of `i` in `outside_loop`,
since it thinks the value of `i` on loop exit, `len`, is a high cost
expansion since it contains an `umax` in it. With this change,
`IndVars` can see that it can re-use `len` instead of creating a new
expression to compute `umin(len_A, len_B)`.
I considered putting this cleverness in `SCEVExpander`, but I was
worried that it may then have a deterimental effect on other passes
that use it. So I decided it was better to just do this in the one
place where it seems like an obviously good idea, with the intent of
generalizing later if needed.
Reviewers: atrick, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10782
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241838 91177308-0d34-0410-b5e6-96231b3b80d8
Place all code corresponding to a run-time check in one place.
Previously we generated some code, then proceeded to a next check, then
finished the code for the first check (like splitting blocks and
generating branches). Now the code for generating a check is
self-contained.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241741 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241633 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r241602. We had a latent bug in SCCP where we would
make a basic block empty and then proceed to ask questions about it's
terminator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241616 91177308-0d34-0410-b5e6-96231b3b80d8
This change includes a fix for https://code.google.com/p/chromium/issues/detail?id=499508#c3,
which required updating the visibility for symbols with eliminated definitions.
--Original Commit Message--
Add new EliminateAvailableExternally module pass, which is performed in
O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241466 91177308-0d34-0410-b5e6-96231b3b80d8
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241413 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241197 91177308-0d34-0410-b5e6-96231b3b80d8
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241143 91177308-0d34-0410-b5e6-96231b3b80d8
Surprisingly, this is a correctness issue: the mmx type exists for
calling convention purposes, LLVM doesn't have a zero representation for
them.
This partially fixes PR23999.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241142 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
nsw are flaky and can often be removed by optimizations. This patch enhances
nsw by leveraging @llvm.assume in the IR. Specifically, NaryReassociate now
understands that
assume(a + b >= 0) && assume(a >= 0) ==> a +nsw b
As a result, it can split more sext(a + b) into sext(a) + sext(b) for CSE.
Test Plan: nary-gep.ll
Reviewers: broune, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10822
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241139 91177308-0d34-0410-b5e6-96231b3b80d8
CloneModule didn't take into account that it needed to remap the value
using values in the module.
This fixes PR23992.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241122 91177308-0d34-0410-b5e6-96231b3b80d8
Set debug location for terminator instruction in loop backedge block
(which is an unconditional jump to loop header). We can't copy debug
location from original backedges, as there can be several of them,
with different debug info locations. So, we follow the approach of
SplitBlockPredecessors, and copy the debug info from first non-PHI
instruction in the header (i.e. destination block).
This is yet another change for PR23837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240999 91177308-0d34-0410-b5e6-96231b3b80d8
If we are dealing with a pointer induction variable, isInductionPHI
gives back a step value of Stride / size of pointer. However, we might
be indexing with a legal type wider than the pointer width.
Handle this by inserting casts where appropriate instead of crashing.
This fixes PR23954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240877 91177308-0d34-0410-b5e6-96231b3b80d8