the default target of the first switch is not the basic block the second switch
is in (PredDefault != BB).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163916 91177308-0d34-0410-b5e6-96231b3b80d8
This is essentially a ground up re-think of the SROA pass in LLVM. It
was initially inspired by a few problems with the existing pass:
- It is subject to the bane of my existence in optimizations: arbitrary
thresholds.
- It is overly conservative about which constructs can be split and
promoted.
- The vector value replacement aspect is separated from the splitting
logic, missing many opportunities where splitting and vector value
formation can work together.
- The splitting is entirely based around the underlying type of the
alloca, despite this type often having little to do with the reality
of how that memory is used. This is especially prevelant with unions
and base classes where we tail-pack derived members.
- When splitting fails (often due to the thresholds), the vector value
replacement (again because it is separate) can kick in for
preposterous cases where we simply should have split the value. This
results in forming i1024 and i2048 integer "bit vectors" that
tremendously slow down subsequnet IR optimizations (due to large
APInts) and impede the backend's lowering.
The new design takes an approach that fundamentally is not susceptible
to many of these problems. It is the result of a discusison between
myself and Duncan Sands over IRC about how to premptively avoid these
types of problems and how to do SROA in a more principled way. Since
then, it has evolved and grown, but this remains an important aspect: it
fixes real world problems with the SROA process today.
First, the transform of SROA actually has little to do with replacement.
It has more to do with splitting. The goal is to take an aggregate
alloca and form a composition of scalar allocas which can replace it and
will be most suitable to the eventual replacement by scalar SSA values.
The actual replacement is performed by mem2reg (and in the future
SSAUpdater).
The splitting is divided into four phases. The first phase is an
analysis of the uses of the alloca. This phase recursively walks uses,
building up a dense datastructure representing the ranges of the
alloca's memory actually used and checking for uses which inhibit any
aspects of the transform such as the escape of a pointer.
Once we have a mapping of the ranges of the alloca used by individual
operations, we compute a partitioning of the used ranges. Some uses are
inherently splittable (such as memcpy and memset), while scalar uses are
not splittable. The goal is to build a partitioning that has the minimum
number of splits while placing each unsplittable use in its own
partition. Overlapping unsplittable uses belong to the same partition.
This is the target split of the aggregate alloca, and it maximizes the
number of scalar accesses which become accesses to their own alloca and
candidates for promotion.
Third, we re-walk the uses of the alloca and assign each specific memory
access to all the partitions touched so that we have dense use-lists for
each partition.
Finally, we build a new, smaller alloca for each partition and rewrite
each use of that partition to use the new alloca. During this phase the
pass will also work very hard to transform uses of an alloca into a form
suitable for promotion, including forming vector operations, speculating
loads throguh PHI nodes and selects, etc.
After splitting is complete, each newly refined alloca that is
a candidate for promotion to a scalar SSA value is run through mem2reg.
There are lots of reasonably detailed comments in the source code about
the design and algorithms, and I'm going to be trying to improve them in
subsequent commits to ensure this is well documented, as the new pass is
in many ways more complex than the old one.
Some of this is still a WIP, but the current state is reasonbly stable.
It has passed bootstrap, the nightly test suite, and Duncan has run it
successfully through the ACATS and DragonEgg test suites. That said, it
remains behind a default-off flag until the last few pieces are in
place, and full testing can be done.
Specific areas I'm looking at next:
- Improved comments and some code cleanup from reviews.
- SSAUpdater and enabling this pass inside the CGSCC pass manager.
- Some datastructure tuning and compile-time measurements.
- More aggressive FCA splitting and vector formation.
Many thanks to Duncan Sands for the thorough final review, as well as
Benjamin Kramer for lots of review during the process of writing this
pass, and Daniel Berlin for reviewing the data structures and algorithms
and general theory of the pass. Also, several other people on IRC, over
lunch tables, etc for lots of feedback and advice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163883 91177308-0d34-0410-b5e6-96231b3b80d8
a pair of switch/branch where both depend on the value of the same variable and
the default case of the first switch/branch goes to the second switch/branch.
Code clean up and fixed a few issues:
1> handling the case where some cases of the 2nd switch are invalidated
2> correctly calculate the weight for the 2nd switch when it is a conditional eq
Testing case is modified from Alastair's original patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163635 91177308-0d34-0410-b5e6-96231b3b80d8
The lookup tables did not get built in a deterministic order.
This makes them get built in the order that the corresponding phi nodes
were found.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163305 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a transformation to SimplifyCFG that attemps to turn switch
instructions into loads from lookup tables. It works on switches that
are only used to initialize one or more phi nodes in a common successor
basic block, for example:
int f(int x) {
switch (x) {
case 0: return 5;
case 1: return 4;
case 2: return -2;
case 5: return 7;
case 6: return 9;
default: return 42;
}
This speeds up the code by removing the hard-to-predict jump, and
reduces code size by removing the code for the jump targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163302 91177308-0d34-0410-b5e6-96231b3b80d8
switch, make sure we include the value for the cases when calculating edge
value from switch to the default destination.
rdar://12241132
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163270 91177308-0d34-0410-b5e6-96231b3b80d8
pointers-to-strong-pointers may be in play. These can lead to retains and
releases happening in unstructured ways, foiling the optimizer. This fixes
rdar://12150909.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163180 91177308-0d34-0410-b5e6-96231b3b80d8
Scan the body of the loop and find instructions that may trap.
Use this information when deciding if it is safe to hoist or sink instructions.
Notice that we can optimize the search of instructions that may throw in the case of nested loops.
rdar://11518836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163132 91177308-0d34-0410-b5e6-96231b3b80d8
This code used to only handle malloc-like calls, which do not read memory.
r158919 changed it to check isNoAliasFn(), which includes strdup-like and
realloc-like calls, but it was not checking for dependencies on the memory
read by those calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163106 91177308-0d34-0410-b5e6-96231b3b80d8
We update until we hit a fixpoint. This is probably slow but also
slightly simplifies the code. It should also fix the occasional
invalid domtrees observed when building with expensive checking.
I couldn't find a case where this had a measurable slowdown, but
if someone finds a pathological case where it does we may have
to find a cleverer way of updating dominators here.
Thanks to Duncan for the test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163091 91177308-0d34-0410-b5e6-96231b3b80d8
The old PHI updating code in loop-rotate was replaced with SSAUpdater a while
ago, it has no problems with comples PHIs. What had to be fixed is detecting
whether a loop was already rotated and updating dominators when multiple exits
were present.
This change increases overall code size a bit, mostly due to additional loop
unrolling opportunities. Passes test-suite and selfhost with -verify-dom-info.
Fixes PR7447.
Thanks to Andy for the input on the domtree updating code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162912 91177308-0d34-0410-b5e6-96231b3b80d8
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162841 91177308-0d34-0410-b5e6-96231b3b80d8
This optimization is really just replacing allocas wholesale with
globals, there is no scalarization.
The underlying motivation for this patch is to simplify the SROA pass
and focus it on splitting and promoting allocas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162271 91177308-0d34-0410-b5e6-96231b3b80d8
The previous fix only checked for simple cycles, use a set to catch longer
cycles too.
Drop the broken check from the ObjectSizeOffsetEvaluator. The BoundsChecking
pass doesn't have to deal with invalid IR like InstCombine does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162120 91177308-0d34-0410-b5e6-96231b3b80d8
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162097 91177308-0d34-0410-b5e6-96231b3b80d8
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.
This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.
The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162023 91177308-0d34-0410-b5e6-96231b3b80d8
- memcpy size is wrongly truncated into 32-bit and treat 8GB memcpy is
0-sized memcpy
- as 0-sized memcpy/memset is already removed before SimplifyMemTransfer
and SimplifyMemSet in visitCallInst, replace 0 checking with
assertions.
- replace getZExtValue() with getLimitedValue() according to
Eli Friedman
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161923 91177308-0d34-0410-b5e6-96231b3b80d8
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.
rdar://11973998
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161852 91177308-0d34-0410-b5e6-96231b3b80d8
multiple scalar promotions on a single loop. This also has the effect of
preserving the order of stores sunk out of loops, which is aesthetically
pleasing, and it happens to fix the testcase in PR13542, though it doesn't
fix the underlying problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161459 91177308-0d34-0410-b5e6-96231b3b80d8
An unsigned value converted to floating-point will always be greater than
a negative constant. Unfortunately InstCombine reversed the check so that
unsigned values were being optimized to always be greater than all positive
floating-point constants. <rdar://problem/12029145>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161452 91177308-0d34-0410-b5e6-96231b3b80d8
We give a bonus for every argument because the argument setup is not needed
anymore when the function is inlined. With this patch we interpret byval
arguments as a compact representation of many arguments. The byval argument
setup is implemented in the backend as an inline memcpy, so to model the
cost as accurately as possible we take the number of pointer-sized elements
in the byval argument and give a bonus of 2 instructions for every one of
those. The bonus is capped at 8 elements, which is the number of stores
at which the x86 backend switches from an expanded inline memcpy to a real
memcpy. It would be better to use the real memcpy threshold from the backend,
but it's not available via TargetData.
This change brings the performance of c-ray in line with gcc 4.7. The included
test case tries to reproduce the c-ray problem to catch regressions for this
benchmark early, its performance is dominated by the inline decision of a
specific call.
This only has a small impact on most code, more on x86 and arm than on x86_64
due to the way the ABI works. When building LLVM for x86 it gives a small
inline cost boost to virtually any function using StringRef or STL allocators,
but only a 0.01% increase in overall binary size. The size of gcc compiled by
clang actually shrunk by a couple bytes with this patch applied, but not
significantly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161413 91177308-0d34-0410-b5e6-96231b3b80d8
instsimplify+inline strategy.
The crux of the problem is that instsimplify was reasonably relying on
an invariant that is true within any single function, but is no longer
true mid-inline the way we use it. This invariant is that an argument
pointer != a local (alloca) pointer.
The fix is really light weight though, and allows instsimplify to be
resiliant to these situations: when checking the relation ships to
function arguments, ensure that the argumets come from the same
function. If they come from different functions, then none of these
assumptions hold. All credit to Benjamin Kramer for coming up with this
clever solution to the problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161410 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.
As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.
Fixes PR13265.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161409 91177308-0d34-0410-b5e6-96231b3b80d8
This can happen as long as the instruction is not reachable. Instcombine does generate these unreachable malformed selects when doing RAUW
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160874 91177308-0d34-0410-b5e6-96231b3b80d8
of an array element (rather than at the beginning of the element) and extended
into the next element, then the load from the second element was being handled
wrong due to incorrect updating of the notion of which byte to load next. This
fixes PR13442. Thanks to Chris Smowton for reporting the problem, analyzing it
and providing a fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160711 91177308-0d34-0410-b5e6-96231b3b80d8
might be deliberate "one time" leaks, so that leak checkers can find them.
This is a reapply of r160602 with the fix that this time I'm committing the
code I thought I was committing last time; the I->eraseFromParent() goes
*after* the break out of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160664 91177308-0d34-0410-b5e6-96231b3b80d8
r160529 that was subsequently reverted. The fix was to not call
GV->eraseFromParent() right before the caller does the same. The existing
testcases already caught this bug if run under valgrind.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160602 91177308-0d34-0410-b5e6-96231b3b80d8
GetBestDestForJumpOnUndef() assumes there is at least 1 successor, which isn't
true if the block ends in an indirect branch with no successors. Fix this by
bailing out earlier in this case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160546 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR13371: indvars pass incorrectly substitutes 'undef' values.
I do not like this fix. It's needed until/unless the meaning of undef
changes. It attempts to be complete according to the IR spec, but I
don't have much confidence in the implementation given the difficulty
testing undefined behavior. Worse, this invalidates some of my
hard-fought work on indvars and LSR to optimize pointer induction
variables. It results benchmark regressions, which I'll track
internally. On x86_64 no LTO I see:
-3% huffbench
-3% 400.perlbench
-8% fhourstones
My only suggestion for recovering is to change the meaning of
undef. If we could trust an arbitrary instruction to produce a some
real value that can be manipulated (e.g. incremented) according to
non-undef rules, then this case could be easily handled with SCEV.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160421 91177308-0d34-0410-b5e6-96231b3b80d8
It began choking since Chandler's r159547, possibly due to improper expression on grep from TclParser to ShParser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160367 91177308-0d34-0410-b5e6-96231b3b80d8
All SCEV expressions used by LSR formulae must be safe to
expand. i.e. they may not contain UDiv unless we can prove nonzero
denominator.
Fixes PR11356: LSR hoists UDiv.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160205 91177308-0d34-0410-b5e6-96231b3b80d8
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, -1
%and = and i64 %sub, %shr
ret i64 %and
to:
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, 2305843009213693951
%and = and i64 %sub, %shr
ret i64 %and
The demanded bit optimization is actually a pessimization because add -1 would
be codegen'ed as a sub 1. Teach the demanded constant shrinking optimization
to check for negated constant to make sure it is actually reducing the width
of the constant.
rdar://11793464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160101 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes ~70 lines in InstCombineLoadStoreAlloca.cpp and makes both functions a bit more aggressive than before :)
In theory, we can be more aggressive when removing an alloca than a malloc, because an alloca pointer should never escape, but we are not taking advantage of this anyway
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159952 91177308-0d34-0410-b5e6-96231b3b80d8
This means we can do cheap DSE for heap memory.
Nothing is done if the pointer excapes or has a load.
The churn in the tests is mostly due to objectsize, since we want to make sure we
don't delete the malloc call before evaluating the objectsize (otherwise it becomes -1/0)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159876 91177308-0d34-0410-b5e6-96231b3b80d8
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159547 91177308-0d34-0410-b5e6-96231b3b80d8
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.
This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159544 91177308-0d34-0410-b5e6-96231b3b80d8
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159525 91177308-0d34-0410-b5e6-96231b3b80d8
The original algorithm only used recursive pair fusion of equal-length
types. This is now extended to allow pairing of any types that share
the same underlying scalar type. Because we would still generally
prefer the 2^n-length types, those are formed first. Then a second
set of iterations form the non-2^n-length types.
Also, a call to SimplifyInstructionsInBlock has been added after each
pairing iteration. This takes care of DCE (and a few other things)
that make the following iterations execute somewhat faster. For the
same reason, some of the simple shuffle-combination cases are now
handled internally.
There is some additional refactoring work to be done, but I've had
many requests for this feature, so additional refactoring will come
soon in future commits (as will additional test cases).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159330 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
If a constant or a function has linkonce_odr linkage and unnamed_addr, mark it
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159272 91177308-0d34-0410-b5e6-96231b3b80d8
before the expression root. Any existing operators that are changed to use one
of them needs to be moved between it and the expression root, and recursively
for the operators using that one. When I rewrote RewriteExprTree I accidentally
inverted the logic, resulting in the compacting going down from operators to
operands rather than up from operands to the operators using them, oops. Fix
this, resolving PR12963.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159265 91177308-0d34-0410-b5e6-96231b3b80d8
// C - zext(bool) -> bool ? C - 1 : C
if (ZExtInst *ZI = dyn_cast<ZExtInst>(Op1))
if (ZI->getSrcTy()->isIntegerTy(1))
return SelectInst::Create(ZI->getOperand(0), SubOne(C), C);
This ends up forming sext i1 instructions that codegen to terrible code. e.g.
int blah(_Bool x, _Bool y) {
return (x - y) + 1;
}
=>
movzbl %dil, %eax
movzbl %sil, %ecx
shll $31, %ecx
sarl $31, %ecx
leal 1(%rax,%rcx), %eax
ret
Without the rule, llvm now generates:
movzbl %sil, %ecx
movzbl %dil, %eax
incl %eax
subl %ecx, %eax
ret
It also helps with ARM (and pretty much any target that doesn't have a sext i1 :-).
The transformation was done as part of Eli's r75531. He has given the ok to
remove it.
rdar://11748024
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159230 91177308-0d34-0410-b5e6-96231b3b80d8
merge all zero-sized alloca's into one, fixing c43204g from the Ada ACATS
conformance testsuite. What happened there was that a variable sized object
was being allocated on the stack, "alloca i8, i32 %size". It was then being
passed to another function, which tested that the address was not null (raising
an exception if it was) then manipulated %size bytes in it (load and/or store).
The optimizers cleverly managed to deduce that %size was zero (congratulations
to them, as it isn't at all obvious), which made the alloca zero size, causing
the optimizers to replace it with null, which then caused the check mentioned
above to fail, and the exception to be raised, wrongly. Note that no loads
and stores were actually being done to the alloca (the loop that does them is
executed %size times, i.e. is not executed), only the not-null address check.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159202 91177308-0d34-0410-b5e6-96231b3b80d8
The primary advantage is that loop optimizations will be applied in a
stable order. This helps debugging and unit test creation. It is also
a better overall implementation without pathologically bad performance
on deep functions.
On large functions (llvm-stress --size=200000 | opt -loops)
Before: 0.1263s
After: 0.0225s
On deep functions (after tweaking llvm-stress, thanks Nadav):
Before: 0.2281s
After: 0.0227s
See r158790 for more comments.
The loop tree is now consistently generated in forward order, but loop
passes are applied in reverse order over the program. If we have a
loop optimization that prefers forward order, that can easily be
achieved by adding a different type of LoopPassManager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159183 91177308-0d34-0410-b5e6-96231b3b80d8
- simplifycfg: invoke undef/null -> unreachable
- instcombine: invoke new -> invoke expect(0, 0) (an arbitrary NOOP intrinsic; only done if the allocated memory is unused, of course)
- verifier: allow invoke of intrinsics (to make the previous step work)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159146 91177308-0d34-0410-b5e6-96231b3b80d8
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159136 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR5997.
These transforms were disabled because codegen couldn't deal with other
uses of trunc(x). This is now handled by the peephole pass.
This causes no regressions on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159003 91177308-0d34-0410-b5e6-96231b3b80d8
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158919 91177308-0d34-0410-b5e6-96231b3b80d8
The present implementation handles only TBAA and FP metadata, discarding everything else.
For debug metadata, the current behavior is maintained (the debug metadata associated with
one of the instructions will be kept, discarding that attached to the other).
This should address PR 13040.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158606 91177308-0d34-0410-b5e6-96231b3b80d8
Dynamic GEPs created by SROA needed to insert extra "i32 0"
operands to index through structs and arrays to get to the
vector being indexed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158590 91177308-0d34-0410-b5e6-96231b3b80d8
linkonce linkage. For example, it is not valid to add unnamed_addr.
This also fixes a crash in g++.dg/opt/static5.C.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158528 91177308-0d34-0410-b5e6-96231b3b80d8
example degenerate phi nodes and binops that use themselves in unreachable code.
Thanks to Charles Davis for the testcase that uncovered this can of worms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158508 91177308-0d34-0410-b5e6-96231b3b80d8
This patch extends FoldBranchToCommonDest to fold unconditional branches.
For unconditional branches, we fold them if it is easy to update the phi nodes
in the common successors.
rdar://10554090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158392 91177308-0d34-0410-b5e6-96231b3b80d8
POD type, causing memory corruption when mapping to APInts with bitwidth > 64.
Merge another crash testcase into crash.ll while there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158369 91177308-0d34-0410-b5e6-96231b3b80d8
topologies, it is quite possible for a leaf node to have huge multiplicity, for
example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x
raised to a vast power (the multiplicity, or weight, of x). This patch fixes
the computation of weights by correctly computing them no matter how big they
are, rather than just overflowing and getting a wrong value. It turns out that
the weight for a value never needs more bits to represent than the value itself,
so it is enough to represent weights as APInts of the same bitwidth and do the
right overflow-avoiding dance steps when computing weights. As a side-effect it
reduces the number of multiplies needed in some cases of large powers. While
there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree
static, pushing the rank computation out into users. This is progress towards
fixing PR13021.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158358 91177308-0d34-0410-b5e6-96231b3b80d8
This saves a cast, and zext is more expensive on platforms with subreg support
than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
same performance now when not inlining either function.
stupid_memchr: 323.0us
bsd_memchr: 321.0us
memchr: 479.0us
where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
I haven't fully understood the issue yet, something is grossly mangling the
loop after inlining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158297 91177308-0d34-0410-b5e6-96231b3b80d8
-%a + 42
into
42 - %a
previously we were emitting:
-(%a + 42)
This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158237 91177308-0d34-0410-b5e6-96231b3b80d8
can move instructions within the instruction list. If the instruction just
happens to be the one the basic block iterator is pointing to, and it is
moved to a different basic block, then we get into an infinite loop due to
the iterator running off the end of the basic block (for some reason this
doesn't fire any assertions). Original commit message:
Grab-bag of reassociate tweaks. Unify handling of dead instructions and
instructions to reoptimize. Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on). No need for WeakVH any more: use
an AssertingVH instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158199 91177308-0d34-0410-b5e6-96231b3b80d8
instructions to reoptimize. Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on). No need for WeakVH any more: use
an AssertingVH instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158073 91177308-0d34-0410-b5e6-96231b3b80d8
replacement to make it at least as generic as the instruction being replaced.
This includes:
* dropping nsw/nuw flags
* getting the least restrictive tbaa and fpmath metadata
* merging ranges
Fixes PR12979.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157958 91177308-0d34-0410-b5e6-96231b3b80d8
add regression tests for this problem.
Can already compile & run: PHP, PCRE, and ICU (i.e., all the software I tried)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157822 91177308-0d34-0410-b5e6-96231b3b80d8
- compute size & offset at the same time. The side-effects of this are that we now support negative GEPs. It's now approaching a phase that it can be reused by other passes (e.g., lowering of the objectsize intrinsic)
- use APInt throughout to handle wrap-arounds
- add support for PHI instrumentation
- add a cache (required for recursive PHIs anyway)
- remove hoisting support for now, since it was wrong in a few cases
sorry for the churn here.. tests will follow soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157775 91177308-0d34-0410-b5e6-96231b3b80d8
This also required making recursive simplifications until
nothing changes or a hard limit (currently 3) is hit.
With the simplification in place indvars can canonicalize
loops of the form
for (unsigned i = 0; i < a-b; ++i)
into
for (unsigned i = 0; i != a-b; ++i)
which used to fail because SCEV created a weird umax expr
for the backedge taken count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157701 91177308-0d34-0410-b5e6-96231b3b80d8
The test case feeds the following into InstCombine's visitSelect:
%tobool8 = icmp ne i32 0, 0
%phitmp = select i1 %tobool8, i32 3, i32 0
Then instcombine replaces the right side of the switch with 0, doesn't notice
that nothing changes and tries again indefinitely.
This fixes PR12897.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157587 91177308-0d34-0410-b5e6-96231b3b80d8
then it doesn't alter the instructions composing it, however it would continue
to move the instructions to just before the expression root. Ensure it doesn't
move them either, so now it really does nothing if there is nothing to do. That
commit also ensured that nsw etc flags weren't cleared if the expression was not
being changed. Tweak this a bit so that it doesn't clear flags on the initial
part of a computation either if that part didn't change but later bits did.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157518 91177308-0d34-0410-b5e6-96231b3b80d8
with arbitrary topologies (previously it would give up when hitting a diamond
in the use graph for example). The testcase from PR12764 is now reduced from
a pile of additions to the optimal 1617*%x0+208. In doing this I changed the
previous strategy of dropping all uses for expression leaves to one of dropping
all but one use. This works out more neatly (but required a bunch of tweaks)
and is also safer: some recently fixed bugs during recursive linearization were
because the linearization code thinks it completely owns a node if it has no uses
outside the expression it is linearizing. But if the node was also in another
expression that had been linearized (and thus all uses of the node from that
expression dropped) then the conclusion that it is completely owned by the
expression currently being linearized is wrong. Keeping one use from within each
linearized expression avoids this kind of mistake.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157467 91177308-0d34-0410-b5e6-96231b3b80d8
LowerSwitch::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
test/Transform/LowerSwitch/feature.ll - this test was refactored: grep + count was replaced with FileCheck usage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157384 91177308-0d34-0410-b5e6-96231b3b80d8