As suggested by Nick Lewycky, the tree traversal queues have been changed to SmallVectors and the associated loops have been rotated. Also, an 80-col violation was fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149607 91177308-0d34-0410-b5e6-96231b3b80d8
Long basic blocks with many candidate pairs (such as in the SHA implementation in Perl 5.14; thanks to Roman Divacky for the example) used to take an unacceptably-long time to compile. Instead, break long blocks into groups so that no group has too many candidate pairs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149595 91177308-0d34-0410-b5e6-96231b3b80d8
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149481 91177308-0d34-0410-b5e6-96231b3b80d8
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149468 91177308-0d34-0410-b5e6-96231b3b80d8
Changing arguments from being passed as fixed to varargs is unsafe, as
the ABI may require they be handled differently (stack vs. register, for
example).
Remove two tests which rely on the bitcast being folded into the direct
call, which is exactly the transformation that's unsafe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149457 91177308-0d34-0410-b5e6-96231b3b80d8
The redzones emitted by AddressSanitizer for CFString instances confuse the linker and are of little use, so we shouldn't add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149243 91177308-0d34-0410-b5e6-96231b3b80d8
Problem: LLVM needs more function attributes than currently available (32 bits).
One such proposed attribute is "address_safety", which shows that a function is being checked for address safety (by AddressSanitizer, SAFECode, etc).
Solution:
- extend the Attributes from 32 bits to 64-bits
- wrap the object into a class so that unsigned is never erroneously used instead
- change "unsigned" to "Attributes" throughout the code, including one place in clang.
- the class has no "operator uint64 ()", but it has "uint64_t Raw() " to support packing/unpacking.
- the class has "safe operator bool()" to support the common idiom: if (Attributes attr = getAttrs()) useAttrs(attr);
- The CTOR from uint64_t is marked explicit, so I had to add a few explicit CTOR calls
- Add the new attribute "address_safety". Doing it in the same commit to check that attributes beyond first 32 bits actually work.
- Some of the functions from the Attribute namespace are worth moving inside the class, but I'd prefer to have it as a separate commit.
Tested:
"make check" on Linux (32-bit and 64-bit) and Mac (10.6)
built/run spec CPU 2006 on Linux with clang -O2.
This change will break clang build in lib/CodeGen/CGCall.cpp.
The following patch will fix it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148553 91177308-0d34-0410-b5e6-96231b3b80d8
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:
- Always used properlyDominates to check safe code hoisting.
- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.
- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.
- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.
Fixes PR11783: SCEVExpander assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148535 91177308-0d34-0410-b5e6-96231b3b80d8
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148288 91177308-0d34-0410-b5e6-96231b3b80d8
Message for r148132:
LoopUnswitch: All helper data that is collected during loop-unswitch iterations was moved to separated class (LUAnalysisCache).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148215 91177308-0d34-0410-b5e6-96231b3b80d8
the optimizer doesn't eliminate objc_retainBlock calls which are needed
for their side effect of copying blocks onto the heap.
This implements rdar://10361249.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148076 91177308-0d34-0410-b5e6-96231b3b80d8
1. Size heuristics changed. Now we calculate number of unswitching
branches only once per loop.
2. Some checks was moved from UnswitchIfProfitable to
processCurrentLoop, since it is not changed during processCurrentLoop
iteration. It allows decide to skip some loops at an early stage.
Extended statistics:
- Added total number of instructions analyzed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147935 91177308-0d34-0410-b5e6-96231b3b80d8
with other symbols.
An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147899 91177308-0d34-0410-b5e6-96231b3b80d8
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.
Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.
Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.
On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147826 91177308-0d34-0410-b5e6-96231b3b80d8
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.
In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147801 91177308-0d34-0410-b5e6-96231b3b80d8
This collects a set of IV uses within the loop whose values can be
computed relative to each other in a sequence. Following checkins will
make use of this information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147797 91177308-0d34-0410-b5e6-96231b3b80d8
We still save an instruction when just the "and" part is replaced.
Also change the code to match comments more closely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147753 91177308-0d34-0410-b5e6-96231b3b80d8
This will be more important as we extend the LSR pass in ways that don't rely on the formula solver. In particular, we need it for constructing IV chains.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147724 91177308-0d34-0410-b5e6-96231b3b80d8
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147718 91177308-0d34-0410-b5e6-96231b3b80d8
present in the bottom of the CFG triangle, as the transformation isn't
ever valuable if the branch can't be eliminated.
Also, unify some heuristics between SimplifyCFG's multiple
if-converters, for consistency.
This fixes rdar://10627242.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147630 91177308-0d34-0410-b5e6-96231b3b80d8
code can incorrectly move the load across a store. This never
happens in practice today, but only because the current
heuristics accidentally preclude it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147623 91177308-0d34-0410-b5e6-96231b3b80d8
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.
Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147327 91177308-0d34-0410-b5e6-96231b3b80d8
This was intended to undo the sub canonicalization in cases where it's not profitable, but it also
finds some cases on it's own.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147256 91177308-0d34-0410-b5e6-96231b3b80d8
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147254 91177308-0d34-0410-b5e6-96231b3b80d8
performance regressions (both execution-time and compile-time) on our
nightly testers.
Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147131 91177308-0d34-0410-b5e6-96231b3b80d8
For example,
if (a == b) {
if (a > b) // this is false
Fixes some of the issues on <rdar://problem/10554090>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146822 91177308-0d34-0410-b5e6-96231b3b80d8
"half precision" floating-point with a first-class type.
This patch adds basic IR support (but not codegen support).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146786 91177308-0d34-0410-b5e6-96231b3b80d8
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146610 91177308-0d34-0410-b5e6-96231b3b80d8
This should always be done as a matter of principal. I don't have a
case that exposes the problem. I just noticed this recently while
scanning the code and realized I meant to fix it long ago.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146438 91177308-0d34-0410-b5e6-96231b3b80d8
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146436 91177308-0d34-0410-b5e6-96231b3b80d8
detected in the forward-CFG DFS. This prevents the reverse-CFG from
visiting blocks inside loops after blocks that dominate them in the
case where loops have multiple exits.
No testcase, because this fixes a bug which in practice only shows
up in a full optimizer run, due to the use-list order.
This fixes rdar://10422791 and others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146408 91177308-0d34-0410-b5e6-96231b3b80d8
indicates whether the intrinsic has a defined result for a first
argument equal to zero. This will eventually allow these intrinsics to
accurately model the semantics of GCC's __builtin_ctz and __builtin_clz
and the X86 instructions (prior to AVX) which implement them.
This patch merely sets the stage by extending the signature of these
intrinsics and establishing auto-upgrade logic so that the old spelling
still works both in IR and in bitcode. The upgrade logic preserves the
existing (inefficient) semantics. This patch should not change any
behavior. CodeGen isn't updated because it can use the existing
semantics regardless of the flag's value.
Note that this will be followed by API updates to Clang and DragonEgg.
Reviewed by Nick Lewycky!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146357 91177308-0d34-0410-b5e6-96231b3b80d8
Since we're not rewriting IVs in other loops, there's not much reason
to consider their stride when generating formulae.
This should reduce the number of useless formulas considered by LSR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146302 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Brendon Cahoon!
This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146245 91177308-0d34-0410-b5e6-96231b3b80d8
- Walking over pred_begin/pred_end is an expensive operation.
- PHINodes contain a value for each predecessor anyway.
- While it may look like we used to save a few iterations with the set,
be aware that getIncomingValueForBlock does a linear search on
the values of the phi node.
- Another -5% on ARMDisassembler.cpp (Release build). This was the last
entry in the profile that was obviously wasting time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145937 91177308-0d34-0410-b5e6-96231b3b80d8
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145906 91177308-0d34-0410-b5e6-96231b3b80d8
where this would be bad as the backend shouldn't have a problem inlining small
memcpys.
rdar://10510150
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145865 91177308-0d34-0410-b5e6-96231b3b80d8
- Calling getUser in a loop is much more expensive than iterating over a few instructions.
- Use it instead of the open-coded loop in AddrModeMatcher.
- 5% speedup on ARMDisassembler.cpp Release builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145810 91177308-0d34-0410-b5e6-96231b3b80d8
The callee is usually smaller than the caller, too. This reduces the compile
time of ARMDisassembler.cpp by 32% (Release build). It still takes ages to
compile though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145690 91177308-0d34-0410-b5e6-96231b3b80d8