Zero is used by BlockFrequencyInfo as a special "don't know" value. It also
causes a sink for frequencies as you can't ever get off a zero frequency with
more multiplies.
This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency
was propagated into an inner loop causing excessive spilling.
PR16402.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184584 91177308-0d34-0410-b5e6-96231b3b80d8
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.
Added analysis and code generation tests. Added documentation for the
new attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182638 91177308-0d34-0410-b5e6-96231b3b80d8
We switch the order of offset and field type to make TBAAStructType node
(name, parent node, offset) similar to scalar TBAA node (name, parent node).
TypeIsImmutable is added to TBAAStructTag node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180654 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
Added PathAliases to check if two struct-path tags can alias.
Added command line option -struct-path-tbaa.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179337 91177308-0d34-0410-b5e6-96231b3b80d8
The costs are overfitted so that I can still use the legalization factor.
For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.
unsigned short A[1024];
double B[1024];
void f() {
int i;
for (i = 0; i < 1024; ++i) {
B[i] = (double) A[i];
}
}
radar://13599001
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.
With this change we differentiate between shifts of constants and other shifts.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.
For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.
radar://13130673
radar://13537826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR15570: SEGV: SCEV back-edge info invalid after dead code removal.
Indvars creates a SCEV expression for the loop's back edge taken
count, then determines that the comparison is always true and
removes it.
When loop-unroll asks for the expression, it contains a NULL
SCEVUnknkown (as a CallbackVH).
forgetMemoizedResults should invalidate the loop back edges expression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177986 91177308-0d34-0410-b5e6-96231b3b80d8
Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries
between pointers do not include TBAA tags.
Add testing case for "placement new". TBAA currently says NoAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177772 91177308-0d34-0410-b5e6-96231b3b80d8
- After moving logic recognizing vector shift with scalar amount from
DAG combining into DAG lowering, we declare to customize all vector
shifts even vector shift on AVX is legal. As a result, the cost model
needs special tuning to identify these legal cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177380 91177308-0d34-0410-b5e6-96231b3b80d8
The default logic marks them as too expensive.
For example, before this patch we estimated:
cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float>
While this translates to:
vmovl.u16 q8, d16
vcvt.f32.u32 q8, q8
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13445992
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8
Fix cost of some "cheap" cast instructions. Before this patch we used to
estimate for example:
cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16>
While we would emit:
vcvt.s32.f32 q8, q8
vmovn.i32 d16, q8
vuzp.8 d16, d17
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13434072
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.
Changing the code fragment to:
%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>
define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
%T1_3* %blend, %T0_3* %storeaddr) {
%v0 = load %T0_3* %loadaddr
%v1 = load %T0_3* %loadaddr2
==> FROM:
;%c = load %T1_3* %blend
==> TO:
%c = icmp slt %T0_3 %v0, %v1
==> USE:
%r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1
store %T0_3 %r, %T0_3* %storeaddr
ret void
}
revealed this mistake.
radar://13403975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177170 91177308-0d34-0410-b5e6-96231b3b80d8
By terrible I mean we store/load from the stack.
This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.
LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32
The bug that tracks the CodeGen part is PR14868.
radar://13403975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177105 91177308-0d34-0410-b5e6-96231b3b80d8
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
currently lowers those using stack accesses.
This was responsible for a significant degradation on
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
where we vectorize one loop to a vector factor of 16. After this patch we select
a vector factor of 4 which will generate reasonable code.
unsigned char cle[32];
void test(short c) {
unsigned short compte;
for (compte = 0; compte <= 31; compte++) {
cle[compte] = cle[compte] ^ c;
}
}
radar://13220512
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176898 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.
Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.
Many tests depend on grepping "-stats" output. Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.
Differential Revision: http://llvm-reviews.chandlerc.com/D486
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176733 91177308-0d34-0410-b5e6-96231b3b80d8
The "invariant.load" metadata indicates the memory unit being accessed is immutable.
A load annotated with this metadata can be moved across any store.
As I am not sure if it is legal to move such loads across barrier/fence, this
change dose not allow such transformation.
rdar://11311484
Thank Arnold for code review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176562 91177308-0d34-0410-b5e6-96231b3b80d8
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.
Differential Revision: http://llvm-reviews.chandlerc.com/D466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8
Listing all of the attributes for the callee of a call/invoke instruction is way
too much and makes the IR unreadable. Use references to attributes instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175877 91177308-0d34-0410-b5e6-96231b3b80d8
sext <4 x i1> to <4 x i64>
sext <4 x i8> to <4 x i64>
sext <4 x i16> to <4 x i64>
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
(sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.
I also added a cost of this operations to the AVX costs table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8
Profiling tests *do* need a JIT. They'll pass if a cross-compiler targetting
AArch64 by default has been built, but fail if a native AArch64 compiler has
been build. Therefore XFAIL is inappropriate and we mark them unsupported.
ExecutionEngine tests are JIT by definition, they should also be unsupported.
Transforms/LICM only uses the interpreter to check the output is still sane
after optimisation. It can be switched to use an interpreter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175433 91177308-0d34-0410-b5e6-96231b3b80d8
Thanks to help from Nadav and Hal, I have a more reasonable (and even
correct!) approach. This specifically penalizes the insertelement
and extractelement operations for the performance hit that will occur
on PowerPC processors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174725 91177308-0d34-0410-b5e6-96231b3b80d8
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.
radar://13097204
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174713 91177308-0d34-0410-b5e6-96231b3b80d8
Swift has a renaming dependency if we load into D subregisters. We don't have a
way of distinguishing between insertelement operations of values from loads and
other values. Therefore, we are pessimistic for now (The performance problem
showed up in example 14 of gcc-loops).
radar://13096933
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174300 91177308-0d34-0410-b5e6-96231b3b80d8
This provides a place to add customized operation cost information and
control some other target-specific IR-level transformations.
The only non-trivial logic in this checkin assigns a higher cost to
unaligned loads and stores (covered by the included test case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173520 91177308-0d34-0410-b5e6-96231b3b80d8
This test did not test anything at all (except for opt crashing, but that was
not the reason why it was added).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171248 91177308-0d34-0410-b5e6-96231b3b80d8
Analyse Phis under the starting assumption that they are NoAlias. Recursively
look at their inputs.
If they MayAlias/MustAlias there must be an input that makes them so.
Addresses bug 14351.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169788 91177308-0d34-0410-b5e6-96231b3b80d8
more information for dependences between
instructions that don't share a common loop.
Updated the test results appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168965 91177308-0d34-0410-b5e6-96231b3b80d8
there's no possible loo-independent dependence, then there's no
dependence.
Updated all test result appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168719 91177308-0d34-0410-b5e6-96231b3b80d8
If the Src and Dst are the same instruction,
no loop-independent dependence is possible,
so we force the PossiblyLoopIndependent flag to false.
The test case results are updated appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168678 91177308-0d34-0410-b5e6-96231b3b80d8
analysis. Better is to look for cases with useful GEPs and use them
when possible. When a pair of useful GEPs is not available, use the
raw SCEVs directly. This approach supports better analysis of pointer
dereferencing.
In parallel, all the test cases are updated appropriately.
Cases where we have a store to *B++ can now be analyzed!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168474 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial solution to PR14351. It removes some of the special
significance of the first incoming phi value in the phi aliasing checking logic
in BasicAA. In the context of a loop, the old logic assumes that the first
incoming value is the interesting one (meaning that it is the one that comes
from outside the loop), but this is often not the case. With this change, we
now test first the incoming value that comes from a block other than the parent
of the phi being tested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168245 91177308-0d34-0410-b5e6-96231b3b80d8
'nocapture' attribute.
The nocapture attribute only specifies that no copies are made that
outlive the function. This isn't the same as there being no copies at all.
This fixes PR14045.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167381 91177308-0d34-0410-b5e6-96231b3b80d8
It was unmaintained and not much more than a stub. The new DependenceAnalysis
pass is both more general and complete.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166810 91177308-0d34-0410-b5e6-96231b3b80d8
Patch from Preston Briggs <preston.briggs@gmail.com>.
This is an updated version of the dependence-analysis patch, including an MIV
test based on Banerjee's inequalities.
It's a fairly complete implementation of the paper
Practical Dependence Testing
Gina Goff, Ken Kennedy, and Chau-Wen Tseng
PLDI 1991
It cannot yet propagate constraints between coupled RDIV subscripts (discussed
in Section 5.3.2 of the paper).
It's organized as a FunctionPass with a single entry point that supports testing
for dependence between two instructions in a function. If there's no dependence,
it returns null. If there's a dependence, it returns a pointer to a Dependence
which can be queried about details (what kind of dependence, is it loop
independent, direction and distance vector entries, etc). I haven't included
every imaginable feature, but there's a good selection that should be adequate
for supporting many loop transformations. Of course, it can be extended as
necessary.
Included in the patch file are many test cases, commented with C code showing
the loops and array references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165708 91177308-0d34-0410-b5e6-96231b3b80d8
teach the callgraph logic to not create callgraph edges to intrinsics for invoke
instructions; it already skips this for call instructions. Fixes PR13903.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164707 91177308-0d34-0410-b5e6-96231b3b80d8
Enhances basic alias analysis to recognize phis whose first incoming values are
NoAlias and whose other incoming values are just the phi node itself through
some amount of recursion.
Example: With this change basicaa reports that ptr_phi and ptr_phi2 do not alias
each other.
bb:
ptr = ptr2 + 1
loop:
ptr_phi = phi [bb, ptr], [loop, ptr_plus_one]
ptr2_phi = phi [bb, ptr2], [loop, ptr2_plus_one]
...
ptr_plus_one = gep ptr_phi, 1
ptr2_plus_one = gep ptr2_phi, 1
This enables the elimination of one load in code like the following:
extern int foo;
int test_noalias(int *ptr, int num, int* coeff) {
int *ptr2 = ptr;
int result = (*ptr++) * (*coeff--);
while (num--) {
*ptr2++ = *ptr;
result += (*coeff--) * (*ptr++);
}
*ptr = foo;
return result;
}
Part 2/2 of fix for PR13564.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163319 91177308-0d34-0410-b5e6-96231b3b80d8
If we can show that the base pointers of two GEPs don't alias each other using
precise analysis and the indices and base offset are equal then the two GEPs
also don't alias each other.
This is primarily needed for the follow up patch that analyses NoAlias'ing PHI
nodes.
Part 1/2 of fix for PR13564.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163317 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements ProfileDataLoader which loads profile data generated by
-insert-edge-profiling and updates branch weight metadata accordingly.
Patch by Alastair Murray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162799 91177308-0d34-0410-b5e6-96231b3b80d8
the case of multiple edges from one block to another.
A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.
Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162572 91177308-0d34-0410-b5e6-96231b3b80d8
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162097 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, if GetLocation reports that it did not find a valid pointer (this is the case for volatile load/stores),
we ignore the result. This patch adds code to handle the cases where we did not obtain a valid pointer.
rdar://11872864 PR12899
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161802 91177308-0d34-0410-b5e6-96231b3b80d8
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159547 91177308-0d34-0410-b5e6-96231b3b80d8
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.
This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159544 91177308-0d34-0410-b5e6-96231b3b80d8
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159525 91177308-0d34-0410-b5e6-96231b3b80d8
If integer overflow causes one of the terms to reach zero, that can
force the entire expression to zero.
Fixes PR12929: cast<Ty>() argument of incompatible type
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157673 91177308-0d34-0410-b5e6-96231b3b80d8
getUDivExpr attempts to simplify by checking for overflow.
isLoopEntryGuardedByCond then evaluates the loop predicate which
may lead to the same getUDivExpr causing endless recursion.
Fixes PR12868: clang 3.2 segmentation fault.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157092 91177308-0d34-0410-b5e6-96231b3b80d8
verifier does. This correctly handles invoke.
Thanks to Duncan, Andrew and Chris for the comments.
Thanks to Joerg for the early testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151469 91177308-0d34-0410-b5e6-96231b3b80d8
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.
Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147327 91177308-0d34-0410-b5e6-96231b3b80d8
probability wouldn't be considered "hot" in some weird loop structures
or other compounding probability patterns. This makes it much harder to
confuse, but isn't really a principled fix. I'd actually like it if we
could model a zero probability, as it would make this much easier to
reason about. Suggestions for how to do this better are welcome.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147142 91177308-0d34-0410-b5e6-96231b3b80d8
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146370 91177308-0d34-0410-b5e6-96231b3b80d8
bots. Original commit messages:
- Reapply r142781 with fix. Original message:
Enhance SCEV's brute force loop analysis to handle multiple PHI nodes in the
loop header when computing the trip count.
With this, we now constant evaluate:
struct ListNode { const struct ListNode *next; int i; };
static const struct ListNode node1 = {0, 1};
static const struct ListNode node2 = {&node1, 2};
static const struct ListNode node3 = {&node2, 3};
int test() {
int sum = 0;
for (const struct ListNode *n = &node3; n != 0; n = n->next)
sum += n->i;
return sum;
}
- Now that we look at all the header PHIs, we need to consider all the header PHIs
when deciding that the loop has stopped evolving. Fixes miscompile in the gcc
torture testsuite!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142919 91177308-0d34-0410-b5e6-96231b3b80d8
classifying many edges as exiting which were in fact not. These mainly
formed edges into sub-loops. It was also not correctly classifying all
returning edges out of loops as leaving the loop. With this match most
of the loop heuristics are more rational.
Several serious regressions on loop-intesive benchmarks like perlbench's
loop tests when built with -enable-block-placement are fixed by these
updated heuristics. Unfortunately they in turn uncover some other
regressions. There are still several improvemenst that should be made to
loop heuristics including trip-count, and early back-edge management.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142917 91177308-0d34-0410-b5e6-96231b3b80d8
the dragonegg and llvm-gcc self-host buildbots. Original commit
messages:
- Reapply r142781 with fix. Original message:
Enhance SCEV's brute force loop analysis to handle multiple PHI nodes in the
loop header when computing the trip count.
With this, we now constant evaluate:
struct ListNode { const struct ListNode *next; int i; };
static const struct ListNode node1 = {0, 1};
static const struct ListNode node2 = {&node1, 2};
static const struct ListNode node3 = {&node2, 3};
int test() {
int sum = 0;
for (const struct ListNode *n = &node3; n != 0; n = n->next)
sum += n->i;
return sum;
}
- Now that we look at all the header PHIs, we need to consider all the header PHIs
when deciding that the loop has stopped evolving. Fixes miscompile in the gcc
torture testsuite!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142916 91177308-0d34-0410-b5e6-96231b3b80d8
introduce no-return or unreachable heuristics.
The return heuristics from the Ball and Larus paper don't work well in
practice as they pessimize early return paths. The only good hitrate
return heuristics are those for:
- NULL return
- Constant return
- negative integer return
Only the last of these three can possibly require significant code for
the returning block, and even the last is fairly rare and usually also
a constant. As a consequence, even for the cold return paths, there is
little code on that return path, and so little code density to be gained
by sinking it. The places where sinking these blocks is valuable (inner
loops) will already be weighted appropriately as the edge is a loop-exit
branch.
All of this aside, early returns are nearly as common as all three of
these return categories, and should actually be predicted as taken!
Rather than muddy the waters of the static predictions, just remain
silent on returns and let the CFG itself dictate any layout or other
issues.
However, the return heuristic was flagging one very important case:
unreachable. Unfortunately it still gave a 1/4 chance of the
branch-to-unreachable occuring. It also didn't do a rigorous job of
finding those blocks which post-dominate an unreachable block.
This patch builds a more powerful analysis that should flag all branches
to blocks known to then reach unreachable. It also has better worst-case
runtime complexity by not looping through successors for each block. The
previous code would perform an N^2 walk in the event of a single entry
block branching to N successors with a switch where each successor falls
through to the next and they finally fall through to a return.
Test case added for noreturn heuristics. Also doxygen comments improved
along the way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142793 91177308-0d34-0410-b5e6-96231b3b80d8
Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.
coming out of indvars.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142786 91177308-0d34-0410-b5e6-96231b3b80d8
to bring it under direct test instead of merely indirectly testing it in
the BlockFrequencyInfo pass.
The next step is to start adding tests for the various heuristics
employed, and to start fixing those heuristics once they're under test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142778 91177308-0d34-0410-b5e6-96231b3b80d8
able to constant fold load instructions where the argument is a constant.
Second, we should be able to watch multiple PHI nodes through the loop; this
patch only supports PHIs in loop headers, more can be done here.
With this patch, we now constant evaluate:
static const int arr[] = {1, 2, 3, 4, 5};
int test() {
int sum = 0;
for (int i = 0; i < 5; ++i) sum += arr[i];
return sum;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142731 91177308-0d34-0410-b5e6-96231b3b80d8
and switches, with arbitrary numbers of successors. Still optimized for
the common case of 2 successors for a conditional branch.
Add a test case for switch metadata showing up in the BlockFrequencyInfo pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142493 91177308-0d34-0410-b5e6-96231b3b80d8
encoding of probabilities. In the absense of metadata, it continues to
fall back on static heuristics.
This allows __builtin_expect, after lowering through llvm.expect
a branch instruction's metadata, to actually enter the branch
probability model. This is one component of resolving PR2577.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142492 91177308-0d34-0410-b5e6-96231b3b80d8
layer already had support for printing the results of this analysis, but
the wiring was missing.
Now that printing the analysis works, actually bring some of this
analysis, and the BranchProbabilityInfo analysis that it wraps, under
test! I'm planning on fixing some bugs and doing other work here, so
having a nice place to add regression tests and a way to observe the
results is really useful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142491 91177308-0d34-0410-b5e6-96231b3b80d8
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133337 91177308-0d34-0410-b5e6-96231b3b80d8