The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216200 91177308-0d34-0410-b5e6-96231b3b80d8
There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only
the latter supports using different source and destination registers, so
whenever we materialize a new base register (at a certain offset) we'd do
so by moving the base register value to the new register and then adding in
place. This patch changes the code to use a single tADDi3 if the offset is
small enough to fit in 3 bits.
Differential Revision: http://reviews.llvm.org/D5006
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216193 91177308-0d34-0410-b5e6-96231b3b80d8
In both Clang and LLVM, this is a common pattern:
Size = sizeof(DeclRefExpr) + SomeExtraStuff;
void *Mem = Context.Allocate(Size, llvm::alignOf<DeclRefExpr>());
return new (Mem) DeclRefExpr(...);
The annoying thing is that because the default placement-new operator has a
nothrow specification, the compiler will insert a null check of Mem before
calling the DeclRefExpr constructor. This null check is redundant for us,
because we expect the allocation functions to never return null.
By annotating the allocator functions with returns_nonnull, we can optimize
away these checks. Compiling clang with a recent version of Clang and measuring
with:
$ perf stat -r20 bin/clang.patch -fsyntax-only -w gcc.c && perf stat -r20 bin/clang.orig -fsyntax-only -w gcc.c
Shows a 2.4% speed-up (+- 0.8%).
The pattern occurs in LLVM too. Measuring with -O3 (and now using bzip2.c
instead, because it's smaller):
$ perf stat -r20 bin/clang.patch -O3 -w bzip2.c && perf stat -r20 bin/clang.orig -O3 -w bzip2.c
Shows 4.4 % speed-up (+- 1%).
If anyone knows of a similar attribute we can use for MSVC, or some other
technique to get rid off the null check there, please let me know.
Differential Revision: http://reviews.llvm.org/D4989
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216192 91177308-0d34-0410-b5e6-96231b3b80d8
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.
This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216172 91177308-0d34-0410-b5e6-96231b3b80d8
It's not meant to be used with operator delete and this avoids emitting virtual
dtors for every derived format object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216170 91177308-0d34-0410-b5e6-96231b3b80d8
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:
int N = 100;
float **A;
void foo(int x0, int x1)
{
float * A_cur = &A[0][0];
float * A_next = &A[1][0];
for(int x = x0; x < x1; ++x).
{
// Currently only [x+N] case is widened. Others 2 cases lead to sext.
// This patch fixes it, so all 3 cases do not need sext.
const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
A_next[x] = div;
}
}
...
> clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o -
Differential Revision: http://reviews.llvm.org/D4695
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216160 91177308-0d34-0410-b5e6-96231b3b80d8
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216144 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216140 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216139 91177308-0d34-0410-b5e6-96231b3b80d8
advanced copy optimization.
This patch is a step toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov r0, r1, d16
but it does not understand yet
vmov.32 d16[0], r0
vmov.32 d16[1], r1
Comming patches will fix that and update the related test case.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216136 91177308-0d34-0410-b5e6-96231b3b80d8
It makes no sense and can hide bugs. In particular, it lead
to left shift by 64 bits, which is an undefined behavior,
properly reported by UBSan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216134 91177308-0d34-0410-b5e6-96231b3b80d8
target hook.
This patch teaches the compiler that:
rX, rY = VMOVRRD dZ
is the same as:
rX = EXTRACT_SUBREG dZ, ssub_0
rY = EXTRACT_SUBREG dZ, ssub_1
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216132 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new property: isExtractSubreg and the related target hooks:
TargetIntrInfo::getExtractSubregInputs and
TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific
instruction is a (kind of) EXTRACT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216130 91177308-0d34-0410-b5e6-96231b3b80d8
Store TargetSelectionDAGInfo as a pointer instead of a reference:
getSelectionDAGInfo() may not be implemented for certain backends
(e.g. it's not currently implemented for R600).
This bug is reported by UBSan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216129 91177308-0d34-0410-b5e6-96231b3b80d8
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648
This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.
This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.
Differential Revision: http://reviews.llvm.org/D4934
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly. The controlling sentence is, "The size of each argument gets
rounded up to eightbytes. Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area." For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.
Patch by Thomas Jablin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8