This code had previously used 2*N, where N is the mask length, to represent
undef. That is not safe because the shufflevector operands may have more
than N elements -- they don't have to match the result type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117721 91177308-0d34-0410-b5e6-96231b3b80d8
Allow splats even if they don't match either of the original shuffles,
possibly due to undef entries in the shuffles masks. Radar 8597790.
Also fix some 80-column violations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117719 91177308-0d34-0410-b5e6-96231b3b80d8
Usually we wouldn't do this anyway because llvm_fenv_testexcept would return an
exception, but we have seen some cases where neither errno nor fenv detect an
exception on arm-linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114893 91177308-0d34-0410-b5e6-96231b3b80d8
so that it detects errors on platforms where libm doesn't set errno.
It's still subject to host libm details though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114148 91177308-0d34-0410-b5e6-96231b3b80d8
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113763 91177308-0d34-0410-b5e6-96231b3b80d8
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113679 91177308-0d34-0410-b5e6-96231b3b80d8
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113260 91177308-0d34-0410-b5e6-96231b3b80d8
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112314 91177308-0d34-0410-b5e6-96231b3b80d8
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112304 91177308-0d34-0410-b5e6-96231b3b80d8
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112285 91177308-0d34-0410-b5e6-96231b3b80d8
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112278 91177308-0d34-0410-b5e6-96231b3b80d8
instructions with alignment 0, so that subsequent passes don't
need to bother checking the TargetData ABI size manually.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110128 91177308-0d34-0410-b5e6-96231b3b80d8