output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163036 91177308-0d34-0410-b5e6-96231b3b80d8
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163018 91177308-0d34-0410-b5e6-96231b3b80d8
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162735 91177308-0d34-0410-b5e6-96231b3b80d8
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162187 91177308-0d34-0410-b5e6-96231b3b80d8
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.
Before:
xorl %esi, %edi
testb %dil, %dil
setne %al
ret
After:
xorb %dil, %sil
setne %al
ret
rdar://12081007
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162160 91177308-0d34-0410-b5e6-96231b3b80d8
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161894 91177308-0d34-0410-b5e6-96231b3b80d8
- FCMOV only supports a subset of X86 conditions. Skip boolean
simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161732 91177308-0d34-0410-b5e6-96231b3b80d8
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
generated from X86ISD::SETCC, try to simplify the boolean value
generation and checking by reusing the original EFLAGS with proper
condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
consuming EFLAGS
part of patches fixing PR12312
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161687 91177308-0d34-0410-b5e6-96231b3b80d8
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.
Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.
rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161462 91177308-0d34-0410-b5e6-96231b3b80d8
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8
uint32_t hi(uint64_t res)
{
uint_32t hi = res >> 32;
return !hi;
}
llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
%lnot = icmp ult i64 %res, 4294967296
%lnot.ext = zext i1 %lnot to i32
ret i32 %lnot.ext
}
The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
movabsq $4294967296, %rax ## imm = 0x100000000
cmpq %rax, %rdi
sbbl %eax, %eax
andl $1, %eax
This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
shrq $32, %rdi
testl %edi, %edi
sete %al
movzbl %al, %eax
It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1
rdar://11866926
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160312 91177308-0d34-0410-b5e6-96231b3b80d8