- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
(so far 1) elements being inserted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166288 91177308-0d34-0410-b5e6-96231b3b80d8
- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
vector element-wise widening) to reduce DAG combiner and its overhead added
in X86 backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166036 91177308-0d34-0410-b5e6-96231b3b80d8
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
used as a light-weight replacement of setjmp/longjmp which are used to
implementation continuation, user-level threading, and etc. The support added
in this patch ONLY addresses this usage and is NOT intended to support SjLj
exception handling as zero-cost DWARF exception handling is used by default
in X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165989 91177308-0d34-0410-b5e6-96231b3b80d8
- Due to the current matching vector elements constraints in
ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
v2f32) is scalarized. Add a customized v2f32 widening to convert it
into a target-specific X86ISD::VFPROUND to work around this
constraints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165631 91177308-0d34-0410-b5e6-96231b3b80d8
- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
to convert it into a target-specific X86ISD::VFPEXT to work around this
constraints. This patch also reverts a previous attempt to fix this issue by
recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
reduces the overhead of supporting non-power-2 vector FP extend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165625 91177308-0d34-0410-b5e6-96231b3b80d8
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164281 91177308-0d34-0410-b5e6-96231b3b80d8
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163832 91177308-0d34-0410-b5e6-96231b3b80d8
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162187 91177308-0d34-0410-b5e6-96231b3b80d8
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161894 91177308-0d34-0410-b5e6-96231b3b80d8
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157818 91177308-0d34-0410-b5e6-96231b3b80d8
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157479 91177308-0d34-0410-b5e6-96231b3b80d8
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.
Fixes PR6679. Patch by Christoph Erhardt!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155704 91177308-0d34-0410-b5e6-96231b3b80d8
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154483 91177308-0d34-0410-b5e6-96231b3b80d8
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154396 91177308-0d34-0410-b5e6-96231b3b80d8
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154370 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151623 91177308-0d34-0410-b5e6-96231b3b80d8
[Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
Clang raised a warning, and X86 LowerOperation would assert out for
fptoui f64 to i32 because it improperly lowered to an illegal
BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
any other changes are necessary. Thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151432 91177308-0d34-0410-b5e6-96231b3b80d8