lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed.
Added a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148044 91177308-0d34-0410-b5e6-96231b3b80d8
As the comment around 7746 says, it's better to use the x87 extended precision
here than SSE. And the generic code doesn't know how to do that. It also regains
the speed lost for the uint64_to_float.c testcase.
<rdar://problem/10669858>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147869 91177308-0d34-0410-b5e6-96231b3b80d8
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)
Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147601 91177308-0d34-0410-b5e6-96231b3b80d8
This small bit of ASM code is sufficient to do what the old algorithm did:
movq %rax, %xmm0
punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U }
subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 }
#ifdef __SSE3__
haddpd %xmm0, %xmm0
#else
pshufd $0x4e, %xmm0, %xmm1
addpd %xmm1, %xmm0
#endif
It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on
all processors.
<rdar://problem/7719814>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147593 91177308-0d34-0410-b5e6-96231b3b80d8
(x > y) ? x : y
=>
(x >= y) ? x : y
So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0
This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl %esi, %edi
testl %edi, %edi
movl $0, %eax
cmovgl %edi, %eax
=>
xorl %eax, %eax
subl %esi, $edi
cmovsl %eax, %edi
rdar://10633221
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147512 91177308-0d34-0410-b5e6-96231b3b80d8
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147308 91177308-0d34-0410-b5e6-96231b3b80d8
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type
We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147251 91177308-0d34-0410-b5e6-96231b3b80d8
'bsf' instructions here.
This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147246 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147244 91177308-0d34-0410-b5e6-96231b3b80d8
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146974 91177308-0d34-0410-b5e6-96231b3b80d8
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146466 91177308-0d34-0410-b5e6-96231b3b80d8