Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.
<rdar://problem/16074331>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204631 91177308-0d34-0410-b5e6-96231b3b80d8
The patch defines new or refines existing generic scheduling classes to match
the behavior of the SSE instructions.
It also maps those scheduling classes on the related SSE instructions.
<rdar://problem/15607571>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202065 91177308-0d34-0410-b5e6-96231b3b80d8
Original commits messages:
Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code.
Simplify a bunch of code by removing the need for the x86 disassembler table builder to know about extended opcodes. The modrm forms are sufficient to convey the information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201065 91177308-0d34-0410-b5e6-96231b3b80d8
r201059 appears to cause a crash in a bootstrapped build of clang. Craig
isn't available to look at it right now, so I'm reverting it while he
investigates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201064 91177308-0d34-0410-b5e6-96231b3b80d8
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200957 91177308-0d34-0410-b5e6-96231b3b80d8
I believe VZEXT_MOVL means "zero all vector elements except the first" (and
should have identical input & output types) whereas VZEXT means "zero extend
each element of a vector (discarding higher elements if necessary)".
For example:
(v4i32 (vzext (v16i8 ...)))
should zero extend the low 4 bytes of the incoming vector to 32-bits,
discarding higher bytes.
However, somewhere in the past, these two concepts had become confused, even
leading to a nonsensical VSEXT_MOVL.
This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
-> VZEXT when it's an actual extension).
rdar://problem/15981990
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200918 91177308-0d34-0410-b5e6-96231b3b80d8
This should allow SSE instructions to be encoded correctly in 16-bit mode which r198586 probably broke.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199193 91177308-0d34-0410-b5e6-96231b3b80d8
That's what it actually means, and with 16-bit support it's going to be
a little more relevant since in a few corner cases we may actually want
to distinguish between 16-bit and 32-bit mode (for example the bare 'push'
aliases to pushw/pushl etc.)
Patch by David Woodhouse
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197768 91177308-0d34-0410-b5e6-96231b3b80d8
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197384 91177308-0d34-0410-b5e6-96231b3b80d8
a vector packed single/double fp operation followed by a vector insert.
The effect is that the backend coverts the packed fp instruction
followed by a vectro insert into a SSE or AVX scalar fp instruction.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
__m128 C = A + B;
return (__m128) {c[0], a[1], a[2], a[3]};
}
previously we generated:
addps %xmm0, %xmm1
movss %xmm1, %xmm0
we now generate:
addss %xmm1, %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197145 91177308-0d34-0410-b5e6-96231b3b80d8
immediately after SSE scalar fp instructions like addss or mulss.
Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
A[0] += B[0];
return A;
}
previously we generated:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
now we generate:
addss %xmm1, %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196925 91177308-0d34-0410-b5e6-96231b3b80d8
- When selecting BLEND from vselect, the operands need swapping as due to the
difference between vselect and SSE/AVX's BLEND insn
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193900 91177308-0d34-0410-b5e6-96231b3b80d8
On sandy bridge (PR17654) we now get
vpxor %xmm1, %xmm1, %xmm1
vpunpckhbw %xmm1, %xmm0, %xmm2
vpunpcklbw %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
On haswell it's a simple
vpmovzxbw %xmm0, %ymm0
There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193262 91177308-0d34-0410-b5e6-96231b3b80d8
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Fixes <rdar://problem/14968098>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193096 91177308-0d34-0410-b5e6-96231b3b80d8
This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior and instruction selection already does this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192088 91177308-0d34-0410-b5e6-96231b3b80d8