Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.
Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.
Added some tests for different cases where this optimization triggers.
Reviewers: nadav, delena, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3521
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208271 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, the backend always emitted a store+load sequence to
bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that
performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting
i64 node was then used to build a v2i32 vector.
With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from
MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free"
bitcast to type MVT::v4i32. The elements of the resulting
v4i32 are then extracted to build a v2i32 vector (which is illegal and
therefore promoted to MVT::v2i64).
This is in general cheaper than emitting a stack store+load sequence
to bitconvert the operand from type f64 to type i64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208107 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
The Win64 docs are very clear that anything larger than 8 bytes is
passed by reference, and GCC MinGW64 honors that for __modti3 and
friends.
Patch by Jameson Nash!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208029 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails. Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.
The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt. This change merely arranges for musttail
calls to use it.
No functional change for GuaranteedTailCallOpt.
Reviewers: espindola
Differential Revision: http://reviews.llvm.org/D3493
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207598 91177308-0d34-0410-b5e6-96231b3b80d8
This gets us pretty code for divs of i16 vectors. Turn the existing
intrinsics into the corresponding nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207317 91177308-0d34-0410-b5e6-96231b3b80d8
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.
<rdar://problem/16730541>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207301 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.
Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3475
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207291 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207143 91177308-0d34-0410-b5e6-96231b3b80d8
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
correctly verifies that both READCYCLECOUNTER and the two new intrinsics
work fine for both 64bit and 32bit Subtargets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207127 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to compile
return (mask & 0x8 ? a : b);
into
testb $8, %dil
cmovnel %edx, %esi
instead of
andl $8, %edi
shrl $3, %edi
cmovnel %edx, %esi
which we formed previously because dag combiner canonicalizes setcc of and into shift.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207088 91177308-0d34-0410-b5e6-96231b3b80d8
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.
<rdar://problem/15480077>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206738 91177308-0d34-0410-b5e6-96231b3b80d8
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps. Consider:
(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
(extract_vector_elt %vreg0, Constant<2>),
(extract_vector_elt %vreg0, Constant<3>),
(extract_vector_elt %vreg0, Constant<4>),
(extract_vector_elt %vreg0, Constant<5>),
(extract_vector_elt %vreg0, Constant<6>),
(extract_vector_elt %vreg0, Constant<7>),
%vreg1))
a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.
b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.
c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.
Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:
(v4f32 (BUILD_VECTOR (extract_vector_elt
(vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
%vreg1))
In order to figure out the underlying vector and their identity we need to see
through the shuffles.
[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.
Fixes <rdar://problem/16296956>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206634 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.
When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.
Example
(v4i32 (srl A, (build_vector < X, Y, Y, Y>)))
Can be rewritten as:
(v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))
[with X and Y ConstantInt]
The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206316 91177308-0d34-0410-b5e6-96231b3b80d8
I found this from a particular GDB test suite case of inlining
(something similar is provided as a test case) but came across a few
other related cases (other callers of the same functions, and one other
instance of the same coding mistake in a separate function).
I'm not sure what the best way to test this is (let alone to cover the
other cases I discovered), so hopefully this sufficies - open to ideas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206130 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205649 91177308-0d34-0410-b5e6-96231b3b80d8
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.
Suggestion by Saleem Abdulrasool!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205393 91177308-0d34-0410-b5e6-96231b3b80d8
This adds back r204781.
Original message:
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204934 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204784 91177308-0d34-0410-b5e6-96231b3b80d8