When lowering a ISD::BRCOND into a test+branch, make sure that we
always use the correct condition code to emit the test operation.
This fixes PR19858: "i8 checked mul is wrong on x86".
Patch by Keno Fisher!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210032 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.
This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
(shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
(shuffle (BINOP A, B), Undef, <Mask>).
Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209930 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Separate the check for blend shuffle_vector masks into isBlendMask.
This function will also be used to check if a vector shuffle is legal. No
change in functionality was intended, but we ended up improving codegen on
two tests, which were being (more) optimized only if the resulting shuffle
was legal.
Reviewers: nadav, delena, andreadb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3964
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209923 91177308-0d34-0410-b5e6-96231b3b80d8
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h. This patch removes the LLVM intrinsics from the backend.
We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.
The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics. As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.
CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.
Also verified that there was no assembly change in the test-suite before and
after this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209864 91177308-0d34-0410-b5e6-96231b3b80d8
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209759 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209600 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).
This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209451 91177308-0d34-0410-b5e6-96231b3b80d8
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209340 91177308-0d34-0410-b5e6-96231b3b80d8
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.
The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.
Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).
<rdar://problem/16975435>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209324 91177308-0d34-0410-b5e6-96231b3b80d8
According to Intel Software Optimization Manual on Silvermont in some cases LEA
is better to be replaced with ADD instructions:
"The rule of thumb for ADDs and LEAs is that it is justified to use LEA
with a valid index and/or displacement for non-destructive destination purposes
(especially useful for stack offset cases), or to use a SCALE.
Otherwise, ADD(s) are preferable."
Differential Revision: http://reviews.llvm.org/D3826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209198 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the X86 backend doesn't support types larger than i128 very well. For
example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted.
This fix changes the cost model to never hoist constants for types larger than
i128. Once the codegen issues have been resolved, the cost model can be updated
to allow also larger types.
This is related to <rdar://problem/16954938>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209162 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always
provide the operand size as output if the input operand is zero.
We can take advantage of this knowledge during instruction selection
stage in order to simplify a few corner case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209159 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.
Also added tests for SSE4.1, AVX, and AVX2.
Reviewers: delena, nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209156 91177308-0d34-0410-b5e6-96231b3b80d8
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209123 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern. This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions. No functional change beyond the removal of the old
constructors are intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209082 91177308-0d34-0410-b5e6-96231b3b80d8
were added in SSE2, no SSSE3. Found this while auditing all uses of
SSSE3 in the X86 target. I don't actually expect this to make
a significant difference on anything and I don't have any detailed test
cases but I updated the existing test cases that already covered some of
this code path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209056 91177308-0d34-0410-b5e6-96231b3b80d8
vselects with constant masks, after legalization, will get turned into
specialized shuffle_vectors so they can be matched to blend+imm
instructions.
Fixed some tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209044 91177308-0d34-0410-b5e6-96231b3b80d8
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.
This is not enough for all cases. An additional SELECTCombine optimization
will be committed.
Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209043 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change intended. The types that previously were set to
lower as Expand or Legal are doing the same thing with this lowering
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209042 91177308-0d34-0410-b5e6-96231b3b80d8
In AT&T syntax, we should probably print the full "movl" or "movw". TableGen
used to ignore these aliases because it was miscounting the number of operands.
This fixes the issue.
This will be tested when the TableGen "should I print this Alias"
heuristic is fixed (very soon).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208963 91177308-0d34-0410-b5e6-96231b3b80d8
Added target specific combine rules to fold blend intrinsics according
to the following rules:
1) fold(blend A, A, Mask) -> A;
2) fold(blend A, B, <allZeros>) -> A;
3) fold(blend A, B, <allOnes>) -> B.
Added two new tests to verify that the new folding rules work for all
the optimized blend intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208895 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, TableGen assumed that every aliased operand consumed precisely 1
MachineInstr slot (this was reasonable because until a couple of days ago,
nothing more complicated was eligible for printing).
This allows a couple more ARM64 aliases to print so we can remove the special
code.
On the X86 side, I've gone for explicit AT&T size specifiers as the default, so
turned off a few of the aliases that would have just started printing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208880 91177308-0d34-0410-b5e6-96231b3b80d8
To get at least one use of the change (and some actual tests) in with its
commit, I've enabled the AArch64 & ARM64 NEON mov aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208867 91177308-0d34-0410-b5e6-96231b3b80d8
For example
tzcntl %edi, %ebx
testl %edi, %edi
je .label
can be rewritten into
tzcntl %edi, %ebx
jb .label
A minor complication is that tzcnt sets CF instead of ZF when the input
is zero, we have to rewrite users of the flags from ZF to CF. Currently
we recognize patterns using lzcnt, tzcnt and popcnt.
Differential Revision: http://reviews.llvm.org/D3454
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208788 91177308-0d34-0410-b5e6-96231b3b80d8
r208453 added support for having sret on the second parameter. In that
change, the code for copying sret into a virtual register was hoisted
into the loop that lowers formal parameters. This caused a "Wrong
topological sorting" assertion failure during scheduling when a
parameter is passed in memory. This change undoes that by creating a
second loop that deals with sret.
I'm worried that this fix is incomplete. I don't fully understand the
dependence issues. However, with this change we produce the same DAGs
we used to produce, so if they are broken, they are just as broken as
they have always been.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208637 91177308-0d34-0410-b5e6-96231b3b80d8
1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed.
2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012).
3) I added gather/scatter prefetch intrinsics.
4) I fixed MRMm encoding for masked instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208522 91177308-0d34-0410-b5e6-96231b3b80d8
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208508 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering build_vector to an insertps, we would still lower it, even
if the source vectors weren't v4x32. This would break on avx if the source
was a v8x32. We now check the type of the source vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208487 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r200561.
This calling convention was an attempt to match the MSVC C++ ABI for
methods that return structures by value. This solution didn't scale,
because it would have required splitting every CC available on Windows
into two: one for methods and one for free functions.
Now that we can put sret on the second arg (r208453), and Clang does
that (r208458), revert this hack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208459 91177308-0d34-0410-b5e6-96231b3b80d8