1) i512mem -> f512mem (this is the packed FP input being permuted)
2) element size is 64 bits in EVEX_CD8 for PD.
(A good illustration why X86VectorVTInfo is useful)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220734 91177308-0d34-0410-b5e6-96231b3b80d8
For a call to not return in to the stackmap shadow, the shadow must end with the call.
To do this, we must insert any required nops *before* the call, and not after it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220728 91177308-0d34-0410-b5e6-96231b3b80d8
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.
A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220710 91177308-0d34-0410-b5e6-96231b3b80d8
Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).
Minor patch agreed with qcolombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220613 91177308-0d34-0410-b5e6-96231b3b80d8
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section. But when either symbol is undefined it
is an error.
The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.
rdar://18678402
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220599 91177308-0d34-0410-b5e6-96231b3b80d8
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.
An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.
Differential Revision: http://reviews.llvm.org/D5917
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220592 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
This is asm/diasm-only support, similar to AVX.
For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.
For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant. The addition operand (op3) in both cases can come from
memory. Again the ony difference is which operand is destructed.
There could be a post-RA pass that would convert a 213 or 132 into a 231.
Part of <rdar://problem/17082571>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220540 91177308-0d34-0410-b5e6-96231b3b80d8
This multiclass generates the different forms: 213, 231, 132 in AVX.
132 in AVX512 is a separate class but I am planning to use this same
multiclass to generate 231 relying on the nice the null_frag trick from AVX to
disable codegen pattern for 231.
No functionality change, no change in X86.td.expanded except for the different
instruction definition names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220539 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.
This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.
i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.
Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.
A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):
// FIXME: Used for 8-bit mul, ignore result upper 8 bits.
// This probably ought to be moved to a def : Pat<> if the
// syntax can be accepted.
[(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]
Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.
Before this change:
movsbl %sil, %eax
movsbl %dil, %ecx
imull %eax, %ecx
movb %cl, %al
sarb $7, %al
movzbl %al, %eax
movzbl %ch, %esi
cmpl %eax, %esi
setne %al
After:
movb %dil, %al
imulb %sil
seto %al
Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.
Differential Revision: http://reviews.llvm.org/D5809
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220516 91177308-0d34-0410-b5e6-96231b3b80d8
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220256 91177308-0d34-0410-b5e6-96231b3b80d8
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.
This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.
<rdar://problem/18675020>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220242 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.
Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.
Added additional regression test case provided by Joerg Sonnenberger.
Differential Revision: http://reviews.llvm.org/D5818
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220239 91177308-0d34-0410-b5e6-96231b3b80d8
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.
This fixes PR19370.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220054 91177308-0d34-0410-b5e6-96231b3b80d8
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).
Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ.
Fixes <rdar://problem/18426089>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219874 91177308-0d34-0410-b5e6-96231b3b80d8
The new attributes are NumElts and the CD8TupleForm. This prepares the code
to enable x8 and x2 inserts.
NFC, no change in X86.td.expanded except for the new attributes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219871 91177308-0d34-0410-b5e6-96231b3b80d8
It's the W bit that selects between 32 or 64 elt type and not the opcode. The
opcode selects between the width of the insert (128 or 256).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219870 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.
This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.
Differential Revision: http://reviews.llvm.org/D5701
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219584 91177308-0d34-0410-b5e6-96231b3b80d8
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.
NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219573 91177308-0d34-0410-b5e6-96231b3b80d8
This is dangerous for numerous reasons. The primary risk here is with
floating point or double types where if the wrong header files are
included in a strange order this can implicitly convert to integers and
then call the C abs function on the integers. There is a secondary risk
that even impacts integers where if the namespace the code is written in
ever defines an abs overload for types within that namespace the global
abs will be hidden. The correct form is to call std::abs or write 'using
std::abs' for builtin types (and only the latter is correct in any
generic context).
I've also added the requisite header to be a bit more explicit here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219484 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the Pat<>'s for the intrinsics. These are necessary because we
don't lower these intrinsics to SDNodes but match them directly. See the
rational in the previous commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219362 91177308-0d34-0410-b5e6-96231b3b80d8
These derive from the new asm-only masking definitions.
Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants. The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking. These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly. We can revisit this question later
if we have a more pressing need to express something like this.
So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219361 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
No change in X86.td.expanded except for the appearance of the new attributes.
The new attributes will be used in the subsequent patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219360 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
This enables the generation of masking instructions that don't provide a
ISel pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219358 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.
Test Plan: New checks on atomic_mi.ll
Reviewers: jfb, nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219336 91177308-0d34-0410-b5e6-96231b3b80d8