This commit broke an x86 test and the bots have been broken for well
over an hour now so I'm just reverting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237210 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:
EAX scratch register; also used to return integer and pointer values
from functions; also stores the address of a returned struct or union
Fixes PR23491.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9715
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237175 91177308-0d34-0410-b5e6-96231b3b80d8
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237129 91177308-0d34-0410-b5e6-96231b3b80d8
Before revision 171146, function 'PerformTruncateCombine' used to perform
a premature lowering of TRUNCATE dag nodes.
Revision 171146 then moved all the logic implemented by PerformTruncateCombine
to a custom lowering hook. However, that revision forgot to delete
function PerformTruncateCombine from the code.
This patch removes function 'PerformTruncateCombine' since it has no effect
on the SelectionDAG. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237122 91177308-0d34-0410-b5e6-96231b3b80d8
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237106 91177308-0d34-0410-b5e6-96231b3b80d8
to use the information in the module rather than TargetOptions.
We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.
For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.
Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237079 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.
This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.
Tests included.
Reviewers: ab, srhines, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9092
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237004 91177308-0d34-0410-b5e6-96231b3b80d8
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236916 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236888 91177308-0d34-0410-b5e6-96231b3b80d8
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.
This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.
Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.
Added test x86-fold-pshufb.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236863 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
There are 2 structural changes here:
1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.
2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes.
Differential Revision: http://reviews.llvm.org/D8900
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236546 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236420 91177308-0d34-0410-b5e6-96231b3b80d8
Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236388 91177308-0d34-0410-b5e6-96231b3b80d8
Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236209 91177308-0d34-0410-b5e6-96231b3b80d8
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).
In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment,
we should eventually share this data between the IR passes and the backend.
We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.
The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.
In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.
Differential Revision: http://reviews.llvm.org/D9325
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235997 91177308-0d34-0410-b5e6-96231b3b80d8
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235989 91177308-0d34-0410-b5e6-96231b3b80d8
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235977 91177308-0d34-0410-b5e6-96231b3b80d8
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.
The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.
Differential Revision: http://reviews.llvm.org/D9115
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235837 91177308-0d34-0410-b5e6-96231b3b80d8
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.
So instead of producing this:
pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3]
movd %xmm0, (%eax)
movd %xmm1, 4(%eax)
We can do:
movq %xmm0, (%eax)
This is a fix for the problem noted in D7296.
Differential Revision: http://reviews.llvm.org/D9134
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235460 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.
Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296
Differential Revision: http://reviews.llvm.org/D9120
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235367 91177308-0d34-0410-b5e6-96231b3b80d8
The fix ensures that scalar sources inserted into a vector are the correct bit size.
Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235281 91177308-0d34-0410-b5e6-96231b3b80d8
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier.
So that's the worst case for this transform (no latency win),
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.
These are the sequences I'm comparing:
divss %xmm2, %xmm0
mulss %xmm1, %xmm0
divss %xmm2, %xmm0
Becomes:
movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero
divss %xmm2, %xmm3
mulss %xmm3, %xmm0
mulss %xmm1, %xmm0
mulss %xmm3, %xmm0
[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]
Differential Revision: http://reviews.llvm.org/D8941
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235012 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.
This allows folding of byte loads and reduces scalar register usage as well.
Differential Revision: http://reviews.llvm.org/D8839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234193 91177308-0d34-0410-b5e6-96231b3b80d8
We don't need to represent UnwindHelp in IR. Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234061 91177308-0d34-0410-b5e6-96231b3b80d8
Without this patch, we split the 256-bit vector into halves and produced something like:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
Now, we eliminate the xor and blend because those zeros are free with the vmovd:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233941 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us catch exceptions in simple cases.
N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
that we generate.
- IP-to-State tables are sensitive to the order of emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233767 91177308-0d34-0410-b5e6-96231b3b80d8
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)
It improves the v4i64 case although not optimally. This AVX codegen:
vmovq {{.*#+}} xmm0 = mem[0],zero
vxorpd %ymm1, %ymm1, %ymm1
vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
Becomes:
vmovsd {{.*#+}} xmm0 = mem[0],zero
Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:
We're not handling v32i8 / v16i16.
We're not getting the FP / int domains right for instruction selection.
But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64,
I'm submitting this patch ahead of fixing the above.
Differential Revision: http://reviews.llvm.org/D8341
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233704 91177308-0d34-0410-b5e6-96231b3b80d8