Try to match scalar and first like the other instructions.
Expand 64-bit ands to a pair of 32-bit ands since that is not
available on the VALU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204660 91177308-0d34-0410-b5e6-96231b3b80d8
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.
<rdar://problem/16074331>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204631 91177308-0d34-0410-b5e6-96231b3b80d8
This is a pretty straight forward translation for COFF, we just need to
stick the function in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204565 91177308-0d34-0410-b5e6-96231b3b80d8
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204559 91177308-0d34-0410-b5e6-96231b3b80d8
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204543 91177308-0d34-0410-b5e6-96231b3b80d8
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.
Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204536 91177308-0d34-0410-b5e6-96231b3b80d8
We make sure a spill is not hoisted to a hotter outer loop by adding
a condition. Hoist a spill to outer loop if there are multiple dependents
(it can be beneficial if more than one dependents are hoisted) or
if DepSV (the hoisting source) is hotter than SV (the hoisting destination).
rdar://16268194
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204522 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, only regular AArch64 instructions were annotated with SchedRW lists.
This patch does the same for NEON enabling these instructions to be scheduled by
the MIScheduler. Additionally, store operations are now modeled and a few
SchedRW lists were updated for bug fixes (e.g. multiple def operands).
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204505 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
VECTOR_SHUFFLE concatenates the vectors in an vectorwise fashion.
<0b00, 0b01> + <0b10, 0b11> -> <0b00, 0b01, 0b10, 0b11>
VSHF concatenates the vectors in a bitwise fashion:
<0b00, 0b01> + <0b10, 0b11> ->
0b0100 + 0b1110 -> 0b01001110
<0b10, 0b11, 0b00, 0b01>
We must therefore swap the operands to get the correct result.
The test case that discovered the issue was MultiSource/Benchmarks/nbench.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3142
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204480 91177308-0d34-0410-b5e6-96231b3b80d8
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204435 91177308-0d34-0410-b5e6-96231b3b80d8
.data_region is only used in Darwin, so it shouldn't be generated
for other OS. Currently AArch64 doesn't support darwin yet, so
I removed it from AArch64. When Darwin is supported someday, we can
add it back and associate it with Darwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204424 91177308-0d34-0410-b5e6-96231b3b80d8
Sicne MBB->computeRegisterLivenes() returns Dead for sub regs like s0,
d0 is used in vpop instead of updating sp, which causes s0 dead before
its use.
This patch checks the liveness of each subreg to make sure the reg is
actually dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204411 91177308-0d34-0410-b5e6-96231b3b80d8
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204389 91177308-0d34-0410-b5e6-96231b3b80d8
The Octeon cpu from Cavium Networks is mips64r2 based and has an extended
instruction set. In order to utilize this with LLVM, a new cpu feature "octeon"
and a subtarget feature "cnmips" is added. A small set of new instructions
(baddu, dmul, pop, dpop, seq, sne) is also added. LLVM generates dmul, pop and
dpop instructions with option -mcpu=octeon or -mattr=+cnmips.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204337 91177308-0d34-0410-b5e6-96231b3b80d8
It isn't actually used now, and probably never will be, plus it makes
tests less annoying. I also think SC prints GDS instructions as a
separate instruction name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204270 91177308-0d34-0410-b5e6-96231b3b80d8
The "noduplicate" function attribute exists to prevent certain optimizations
from duplicating calls to the function. This is important on platforms where
certain function call duplications are unsafe (for example execution barriers
for CUDA and OpenCL).
This patch makes it possible to specify intrinsics as "noduplicate" and
translates that to the appropriate function attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204200 91177308-0d34-0410-b5e6-96231b3b80d8
For functions where esi is used as base pointer, we would previously fall back
from lowering memcpy with "rep movs" because that clobbers esi.
With this patch, we just store esi in another physical register, and restore
it afterwards. This adds a little bit of register preassure, but the more
efficient memcpy should be worth it.
Differential Revision: http://llvm-reviews.chandlerc.com/D2968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204174 91177308-0d34-0410-b5e6-96231b3b80d8
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid. We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd. This fixes
that silly error.
This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax. I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid. There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware. (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204155 91177308-0d34-0410-b5e6-96231b3b80d8