Synthesizing a call directly using the MI layer would confuse the frame
lowering code. This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230129 91177308-0d34-0410-b5e6-96231b3b80d8
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.
There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230118 91177308-0d34-0410-b5e6-96231b3b80d8
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.
This fixes PR22572.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230113 91177308-0d34-0410-b5e6-96231b3b80d8
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230072 91177308-0d34-0410-b5e6-96231b3b80d8
This enables a few useful combines that used to only
use fma.
Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230071 91177308-0d34-0410-b5e6-96231b3b80d8
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230053 91177308-0d34-0410-b5e6-96231b3b80d8
changes to remove non-Function based subtargets out of the asm
printer. For module level emission we'll need to construct up
an MCSubtargetInfo so that we can encode instructions for
emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230050 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.
Differential Revision: http://reviews.llvm.org/D7673
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230043 91177308-0d34-0410-b5e6-96231b3b80d8
EmitFunctionStubs is called from doFinalization and so can't
depend on the Subtarget existing. It's also irrelevant as
we know we're darwin since we're in the darwin asm printer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230039 91177308-0d34-0410-b5e6-96231b3b80d8
This canonicalization step saves us 3 pattern matching possibilities * 4 math ops
for scalar FP math that uses xmm regs. The backend can re-commute the operands
post-instruction-selection if that makes register allocation better.
The tests in llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll cover this scenario already,
so there are no new tests with this patch.
Differential Revision: http://reviews.llvm.org/D7777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230024 91177308-0d34-0410-b5e6-96231b3b80d8
the wrong answer. We also got initializer lists which are *way* cleaner
for this kind of thing. Let's use those and make this a normal, boring
functionn accepting ArrayRef.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230004 91177308-0d34-0410-b5e6-96231b3b80d8
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.
I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.
Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229966 91177308-0d34-0410-b5e6-96231b3b80d8
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]
Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.
In case you aren't using a diff algorithm that can handle this:
X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229964 91177308-0d34-0410-b5e6-96231b3b80d8
is going well, remove the flag and the code for the old legality tests.
This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229963 91177308-0d34-0410-b5e6-96231b3b80d8
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.
My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229958 91177308-0d34-0410-b5e6-96231b3b80d8
The instructions were being generated on architectures that don't support avx512.
This reverts commit r229837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229942 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.
The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.
However, for ARMISD nodes, we just used the MMO alignment, no matter
what. In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.
And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).
To fix this, we can just mirror the hack done for ISD nodes: only
take into account the MMO alignment when the access is overaligned.
Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
rdar://19717869, rdar://14062261.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229932 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for a future patch:
- rename isLoad to isLoadOp: the former is confusing, and can be taken
to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.)
- change formatting here and there.
- add some comments.
- const-ify bools.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229929 91177308-0d34-0410-b5e6-96231b3b80d8
systematic lowering of v8i16.
This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.
In some cases, this is yet another significant improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229859 91177308-0d34-0410-b5e6-96231b3b80d8
addition to lowering to trees rooted in an unpack.
This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229856 91177308-0d34-0410-b5e6-96231b3b80d8
terribly complex partial blend logic.
This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.
This does expose one (minor) weakness of the unpack lowering that I'll
try to address.
In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229853 91177308-0d34-0410-b5e6-96231b3b80d8
needed, and significantly improve the SSSE3 path.
This makes the new strategy much more clear. If we can blend, we just go
with that. If we can't blend, we try to permute into an unpack so
that we handle cases where the unpack doing the blend also simplifies
the shuffle. If that fails and we've got SSSE3, we now call into
factored-out pshufb lowering code so that we leverage the fact that
pshufb can set up a blend for us while shuffling. This generates great
code, especially because we *know* we don't have a fast blend at this
point. Finally, we fall back on decomposing into permutes and blends
because we do at least have a bit-math-based blend if we need to use
that.
This pretty significantly improves some of the v8i16 code paths. We
never need to form pshufb for the single-input shuffles because we have
effective target-specific combines to form it there, but we were missing
its effectiveness in the blends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229851 91177308-0d34-0410-b5e6-96231b3b80d8
them into permutes and a blend with the generic decomposition logic.
This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.
With this we're down to only really random permutations that cause an
explosion of instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8