Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list.
Differential Revision: http://reviews.llvm.org/D6620
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224246 91177308-0d34-0410-b5e6-96231b3b80d8
PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted
truncations and extensions when dealing with nodes of the form:
zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
There was a FIXME in the implementation (now removed) regarding the fact that
the function would abort the transformations if any of the non-output operands
of a SELECT or SELECT_CC node would need to be promoted (because they were
also output operands, for example). As a result, we continued to generate
unnecessary zero-extends for code such as this:
unsigned foo(unsigned a, unsigned b) {
return (a <= b) ? a : b;
}
which would produce:
cmplw 0, 3, 4
isel 3, 4, 3, 1
rldicl 3, 3, 0, 32
blr
and now we produce:
cmplw 0, 3, 4
isel 3, 4, 3, 1
blr
which is better in the obvious way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224213 91177308-0d34-0410-b5e6-96231b3b80d8
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.
Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
Differential Revision: http://reviews.llvm.org/D6585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224203 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224198 91177308-0d34-0410-b5e6-96231b3b80d8
options.
This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.
This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.
#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));
rdar://problem/19212448
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224186 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all
of the usual places, but also from the ABI, which specifies that values passed
are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined
to do *something* to the higher-order bits, and for some instructions, that
action clears those bits (thus providing a zero-extended result). This is
especially common after rotate-and-mask instructions. Adding an additional
instruction to zero-extend the results of these instructions is unnecessary.
This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and
looks back through their operands to see if all instructions will implicitly
zero extend their results. If so, we convert these instructions to their 64-bit
variants (which is an internal change only, the actual encoding of these
instructions is the same as the original 32-bit ones) and remove the
unnecessary zero-extension (changing where the INSERT_SUBREG instructions are
to make everything internally consistent).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224169 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This commit enables the MIPS-III target and adds support for code
generation of SELECT nodes. We have to use pseudo-instructions with
custom inserters for these nodes as MIPS-III CPUs do not have
conditional-move instructions.
Depends on D6212
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224128 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224126 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For Mips targets that do not have conditional-move instructions, ie. targets
before MIPS32 and MIPS-IV, we have to insert a diamond control-flow
pattern in order to support SELECT nodes. In order to do that, we add
pseudo-instructions with a custom inserter that emits the necessary
control-flow that selects the correct value.
With this patch we add complete support for code generation of Mips-II targets
based on the LLVM test-suite.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6212
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224124 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.
The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.
This patch fixes the issue in MISched and provides a test.
Patch by Russell Gallop!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224118 91177308-0d34-0410-b5e6-96231b3b80d8
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
supported, there is not a user switch to change this behaviour. This build
attribute should capture the default behaviour of the compiler, which is to
expose the IEEE 754 version of __fp16.
When -mfp16-format is emitted, that will be the way to control the value of
this build attribute.
Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224115 91177308-0d34-0410-b5e6-96231b3b80d8
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224094 91177308-0d34-0410-b5e6-96231b3b80d8
This is nice for the instruction patterns, but it complicates
min / max matching. The select doesn't have the correct type and would
require looking through the bitcasts for the real float operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224092 91177308-0d34-0410-b5e6-96231b3b80d8
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224084 91177308-0d34-0410-b5e6-96231b3b80d8
If we have an add (or an or that is really an add), where one operand is a
FrameIndex and the other operand is a small constant, we can combine the
lowering of the FrameIndex (which is lowered as an add of the FI and a zero
offset) with the constant operand.
Amusingly, this is an old potential improvement entry from
lib/Target/PowerPC/README.txt which had never been resolved. In short, we used
to lower:
%X = alloca { i32, i32 }
%Y = getelementptr {i32,i32}* %X, i32 0, i32 1
ret i32* %Y
as:
addi 3, 1, -8
ori 3, 3, 4
blr
and now we produce:
addi 3, 1, -4
blr
which is much more sensible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224071 91177308-0d34-0410-b5e6-96231b3b80d8
PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using
srawi/addze, but did not implement the i64 case. DAGCombine now contains a
callback specifically designed for this purpose (BuildSDIVPow2), and part of
the logic has been moved to an implementation of that callback. Doing this
lowering using BuildSDIVPow2 likely does not matter, compared to handling
everything in PPCISelDAGToDAG, for the positive divisor case, but the negative
divisor case, which generates an additional negation, can potentially benefit
from additional folding from DAGCombine. Now, both the i32 and the i64 cases
have been implemented.
Fixes PR20732.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224033 91177308-0d34-0410-b5e6-96231b3b80d8
Quite a major error here: the expansions for the Pseudos with and without
folded load were mixed up. Fortunately it only affects ARM-mode, when not using
movw/movt, on Darwin. I'm guessing no-one actually uses that combination.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223986 91177308-0d34-0410-b5e6-96231b3b80d8
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.
To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.
Reviewed by Tim Northover
Fixes rdar://problem/16572564.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223941 91177308-0d34-0410-b5e6-96231b3b80d8
EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash.
This adds a check to make this path fire only for 128-bit vectors.
Differential Revision: http://reviews.llvm.org/D6579
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223922 91177308-0d34-0410-b5e6-96231b3b80d8
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
Differential Revision: http://reviews.llvm.org/D6585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223862 91177308-0d34-0410-b5e6-96231b3b80d8
It was missing from the VLD1/VST1 handling logic, even though the
corresponding instructions exist (same form as v2i64).
In preparation for a future patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223832 91177308-0d34-0410-b5e6-96231b3b80d8
The load/store value type is currently not available when lowering the memcpy
intrinsic. Add the missing nullptr check to support this in 'computeAddress'.
Fixes rdar://problem/19178947.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223818 91177308-0d34-0410-b5e6-96231b3b80d8
Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
Added lowering tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223804 91177308-0d34-0410-b5e6-96231b3b80d8
With the foregoing three patches, VSX instructions can be used for
little endian. This patch removes the restriction that prevented
this, and re-enables the test cases from the first three patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223792 91177308-0d34-0410-b5e6-96231b3b80d8
When performing instruction selection for ISD::VECTOR_SHUFFLE, there
is special code for handling v2f64 and v2i64 using VSX instructions.
This code must be adjusted for little-endian. Because the two inputs
are treated as a double-wide register, we must swap their order for
little endian. To get the appropriate mask elements to use with the
big-endian biased XXPERMDI instruction, we must reverse their order
and invert the bits.
A new test is added to test the 16 possible values of the shuffle
mask. It is initially disabled for reasons specified in the test. It
is re-enabled by patch 4/4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223791 91177308-0d34-0410-b5e6-96231b3b80d8
This optimization transforms code like:
bb1:
%0 = icmp ne i32 %a, 0
%1 = icmp ne i32 %b, 0
%or.cond = or i1 %0, %1
br i1 %or.cond, label %TrueBB, label %FalseBB
into a multiple branch instructions like:
bb1:
%0 = icmp ne i32 %a, 0
br i1 %0, label %TrueBB, label %bb2
bb2:
%1 = icmp ne i32 %b, 0
br i1 %1, label %TrueBB, label %FalseBB
This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.
Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.
Reviewed by Jim Grosbach
This fixes rdar://problem/19034919.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223786 91177308-0d34-0410-b5e6-96231b3b80d8
This patch addresses the inherent big-endian bias in the lxvd2x,
lxvw4x, stxvd2x, and stxvw4x instructions. These instructions load
vector elements into registers left-to-right (with the first element
loaded into the high-order bits of the register), regardless of the
endian setting of the processor. However, these are the only
vector memory instructions that permit unaligned storage accesses, so
we want to use them for little-endian.
To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
followed by an xxswapd, which swaps the doublewords. This works for
lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
vector elements are in LE order (right-to-left) within each
doubleword. (Thus after lxvw2x of a <4 x float> the elements will
appear as 1, 0, 3, 2. Following the swap, they will appear as 3, 2,
0, 1, as desired.) For stores, an stxvd2x or stxvw4x is replaced
with an stxvd2x preceded by an xxswapd.
Introduction of extra swap instructions provides correctness, but
obviously is not ideal from a performance perspective. Future patches
will address this with optimizations to remove most of the introduced
swaps, which have proven effective in other implementations.
The introduction of the swaps is performed during lowering of LOAD,
STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations. The latter
are used to translate intrinsics that specify the VSX loads and stores
directly into equivalent sequences for little endian. Thus code that
uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
ported from BE to LE.
We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
use during this lowering step. In PPCInstrVSX.td, we add new SDType
and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
These are recognized during instruction selection and mapped to the
correct instructions.
Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
to disable VSX on LE variants because code generation changes with
this and subsequent patches in this set. I chose to include all of
these in the first patch than try to rigorously sort out which tests
were broken by one or another of the patches. Sorry about that.
The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
are disabled until LE support is enabled because of breakages that
occur as noted in those tests. They are re-enabled in patch 4/4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223783 91177308-0d34-0410-b5e6-96231b3b80d8
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.
If anyone wants to clean up the 32-bit only x86 CPUs, go for it.
Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223774 91177308-0d34-0410-b5e6-96231b3b80d8
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.
Differential Revision: http://reviews.llvm.org/D6503
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223757 91177308-0d34-0410-b5e6-96231b3b80d8
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.
Fixes PR21452.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223727 91177308-0d34-0410-b5e6-96231b3b80d8