Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable
macro-fusion.
<rdar://problem/15680770>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212747 91177308-0d34-0410-b5e6-96231b3b80d8
passes in the mips back end. This, unfortunately, required a
bit of churn in the various predicates to use a pointer rather
than a reference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212744 91177308-0d34-0410-b5e6-96231b3b80d8
Remove a default label which covered no enumerators, replace it with a
llvm_unreachable.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212729 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the AsmParser to accept some logical+immediate
instructions and convert them as shown:
bic Rd, Rn, #imm -> and Rd, Rn, #~imm
bics Rd, Rn, #imm -> ands Rd, Rn, #~imm
orn Rd, Rn, #imm -> orr Rd, Rn, #~imm
eon Rd, Rn, #imm -> eor Rd, Rn, #~imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers. For example, the bic construct is used by the linux kernel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212722 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When -mno-odd-spreg is in effect, 32-bit floating point values are not
permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit
floating point comparison results from being written to odd registers.
This option has three purposes:
* It allows support for certain MIPS implementations such as loongson-3a that
do not allow the use of odd registers for single precision arithmetic.
* When using -mfpxx, -mno-odd-spreg is the default and this allows us to
statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1
instructions to/from odd registers are guaranteed not to appear for any
reason. Once this has been established, the user can then re-enable
-modd-spreg to regain the use of all 32 single-precision registers.
* When using -mfp64 and -mno-odd-spreg together, an O32 extension named
O32 FP64A is used as the ABI. This is intended to provide almost all
functionality of an FR=1 processor but can also be executed on a FR=0 core
with the assistance of a hardware compatibility mode which emulates FR=0
behaviour on an FR=1 processor.
* Added '.module oddspreg' and '.module nooddspreg' each of which update
the .MIPS.abiflags section appropriately
* Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller
doesn't have to remember to do it.
* MipsABIFlags now calculates the flags1 and flags2 member on demand rather
than trying to maintain them in the same format they will be emitted in.
There is one portion of the -mfp64 and -mno-odd-spreg combination that is not
implemented yet. Moves to/from odd-numbered double-precision registers must not
use mtc1. I will fix this in a follow-up.
Differential Revision: http://reviews.llvm.org/D4383
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212717 91177308-0d34-0410-b5e6-96231b3b80d8
There's no real need to have Shift as a separate format type from Binary.
The comments for other format types were too specific and in some cases
no longer accurate.
Just a clean-up, no behavioral change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212707 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.
This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212705 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions aren't used for codegen since the original L*DB instructions
are suitable for fround.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212703 91177308-0d34-0410-b5e6-96231b3b80d8
Immediate fields that have no natural MVT type tended to use i8 if the
field was small enough. This was a bit confusing since i8 isn't a legal
type for the target. Fields for short immediates in a 32-bit or 64-bit
operation use i32 or i64 instead, so it would be better to do the same
for all fields.
No behavioral change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212702 91177308-0d34-0410-b5e6-96231b3b80d8
The dwarf FPR numbers are supposed to have the order F0, F2, F4, F6,
F1, F3, F5, F7, F8, etc., which matches the pairing of registers for
long doubles. E.g. a long double stored in F0 is paired with F2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212701 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.
Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
Come to think of it, this one could test for the common case of 'C'
being a SETCC too.
Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4389
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212697 91177308-0d34-0410-b5e6-96231b3b80d8
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.
This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212695 91177308-0d34-0410-b5e6-96231b3b80d8
that splat i8s into i16s.
Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.
With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212692 91177308-0d34-0410-b5e6-96231b3b80d8
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.
This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.
There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212680 91177308-0d34-0410-b5e6-96231b3b80d8
don't need to set it manually.
This is based on feedback from Tom who pointed out that if every target
needs to handle this we need to reach out to those maintainers. In fact,
it doesn't make sense to duplicate everything when anything other than
expand seems unlikely at this stage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212661 91177308-0d34-0410-b5e6-96231b3b80d8
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.
rdar://17594379
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212638 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212634 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow up to r212492. There should be no functional difference, but
this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be
an i8/i16/i32/i64.
rdar://17516686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212633 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.
Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212611 91177308-0d34-0410-b5e6-96231b3b80d8
vector types to be legal and a ZERO_EXTEND node is encountered.
When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.
This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.
It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.
There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.
However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.
Differential Revision: http://reviews.llvm.org/D4405
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6.
Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4269
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212605 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
RET, and RET_MM have been replaced by a pseudo named PseudoReturn.
In addition a version with a 64-bit GPR named PseudoReturn64 has been
added.
Instruction selection for a return matches RetRA, which is expanded post
register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter,
this PseudoReturn/PseudoReturn64 are emitted as:
- (JALR64 $zero, $rs) on MIPS64r6
- (JALR $zero, $rs) on MIPS32r6
- (JR_MM $rs) on microMIPS
- (JR $rs) otherwise
On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid
development and review (specifically, to ensure all cases of jr are
updated), these aliases are temporarily named 'r6.jr' instead of 'jr'.
A follow up patch will change them back to the correct mnemonic.
Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect
jump, and removed it from its definition of a call.
Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's
doesn't appear to account for any MIPS64-specifics.
The return instruction created as part of eh_return expansion is now expanded
using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6
('jalr $zero, $rs').
Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in
expandEhReturn().
Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4268
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212604 91177308-0d34-0410-b5e6-96231b3b80d8
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.
The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.
This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
(where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
a correct check of whether the splat is the only user of a loaded
value, checking that the splat actually crosses multiple lanes before
using a broadcast, and handling broadcasts of non-constant splats.
As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.
rdar://17594379
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212573 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).
This is in line with the common-code and ARM back-end change
implemented in r212433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212547 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow on to r212519 to improve the encapsulation and limit the scope of the enums.
Also merged two very similar parser functions, fixed a bug where ASE's
were not being reported, and marked CPR1's as being 128-bit when MSA is
enabled.
Differential Revision: http://reviews.llvm.org/D4384
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212522 91177308-0d34-0410-b5e6-96231b3b80d8
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.
This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.
First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.
Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.
Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.
But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212517 91177308-0d34-0410-b5e6-96231b3b80d8
As destination k0 is allowed but not as predicate/writemask.
I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212504 91177308-0d34-0410-b5e6-96231b3b80d8
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.
Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.
define <4 x float> @test1(<4 x float> %V) {
%1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
%2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
ret <4 x float> %2
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212498 91177308-0d34-0410-b5e6-96231b3b80d8
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.
This fixes <rdar://problem/17549300>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212493 91177308-0d34-0410-b5e6-96231b3b80d8
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:
typedef unsigned int uint128_t __attribute__((mode(TI)));
typedef int sint128_t __attribute__((mode(TI)));
This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.
Tests included.
rdar://17516686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212492 91177308-0d34-0410-b5e6-96231b3b80d8
According to a FIXME in ARMMCTargetDesc.cpp the ARM version parsing should be
in the Triple helper class.
Patch by: Gabor Ballabas
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212479 91177308-0d34-0410-b5e6-96231b3b80d8