Commit Graph

12043 Commits

Author SHA1 Message Date
Matt Arsenault
6f485c0bc5 R600/SI: Fix fmin_legacy / fmax_legacy matching for SI
select_cc is expanded on SI, so this was never matched.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221941 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 23:03:09 +00:00
Chandler Carruth
a5408b9c7c [x86] Add some tests for specific patterns of lane-flips combined with
in-lane shuffles that aren't always handled well by the current vector
shuffle lowering.

No functionality change yet, that will follow in a subsequent commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221938 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 22:49:44 +00:00
Juergen Ributzka
add7c56be5 [FastISel][AArch64] Don't bail during simple GEP instruction selection.
The generic FastISel code would bail, because it can't emit a sign-extend for
AArch64. This copies the code over and uses AArch64 specific emit functions.

This is not ideal and 'computeAddress' should handles this, so it can fold the
address computation into the memory operation.

I plan to clean up 'computeAddress' anyways, so I will add that in a future
commit.

Related to rdar://problem/18962471.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 20:50:44 +00:00
Matt Arsenault
01ab7a869d R600/SI: Use s_movk_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 20:44:23 +00:00
Matt Arsenault
60c3acb36c R600: Fix assert on empty function
If a function is just an unreachable, this would hit a
"this is not a MachO target" assertion because of setting
HasSubsectionViaSymbols.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221920 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 20:07:40 +00:00
Matt Arsenault
8082990487 R600: Error on initializer for LDS.
Also give a proper error for other address spaces.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221917 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 19:56:13 +00:00
Matt Arsenault
b44e43623d R600/SI: Get rid of FCLAMP_SI pseudo
It's not necessary. Also use complex patterns to allow
src modifier usage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221916 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 19:49:04 +00:00
Matt Arsenault
e59f9f46f7 R600/SI: Allow commuting with src2_modifiers
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221911 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 19:26:50 +00:00
Matt Arsenault
1aae959de7 R600/SI: Allow commuting some 3 op instructions
e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c

This simplifies matching v_madmk_f32.

This looks somewhat surprising, but it appears to be
OK to do this. We can commute src0 and src1 in all
of these instructions, and that's all that appears
to matter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221910 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 19:26:47 +00:00
Tim Northover
8bca5de6a9 ARM: allow constpool entry to be moved to the user's block in all cases.
Normally entries can only move to a lower address, but when that wasn't viable,
the user's block was considered anyway. Unfortunately, it went via
createNewWater which wasn't designed to handle the case where there's already
an island after the block.

Unfortunately, the test we have is slow and fragile, and I couldn't reduce it
to anything sane even with the @llvm.arm.space intrinsic. The test change here
is recreating the previous one after the change.

rdar://problem/18545506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221905 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 17:58:53 +00:00
Tim Northover
064da63fcb ARM: avoid duplicating branches during constant islands.
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221904 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 17:58:51 +00:00
Tim Northover
5bd311bf17 ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.
Creating tests for the ConstantIslands pass is very difficult, since it depends
on precise layout details. Having the ability to precisely inject a number of
bytes into the stream helps greatly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221903 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 17:58:48 +00:00
Elena Demikhovsky
18e1185ddf AVX-512: SINT_TO_FP cost model and some bugfixes
Checked some corner cases, for example translation
of <8 x i1> to <8 x double>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221883 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 11:46:16 +00:00
Chandler Carruth
4ea3097d08 [x86] Teach the vector shuffle lowering to make a more nuanced decision
between splitting a vector into 128-bit lanes and recombining them vs.
decomposing things into single-input shuffles and a final blend.

This handles a large number of cases in AVX1 where the cross-lane
shuffles would be much more expensive to represent even though we end up
with a fast blend at the root. Instead, we can do a better job of
shuffling in a single lane and then inserting it into the other lanes.

This fixes the remaining bits of Halide's regression captured in PR21281
for AVX1. However, the bug persists in AVX2 because I've made this
change reasonably conservative. The cases where it makes sense in AVX2
to split into 128-bit lanes are much more rare because we can often do
full permutations across all elements of the 256-bit vector. However,
the particular test case in PR21281 is an example of one of the rare
cases where it is *always* better to work in a single 128-bit lane. I'm
going to try to teach the logic to detect and form the good code even in
AVX2 next, but it will need to use a separate heuristic.

Finally, there is one pesky regression here where we previously would
craftily use vpermilps in AVX1 to shuffle both high and low halves at
the same time. We no longer pull that off, and not for any really good
reason. Ultimately, I think this is just another missing nuance to the
selection heuristic that I'll try to add in afterward, but this change
already seems strictly worth doing considering the magnitude of the
improvements in common matrix math shuffle patterns.

As always, please let me know if this causes a surprising regression for
you.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 04:06:10 +00:00
Chandler Carruth
927a5f45e0 [x86] Don't form overly fragmented blends when splitting and
re-combining shuffles because nothing was available in the wider vector
type.

The key observation (which I've put in the comments for future
maintainers) is that at this point, no further combining is really
possible. And so even though these shuffles trivially could be combined,
we need to actually do that as we produce them when producing them this
late in the lowering.

This fixes another (huge) part of the Halide vector shuffle regressions.
As it happens, this was already well covered by the tests, but I hadn't
noticed how bad some of these got. The specific patterns that turn
directly into unpckl/h patterns were occurring *many* times in common
vector processing code.

There are still more problems here sadly, but trying to incrementally
tease them apart and it looks like this is the core of the problem in
the splitting logic.

There is some chance of regression here, you can see it in the test
changes. Specifically, where we stop forming pshufb in some cases, it is
possible that pshufb was in fact faster. Intel "says" that pshufb is
slower than the instruction sequences replacing it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221852 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 02:42:08 +00:00
Quentin Colombet
e8a8deab8c [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.
Prior to this patch the TypePromotionHelper was promoting only sign extensions.
Supporting zero extensions changes:
- How constants are extended.
- How sign extensions, zero extensions, and truncate are composed together.
- How the type of the extended operation is recorded. Now we need to know the
  kind of the extension as well as its type.

Each change is fairly small, unlike the diff.
Most of the diff are comments/variable renaming to say "extension" instead of
"sign extension".

The performance improvements on the test suite are within the noise.

Related to <rdar://problem/18310086>.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221851 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 01:44:51 +00:00
Juergen Ributzka
9bb95ddae4 [FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value.
Optimize selects of i1 in the presence of 'true' and 'false' operands to simple
logic operations.

This fixes rdar://problem/18960150.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221848 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 00:36:46 +00:00
Juergen Ributzka
b80d6be6d7 [FastISel][AArch64] Fold the cmp into the select when possible.
This folds the compare emission into the select emission when possible, so we
can directly use the flags and don't have to emit a separate compare.

Related to rdar://problem/18960150.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221847 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 00:36:43 +00:00
Juergen Ributzka
8d6824ea4c [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.
Related to rdar://problem/18960150.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221846 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 00:36:38 +00:00
Sanjay Patel
dab91bcc3a Expose the number of Newton-Raphson iterations applied to the hardware's reciprocal estimate as a parameter (x86).
This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385.

This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but
we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT
sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221819 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 21:39:01 +00:00
Cameron McInally
be30336912 [AVX512] Add integer shift by immediate intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 19:58:54 +00:00
Justin Hibbits
86a3cdff6b Revert part of the PIC tests (TLS part)
This change actually wasn't warranted for -O0, and the new changes prove it and
break the build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221793 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 16:50:15 +00:00
Justin Hibbits
5a3eb1d38c Fix thet tests.
I seem to have missed the update I made for changing 'flag_pic' to "PIC Level".
Mea culpa.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221792 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 16:40:00 +00:00
Justin Hibbits
fcd08c294a Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first.  With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI.  This
generates more optimal code, for shared libraries with less than about 16380
data objects.

Test Plan: Test cases added or updated

Reviewers: joerg, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, mcrosier, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D5399

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 15:16:30 +00:00
Zoran Jovanovic
cb5fadfe6a [mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for microMIPS instructions
Differential Revision: http://reviews.llvm.org/D6198


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221780 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 13:30:10 +00:00
Chandler Carruth
556578ec0c [x86] Start improving the matching of unpck instructions based on test
cases from Halide folks. This initial step was extracted from
a prototype change by Clay Wood to try and address regressions found
with Halide and the new vector shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221779 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 10:05:18 +00:00
Chandler Carruth
3baea18935 [x86] Clean up a bunch of vector shuffle tests with my script. Notably,
removes windows line endings and other noise. This is in prelude to
making substantive changes to these tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221776 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 09:17:15 +00:00
Elena Demikhovsky
5f9c438577 AVX-512: Intrinsics for ERI
3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms.
Intrinsics include SAE (Suppres All Exceptions) parameter.

http://reviews.llvm.org/D6214



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221774 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 07:31:03 +00:00
Bill Schmidt
fc22bfd921 [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 04:19:40 +00:00
Juergen Ributzka
4f0d671b97 [FastISel][AArch64] Add support for fabs intrinsic.
Lower the llvm.fabs intrinsic to the 'fabs' MI instruction.

This fixes rdar://problem/18946552.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221729 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 23:10:44 +00:00
Tom Roeder
63dea2c952 Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221708 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 21:08:02 +00:00
Sanjay Patel
e7c966f067 Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 20:51:00 +00:00
Rafael Espindola
612f7d7e00 Simplify testcase. NFC.
Thanks to Filipe Cabecinhas for the tip.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 20:49:16 +00:00
Bill Schmidt
10161a0cce [PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress.  Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time.  I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct.  Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results.  Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.

The proper thing to do is to lower these into calls in the first
place.  This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation.  This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.

The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall().  In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information.  This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall.  We change the call opcode to CALL_TLS for this case.  Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.

During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes.  This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.

Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.

There are existing TLS tests to verify the changes haven't messed
anything up).  I've added one new test that verifies that the problem
with the original code has been fixed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221703 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 20:44:09 +00:00
Rafael Espindola
71c70733b7 Use a 8 bit immediate when possible.
This fixes pr21529.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221700 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 19:46:36 +00:00
Dario Domizioli
949d328bee [X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access
The ISel lowering for global TLS access in PIC mode was creating a pseudo 
instruction that is later expanded to a call, but the code was not 
setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack 
flag. This caused some functions to be mistakenly recognized as leaf functions,
and this in turn affected the decision to eliminate the frame pointer.

With the fix, hasCalls is properly set and the leaf frame pointer is correctly
preserved.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221695 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 18:44:49 +00:00
Oliver Stannard
659b1491b8 LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 17:36:01 +00:00
Vasileios Kalintiris
328bc2f89e [mips] Add preliminary support for the MIPS II target.
Summary:
This patch enables code generation for the MIPS II target. Pre-Mips32
targets don't have the MUL instruction, so we add the correspondent
pattern that uses the MULT/MFLO combination in order to retrieve the
product.

This is WIP as we don't support code generation for select nodes due to
the lack of conditional-move instructions.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221686 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 11:43:55 +00:00
Andrea Di Biagio
d6548ad013 [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.
This helps the DAGCombiner to identify more opportunities to fold shuffles.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221684 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 11:20:31 +00:00
Michael Kuperstein
f2fe3b72a9 [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details.
Recommitting - This time, with a hopefully working test.

Differential Revision: http://reviews.llvm.org/D6128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221672 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 07:07:40 +00:00
Quentin Colombet
8201185d61 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221657 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 02:23:47 +00:00
Michael Kuperstein
dee48e7ad4 Reverting r221626 due to a too-strict test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221629 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 21:07:41 +00:00
Juergen Ributzka
1b9706b8c6 [AArch64][FastISel] Fix kill flags for integer extends.
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 21:05:31 +00:00
Michael Kuperstein
1a66dc7468 [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits.
See PR20494 for details.

Differential Revision: http://reviews.llvm.org/D6128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221626 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 20:40:21 +00:00
Zoran Jovanovic
c63c935a80 [mips][microMIPS] Fix issue with delay slot filler and microMIPS
Differential Revision: http://reviews.llvm.org/D6193


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221612 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 17:27:56 +00:00
Daniel Sanders
62c2faa216 [mips] Fix sret arguments for N32/N64 which were accidentally broken in r221534.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221604 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 15:57:53 +00:00
Matt Arsenault
79da624ab2 R600/SI: Fix broken check prefixes in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221565 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-08 00:02:57 +00:00
Daniel Sanders
fe2b8b1960 [mips] Promote i32 arguments to i64 for the N32/N64 ABI and fix <64-bit structs...
Summary:
... and after all that refactoring, it's possible to distinguish softfloat
floating point values from integers so this patch no longer breaks softfloat to
do it.

Remove direct handling of i32's in the N32/N64 ABI by promoting them to
i64. This more closely reflects the ABI documentation and also fixes
problems with stack arguments on big-endian targets.

We now rely on signext/zeroext annotations (already generated by clang) and
the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero
extends.

It was not possible to convert three tests to use signext/zeroext. These tests
are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a
vector type so we just accept the sign extends here for now. These tests don't
pass the vectors the same way clang does (clang puts multiple elements in the
same argument, these map 1 element to 1 argument) so we don't need to worry too
much about it.

With this patch, all known N32/N64 bugs should be fixed and we now pass the
first 10,000 tests generated by ABITest.py.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6117


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221534 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-07 16:54:21 +00:00
Ahmed Bougacha
75da31ff15 [AArch64] Keep flags on condition vreg when instantiating a CB branch.
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.

Differential Revision: http://reviews.llvm.org/D6159


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221507 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-07 02:50:00 +00:00
Simon Pilgrim
de3d50643c [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions.

Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables.

Differential Revision: http://reviews.llvm.org/D6001



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-06 22:15:41 +00:00