This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.
Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown
Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup
Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205658 91177308-0d34-0410-b5e6-96231b3b80d8
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205649 91177308-0d34-0410-b5e6-96231b3b80d8
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205630 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the
processor which are used to select between the 1985 and 2008 versions of IEEE
754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid
operation exceptions when given NaN), in 2008 mode they are non-arithmetic
(i.e. they are copies).
nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because
the ISA spec does not explicitly state that they obey Has2008 and ABS2008.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3274
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205628 91177308-0d34-0410-b5e6-96231b3b80d8
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.
AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205626 91177308-0d34-0410-b5e6-96231b3b80d8
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.
Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.
Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).
Should fix PR19335.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205625 91177308-0d34-0410-b5e6-96231b3b80d8
Removed "GNU Assembler extension (compatibility)" definitions from ARMInstrInfo.td
Fixed ARMAsmParser::ParseInstruction GNU compatability branch, so it also works for thumb mode from now.
Added new tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205622 91177308-0d34-0410-b5e6-96231b3b80d8
Sorry for the breakage.
For now, it will fail in two ways:
1. To fail for targeting x86_64-mingw32.
<stdin>:131:8: note: possible intended match here
0x30830a0100000002 3 0 1 0 0 is_stmt
2. To fail not to find the target x86.
llc: : error: unable to get target for 'x86_64-unknown-unknown',
see --version and --triple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205621 91177308-0d34-0410-b5e6-96231b3b80d8
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.
It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.
It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.
This should also fix PR19331.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205616 91177308-0d34-0410-b5e6-96231b3b80d8
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.
Should fix PR19332.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205615 91177308-0d34-0410-b5e6-96231b3b80d8
recoloring cut-offs are encountered and register allocation failed.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205601 91177308-0d34-0410-b5e6-96231b3b80d8
I am not sure how to get a relocation in a .dylib, but this function would
return the wrong value if passed one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205592 91177308-0d34-0410-b5e6-96231b3b80d8
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.
Patch by Jingyue Wu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205571 91177308-0d34-0410-b5e6-96231b3b80d8
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
%X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32
These instructions are live, and their definitions should not be rewritten.
Fixes <rdar://problem/16492408>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205565 91177308-0d34-0410-b5e6-96231b3b80d8
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4
Fixes Arithm/Absdiff.Mat/3 OpenCV test
Patch by: Bruno Jiménez
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205562 91177308-0d34-0410-b5e6-96231b3b80d8
these is very much off and is more than just the branch
from this bug incorrect:
Address Line Column File ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x30830a0100000002 3 0 1 0 0 is_stmt
0x30830a0100000008 3 0 1 0 0 is_stmt end_sequence
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205551 91177308-0d34-0410-b5e6-96231b3b80d8
More updating of tests to be explicit about the target triple rather than
relying on the default target triple supporting ARM mode.
Indicate to lit that object emission is not yet available for Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205545 91177308-0d34-0410-b5e6-96231b3b80d8