For now, just save the compile time since the ConvergingScheduler
heuristics don't use this analysis. We'll probably enable it later
after compile-time investigation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178822 91177308-0d34-0410-b5e6-96231b3b80d8
Pass down the fact that an operand is going to be a vector of constants.
This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178809 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.
With this change we differentiate between shifts of constants and other shifts.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.
We can efficiently support
for (i = 0 ; i < ; i += 4)
w[0:3] = v[0:3] << <2, 2, 2, 2>
but not
for (i = 0; i < ; i += 4)
w[0:3] = v[0:3] << x[0:3]
This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.
Targets can then choose to return a different cost for instructions with such
operand values.
A follow-up commit will test this feature on x86.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178807 91177308-0d34-0410-b5e6-96231b3b80d8
There is a difference for FORM_ref_addr between DWARF 2 and DWARF 3+.
Since Eric is against guarding DWARF 2 ref_addr with DarwinGDBCompat, we are
still in discussion on how to handle this.
The correct solution is to update our header to say version 4 instead of version
2 and update tool chains as well.
rdar://problem/13559431
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178806 91177308-0d34-0410-b5e6-96231b3b80d8
BCL is normally a conditional branch-and-link instruction, but has
an unconditional form (which is used in the SjLj code, for example).
To make clear that this BCL instruction definition is specifically
the special unconditional form (which does not meaningfully take
a condition-register input), rename it to BCLalways.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178803 91177308-0d34-0410-b5e6-96231b3b80d8
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178801 91177308-0d34-0410-b5e6-96231b3b80d8
It fixes following tests for Hexagon:
CodeGen/Generic/2003-07-29-BadConstSbyte.ll
CodeGen/Generic/2005-10-21-longlonggtu.ll
CodeGen/Generic/2009-04-28-i128-cmp-crash.ll
CodeGen/Generic/MachineBranchProb.ll
CodeGen/Generic/builtin-expect.ll
CodeGen/Generic/pr12507.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178794 91177308-0d34-0410-b5e6-96231b3b80d8
OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178793 91177308-0d34-0410-b5e6-96231b3b80d8
At the time when the XCore backend was added there were some issues with
with overlapping register classes but these all seem to be fixed now.
Describing the register classes correctly allow us to get rid of a
codegen only instruction (LDAWSP_lru6_RRegs) and it means we can
disassemble ru6 instructions that use registers above r11.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178782 91177308-0d34-0410-b5e6-96231b3b80d8
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.
Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.
Also process blocks in a reverse post-order and propagate high-latency
flags to successors.
<rdar://problem/13468102>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178773 91177308-0d34-0410-b5e6-96231b3b80d8
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178737 91177308-0d34-0410-b5e6-96231b3b80d8
the target system.
It was hard-coded to 4 bytes before. I can't get llvm to generate a
ref_addr on a reasonably sized testing case.
rdar://problem/13559431
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178722 91177308-0d34-0410-b5e6-96231b3b80d8
Cleaned up trailing whitespace and added extra slashes in front of a
function level comment so that it follow the convention of having 3
slashes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178712 91177308-0d34-0410-b5e6-96231b3b80d8
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.
For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.
radar://13130673
radar://13537826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
I discussed this with Bill Schmidt on IRC, and it was decided that this is a
safe and reasonable default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178659 91177308-0d34-0410-b5e6-96231b3b80d8
This patch follows up on work done by Bill Schmidt in r178277,
and replaces most of the remaining uses of VRRC in ISEL DAG patterns.
The resulting .inc files are identical except for comments, so
no change in code generation is expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178656 91177308-0d34-0410-b5e6-96231b3b80d8
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178639 91177308-0d34-0410-b5e6-96231b3b80d8
when getting the host processor information. It emits a .byte sequence on GNUC compilers to work around lack of xgetbv support with older assemblers, and resolves a comment typo found in the previous patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178636 91177308-0d34-0410-b5e6-96231b3b80d8
It's a bit of churn in the blame log, but I think there are real benefits to
the newer system so I'm making the change in one go.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178633 91177308-0d34-0410-b5e6-96231b3b80d8
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178621 91177308-0d34-0410-b5e6-96231b3b80d8
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178617 91177308-0d34-0410-b5e6-96231b3b80d8