This is similar to the -unroll-threshold option. There should be no change in
behavior when -tail-dup-size is not explicit on the llc command line.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124564 91177308-0d34-0410-b5e6-96231b3b80d8
This happens all the time when a smul is promoted to a larger type.
On x86-64 we now compile "int test(int x) { return x/10; }" into
movslq %edi, %rax
imulq $1717986919, %rax, %rax
movq %rax, %rcx
shrq $63, %rcx
sarq $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
addl %ecx, %eax
This fires 96 times in gcc.c on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124559 91177308-0d34-0410-b5e6-96231b3b80d8
This happens e.g. for code like "X - X%10" where we lower the modulo operation
to a series of multiplies and shifts that are then subtracted from X, leading to
this missed optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124532 91177308-0d34-0410-b5e6-96231b3b80d8
rdar://problem/8893967: JM/lencod miscompile at -arch armv7 -mthumb -O3
Added ResurrectKill to remove kill flags after we decide to reused a
physical register. And (hopefully) ensure that we call it in all the
right places.
Sorry, I'm not checking in a unit test given that it's a miscompile I
can't reproduce easily with a toy example. Failures in the rewriter
depend on a series of heuristic decisions maked during one of the many
upstream phases in codegen. This case would require coercing regalloc
to generate a couple of rematerialzations in a way that causes the
scavenger to reuse the same register at just the wrong point.
The general way to test this is to implement kill flags
verification. Then we could have a simple, robust compile-only unit
test. That would be worth doing if the whole pass was not about to
disappear. At this point we focus verification work on the next
generation of regalloc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124442 91177308-0d34-0410-b5e6-96231b3b80d8
Linear scan regalloc is currently assuming that any register aliased with
a member of a regclass must also be in at least one regclass. That is not
always true. For example, for X86, RIP is in a regclass but IP is not.
If you're unlucky, this can cause a crash by invalidating the iterator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124365 91177308-0d34-0410-b5e6-96231b3b80d8
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR. Eventually this
will get matched to VINSERTF128 if AVX is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124307 91177308-0d34-0410-b5e6-96231b3b80d8
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124292 91177308-0d34-0410-b5e6-96231b3b80d8
clang's -Wuninitialized-experimental warning.
While these don't look like real bugs, clang's
-Wuninitialized-experimental analysis is stricter
than GCC's, and these fixes have the benefit
of being general nice cleanups.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124073 91177308-0d34-0410-b5e6-96231b3b80d8
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123971 91177308-0d34-0410-b5e6-96231b3b80d8
flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123969 91177308-0d34-0410-b5e6-96231b3b80d8
The value mapping gets confused about which original values have multiple new
definitions so they may need phi insertions.
This could probably be simplified by letting enterIntvBefore() take a live range
to be added following the instruction. As long as the range stays inside the
same basic block, value mapping shouldn't be a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123926 91177308-0d34-0410-b5e6-96231b3b80d8
to add/sub by doing the normal operation and then checking for overflow
afterwards. This generally relies on the DAG handling the later invalid
operations as well.
Fixes the 64-bit part of rdar://8622122 and rdar://8774702.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123908 91177308-0d34-0410-b5e6-96231b3b80d8
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123905 91177308-0d34-0410-b5e6-96231b3b80d8
Added a check for already live regs before claiming HighRegPressure.
Fixed a few cases of checking the wrong number of successors.
Added some tracing until these heuristics are better understood.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123892 91177308-0d34-0410-b5e6-96231b3b80d8