"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116781 91177308-0d34-0410-b5e6-96231b3b80d8
The old algorithm inserted a 'rotqmbyi' instruction which was
both redundant and wrong - it made shufb select bytes from the
wrong end of the input quad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116701 91177308-0d34-0410-b5e6-96231b3b80d8
have been printed with the "S" modifier after the predicate. With ARM's
unified syntax, they are supposed to go in the other order. We fixed this
for Thumb when we switched to unified syntax but missed changing it for
ARM. Apparently we don't generate these instructions often because no one
noticed until now. Thanks to Bill Wendling for the testcase!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116563 91177308-0d34-0410-b5e6-96231b3b80d8
virtual registers for those stores since RegAllocFast requires that each live
physreg only be used once.
This fixes PR8357.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116222 91177308-0d34-0410-b5e6-96231b3b80d8
alignment for PPC32/64, avoiding some masking operations.
llvm-gcc expands vaarg inline instead of using the instruction
so it has never hit this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116168 91177308-0d34-0410-b5e6-96231b3b80d8
1. Cortex-A8 load / store multiplies can only issue on ALU0.
2. Eliminate A8_Issue, A8_LSPipe will correctly limit the load / store issues.
3. Correctly model all vld1 and vld2 variants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116134 91177308-0d34-0410-b5e6-96231b3b80d8
callee-saved registers at the end of the lists. Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases. Thanks to
Jakob for helping me realize the problem.
Most of this patch is fixing the testsuite. There are quite a few places
where we're checking for specific registers. I changed those to wildcards
in places where that doesn't weaken the tests. The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116055 91177308-0d34-0410-b5e6-96231b3b80d8
reapply: reimplement the second half of the or/add optimization. We should now
with no changes. Turns out that one missing "Defs = [EFLAGS]" can upset things
a bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116040 91177308-0d34-0410-b5e6-96231b3b80d8
only end up emitting LEA instead of OR. If we aren't able to promote
something into an LEA, we should never be emitting it as an ADD.
Add some testcases that we emit "or" in cases where we used to produce
an "add".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116026 91177308-0d34-0410-b5e6-96231b3b80d8
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115755 91177308-0d34-0410-b5e6-96231b3b80d8
having to do a double cast (uint64_t --> double --> float). This is based on the algorithm from compiler_rt's __floatundisf
for X86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115634 91177308-0d34-0410-b5e6-96231b3b80d8