allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115755 91177308-0d34-0410-b5e6-96231b3b80d8
stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide
more nuanced estimates in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115364 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114973 91177308-0d34-0410-b5e6-96231b3b80d8
into OptimizeCompareInstr.
This necessitates the passing of CmpValue around,
so widen the virtual functions to accomodate.
No functionality changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114428 91177308-0d34-0410-b5e6-96231b3b80d8
the 'zero' bit down into the back-end. There are other cases where this logic
isn't sufficient, so they should be handled separately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113665 91177308-0d34-0410-b5e6-96231b3b80d8
iterator when an optimization took place. This allows us to do more insane
things with the code than just remove an instruction or two.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113640 91177308-0d34-0410-b5e6-96231b3b80d8
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113570 91177308-0d34-0410-b5e6-96231b3b80d8
instruction in the class would be decoded to. Or zero if the number of
uOPs must be determined dynamically.
This will be used to determine the cost-effectiveness of predicating a
micro-coded instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113513 91177308-0d34-0410-b5e6-96231b3b80d8
(I discovered 2 more copies of the ARM instruction format list, bringing the
total to 4!! Two of them were already out of sync. I haven't yet gotten into
the disassembler enough to know the best way to fix this, but something needs
to be done.) Add support for encoding these instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110754 91177308-0d34-0410-b5e6-96231b3b80d8
relatively expensive comparison analyzer on each instruction. Also rename the
comparison analyzer method to something more in line with what it actually does.
This pass is will eventually be folded into the Machine CSE pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110539 91177308-0d34-0410-b5e6-96231b3b80d8
This pass tries to remove comparison instructions when possible. For instance,
if you have this code:
sub r1, 1
cmp r1, 0
bz L1
and "sub" either sets the same flag as the "cmp" instruction or could be
converted to set the same flag, then we can eliminate the "cmp" instruction all
together. This is a important for ARM where the ALU instructions could set the
CPSR flag, but need a special suffix ('s') to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110423 91177308-0d34-0410-b5e6-96231b3b80d8
ARM/PPC/MSP430-specific code (which are the only targets that
implement the hook) can directly reference their target-specific
instrinfo classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109171 91177308-0d34-0410-b5e6-96231b3b80d8
The only folding these load/store architectures can do is converting COPY into a
load or store, and the target independent part of foldMemoryOperand already
knows how to do that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108099 91177308-0d34-0410-b5e6-96231b3b80d8
void t(int *cp0, int *cp1, int *dp, int fmd) {
int c0, c1, d0, d1, d2, d3;
c0 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
c1 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
/* ... */
}
It code gens into something pretty bad. But with this change (analogous to the
X86 back-end), it will use ldm and generate few instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106693 91177308-0d34-0410-b5e6-96231b3b80d8
- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106344 91177308-0d34-0410-b5e6-96231b3b80d8
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106243 91177308-0d34-0410-b5e6-96231b3b80d8
the machine instruction representation of the immediate value to be encoded
into an integer with similar fields as the actual VMOV instruction. This makes
things easier for the disassembler, since it can just stuff the bits into the
immediate operand, but harder for the asm printer since it has to decode the
value to be printed. Testcase for the encoding will follow later when MC has
more support for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@105836 91177308-0d34-0410-b5e6-96231b3b80d8
instruction defines subregisters.
Any existing subreg indices on the original instruction are preserved or
composed with the new subreg index.
Also substitute multiple operands mentioning the original register by using the
new MachineInstr::substituteRegister() function. This is necessary because there
will soon be <imp-def> operands added to non read-modify-write partial
definitions. This instruction:
%reg1234:foo = FLAP %reg1234<imp-def>
will reMaterialize(%reg3333, bar) like this:
%reg3333:bar-foo = FLAP %reg333:bar<imp-def>
Finally, replace the TargetRegisterInfo pointer argument with a reference to
indicate that it cannot be NULL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@105358 91177308-0d34-0410-b5e6-96231b3b80d8
room for it. This is in preparation for another patch which is adding NEON
subformats to facilitate disassembly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@98967 91177308-0d34-0410-b5e6-96231b3b80d8
- Eliminate TargetInstrInfo::isIdentical and replace it with produceSameValue. In the default case, produceSameValue just checks whether two machine instructions are identical (except for virtual register defs). But targets may override it to check for unusual cases (e.g. ARM pic loads from constant pools).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@97628 91177308-0d34-0410-b5e6-96231b3b80d8
for all the processors where I have tried it, and even when it might not help
performance, the cost is quite low. The opportunities for duplicating
indirect branches are limited by other factors so code size does not change
much due to tail duplicating indirect branches aggressively.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90144 91177308-0d34-0410-b5e6-96231b3b80d8
Make tail duplication of indirect branches much more aggressive (for targets
that indicate that it is profitable), based on further experience with
this transformation. I compiled 3 large applications with and without
this more aggressive tail duplication and measured minimal changes in code
size. ("size" on Darwin seems to round the text size up to the nearest
page boundary, so I can only say that any code size increase was less than
one 4k page.) Radar 7421267.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89814 91177308-0d34-0410-b5e6-96231b3b80d8
than doing the same via constpool:
1. Load from constpool costs 3 cycles on A9, movt/movw pair - just 2.
2. Load from constpool might stall up to 300 cycles due to cache miss.
3. Movt/movw does not use load/store unit.
4. Less constpool entries => better compiler performance.
This is only enabled on ELF systems, since darwin does not have needed
relocations (yet).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89720 91177308-0d34-0410-b5e6-96231b3b80d8
contents of the block to be duplicated. Use this for ARM Cortex A8/9 to
be more aggressive tail duplicating indirect branches, since it makes it
much more likely that they will be predicted in the branch target buffer.
Testcase coming soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89187 91177308-0d34-0410-b5e6-96231b3b80d8
- If destination is a physical register and it has a subreg index, use the
sub-register instead.
This fixes PR5423.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88745 91177308-0d34-0410-b5e6-96231b3b80d8