operand list instead of the operand list redundantly declared on the alias
or instruction.
With this change, we finally remove the ins/outs list on the alias. Before:
def : InstAlias<(outs GR16:$dst), (ins GR8 :$src),
"movsx $src, $dst",
(MOVSX16rr8W GR16:$dst, GR8:$src)>;
After:
def : InstAlias<"movsx $src, $dst",
(MOVSX16rr8W GR16:$dst, GR8:$src)>;
This also makes the alias mechanism more general and powerful, which will
be exploited in subsequent patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118329 91177308-0d34-0410-b5e6-96231b3b80d8
with a SimpleValueType, while an EVT supports equality and
inequality comparisons with SimpleValueType.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118169 91177308-0d34-0410-b5e6-96231b3b80d8
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118167 91177308-0d34-0410-b5e6-96231b3b80d8
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118135 91177308-0d34-0410-b5e6-96231b3b80d8
"In32BitMode" and "In64BitMode" into tblgen, allow any
predicate that inherits from AssemblerPredicate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117831 91177308-0d34-0410-b5e6-96231b3b80d8
directives, allowing things like this:
def : MnemonicAlias<"pop", "popl">, Requires<[In32BitMode]>;
def : MnemonicAlias<"pop", "popq">, Requires<[In64BitMode]>;
Move the rest of the X86 MnemonicAliases over to the .td file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117830 91177308-0d34-0410-b5e6-96231b3b80d8
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116781 91177308-0d34-0410-b5e6-96231b3b80d8
single object format can be shared.
This also adds support for
mov zed+(bar-foo), %eax
on ELF and COFF targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116675 91177308-0d34-0410-b5e6-96231b3b80d8
stuff that wants to take one or the other. These can both be used
as the operation of a dag in a pattern match.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115877 91177308-0d34-0410-b5e6-96231b3b80d8
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115755 91177308-0d34-0410-b5e6-96231b3b80d8
stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide
more nuanced estimates in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115364 91177308-0d34-0410-b5e6-96231b3b80d8
or not. TableGen needs to generate the printInstruction() function as taking
an MCInstr* or a MachineInstr*, depending. Default to the old non-MC
version so that everything not yet using MC continues to just work without
fidding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115126 91177308-0d34-0410-b5e6-96231b3b80d8
MCStreamer to emit into instead of an MCInst to fill in. This allows the
matcher extra flexibility and is more convenient.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115014 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114973 91177308-0d34-0410-b5e6-96231b3b80d8
support aligned comm. Detect when compiling for 10.4 and don't
emit an alignment for comm. THis will hopefully fix PR8198.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114817 91177308-0d34-0410-b5e6-96231b3b80d8
passed the root of the match, even though only a few patterns
actually needed this (one in X86, several in ARM [which should
be refactored anyway], and some in CellSPU that I don't feel
like detangling). Instead of requiring all ComplexPatterns to
take the dead root, have targets opt into getting the root by
putting SDNPWantRoot on the ComplexPattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114471 91177308-0d34-0410-b5e6-96231b3b80d8
into OptimizeCompareInstr.
This necessitates the passing of CmpValue around,
so widen the virtual functions to accomodate.
No functionality changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114428 91177308-0d34-0410-b5e6-96231b3b80d8
the 'zero' bit down into the back-end. There are other cases where this logic
isn't sufficient, so they should be handled separately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113665 91177308-0d34-0410-b5e6-96231b3b80d8
iterator when an optimization took place. This allows us to do more insane
things with the code than just remove an instruction or two.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113640 91177308-0d34-0410-b5e6-96231b3b80d8
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113570 91177308-0d34-0410-b5e6-96231b3b80d8
instruction in the class would be decoded to. Or zero if the number of
uOPs must be determined dynamically.
This will be used to determine the cost-effectiveness of predicating a
micro-coded instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113513 91177308-0d34-0410-b5e6-96231b3b80d8
that like to randomly define things like "X86", regenerate autoconf bits
and update cmake.
Fixes PR7852.
Patch by Xerxes Rånby!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112499 91177308-0d34-0410-b5e6-96231b3b80d8
expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats. This
affects two places in the code: handling cross block values and handling
function return and arguments. Since vectors are already widened by
legalizetypes, this gives us much better code and unblocks x86-64 abi
and SPU abi work.
For example, this (which is a silly example of a cross-block value):
define <4 x float> @test2(<4 x float> %A) nounwind {
%B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1>
%C = fadd <2 x float> %B, %B
br label %BB
BB:
%D = fadd <2 x float> %C, %C
%E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
ret <4 x float> %E
}
Now compiles into:
_test2: ## @test2
## BB#0:
addps %xmm0, %xmm0
addps %xmm0, %xmm0
ret
previously it compiled into:
_test2: ## @test2
## BB#0:
addps %xmm0, %xmm0
pshufd $1, %xmm0, %xmm1
## kill: XMM0<def> XMM0<kill> XMM0<def>
insertps $0, %xmm0, %xmm0
insertps $16, %xmm1, %xmm0
addps %xmm0, %xmm0
ret
This implements rdar://8230384
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112101 91177308-0d34-0410-b5e6-96231b3b80d8
For now it's still a command line option, but the interface to the generic
code doesn't need to know that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111942 91177308-0d34-0410-b5e6-96231b3b80d8
Nothing fancy, just ask the target if any currently available base reg
is in range for the instruction under consideration and use the first one
that is. Placeholder ARM implementation simply returns false for now.
ongoing saga of rdar://8277890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111374 91177308-0d34-0410-b5e6-96231b3b80d8
the local block. Resolve references to those indices to a new base register.
For simplification and testing purposes, a new virtual base register is
allocated for each frame index being resolved. The result is truly horrible,
but correct, code that's good for exercising the new code paths.
Next up is adding thumb1 support, which should be very simple. Following that
will be adding base register re-use and implementing a reasonable ARM
heuristic for when a virtual base register should be generated at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111315 91177308-0d34-0410-b5e6-96231b3b80d8
whether to allocate a virtual frame base register to resolve the frame
index reference in it. Implement a simple version for ARM to aid debugging.
In LocalStackSlotAllocation, scan the function for frame index references
to local frame indices and ask the target whether to allocate virtual
frame base registers for any it encounters. Purely infrastructural for
debug output. Next step is to actually allocate base registers, then add
intelligent re-use of them.
rdar://8277890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111262 91177308-0d34-0410-b5e6-96231b3b80d8
that many of these things, so the memory savings isn't significant,
and there are now situations where there can be alignments greater
than 128.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110836 91177308-0d34-0410-b5e6-96231b3b80d8
When splitting a live range, the new registers have fewer uses and the
permissible register class may be less constrained. Recompute the register class
constraint from the uses of new registers created for a split. This may let them
be allocated from a larger set, possibly avoiding a spill.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110703 91177308-0d34-0410-b5e6-96231b3b80d8
relatively expensive comparison analyzer on each instruction. Also rename the
comparison analyzer method to something more in line with what it actually does.
This pass is will eventually be folded into the Machine CSE pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110539 91177308-0d34-0410-b5e6-96231b3b80d8
Without this what was happening was:
* R3 is not marked as "used"
* ARM backend thinks it has to save it to the stack because of vaarg
* Offset computation correctly ignores it
* Offsets are wrong
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110446 91177308-0d34-0410-b5e6-96231b3b80d8
need the Compare flag after all.
--- Reverse-merging r109901 into '.':
U include/llvm/Target/TargetInstrDesc.h
U include/llvm/Target/Target.td
U utils/TableGen/InstrInfoEmitter.cpp
U utils/TableGen/CodeGenInstruction.cpp
U utils/TableGen/CodeGenInstruction.h
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110424 91177308-0d34-0410-b5e6-96231b3b80d8
This pass tries to remove comparison instructions when possible. For instance,
if you have this code:
sub r1, 1
cmp r1, 0
bz L1
and "sub" either sets the same flag as the "cmp" instruction or could be
converted to set the same flag, then we can eliminate the "cmp" instruction all
together. This is a important for ARM where the ALU instructions could set the
CPSR flag, but need a special suffix ('s') to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110423 91177308-0d34-0410-b5e6-96231b3b80d8
later to identify and possibly remove superfluous compare instructions -- those
that are testing for and setting a status flag that should already be set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109901 91177308-0d34-0410-b5e6-96231b3b80d8
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109300 91177308-0d34-0410-b5e6-96231b3b80d8
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109279 91177308-0d34-0410-b5e6-96231b3b80d8