1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118135 91177308-0d34-0410-b5e6-96231b3b80d8
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117675 91177308-0d34-0410-b5e6-96231b3b80d8
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115755 91177308-0d34-0410-b5e6-96231b3b80d8
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113570 91177308-0d34-0410-b5e6-96231b3b80d8
if a block is split (by a custom inserter), the insert point may be in a
different block than it was originally. This fixes 32-bit llvm-gcc
bootstrap builds, and I haven't been able to reproduce it otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108060 91177308-0d34-0410-b5e6-96231b3b80d8
- Check getBytesToPopOnReturn().
- Eschew ST0 and ST1 for return values.
- Fix the PIC base register initialization so that it doesn't ever
fail to end up the top of the entry block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108039 91177308-0d34-0410-b5e6-96231b3b80d8
U utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U test/CodeGen/X86/fast-isel.ll
U test/CodeGen/X86/fast-isel-loads.ll
U include/llvm/Target/TargetLowering.h
U include/llvm/Support/PassNameParser.h
U include/llvm/CodeGen/FunctionLoweringInfo.h
U include/llvm/CodeGen/CallingConvLower.h
U include/llvm/CodeGen/FastISel.h
U include/llvm/CodeGen/SelectionDAGISel.h
U lib/CodeGen/LLVMTargetMachine.cpp
U lib/CodeGen/CallingConvLower.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U lib/CodeGen/SelectionDAG/FastISel.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U lib/CodeGen/SelectionDAG/TargetLowering.cpp
U lib/Target/XCore/XCoreISelLowering.cpp
U lib/Target/XCore/XCoreISelLowering.h
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86ISelLowering.h
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107987 91177308-0d34-0410-b5e6-96231b3b80d8
of getPhysicalRegisterRegClass with it.
If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107140 91177308-0d34-0410-b5e6-96231b3b80d8
original SDNode. This is badness. Also, this function allows one SDNode to point
multiple flags to another SDNode. Badness as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106793 91177308-0d34-0410-b5e6-96231b3b80d8
into the same node, but with different non-memory operands, we need to replace
the memory operands after it's finished morphing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106643 91177308-0d34-0410-b5e6-96231b3b80d8
pipeline stall. It's useful for targets like ARM cortex-a8. NEON has a lot
of long latency instructions so a strict register pressure reduction
scheduler does not work well.
Early experiments show this speeds up some NEON loops by over 30%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@104216 91177308-0d34-0410-b5e6-96231b3b80d8
Here is a theoretical example that illustrates why the placement is important.
tmp1 =
store tmp1 -> x
...
tmp2 = add ...
...
call
...
store tmp2 -> x
Now mem2reg comes along:
tmp1 =
dbg_value (tmp1 -> x)
...
tmp2 = add ...
...
call
...
dbg_value (tmp2 -> x)
When the debugger examine the value of x after the add instruction but before the call, it should have the value of tmp1.
Furthermore, for dbg_value's that reference constants, they should not be emitted at the beginning of the block (since they do not have "producers").
This patch also cleans up how SDISel manages DbgValue nodes. It allow a SDNode to be referenced by multiple SDDbgValue nodes. When a SDNode is deleted, it uses the information to find the SDDbgValues and invalidate them. They are not deleted until the corresponding SelectionDAG is destroyed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@99469 91177308-0d34-0410-b5e6-96231b3b80d8
to adding them in a determinstic order (bottom up from
the root) based on the structure of the graph itself.
This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?
CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear. Since it
is an unreduced mass of gnast, I just removed it.
This fixes PR6370
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@97023 91177308-0d34-0410-b5e6-96231b3b80d8
- Move DisableScheduling flag into TargetOption.h
- Move SDNodeOrdering into its own header file. Give it a minimal interface that
doesn't conflate construction with storage.
- Move assigning the ordering into the SelectionDAGBuilder.
This isn't used yet, so there should be no functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91727 91177308-0d34-0410-b5e6-96231b3b80d8
stuff isn't used just yet.
We want to model the GCC `-fno-schedule-insns' and `-fno-schedule-insns2'
flags. The hypothesis is that the people who use these flags know what they are
doing, and have hand-optimized the C code to reduce latencies and other
conflicts.
The idea behind our scheme to turn off scheduling is to create a map "on the
side" during DAG generation. It will order the nodes by how they appeared in the
code. This map is then used during scheduling to get the ordering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91392 91177308-0d34-0410-b5e6-96231b3b80d8
into MachineInstrs. This is mostly just moving the code from
ScheduleDAGSDNodesEmit.cpp into a new class. This decouples MachineInstr
emitting from scheduling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83699 91177308-0d34-0410-b5e6-96231b3b80d8
- Allocate MachineMemOperands and MachineMemOperand lists in MachineFunctions.
This eliminates MachineInstr's std::list member and allows the data to be
created by isel and live for the remainder of codegen, avoiding a lot of
copying and unnecessary translation. This also shrinks MemSDNode.
- Delete MemOperandSDNode. Introduce MachineSDNode which has dedicated
fields for MachineMemOperands.
- Change MemSDNode to have a MachineMemOperand member instead of its own
fields with the same information. This introduces some redundancy, but
it's more consistent with what MachineInstr will eventually want.
- Ignore alignment when searching for redundant loads for CSE, but remember
the greatest alignment.
Target-specific code which previously used MemOperandSDNodes with generic
SDNodes now use MemIntrinsicSDNodes, with opcodes in a designated range
so that the SelectionDAG framework knows that MachineMemOperand information
is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82794 91177308-0d34-0410-b5e6-96231b3b80d8