Note: This reapplies r215582 without any modifications. The refactoring wasn't
responsible for the buildbot failures.
Original commit message:
Cleanup and prepare constant materialization code for future commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215752 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.
<rdar://problem/17991614>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215600 91177308-0d34-0410-b5e6-96231b3b80d8
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.
For Example:
lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3]
ldr x0, [x0, x1]
sxtw x1, w1
lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3]
ldr x0, [x0, x1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215597 91177308-0d34-0410-b5e6-96231b3b80d8
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.
Fixes <rdar://problem/17924413>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215591 91177308-0d34-0410-b5e6-96231b3b80d8
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215587 91177308-0d34-0410-b5e6-96231b3b80d8
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215558 91177308-0d34-0410-b5e6-96231b3b80d8
The combiner ignored DBG nodes when checking
the uses of a virtual register.
It combined a sequence like
%vreg1 = madd %vreg2, %vreg3,...
DBG_VALUE (%vreg1 ...)
%vreg4 = add %vreg1,...
to
%vreg4 = madd %vreg2, %vreg3
leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215431 91177308-0d34-0410-b5e6-96231b3b80d8
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215233 91177308-0d34-0410-b5e6-96231b3b80d8
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.
This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.
Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).
This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215199 91177308-0d34-0410-b5e6-96231b3b80d8
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215154 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers
The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215151 91177308-0d34-0410-b5e6-96231b3b80d8
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215111 91177308-0d34-0410-b5e6-96231b3b80d8
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.
This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214957 91177308-0d34-0410-b5e6-96231b3b80d8
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214859 91177308-0d34-0410-b5e6-96231b3b80d8
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.
getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214848 91177308-0d34-0410-b5e6-96231b3b80d8
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.
This should cover most of the trivial cases without falling back to
SelectionDAG.
This fixes <rdar://problem/17890986>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214846 91177308-0d34-0410-b5e6-96231b3b80d8
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214845 91177308-0d34-0410-b5e6-96231b3b80d8
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
sequence on AArch64
Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214832 91177308-0d34-0410-b5e6-96231b3b80d8
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.
The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.
The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.
This fixes <rdar://problem/17907720>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214788 91177308-0d34-0410-b5e6-96231b3b80d8
scalar integer instruction pass.
This is a patch I had lying around from a few months ago. The pass is
currently disabled by default, so nothing to interesting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214779 91177308-0d34-0410-b5e6-96231b3b80d8
sequence - AArch64 target support
This patch turns off madd/msub generation in the DAGCombiner and generates
them in the MachineCombiner instead. It replaces the original code sequence
with the combined sequence when it is beneficial to do so.
When there is no machine model support it always generates the madd/msub
instruction. This is true also when the objective is to optimize for code
size: when the combined sequence is shorter is always chosen and does not
get evaluated.
When there is a machine model the combined instruction sequence
is evaluated for critical path and resource length using machine
trace metrics and the original code sequence is replaced when it is
determined to be faster.
rdar://16319955
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214669 91177308-0d34-0410-b5e6-96231b3b80d8