This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.
Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.
This fixes rdar://problem/17907920.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216985 91177308-0d34-0410-b5e6-96231b3b80d8
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216932 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().
Fixes PR20828.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216908 91177308-0d34-0410-b5e6-96231b3b80d8
We have been using .init-array for most systems for quiet some time,
but tools like llc are still defaulting to .ctors because the old
option was never changed.
This patch makes llc default to .init-array and changes the option to
be -use-ctors.
Clang is not affected by this. It has its own fancier logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216905 91177308-0d34-0410-b5e6-96231b3b80d8
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216898 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.
Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.
Reviewers: eliben, meheff, Jiangning
Reviewed By: Jiangning
Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216862 91177308-0d34-0410-b5e6-96231b3b80d8
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.
This fixes rdar://problem/18183707.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216805 91177308-0d34-0410-b5e6-96231b3b80d8
When sinking an instruction it might be moved past the original last use of one
of its operands. This last use has the kill flag set and the verifier will
obviously complain about this.
Before Machine Sinking (AArch64):
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
After Machine Sinking:
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
This fix clears all the kill flags in all instruction that use the same operands
as the instruction that is being sunk.
This fixes rdar://problem/18180996.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216803 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).
Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.
Reviewers: majnemer
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5060
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216780 91177308-0d34-0410-b5e6-96231b3b80d8
We've rejected these kinds of functions since r28405 in 2006 because
it's impossible to lower the return of a callee cleanup varargs
function. However there are lots of legal ways to leave such a function
without returning, such as aborting. Today we can leave a function with
a musttail call to another function with the correct prototype, and
everything works out.
I'm removing the verifier check declaring that a normal return from such
a function is UB.
Reviewed By: nlewycky
Differential Revision: http://reviews.llvm.org/D5059
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216779 91177308-0d34-0410-b5e6-96231b3b80d8
This patch checks for DAG patterns that are an add or a sub followed by a
compare on 16 and 8 bit inputs. Since AArch64 does not support those types
natively they are legalized into 32 bit values, which means that mask operations
are inserted into the DAG to emulate overflow behaviour. In many cases those
masks do not change the result of the processing and just introduce a dependent
operation, often in the middle of a hot loop.
This patch detects the relevent DAG patterns and then tests to see if the
transforms are equivalent with and without the mask, removing the mask if
possible. The exact mechanism of this patch was discusses in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html
There is a reasonably good chance there are missed oppurtunities due to similiar
(but not identical) DAG patterns that could be funneled into this test, adding
them should be simple if we see test cases.
Tests included.
rdar://13754426
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216776 91177308-0d34-0410-b5e6-96231b3b80d8
The new solution is to not use this lowering if there are any dynamic
allocas in the current function. We know up front if there are dynamic
allocas, but we don't know if we'll need to create stack temporaries
with large alignment during lowering. Conservatively assume that we will
need such temporaries.
Reviewed By: hans
Differential Revision: http://reviews.llvm.org/D5128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216775 91177308-0d34-0410-b5e6-96231b3b80d8
When we select a trunc instruction we don't emit any code if the type is already
i32 or smaller. This is because the instruction that uses the truncated value
will deal with it.
This behavior can incorrectly transfer a kill flag, which was meant for the
result of the truncate, onto the source register.
%2 = trunc i32 %1 to i16
... = ... %2 -> ... = ... vreg1 <kill>
... = ... %1 ... = ... vreg1
This commit fixes this by emitting a COPY instruction, so that the result and
source register are distinct virtual registers.
This fixes rdar://problem/18178188.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216750 91177308-0d34-0410-b5e6-96231b3b80d8
A bug in r216725 meant we tried to discover the type of a SETCC before
confirming the node actually was a SETCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216734 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instead of specifying the alignment as metadata which may be destroyed by
transformation passes, make the alignment the second argument to ldu/ldg
intrinsic calls.
Test Plan:
ldu-ldg.ll
ldu-i8.ll
ldu-reg-plus-offset.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: meheff, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D5093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216731 91177308-0d34-0410-b5e6-96231b3b80d8
While working on a Thumb-2 code size optimization I just realized that we don't have any regression tests for it.
So here's a first test case, I plan to increase the coverage over time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216728 91177308-0d34-0410-b5e6-96231b3b80d8
In an llvm-stress generated test, we were trying to create a v0iN type and
asserting when that failed. This case could probably be handled by the
function, but not without added complexity and the situation it arises in is
sufficiently odd that there's probably no benefit anyway.
Should fix PR20775.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216725 91177308-0d34-0410-b5e6-96231b3b80d8
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.
Reviewed by Elena Demikhovsky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.
This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216716 91177308-0d34-0410-b5e6-96231b3b80d8
This fix checks first if the instruction to be folded (e.g. sign-/zero-extend,
or shift) is in the same machine basic block as the instruction we are folding
into.
Not doing so can result in incorrect code, because the value might not be
live-out of the basic block, where the value is defined.
This fixes rdar://problem/18169495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216700 91177308-0d34-0410-b5e6-96231b3b80d8
The AArch64 target lowering for [zs]ext of vectors is set up to handle
input simple types and expects the generic SDag path to do something reasonable
with anything that's not a simple type. The code, however, was only
checking that the result type was a simple type and assuming that
implied that the source type would also be a simple type. That's not a
valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>"
demonstrate. The fix is to simply explicitly validate the source type
as well as the result type.
PR20791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216689 91177308-0d34-0410-b5e6-96231b3b80d8
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.
Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216682 91177308-0d34-0410-b5e6-96231b3b80d8
file.
Changing code that is covered by these tests is just too hard to debug
currently, and now it will be clear the nature of the changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216643 91177308-0d34-0410-b5e6-96231b3b80d8
The included test case would fail, because the MI PHI node would have two
operands from the same predecessor.
This problem occurs when a switch instruction couldn't be selected. This happens
always, because there is no default switch support for FastISel to begin with.
The problem was that FastISel would first add the operand to the PHI nodes and
then fall-back to SelectionDAG, which would then in turn add the same operands
to the PHI nodes again.
This fix removes these duplicate PHI node operands by reseting the
PHINodesToUpdate to its original state before FastISel tried to select the
instruction.
This fixes <rdar://problem/18155224>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216640 91177308-0d34-0410-b5e6-96231b3b80d8
Currently instructions are folded very aggressively for AArch64 into the memory
operation, which can lead to the use of killed operands:
%vreg1<def> = ADDXri %vreg0<kill>, 2
%vreg2<def> = LDRBBui %vreg0, 2
... = ... %vreg1 ...
This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.
This fix teaches hasTrivialKill to not only check the LLVM IR that the value has
a single use, but also to check if the register that represents that value has
already been used. This can happen when the instruction with the use was folded
into another instruction (in this particular case a load instruction).
This fixes rdar://problem/18142857.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216634 91177308-0d34-0410-b5e6-96231b3b80d8
Currently instructions are folded very aggressively into the memory operation,
which can lead to the use of killed operands:
%vreg1<def> = ADDXri %vreg0<kill>, 2
%vreg2<def> = LDRBBui %vreg0, 2
... = ... %vreg1 ...
This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.
If the computed address is used by only memory operations in the same basic
block, then it is safe to fold them. This is because all memory operations will
fold the address computation and the original computation will never be emitted.
This fixes rdar://problem/18142857.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216629 91177308-0d34-0410-b5e6-96231b3b80d8
When the address comes directly from a shift instruction then the address
computation cannot be folded into the memory instruction, because the zero
register is not available as a base register. Simplify addess needs to emit the
shift instruction and use the result as base register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216621 91177308-0d34-0410-b5e6-96231b3b80d8
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216617 91177308-0d34-0410-b5e6-96231b3b80d8
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216555 91177308-0d34-0410-b5e6-96231b3b80d8
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.
This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8