The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.
This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183068 91177308-0d34-0410-b5e6-96231b3b80d8
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.
Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.
<rdar://problem/13973908>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183021 91177308-0d34-0410-b5e6-96231b3b80d8
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182991 91177308-0d34-0410-b5e6-96231b3b80d8
The pattern the test originally checked for doesn't occur on other -mcpu
settings. On atom it's still there though slightly differently scheduled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182933 91177308-0d34-0410-b5e6-96231b3b80d8
This test was failing on some hosts when an unexpected register was used for a
variable. This just extends the regexp to allow the new x86-64 registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182929 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.
Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182928 91177308-0d34-0410-b5e6-96231b3b80d8
In fact, we're probably going to support these flags in completely different
ways. So this test is no longer valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182899 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.
Patch by Xiaoyi Guo!
This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182885 91177308-0d34-0410-b5e6-96231b3b80d8
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite".
This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182739 91177308-0d34-0410-b5e6-96231b3b80d8
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.
Added analysis and code generation tests. Added documentation for the
new attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182638 91177308-0d34-0410-b5e6-96231b3b80d8
Allow LLVM to take advantage of shift instructions that set the ZF flag,
making instructions that test the destination superfluous.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182454 91177308-0d34-0410-b5e6-96231b3b80d8
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182184 91177308-0d34-0410-b5e6-96231b3b80d8
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.
We still prefer palignr over psrldq because it has higher throughput on
sandybridge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182102 91177308-0d34-0410-b5e6-96231b3b80d8
Increase the number of instructions LLVM recognizes as setting the ZF
flag. This allows us to remove test instructions that redundantly
recalculate the flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181937 91177308-0d34-0410-b5e6-96231b3b80d8
IR optimisation passes can result in a basic block that contains:
llvm.lifetime.start(%buf)
...
llvm.lifetime.end(%buf)
...
llvm.lifetime.start(%buf)
Before this change, calculateLiveIntervals() was ignoring the second
lifetime.start() and was regarding %buf as being dead from the
lifetime.end() through to the end of the basic block. This can cause
StackColoring to incorrectly merge %buf with another stack slot.
Fix by removing the incorrect Starts[pos].isValid() and
Finishes[pos].isValid() checks.
Just doing:
Starts[pos] = Indexes->getMBBStartIdx(MBB);
Finishes[pos] = Indexes->getMBBEndIdx(MBB);
unconditionally would be enough to fix the bug, but it causes some
test failures due to stack slots not being merged when they were
before. So, in order to keep the existing tests passing, treat LiveIn
and LiveOut separately rather than approximating the live ranges by
merging LiveIn and LiveOut.
This fixes PR15707.
Patch by Mark Seaborn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181922 91177308-0d34-0410-b5e6-96231b3b80d8
We generate a `push' of a random register (%rax) if the stack needs to be
aligned by the size of that register. However, this could mess up compact unwind
generation. In particular, we want to still generate compact unwind in the
presence of this monstrosity.
Check if the push of of the %rax/%eax register. If it is and it's marked with
the `FrameSetup' flag, then we can generate a compact unwind encoding for the
function only if the push is the last FrameSetup instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181540 91177308-0d34-0410-b5e6-96231b3b80d8
Fold (xor (and x, y), y) -> (and (not x), y)
This removes an opportunity for a constant to appear twice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181395 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISelLowering has support to treat:
(icmp ne (and (xor %flags, -1), (shl 1, flag)), 0)
as if it were actually:
(icmp eq (and %flags, (shl 1, flag)), 0)
However, r179386 has code at the InstCombine level to handle this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181145 91177308-0d34-0410-b5e6-96231b3b80d8
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.
rdar://problem/13658587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180816 91177308-0d34-0410-b5e6-96231b3b80d8
- Revise previous patches of the same purpose by fixing
*) grep <PA> | not grep <PB> semantically is not the same as
CHECK: <PA>{{^<PB>.*$}} as the former will check all occurrences of <PA>
while the later only check the first match. As the result, CHECK needs
putting in all place where <PA> occurs.
*) grep <PA> | count <N> needs a final CHECK-NOT of the same pattern.
(As 'CHECK-<N>' is proposed for discussion, converting 'grep | count <N>'
where N > 1 is postponed.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180742 91177308-0d34-0410-b5e6-96231b3b80d8
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180597 91177308-0d34-0410-b5e6-96231b3b80d8
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180573 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179996 91177308-0d34-0410-b5e6-96231b3b80d8
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179836 91177308-0d34-0410-b5e6-96231b3b80d8
In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and
handled by the DAG-based ISel. However, X86FastISel::X86SelectLoad() makes
no such requirement. There doesn't appear to be an x86 architectural
correctness issue with allowing potentially unaligned store instructions.
This patch removes this restriction.
Patch by Jim Stichnot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179774 91177308-0d34-0410-b5e6-96231b3b80d8
for the sdiv/srem/udiv/urem bitcode instructions. This is done for the i8,
i16, and i32 types, as well as i64 for the x86_64 target.
Patch by Jim Stichnoth
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179715 91177308-0d34-0410-b5e6-96231b3b80d8
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179449 91177308-0d34-0410-b5e6-96231b3b80d8
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem. When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179353 91177308-0d34-0410-b5e6-96231b3b80d8
As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179267 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND
As a by-product, PR5443 is also fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179265 91177308-0d34-0410-b5e6-96231b3b80d8