- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164281 91177308-0d34-0410-b5e6-96231b3b80d8
- Merge the processing of LOAD_ADD with other atomic load-arith
operations
- Separate the logic getting target constant for atomic-load-op and add
an optimization for atomic-load-add on i16 with negative value
- Optimize a minor case for atomic-fetch-add i16 with negative operand. Test
case is revised.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164243 91177308-0d34-0410-b5e6-96231b3b80d8
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.
rdar://11897677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163995 91177308-0d34-0410-b5e6-96231b3b80d8
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163832 91177308-0d34-0410-b5e6-96231b3b80d8
- Find a legal vector type before casting and extracting element from it.
- As the new vector type may have more than 2 elements, build the final
hi/lo pair by BFS pairing them from bottom to top.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163830 91177308-0d34-0410-b5e6-96231b3b80d8
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.
<rdar://problem/12282281>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163819 91177308-0d34-0410-b5e6-96231b3b80d8
by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.
Fix rdar://12281066 PR13813.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163802 91177308-0d34-0410-b5e6-96231b3b80d8
are within the lifetime zone. Sometime legitimate usages of allocas are
hoisted outside of the lifetime zone. For example, GEPS may calculate the
address of a member of an allocated struct. This commit makes sure that
we only check (abort regions or assert) for instructions that read and write
memory using stack frames directly. Notice that by allowing legitimate
usages outside the lifetime zone we also stop checking for instructions
which use derivatives of allocas. We will catch less bugs in user code
and in the compiler itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163791 91177308-0d34-0410-b5e6-96231b3b80d8
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163761 91177308-0d34-0410-b5e6-96231b3b80d8
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163743 91177308-0d34-0410-b5e6-96231b3b80d8
The input program may contain intructions which are not inside lifetime
markers. This can happen due to a bug in the compiler or due to a bug in
user code (for example, returning a reference to a local variable).
This commit adds checks that all of the instructions in the function and
invalidates lifetime ranges which do not contain all of the instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163678 91177308-0d34-0410-b5e6-96231b3b80d8
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163516 91177308-0d34-0410-b5e6-96231b3b80d8
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.
When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:
%CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
%vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>
We can assign %vreg11 to %ECX, overlapping the live range of %CL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163336 91177308-0d34-0410-b5e6-96231b3b80d8
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163150 91177308-0d34-0410-b5e6-96231b3b80d8
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163036 91177308-0d34-0410-b5e6-96231b3b80d8
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163018 91177308-0d34-0410-b5e6-96231b3b80d8
I was too optimistic, inline asm can have tied operands that don't
follow the def order.
Fixes PR13742.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162998 91177308-0d34-0410-b5e6-96231b3b80d8
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
enabled.
As the penalty of inter-mixing SSE and AVX instructions, we need
prevent SSE legacy insn from being generated except explicitly
specified through some intrinsics. For patterns supported by both
SSE and AVX, so far, we force AVX insn will be tried first relying on
AddedComplexity or position in td file. It's error-prone and
introduces bugs accidentally.
'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
by AVX, we need this predicate to force VEX encoding or SSE legacy
encoding only.
For insns not inherited by AVX, we still use the previous predicates,
i.e. 'HasSSEx'. So far, these insns fall into the following
categories:
* SSE insns with MMX operands
* SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
CRC, and etc.)
* SSE4A insns.
* MMX insns.
* x87 insns added by SSE.
2 test cases are modified:
- test/CodeGen/X86/fast-isel-x86-64.ll
AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
selected by fast-isel due to complicated pattern and fast-isel
fallback to materialize it from constant pool.
- test/CodeGen/X86/widen_load-1.ll
AVX code generation is different from SSE one after fixing SSE/AVX
inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
'vmovaps'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162919 91177308-0d34-0410-b5e6-96231b3b80d8
- The root cause is that target constant materialization in X86 fast-isel
creates a PC-rel addressing which may overflow 32-bit range in non-Small code
model if .rodata section is allocated too far away from code segment in
MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
non-Small code model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162881 91177308-0d34-0410-b5e6-96231b3b80d8
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162741 91177308-0d34-0410-b5e6-96231b3b80d8
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162735 91177308-0d34-0410-b5e6-96231b3b80d8
corresponding changes to existing tests for darwin triple to ensure that
same pattern is tested for bdver2 target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162655 91177308-0d34-0410-b5e6-96231b3b80d8
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162187 91177308-0d34-0410-b5e6-96231b3b80d8
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.
Before:
xorl %esi, %edi
testb %dil, %dil
setne %al
ret
After:
xorb %dil, %sil
setne %al
ret
rdar://12081007
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162160 91177308-0d34-0410-b5e6-96231b3b80d8
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162097 91177308-0d34-0410-b5e6-96231b3b80d8
around. That's not how we do things. Besides, the commit message tells us that
it is covered by the GCC test suite.
------------------------------------------------------------------------
r127497 | zwarich | 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) | 3 lines
Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161985 91177308-0d34-0410-b5e6-96231b3b80d8
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161894 91177308-0d34-0410-b5e6-96231b3b80d8
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.
rdar://11973998
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161852 91177308-0d34-0410-b5e6-96231b3b80d8
It is still possible to if-convert if the tail block has extra
predecessors, but the tail phis must be rewritten instead of being
removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161781 91177308-0d34-0410-b5e6-96231b3b80d8
OpTbl1 to OpTbl2 since they have 3 operands and the last operand can be changed
to a memory operand.
PR13576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161769 91177308-0d34-0410-b5e6-96231b3b80d8
- FCMOV only supports a subset of X86 conditions. Skip boolean
simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161732 91177308-0d34-0410-b5e6-96231b3b80d8
FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge.
FeatureFastUAMem is already on if we pass in nehalem or westmere as a command
argument.
rdar: 7252306
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161717 91177308-0d34-0410-b5e6-96231b3b80d8
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
generated from X86ISD::SETCC, try to simplify the boolean value
generation and checking by reusing the original EFLAGS with proper
condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
consuming EFLAGS
part of patches fixing PR12312
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161687 91177308-0d34-0410-b5e6-96231b3b80d8
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161653 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.
In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.
Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161634 91177308-0d34-0410-b5e6-96231b3b80d8
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.
The test changes are required because the register allocation hint is
using the use-list order to break ties.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161633 91177308-0d34-0410-b5e6-96231b3b80d8
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.
Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.
rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161462 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.
As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.
Fixes PR13265.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161409 91177308-0d34-0410-b5e6-96231b3b80d8
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.
rdar://11393714 and rdar://11819721
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161396 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161282 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin. I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161262 91177308-0d34-0410-b5e6-96231b3b80d8
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161207 91177308-0d34-0410-b5e6-96231b3b80d8
- Relax to match even if epilogue (pop %ebp) were emitted.
- Assume the return value is stored to %xmm0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161155 91177308-0d34-0410-b5e6-96231b3b80d8
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
This patch is a rework of r160919 and was tested on clang self-host on my local
machine.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161152 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, we were using EBX, but PIC requires the GOT to be in EBX before
function calls via PLT GOT pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161066 91177308-0d34-0410-b5e6-96231b3b80d8
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.
rdar://11980766
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161062 91177308-0d34-0410-b5e6-96231b3b80d8
We branch to the successor with higher edge weight first.
Convert from
je LBB4_8 --> to outer loop
jmp LBB4_14 --> to inner loop
to
jne LBB4_14
jmp LBB4_8
PR12750
rdar: 11393714
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161018 91177308-0d34-0410-b5e6-96231b3b80d8
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160919 91177308-0d34-0410-b5e6-96231b3b80d8
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160912 91177308-0d34-0410-b5e6-96231b3b80d8
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves. They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.
This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.
The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160816 91177308-0d34-0410-b5e6-96231b3b80d8
TwoAddressInstructionPass.
The generated code for Atom has a different code sequence. This is realted
to commit r160749.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160755 91177308-0d34-0410-b5e6-96231b3b80d8
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.
This also fixed rdar://11830760 where xor is expected instead of movi0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160749 91177308-0d34-0410-b5e6-96231b3b80d8
struct s {
double x1;
float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
f(41, 42, 43);
}
We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160695 91177308-0d34-0410-b5e6-96231b3b80d8
are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.
This fixes PR13438.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160687 91177308-0d34-0410-b5e6-96231b3b80d8
that do not support it (X86 does not lower select_cc).
PR: 13428
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160619 91177308-0d34-0410-b5e6-96231b3b80d8
LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load
into its only use. Only do that when the load is safe to move, and it
won't extend any live ranges.
This fixes PR13414.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160575 91177308-0d34-0410-b5e6-96231b3b80d8
PHIElimination splits critical edges when it predicts it can resolve
interference and eliminate copies. It doesn't split the edge if the
interference wouldn't be resolved anyway because the phi-use register is
live in the critical edge anyway.
Teach PHIElimination to split loop exiting edges with interference, even
if it wouldn't resolve the interference. This removes the necessary
copies from the loop, which is still an improvement from injecting the
copies into the loop.
The test case demonstrates the improvement. Before:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
movl %esi, %eax
je LBB0_1
After:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
je LBB0_1
movl %esi, %eax
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160571 91177308-0d34-0410-b5e6-96231b3b80d8
Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.
rdar://11855129
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160454 91177308-0d34-0410-b5e6-96231b3b80d8
when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.
These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.
Patch by Andy Zhang and Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160451 91177308-0d34-0410-b5e6-96231b3b80d8
When truncating a result of a vector that is split we need
to use the result of the split vector, and not re-split the dead node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160357 91177308-0d34-0410-b5e6-96231b3b80d8
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8
uint32_t hi(uint64_t res)
{
uint_32t hi = res >> 32;
return !hi;
}
llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
%lnot = icmp ult i64 %res, 4294967296
%lnot.ext = zext i1 %lnot to i32
ret i32 %lnot.ext
}
The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
movabsq $4294967296, %rax ## imm = 0x100000000
cmpq %rax, %rdi
sbbl %eax, %eax
andl $1, %eax
This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
shrq $32, %rdi
testl %edi, %edi
sete %al
movzbl %al, %eax
It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1
rdar://11866926
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160312 91177308-0d34-0410-b5e6-96231b3b80d8
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160305 91177308-0d34-0410-b5e6-96231b3b80d8
1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp.
2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of
vector add.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160274 91177308-0d34-0410-b5e6-96231b3b80d8
undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160260 91177308-0d34-0410-b5e6-96231b3b80d8
It is intended to fix PR11468.
Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp
The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:
push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp
Reviewed by Chad Rosier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160248 91177308-0d34-0410-b5e6-96231b3b80d8
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.
PR12782.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160230 91177308-0d34-0410-b5e6-96231b3b80d8