Drop the FpMov instructions, use plain COPY instead.
Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.
This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.
As a bonus we handle floating point inline assembly correctly now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134018 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133080 91177308-0d34-0410-b5e6-96231b3b80d8
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs. Only profitable when the user wants a materialized 0
or 1 at runtime. rdar://problem/5993888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132404 91177308-0d34-0410-b5e6-96231b3b80d8
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@131948 91177308-0d34-0410-b5e6-96231b3b80d8
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128181 91177308-0d34-0410-b5e6-96231b3b80d8
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
switch(x) {
case 1: return f1();
case 2: return f2();
case 3: return f3();
case 4: return f4();
case 5: return f5();
case 6: return f6();
}
}
=>
LBB0_2: ## %sw.bb
callq _f1
popq %rbp
ret
LBB0_3: ## %sw.bb1
callq _f2
popq %rbp
ret
LBB0_4: ## %sw.bb3
callq _f3
popq %rbp
ret
This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:
sw.bb7: ; preds = %entry
%call8 = tail call i32 @f5() nounwind
br label %return
sw.bb9: ; preds = %entry
%call10 = tail call i32 @f6() nounwind
br label %return
return:
%retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
ret i32 %retval.0
This allows codegen to generate better code like this:
LBB0_2: ## %sw.bb
jmp _f1 ## TAILCALL
LBB0_3: ## %sw.bb1
jmp _f2 ## TAILCALL
LBB0_4: ## %sw.bb3
jmp _f3 ## TAILCALL
rdar://9147433
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127953 91177308-0d34-0410-b5e6-96231b3b80d8
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127951 91177308-0d34-0410-b5e6-96231b3b80d8
comparisons on x86. Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.
This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127852 91177308-0d34-0410-b5e6-96231b3b80d8
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127766 91177308-0d34-0410-b5e6-96231b3b80d8
corresponding testcases back to the previous versions.
Fixes some performance regressions only seen on 32-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127441 91177308-0d34-0410-b5e6-96231b3b80d8
testcases accordingly. Some are currently xfailed and will be filed
as bugs to be fixed or understood.
Performance results:
roughly neutral on SPEC
some micro benchmarks in the llvm suite are up between 100 and 150%, only
a pair of regressions that are due to be investigated
john-the-ripper saw:
10% improvement in traditional DES
8% improvement in BSDI DES
59% improvement in FreeBSD MD5
67% improvement in OpenBSD Blowfish
14% improvement in LM DES
Small compile time impact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127208 91177308-0d34-0410-b5e6-96231b3b80d8
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127067 91177308-0d34-0410-b5e6-96231b3b80d8
missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126845 91177308-0d34-0410-b5e6-96231b3b80d8
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126664 91177308-0d34-0410-b5e6-96231b3b80d8
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working.
- The debugger needs to be aware of prolog_end attribute attached with line table entries.
- The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126155 91177308-0d34-0410-b5e6-96231b3b80d8
since one needs to be a register operand. Just use movss instead of forcing
an operand into a register.
Fixes PR9239
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126072 91177308-0d34-0410-b5e6-96231b3b80d8
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients. Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror. Putting this in its own
library solves both problems at once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125765 91177308-0d34-0410-b5e6-96231b3b80d8
have their low bits set to zero. This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.
Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively. Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST). The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125470 91177308-0d34-0410-b5e6-96231b3b80d8
anything but the simplest of cases, lower a 256-bit BUILD_VECTOR by
splitting it into 128-bit parts and recombining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125105 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to easily support 256-bit operations that don't have
native 256-bit support. This applies to integer operations, certain
types of shuffles and various othher things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124910 91177308-0d34-0410-b5e6-96231b3b80d8
infrastructure. This makes lowering 256-bit vectors to 128-bit
vectors simple when 256-bit vector support is not available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124868 91177308-0d34-0410-b5e6-96231b3b80d8
matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines
to examine and translate index values. VINSERTF128 comes next. With
these two in place we can begin supporting more AVX operations as
INSERT/EXTRACT can be used as a fallback when 256-bit support is not
available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124797 91177308-0d34-0410-b5e6-96231b3b80d8
Reversing the operands allows us to fold, but doesn't force us to. Also, at
this point the DAG is still being optimized, so the check for hasOneUse is not
very precise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124773 91177308-0d34-0410-b5e6-96231b3b80d8
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR. Eventually this
will get matched to VINSERTF128 if AVX is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124307 91177308-0d34-0410-b5e6-96231b3b80d8
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124292 91177308-0d34-0410-b5e6-96231b3b80d8
These functions not longer assert when passed 0, but simply return false instead.
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123155 91177308-0d34-0410-b5e6-96231b3b80d8
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122955 91177308-0d34-0410-b5e6-96231b3b80d8
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122952 91177308-0d34-0410-b5e6-96231b3b80d8
lowering to use it. Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122277 91177308-0d34-0410-b5e6-96231b3b80d8
the same as setcc. Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS). This is
a step towards finishing off PR5443. In the testcase in that bug we now get:
movq %rdi, %rax
addq %rsi, %rax
sbbq %rcx, %rcx
testb $1, %cl
setne %dl
ret
instead of:
movq %rdi, %rax
addq %rsi, %rax
movl $0, %ecx
adcq $0, %rcx
testq %rcx, %rcx
setne %dl
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122219 91177308-0d34-0410-b5e6-96231b3b80d8
their carry depenedencies with MVT::Flag operands) and use clean and beautiful
EFLAGS dependences instead.
We do this by changing the modelling of SBB/ADC to have EFLAGS input and outputs
(which is what requires the previous scheduler change) and change X86 ISelLowering
to custom lower ADDC and friends down to X86ISD::ADD/ADC/SUB/SBB nodes.
With the previous series of changes, this causes no changes in the testsuite, woo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122213 91177308-0d34-0410-b5e6-96231b3b80d8
consistently by moving it out of lowering into dag combine.
Add some missing patterns for matching away extended versions of setcc_c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122201 91177308-0d34-0410-b5e6-96231b3b80d8
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121358 91177308-0d34-0410-b5e6-96231b3b80d8
result. This allows us to compile:
void *test12(long count) {
return new int[count];
}
into:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
movq $-1, %rdi
cmovnoq %rax, %rdi
jmp __Znam ## TAILCALL
instead of:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
seto %cl
testb %cl, %cl
movq $-1, %rdi
cmoveq %rax, %rdi
jmp __Znam
Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120936 91177308-0d34-0410-b5e6-96231b3b80d8
backend that they were all implemented except umul. This one fell back
to the default implementation that did a hi/lo multiply and compared the
top. Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test. Now we compile:
void *func(long count) {
return new int[count];
}
into:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
seto %cl ## encoding: [0x0f,0x90,0xc1]
testb %cl, %cl ## encoding: [0x84,0xc9]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
instead of:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120935 91177308-0d34-0410-b5e6-96231b3b80d8
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
supports SSE4.2 (nehalem) or SSE4A (barcelona).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120917 91177308-0d34-0410-b5e6-96231b3b80d8
The user (i.e. whoever generated a call to the intrinsic in the first place) is
essentially asking for a particular instruction to be placed in the assembler.
If that instruction won't execute on the target machine, that's their problem
not ours. Two buildbots with processors that don't support SSE3 were barfing
on the apm.ll test in CodeGen/X86 because of this assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120574 91177308-0d34-0410-b5e6-96231b3b80d8
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120501 91177308-0d34-0410-b5e6-96231b3b80d8
nodes to indicate when ha16/lo16 modifiers should be used. This lets
us pass PowerPC/indirectbr.ll.
The one annoying thing about this patch is that the MCSymbolExpr isn't
expressive enough to represent ha16(label1-label2) which we need on
PowerPC. I have a terrible hack in the meantime, but this will have
to be revisited at some point.
Last major conversion item left is global variable references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@119105 91177308-0d34-0410-b5e6-96231b3b80d8
since it is trivial and will be shared between ppc and x86.
This substantially simplifies the X86 backend also.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@119089 91177308-0d34-0410-b5e6-96231b3b80d8
with a SimpleValueType, while an EVT supports equality and
inequality comparisons with SimpleValueType.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118169 91177308-0d34-0410-b5e6-96231b3b80d8
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115243 91177308-0d34-0410-b5e6-96231b3b80d8
ARM cross-compiler on x86, because the MMO size did not match the type size.
This fixes the MMO size and also the size of the stack object to match the
type size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114554 91177308-0d34-0410-b5e6-96231b3b80d8
the predicate to discover the number of sign bits. Enhance X86's target lowering to provide
a useful response to this query.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114473 91177308-0d34-0410-b5e6-96231b3b80d8
(sbbl x, x) sets the registers to 0 or ~0. Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.
This fixes <rdar://problem/8449754>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114460 91177308-0d34-0410-b5e6-96231b3b80d8
"getFixedStack" on the MachinePointerInfo class. While
this isn't the problem I'm setting out to solve, it is the
right way to eliminate PseudoSourceValue, so lets go with it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114406 91177308-0d34-0410-b5e6-96231b3b80d8
of the getLoad methods. This fixes at least one bug where an incorrect
svoffset is passed in (a potential combiner-aa miscompile).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114404 91177308-0d34-0410-b5e6-96231b3b80d8
nodes to emit shuffles and don't do isel mask matching anymore.
- Add the selection of the remaining shuffle opcode (movddup)
- Introduce two new functions to "recognize" where we may get
potential folds and add several comments to them explaining why
they are not yet in the desidered shape.
- Add more patterns to fallback the case where we select
a specific shuffle opcode as if it could fold a load, but it
can't, so remap to a valid instruction.
- Add a couple of FIXMEs to address in the following days once
there's a good solution to the current folding problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113369 91177308-0d34-0410-b5e6-96231b3b80d8
checking each standalone condition and decide whether emit target
specific nodes or remove the condition if it's already matched before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113031 91177308-0d34-0410-b5e6-96231b3b80d8
"Use target specific nodes instead of relying in unpckl and
unpckh pattern fragments during isel time. Also place a
depth limit in getShuffleScalarElt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113020 91177308-0d34-0410-b5e6-96231b3b80d8
mask pattern fragment", which depends on r112934, which introduced some infinite
loop and select failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112998 91177308-0d34-0410-b5e6-96231b3b80d8
- Teach getShuffleScalarElt how to handle more target
specific nodes, so the DAGCombine can make use of it.
- Add another hack to avoid the node update problem
during legalization. More description on the comments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112934 91177308-0d34-0410-b5e6-96231b3b80d8
when the top elements of a vector are undefined. This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined. For example, on:
_Complex float f32(_Complex float A, _Complex float B) {
return A+B;
}
We used to produce (with SSE2, SSE4.1+ uses insertps):
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $16, %xmm2, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm1
movdqa %xmm2, %xmm0
unpcklps %xmm1, %xmm0
ret
We now produce:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movaps %xmm2, %xmm0
unpcklps %xmm3, %xmm0
ret
This implements rdar://8368414
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112378 91177308-0d34-0410-b5e6-96231b3b80d8
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112377 91177308-0d34-0410-b5e6-96231b3b80d8
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112348 91177308-0d34-0410-b5e6-96231b3b80d8
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112034 91177308-0d34-0410-b5e6-96231b3b80d8
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111801 91177308-0d34-0410-b5e6-96231b3b80d8
general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.
The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111691 91177308-0d34-0410-b5e6-96231b3b80d8
- Do not clobber al during variadic calls, this is AMD64 ABI-only feature
- Emit wincall64, where necessary
Patch by Cameron Esfahani!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111289 91177308-0d34-0410-b5e6-96231b3b80d8
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110897 91177308-0d34-0410-b5e6-96231b3b80d8
avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.
Also, use getCopyFromReg instead of getRegister to read a
physical register's value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110835 91177308-0d34-0410-b5e6-96231b3b80d8
Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.
This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110744 91177308-0d34-0410-b5e6-96231b3b80d8
declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109878 91177308-0d34-0410-b5e6-96231b3b80d8
We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.
The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109764 91177308-0d34-0410-b5e6-96231b3b80d8
The size of this object isn't used for anything - technically it is of variable
size.
This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109652 91177308-0d34-0410-b5e6-96231b3b80d8
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109300 91177308-0d34-0410-b5e6-96231b3b80d8
SSE, so we can't return floating point values if this
is disabled. Detect this error for clang.
With SSE1 only, f64 is a problem; it can be done, but
neither llvm-gcc nor clang has ever generated correct
code for it. Since nobody noticed this I think it's
OK to treat it as an error for now.
This also handles SSE-sized vectors of floating point.
8207686, 8204109.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109201 91177308-0d34-0410-b5e6-96231b3b80d8
for lowering without sse2. Add a couple of new testcases.
Fixes a few libgomp tests and latent bugs. Remove a few todos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109078 91177308-0d34-0410-b5e6-96231b3b80d8
1) all registers were spilled as xmm, regardless of actual size
2) win64 abi doesn't do the varargs-size-in-%al thing
Still to look into:
xmm6-15 are marked as clobbered by call instructions on win64 even though they aren't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109035 91177308-0d34-0410-b5e6-96231b3b80d8
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108465 91177308-0d34-0410-b5e6-96231b3b80d8
lowering atomics. This will allow those copies to still be coalesced after
TII::isMoveInstr is removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108385 91177308-0d34-0410-b5e6-96231b3b80d8
address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.
This fixes PR7610.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108327 91177308-0d34-0410-b5e6-96231b3b80d8
- Check getBytesToPopOnReturn().
- Eschew ST0 and ST1 for return values.
- Fix the PIC base register initialization so that it doesn't ever
fail to end up the top of the entry block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108039 91177308-0d34-0410-b5e6-96231b3b80d8
it is popped, even if it is ununsed. A CopyFromReg node is too weak to represent
the required sideeffect, so insert an FpGET_ST0 instruction directly instead.
This will matter when CopyFromReg gets lowered to a generic COPY instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108037 91177308-0d34-0410-b5e6-96231b3b80d8
U utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U test/CodeGen/X86/fast-isel.ll
U test/CodeGen/X86/fast-isel-loads.ll
U include/llvm/Target/TargetLowering.h
U include/llvm/Support/PassNameParser.h
U include/llvm/CodeGen/FunctionLoweringInfo.h
U include/llvm/CodeGen/CallingConvLower.h
U include/llvm/CodeGen/FastISel.h
U include/llvm/CodeGen/SelectionDAGISel.h
U lib/CodeGen/LLVMTargetMachine.cpp
U lib/CodeGen/CallingConvLower.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U lib/CodeGen/SelectionDAG/FastISel.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U lib/CodeGen/SelectionDAG/TargetLowering.cpp
U lib/Target/XCore/XCoreISelLowering.cpp
U lib/Target/XCore/XCoreISelLowering.h
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86ISelLowering.h
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107987 91177308-0d34-0410-b5e6-96231b3b80d8
like all other instructions, even though a segment is not
allowed. This resolves a bunch of gross hacks in the
encoder and makes LEA more consistent with the rest of the
instruction set.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107934 91177308-0d34-0410-b5e6-96231b3b80d8
instance, rather than pointers to all of FunctionLoweringInfo's
members.
This eliminates an NDEBUG ABI sensitivity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107789 91177308-0d34-0410-b5e6-96231b3b80d8
the example in the testcase, we now generate:
_test1: ## @test1
movss 4(%esp), %xmm0
addss 8(%esp), %xmm0
movl 12(%esp), %eax
movss %xmm0, (%eax)
ret
instead of:
_test1: ## @test1
subl $20, %esp
movl 24(%esp), %eax
movq %mm0, (%esp)
movq %mm0, 8(%esp)
movss (%esp), %xmm0
addss 12(%esp), %xmm0
movss %xmm0, (%eax)
addl $20, %esp
ret
v2f32 support did not work reliably because most of the X86
backend didn't know it was legal. It was apparently only added
to support returning source-level v2f32 values in MMX registers
in x86-32 mode. If ABI compatibility is important on this
GCC-extended-vector type for some reason, then the frontend
should generate IR that returns v2i32 instead of v2f32. However,
we generally don't try very hard to be abi compatible on gcc
extended vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107601 91177308-0d34-0410-b5e6-96231b3b80d8
v2f32 as legal in 32-bit mode. It is just as terrible there,
but I just care about x86-64 and noone claims it is valuable
in 64-bit mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107600 91177308-0d34-0410-b5e6-96231b3b80d8
for an "i" constraint should get lowered; PR 6309. While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106893 91177308-0d34-0410-b5e6-96231b3b80d8
address requires a register or secondary load to compute
(most PIC modes). This improves "g" constraint handling. 8015842.
The test from 2007 is attempting to test the fix for PR1761,
but since -relocation-model=static doesn't work on Darwin
x86-64, it was not testing what it was supposed to be testing
and was passing erroneously. Fixed to use Linux x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106779 91177308-0d34-0410-b5e6-96231b3b80d8
will conflict with another live range. The place which creates this scenerio is
the code in X86 that lowers a select instruction by splitting the MBBs. This
eliminates the need to check from the bottom up in an MBB for live pregs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106066 91177308-0d34-0410-b5e6-96231b3b80d8
x86 backend currently doesn't know how to handle them.
This doesn't really fix anything because LegalizeTypes doesn't know how to
handle them either. We do get a better error message, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@105305 91177308-0d34-0410-b5e6-96231b3b80d8
<1xi64> -> i64 to work in MMX registers on hosts where -no-sse
is the default (not mine). The right thing is
to accept this and make i64->f64 conversions go through memory,
but I don't have time right now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103914 91177308-0d34-0410-b5e6-96231b3b80d8
(This worked as of about 6 months ago and I didn't track down
exactly what broke it; I think this fix is appropriate.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103911 91177308-0d34-0410-b5e6-96231b3b80d8
The implementation in LegalizeIntegerTypes to handle this as
sint64->float + appropriate power of 2 is subject to double rounding,
considered incorrect by numerics people. Use this implementation only
when it is safe. This leads to using library calls in some cases
that produced inline code before, but it's correct now.
(EVTToAPFloatSemantics belongs somewhere else, any suggestions?)
Add a correctly rounding (though not particularly fast) conversion
that uses X87 80-bit computations for x86-32.
7885399, 5901940. This shows up in gcc.c-torture/execute/ieee/rbug.c
in the gcc testsuite on some platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103883 91177308-0d34-0410-b5e6-96231b3b80d8
the variable actually tracks.
N.B., several back-ends are using "HasCalls" as being synonymous for something
that adjusts the stack. This isn't 100% correct and should be looked into.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103802 91177308-0d34-0410-b5e6-96231b3b80d8
used more than once. If ISel had put a kill flag on one of them,
it's not valid to transfer the kill flag to each new instance.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103799 91177308-0d34-0410-b5e6-96231b3b80d8
Move EmitTargetCodeForMemcpy, EmitTargetCodeForMemset, and
EmitTargetCodeForMemmove out of TargetLowering and into
SelectionDAGInfo to exercise this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103481 91177308-0d34-0410-b5e6-96231b3b80d8