Though I have done extensive testing, it is possible that this will break
things in configs I can't test. Please let me know if this causes a problem
and I'll fix it ASAP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23505 91177308-0d34-0410-b5e6-96231b3b80d8
2. Propagate feature "string" to all targets.
3. Implement use of SubtargetFeatures in PowerPCTargetSubtarget.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23192 91177308-0d34-0410-b5e6-96231b3b80d8
putting it into the constant pool. This allows the isel machinery to
create constants that it will end up deciding are not needed, without them
ending up in the resultant function constant pool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23081 91177308-0d34-0410-b5e6-96231b3b80d8
Give a whole bunch of other stuff variable operands, particularly FP. The
FP stackifier is playing fast and loose with operands here, so we have to
mark them all as variable. This will have to be fixed before we can dag->dag
the X86 backend. The solution is for the pre-stackifier and post-stackifier
instructions to all be disjoint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22890 91177308-0d34-0410-b5e6-96231b3b80d8
1 byte loads and other operations. This is bad for store-forwarding on
common CPUs. We now do this:
fnstcw WORD PTR [%ESP]
mov %AX, WORD PTR [%ESP]
instead of:
fnstcw WORD PTR [%ESP]
mov %AL, BYTE PTR [%ESP + 1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22559 91177308-0d34-0410-b5e6-96231b3b80d8
FP-to-int-in-memory: this exposes the load from the stored slot to the
selection dag, allowing it to be folded into other operaions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22556 91177308-0d34-0410-b5e6-96231b3b80d8
that the X86 does not support to the legalizer. This allows it to be better
optimized, etc, and will help with SSE support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22551 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first incremental patch to implement this feature. It adds no
functionality to LLVM but setup up the information needed from targets in
order to implement the optimization correctly. Each target needs to specify
the maximum number of store operations for conversion of the llvm.memset,
llvm.memcpy, and llvm.memmove intrinsics into a sequence of store operations.
The limit needs to be chosen at the threshold of performance for such an
optimization (generally smallish). The target also needs to specify whether
the target can support unaligned stores for multi-byte store operations.
This helps ensure the optimization doesn't generate code that will trap on
an alignment errors.
More patches to follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22468 91177308-0d34-0410-b5e6-96231b3b80d8
Implement the X86 Subtarget.
This consolidates the checks for target triple, and setting options based
on target triple into one place. This allows us to convert the asm printer
and isel over from being littered with "forDarwin", "forCygwin", etc. into
just having the appropriate flags for each subtarget feature controlling
the code for that feature.
This patch also implements indirect external and weak references in the
X86 pattern isel, for darwin. Next up is to convert over the asm printers
to use this new interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22389 91177308-0d34-0410-b5e6-96231b3b80d8
This is the last MVTSDNode.
This allows us to eliminate a bunch of special case code for handling
MVTSDNodes.
Also, remove some uses of dyn_cast that should really be cast (which is
cheaper in a release build).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22368 91177308-0d34-0410-b5e6-96231b3b80d8
XMM registers. There are many known deficiencies and fixmes, which will be
addressed ASAP. The major benefit of this work is that it will allow the
LLVM register allocator to allocate FP registers across basic blocks.
The x86 backend will still default to x87 style FP. To enable this work,
you must pass -enable-sse-scalar-fp and either -sse2 or -sse3 to llc.
An example before and after would be for:
double foo(double *P) { double Sum = 0; int i; for (i = 0; i < 1000; ++i)
Sum += P[i]; return Sum; }
The inner loop looks like the following:
x87:
.LBB_foo_1: # no_exit
fldl (%esp)
faddl (%eax,%ecx,8)
fstpl (%esp)
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
SSE2:
addsd (%eax,%ecx,8), %xmm0
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22340 91177308-0d34-0410-b5e6-96231b3b80d8
1. Pass Value*'s into lowering methods so that the proper pointers can be
added to load/stores from the valist
2. Intrinsics that return void should only return a token chain, not a token
chain/retval pair.
3. Rename LowerVAArgNext -> LowerVAArg, because VANext is long gone.
4. Now that we have Value*'s available in the lowering methods, pass them
into any load/stores from the valist that are emitted
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22339 91177308-0d34-0410-b5e6-96231b3b80d8
working. The instruction selector changes will hopefully be coming later
this week once they are debugged. This is necessary to support the darwin
x86 FP model, and is recommended by intel as the replacement for x87. As
a bonus, the register allocator knows how to deal with these registers
across basic blocks, unliky the FP stackifier. This leads to significantly
better codegen in several cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22300 91177308-0d34-0410-b5e6-96231b3b80d8
currently use: llc t.bc --filetype=obj
This will produce a t.o file which is dumpable with readelf. Currently
the file produced is empty, but the scaffolding to do more is now in place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22292 91177308-0d34-0410-b5e6-96231b3b80d8
* Change assert() to std::cerr printout, as it will not appear in opt builds
* Add comments to clarify what #ifdef/#else/#endif match what condition(s)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22154 91177308-0d34-0410-b5e6-96231b3b80d8
adjustment. If so, we merge the adjustment into the existing one. This
allows us to generate:
caller2:
sub %ESP, 12
mov DWORD PTR [%ESP], 0
mov %EAX, 1234567890
mov %EDX, 0
call func2
add %ESP, 8
ret 4
intead of:
caller2:
sub %ESP, 12
mov DWORD PTR [%ESP], 0
mov %EAX, 1234567890
mov %EDX, 0
call func2
sub %ESP, 4
add %ESP, 12
ret 4
for X86/fast-cc-merge-stack-adj.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22038 91177308-0d34-0410-b5e6-96231b3b80d8
to do ugly hackery to avoid emitting code like this:
call foo
mov vreg, EAX
adjcallstackup ...
If foo is a fastcc call and if vreg gets spilled, we might end up with this:
call foo
mov [ESP+offset], EAX ;; Offset doesn't consider the 12!
sub ESP, 12
Which is bad. The previous hacky code to deal with this was A) gross B) not
good enough. In particular, it could miss cases and emit the bad code above.
Now we always emit this:
call foo
adjcallstackup ...
mov vreg, EAX
directly.
This makes fastcc with callees poping the stack work much better. Next
stop (finally!) really is tail calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22021 91177308-0d34-0410-b5e6-96231b3b80d8
This should fix some missing symbols problems on BSD and improve performance
of programs that use that operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22012 91177308-0d34-0410-b5e6-96231b3b80d8