Commit Graph

3606 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
0e5e821a69 Remove a test that was only testing for physreg joining.
This is the same as the other tests: Clever tricks are required to make
the arguments and return value line up in a single-instruction function.
It rarely happens in real life.

We have plenty other examples of this behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157030 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-18 00:07:14 +00:00
Jakob Stoklund Olesen
ed18a3e6b2 Remove -join-physregs from the test suite.
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.

The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.

The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157027 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-17 23:44:19 +00:00
Evan Cheng
6100366c2f Avoid creating a cycle when folding load / op with flag / store. PR11451474. rdar://11451474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156896 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-16 01:54:27 +00:00
Jakob Stoklund Olesen
4d10829e12 Fix PR12821.
RAFast must add an <imp-def> operand when it is rewriting a sub-register
def that isn't a read-modify-write.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156777 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-14 21:10:25 +00:00
Dan Gohman
a6063c6e29 Rename @llvm.debugger to @llvm.debugtrap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156774 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-14 18:58:10 +00:00
Hans Wennborg
12447575dc Fix test/CodeGen/X86/tls-pie.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156612 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-11 10:19:54 +00:00
Hans Wennborg
228756c744 Implement initial-exec TLS model for 32-bit PIC x86
This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong
code here (see the update to test/CodeGen/X86/tls-pie.ll).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156611 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-11 10:11:01 +00:00
Dan Gohman
d4347e1af9 Define a new intrinsic, @llvm.debugger. It will be similar to __builtin_trap(),
but it generates int3 on x86 instead of ud2.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156593 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-11 00:19:32 +00:00
Nadav Rotem
b210651654 AVX2: Add an additional broadcast idiom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156540 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-10 12:39:13 +00:00
Nadav Rotem
5fc2187a02 Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program.
Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users.

Fix PR11900.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156539 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-10 12:22:05 +00:00
Nuno Lopes
30759542aa change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept.
This commit only adds the parameter. Code taking advantage of it will follow.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156473 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-09 15:52:43 +00:00
Craig Topper
189bce48c7 Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156375 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-08 06:58:15 +00:00
Chad Rosier
42726835e3 Fix a regression from r147481. This combine should only happen if there is a
single use.
rdar://11360370


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156316 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-07 18:47:44 +00:00
Manman Ren
ed57984483 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156312 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-07 18:06:23 +00:00
Craig Topper
5f9cccc509 Add SSE4A MOVNTSS/MOVNTSD instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156281 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-07 05:36:19 +00:00
Benjamin Kramer
77c4ef8a47 Switch the select to branch transformation on by default.
The primitive conservative heuristic seems to give a slight overall
improvement while not regressing stuff. Make it available to wider
testing. If you notice any speed regressions (or significant code
size regressions) let me know!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156258 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-06 14:25:16 +00:00
Benjamin Kramer
59957500f9 CodeGenPrepare: Add a transform to turn selects into branches in some cases.
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.

It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.


Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.

Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156234 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-05 12:49:22 +00:00
Craig Topper
f3640d7ec1 Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156156 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-04 04:44:49 +00:00
Craig Topper
6b28d356c5 Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156059 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-03 07:12:59 +00:00
Evan Cheng
d99d68bcee Fix two-address pass's aggressive instruction commuting heuristics. It's meant
to catch cases like:
 %reg1024<def> = MOV r1
 %reg1025<def> = MOV r0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:

 %reg1024<def> = MOV r0
 %reg1025<def> = MOV 0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

That did no benefit but rather ensure the last MOV would not be coalesced.

rdar://11355268


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156048 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-03 01:45:13 +00:00
Manman Ren
e2849851b2 Revert r155853
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155992 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-02 15:24:32 +00:00
Craig Topper
a9a568a79d Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155982 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-02 08:03:44 +00:00
Bill Wendling
95dd442041 Strip the pointer casts off of allocas so that the selection DAG can find them.
PR10799


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155954 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-01 22:50:45 +00:00
Manman Ren
769ea2f93f X86: optimization for max-like struct
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0

FROM
movl    %edi, %ecx
subl    %esi, %ecx
cmpl    %edi, %esi
movl    $0, %eax
cmovll  %ecx, %eax
TO
xorl    %eax, %eax
subl    %esi, %edi
cmovll  %eax, %edi
movl    %edi, %eax

rdar: 10734411


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155919 91177308-0d34-0410-b5e6-96231b3b80d8
2012-05-01 17:16:15 +00:00
Manman Ren
16a76519a5 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155853 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-30 22:51:25 +00:00
Manman Ren
1701105022 test/CodeGen/X86/select.ll: remove spaces
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155840 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-30 18:54:27 +00:00
Derek Schuff
ddc693bd22 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

(this time, actually commit what was reviewed!)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155825 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-30 16:57:15 +00:00
Andrew Trick
2674a4acdb Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.

Fixes rdar://11314175: BuildSchedUnits assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155749 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-28 01:03:23 +00:00
Derek Schuff
f3db6b855e Revert r155745
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155746 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 23:37:41 +00:00
Derek Schuff
9dc28b0722 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155745 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 23:27:17 +00:00
Andrew Trick
0e47cfd5b6 Temporarily revert r155668: Fix the SD scheduler to avoid gluing.
This definitely caused regression with ARM -mno-thumb.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155743 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 22:55:59 +00:00
Chad Rosier
a73b6fc511 Add x86-specific DAG combine to simplify:
x == -y --> x+y == 0
 x != -y --> x+y != 0

On x86, the generated code goes from
   negl    %esi
   cmpl    %esi, %edi
   je    .LBB0_2
to
   addl    %esi, %edi
   je    .L4

This case is correctly handled for ARM with "cmn".

Patch by Manman Ren.
rdar://11245199
PR12545


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155739 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 22:33:25 +00:00
Benjamin Kramer
17c836c4b5 X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155704 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 12:07:43 +00:00
Craig Topper
76c5897eae Add mcpu to tests to prevent them from using AVX instructions on Sandy Bridge after r155618.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155696 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-27 07:11:58 +00:00
Andrew Trick
aec9240be2 Fix the SD scheduler to avoid gluing the same node twice.
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.

Fixes rdar://11314175: BuildSchedUnits assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155668 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-26 21:48:25 +00:00
Jakob Stoklund Olesen
6962106496 Try to fix llvm-arm-linux builder with -mcpu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155589 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-25 21:22:33 +00:00
Preston Gurd
01c0dd1be3 Trivial change to make the test use -mcpu=generic so as to avoid
a failure if run on an Intel Atom with post RA instruction scheduling.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155587 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-25 21:04:54 +00:00
Nadav Rotem
2003e03045 Fix the testcase. We do expect two vblendw on XMMs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155477 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-24 19:57:38 +00:00
Nadav Rotem
34a13bb412 Add a testcase for 155440
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155475 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-24 19:45:28 +00:00
Nadav Rotem
d1a79136e3 AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions
using the pattern (vbroadcast (i32load src)). In some cases, after we generate
this pattern new users are added to the load node, which prevent the selection
of the blend pattern. This commit provides fallback patterns which perform
in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155437 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-24 11:07:03 +00:00
Nadav Rotem
a35407705d Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155397 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-23 21:53:37 +00:00
Preston Gurd
6a8c7bf8e7 This patch fixes a problem which arose when using the Post-RA scheduler
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.

This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.

This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.

The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().  

It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.

It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.

Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.

Patch by Andy Zhang!

Thanks to Jakob and Anton for their reviews.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155395 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-23 21:39:35 +00:00
Elena Demikhovsky
dd9047815c cleaned line endings in the newly added test file
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155315 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-22 13:22:48 +00:00
Elena Demikhovsky
1da5867236 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155309 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-22 09:39:03 +00:00
Nadav Rotem
db3461662e Teach getVectorTypeBreakdown about promotion of vectors in addition to widening of vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155296 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-21 20:08:32 +00:00
Jakob Stoklund Olesen
0b35c35efc Fix PR12599.
The X86 target is editing the selection DAG while isel is selecting
nodes following a topological ordering. When the DAG hacking triggers
CSE, nodes can be deleted and bad things happen.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155257 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-20 23:36:09 +00:00
Joel Jones
c8969fd291 Test for the the problem with xors being changed into ands
when the set bits aren't the same for both args of the xor.
This transformation is in the function TargetLowering::SimplifyDemandedBits
in the file lib/CodeGen/SelectionDAG/TargetLowering.cpp.

I have tested this test using a previous version of llc which the defect and 
the a version of llc which does not. I got the expected fail and pass, 
respectively.

This test goes with rdar://11195364 and the check in with the fix: svn r154955


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155156 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-19 20:54:44 +00:00
Joe Groff
d15c581100 Move win32 SimplifyLibcall test under Transforms
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154967 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-18 00:07:45 +00:00
Joe Groff
d5bda5ec66 fix pr12559: mark unavailable win32 math libcalls
also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154960 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-17 23:05:54 +00:00
Benjamin Kramer
93751c8c99 Force cmov on test so block placement doesn't shuffle the code around.
This made the test fail with -mcpu=generic (when building on a non-x86 host).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154926 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-17 13:55:23 +00:00