Commit Graph

4694 Commits

Author SHA1 Message Date
Evan Cheng
bbc726d624 Fix a minor bug in two-address pass. It was missing a commute opportunity.
regB = move RCX
regA = op regB, regC
RAX  = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
       movl    %edi, %eax
       shlq    $32, %rcx
       leaq    (%rcx,%rax), %rax
=>
       movl    %edi, %eax
       shlq    $32, %rcx
       orq     %rcx, %rax
rdar://8762995


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121793 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-14 21:34:53 +00:00
Evan Cheng
0c1aec1891 bfi A, (and B, C1), C2) -> bfi A, B, C2 iff C1 & C2 == C1. rdar://8458663
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121746 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-14 03:22:07 +00:00
Jason W Kim
db934e7474 fix fixme case typo :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121743 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-14 01:42:38 +00:00
Jason W Kim
3fa4c1dc95 First cut of ARM/MC/ELF PIC relocations.
Test has fixme, to move to .s -> .o test when AsmParser works better.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121732 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-13 23:16:07 +00:00
Bob Wilson
4711d5cda3 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121730 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-13 23:02:37 +00:00
Chris Lattner
bfc9749f0a rename test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121697 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-13 08:39:40 +00:00
Chris Lattner
de1c3605a6 Add a couple dag combines to transform mulhi/mullo into a wider multiply
when the wider type is legal.  This allows us to compile:

define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
	%div = udiv i16 %x, 33
	ret i16 %div
}

into:

test1:                                  # @test1
	movzwl	4(%esp), %eax
	imull	$63551, %eax, %eax      # imm = 0xF83F
	shrl	$21, %eax
	ret

instead of:

test1:                                  # @test1
        movw    $-1985, %ax             # imm = 0xFFFFFFFFFFFFF83F
        mulw    4(%esp)
        andl    $65504, %edx            # imm = 0xFFE0
        movl    %edx, %eax
        shrl    $5, %eax
        ret

Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320

We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121696 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-13 08:39:01 +00:00
Wesley Peck
638f7a9a5e Missed some ADDI <-> ADDIK conversions in 121649.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121652 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-12 22:53:14 +00:00
Evan Cheng
a9688c4b57 (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121606 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-11 04:11:38 +00:00
Bob Wilson
746fa17d59 Add float patterns for Neon vld1-lane/dup and vst1-lane operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121583 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-10 22:13:32 +00:00
Bob Wilson
a92bac64cb Fix some invalid alignments for Neon vld-dup and vld/st-lane instructions.
Alignments smaller than the total size of the memory being loaded or stored,
unless the alignment is 8 bytes, are not allowed.  Add tests for this, too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121506 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-10 19:37:42 +00:00
Nate Begeman
2ea8ee7c76 Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121439 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-10 00:26:57 +00:00
Jim Grosbach
c6f9261711 ARM stm/ldm instructions require more than one register in the register list.
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121391 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-09 18:31:13 +00:00
Bruno Cardoso Lopes
908b6ddad6 Add ROTR and ROTRV mips32 instructions. Patch by Akira Hatanaka
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121377 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-09 17:32:30 +00:00
Eric Christopher
d8c0536651 Rewrite the darwin tlv support to use a chain and return to copying
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121358 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-09 06:25:53 +00:00
Eric Christopher
8bce7cc3bf Remove extraneous copy from DAG conversion for darwin tls. This was
popping up at O0 when it wasn't folded and the fast allocator would
complain.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121330 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-09 00:27:58 +00:00
Eric Christopher
7b5d456d5c Move this test to tlv* to make it easier to notice versus linux tls
support.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121316 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-08 23:33:23 +00:00
Jason W Kim
a0871e7927 ARM/MC/ELF TPsoft is now a proper pseudo inst.
Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM
as well as ELF/OBJ (including fixup)

Also added support for ELF::R_ARM_TLS_IE32



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121312 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-08 23:14:44 +00:00
Evan Cheng
06d65f5156 Fix a bad prologue / epilogue codegen bug where the compiler would emit illegal
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}

vpush and vpop do not allow gaps in the register list.
rdar://8728956


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121197 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-07 23:08:38 +00:00
Bruno Cardoso Lopes
ab8d53a56a Match a pattern generated by a dag combiner opt where:
(select (load (load tga0)) (load tga1)) => (load (select (load tga0) tga1))

Thanks to Akira for pointing that.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121163 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-07 19:00:20 +00:00
Devang Patel
afeaae7a94 If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121059 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 22:39:26 +00:00
Wesley Peck
dc80380de8 Fixed reversed operands for IDIV and CMP instructions in MBlaze backend.
Use BRAD instead of BRD for indirect branches in MBlaze backend.

patch contributed by Jack Whitham!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121044 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 22:06:49 +00:00
Wesley Peck
1e8cdd599c Fix a 16-bit immediate value detection bug in the MBlaze delay slot filler.
Address more hazards in the MBlaze delay slot filler.

patch contributed by Jack Whitham!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121037 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 21:11:01 +00:00
Rafael Espindola
6d86492f5e Revert previous two patches while I try to find out how to make both
linux and darwin assemblers happy :-(

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121004 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 15:35:15 +00:00
Rafael Espindola
7c00391248 Update test for the extra =.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121001 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 15:05:36 +00:00
Che-Liang Chiou
f964486771 ptx: add shift instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120982 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-06 04:00:03 +00:00
Evan Cheng
48575f6ea7 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120960 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 22:04:16 +00:00
Chris Lattner
9637d5b22e Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags
result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120936 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 07:49:54 +00:00
Chris Lattner
b20e0b1fdd it turns out that when ".with.overflow" intrinsics were added to the X86
backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120935 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 07:30:36 +00:00
Chris Lattner
777dd07394 fix the rest of the linux miscompares :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120933 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 02:08:07 +00:00
Chris Lattner
96908b17ae generalize the previous check to handle -1 on either side of the
select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120932 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 02:00:51 +00:00
Chris Lattner
c8c20d1486 relax this to handle linux defaulting to -static.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120930 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 01:31:13 +00:00
Chris Lattner
a2b5600e61 Improve an integer select optimization in two ways:
1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120929 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 01:23:24 +00:00
Chris Lattner
bced6a1b8f merge some tests into select.ll and make them more specific.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120928 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 01:13:58 +00:00
Chris Lattner
bbdabf411b rename test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120927 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 01:02:23 +00:00
Chris Lattner
63d7c17ff1 remove two tests that aren't really testing anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120926 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 01:02:13 +00:00
Benjamin Kramer
1292c22645 Add patterns for the x86 popcnt instruction.
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120917 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-04 20:32:23 +00:00
Bob Wilson
c24130bade The Thumb tADDrSPi instruction is not valid when the destination is SP.
Check for that and try narrowing it to tADDspi instead.  Radar 8724703.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120892 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-04 04:40:19 +00:00
Jim Grosbach
41ad0c4c73 When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's the
32-bit wide version by adding the .w suffix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120838 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-03 20:33:01 +00:00
Devang Patel
3fda44f276 Hide tests, that check .loc, .file in output assembly, from darwin9 buildbot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120750 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-02 23:29:58 +00:00
Devang Patel
ee4854faf3 Use set directive for StartMinusEndExpr.
This is a fix for llvm-gcc-i386-darwin9 buildbot failure.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120742 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-02 21:32:30 +00:00
Evan Cheng
fabdafbacb Fix test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120730 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-02 20:17:34 +00:00
Evan Cheng
1bf891ae6e Fix and re-enable tail call optimization of expanded libcalls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120622 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-01 22:59:46 +00:00
Owen Anderson
9d63d90de5 Add correct encodings for STRD and LDRD, including fixup support. Additionally, update these to unified syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120589 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-01 19:18:46 +00:00
Evan Cheng
28cd48fffb Speculatively disable x86 portion of r120501 to appease the x86_64 buildbot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120549 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-01 03:27:20 +00:00
Jason W Kim
85fed5e0c5 ARM/MC/ELF relocation "hello world" for movw/movt.
Lifted adjustFixupValue() from Darwin for sharing w ELF.
Test added
TODO:
  refactor ELFObjectWriter::RecordRelocation more.
  Possibly share more code with Darwin?
  Lots more relocations...



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120534 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-01 02:40:06 +00:00
Evan Cheng
3d2125c9db Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120501 91177308-0d34-0410-b5e6-96231b3b80d8
2010-11-30 23:55:39 +00:00
Che-Liang Chiou
21d8b9bcad ptx: add command-line options for gpu target and ptx version
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120423 91177308-0d34-0410-b5e6-96231b3b80d8
2010-11-30 10:14:14 +00:00
Eric Christopher
c459d06ae6 Not all platforms use _<func>. Duh.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120418 91177308-0d34-0410-b5e6-96231b3b80d8
2010-11-30 09:23:54 +00:00
Eric Christopher
228232b282 Rewrite mwait and monitor support and custom lower arguments.
Fixes PR8573.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120404 91177308-0d34-0410-b5e6-96231b3b80d8
2010-11-30 07:20:12 +00:00