578 Commits

Author SHA1 Message Date
Owen Anderson
7670601313 Fix bugs in the pseuo-ization of ADCS/SBCS pointed out by Jim, as well as doing the expansion earlier (using a custom inserter) to allow for the chance of predicating these instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128940 91177308-0d34-0410-b5e6-96231b3b80d8
2011-04-05 21:48:57 +00:00
Bill Wendling
f05b1dcf87 Revamp the SjLj "dispatch setup" intrinsic.
It needed to be moved closer to the setjmp statement, because the code directly
after the setjmp needs to know about values that are on the stack. Also, the
'bitcast' of the function context was causing a dead load. This wouldn't be too
horrible, except that at -O0 it wasn't optimized out, and because it wasn't
using the correct base pointer (if there is a VLA), it would try to access a
value from a garbage address.
<rdar://problem/9130540>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128873 91177308-0d34-0410-b5e6-96231b3b80d8
2011-04-05 01:37:43 +00:00
Cameron Zwarich
4071a71112 Do some peephole optimizations to remove pointless VMOVs from Neon to integer
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128759 91177308-0d34-0410-b5e6-96231b3b80d8
2011-04-02 02:40:43 +00:00
Evan Cheng
8e23e815ad Issue libcalls __udivmod*i4 / __divmod*i4 for div / rem pairs.
rdar://8911343


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128696 91177308-0d34-0410-b5e6-96231b3b80d8
2011-04-01 00:42:02 +00:00
Evan Cheng
463d358f1d Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128665 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-31 19:38:48 +00:00
Evan Cheng
ee2e0e347e Don't try to create zero-sized stack objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128586 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-30 23:44:13 +00:00
Cameron Zwarich
c0e6d780cd Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128584 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-30 23:01:21 +00:00
Evan Cheng
92e3916c3b Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128502 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-29 23:06:19 +00:00
Cameron Zwarich
3007d3331b Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128492 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-29 21:41:55 +00:00
Evan Cheng
78fe9ababe Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128444 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-29 01:56:09 +00:00
Eric Christopher
29aeed1bf8 Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128320 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-26 01:21:03 +00:00
Evan Cheng
485fafc840 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127981 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-21 01:19:09 +00:00
Daniel Dunbar
7a90e04fc7 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127954 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-19 21:47:14 +00:00
Evan Cheng
ae16d6b972 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127953 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-19 17:17:39 +00:00
Bill Wendling
0d4c9d94f6 The VTBL (and VTBX) instructions are rather permissive concerning the masks they
accept. If a value in the mask is out of range, it uses the value 0, for VTBL,
or leaves the value unchanged, for VTBX.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127700 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-15 21:15:20 +00:00
Bill Wendling
a24cb40be2 Some minor cleanups based on feedback.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127694 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-15 20:47:26 +00:00
Bill Wendling
69a05a7b92 Generate a VTBL instruction instead of a series of loads and stores when we
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:

_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}

The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127630 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-14 23:02:38 +00:00
Evan Cheng
21a6179c9d Indentation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127595 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-14 18:02:30 +00:00
Bob Wilson
79f56c9618 Fix a compiler crash where a Glue value had multiple uses. Radar 9049552.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127198 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-08 01:17:20 +00:00
Bob Wilson
1b772f9962 Fix comment typos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127197 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-08 01:17:16 +00:00
Cameron Zwarich
be2119e8e2 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127175 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-07 21:56:36 +00:00
Bob Wilson
4faa0e1952 Remove unused conditional negate operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127090 91177308-0d34-0410-b5e6-96231b3b80d8
2011-03-05 16:54:31 +00:00
Evan Cheng
c24ab5c654 Fix a typo which cause dag combine crash. rdar://9059537.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126661 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-28 18:45:27 +00:00
Stuart Hastings
f222e595c0 Support for byval parameters on ARM. Will be enabled by a forthcoming
patch to the front-end.  Radar 7662569.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126655 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-28 17:17:53 +00:00
Evan Cheng
e573fb3255 More fcopysign correctness and performance fix.
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.

The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
        vmov.i32        d1, #0x80000000
        vbsl    d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126295 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-23 02:24:55 +00:00
Devang Patel
68e6beeccc Revert r124611 - "Keep track of incoming argument's location while emitting LiveIns."
In other words, do not keep track of argument's location.  The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working. 
 - The debugger needs to be aware of prolog_end attribute attached with line table entries.
 - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126155 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-21 23:21:26 +00:00
Nate Begeman
7973f350b7 Implement sdiv & udiv for <4 x i16> and <8 x i8> NEON vector types.
This avoids moving each element to the integer register file and calling __divsi3 etc. on it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125402 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-11 20:53:29 +00:00
Evan Cheng
c143dd4f63 Fix buggy fcopysign lowering.
This
define float @foo(float %x, float %y) nounwind readnone {
entry:
  %0 = tail call float @copysignf(float %x, float %y) nounwind readnone
  ret float %0
}

Was compiled to:
    vmov     s0, r1
    bic      r0, r0, #-2147483648
    vmov     s1, r0
    vcmpe.f32    s0, #0
    vmrs         apsr_nzcv, fpscr
    it           lt
    vneglt.f32   s1, s1
    vmov         r0, s1
    bx           lr

This fails to copy the sign of -0.0f because it's lost during the float to int
conversion. Also, it's sub-optimal when the inputs are in GPR registers.

Now it uses integer and + or operations when it's profitable. And it's correct!
    lsrs    r1, r1, #31
    bfi     r0, r1, #31, #1
    bx      lr
rdar://8984306


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125357 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-11 02:28:55 +00:00
Evan Cheng
aa26102db4 Fix an obvious typo which caused an isel assertion. rdar://8964854.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125023 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-07 18:50:47 +00:00
Bob Wilson
1c3ef90cab Add codegen support for using post-increment NEON load/store instructions.
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125014 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-07 17:43:21 +00:00
Evan Cheng
31959b19a7 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124708 91177308-0d34-0410-b5e6-96231b3b80d8
2011-02-02 01:06:55 +00:00
Devang Patel
e9a7ea6865 Keep track of incoming argument's location while emitting LiveIns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124611 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-31 21:38:14 +00:00
Anton Korobeynikov
5899a60d2f Provide correct registers for EH stuff on ARM
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124151 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-24 22:38:45 +00:00
Evan Cheng
53519f015e Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123991 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-21 18:55:51 +00:00
Evan Cheng
9fe2009956 Sorry, several patches in one.
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.

Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.

ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
   which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
   a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
   to re-materialize the instruction, allow machine LICM to hoist the set of
   instructions out of the loop and make it possible to CSE them. It's a bit
   hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.

With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123905 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-20 08:34:58 +00:00
Andrew Trick
32cec0a756 For ARM subtargets with useNEONForSinglePrecisionFP, double count uses
of the floating point types less than 64-bits. It's somewhat of a temporary
hack but forces more accurate modeling of register pressure and results
in fewer spills.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123811 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-19 02:35:27 +00:00
Andrew Trick
7fa75ce11d whitespace
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123810 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-19 02:26:13 +00:00
Evan Cheng
fc8475bde9 Don't forget to emit the load from indirect symbol when using movw + movt to materialize GA indirect symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123809 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-19 02:16:49 +00:00
Evan Cheng
5de5d4b6d0 Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.
movw    r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
        movt    r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
        add     r0, pc, r0

It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123619 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-17 08:03:18 +00:00
Eric Christopher
41262da6cc Fix 80-cols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123494 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-14 23:50:53 +00:00
Anton Korobeynikov
16c29b5f28 Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123170 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-10 12:39:04 +00:00
Jakob Stoklund Olesen
c9df025e33 Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic.
These functions not longer assert when passed 0, but simply return false instead.

No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123155 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-10 02:58:51 +00:00
Evan Cheng
55d4200336 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123048 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-08 01:24:27 +00:00
Bob Wilson
70f85730b1 Add an explanatory message for an assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123042 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-07 23:40:46 +00:00
Matt Beaumont-Gay
697970286a Eliminate variable only used in debug builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123040 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-07 22:34:58 +00:00
Bob Wilson
11a1dfffc8 Lower some BUILD_VECTORS using VEXT+shuffle.
Patch by Tim Northover.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123035 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-07 21:37:30 +00:00
Bob Wilson
5e8b833707 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122995 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-07 04:59:04 +00:00
Evan Cheng
0521928ae7 Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122952 91177308-0d34-0410-b5e6-96231b3b80d8
2011-01-06 06:52:41 +00:00
Bob Wilson
3c904694fc Radar 8803471: Fix expansion of ARM BCCi64 pseudo instructions.
If the basic block containing the BCCi64 (or BCCZi64) instruction ends with
an unconditional branch, that branch needs to be deleted before appending
the expansion of the BCCi64 to the end of the block.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122521 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-23 22:45:49 +00:00
Bob Wilson
316009054e Add ARM-specific DAG combining to cast i64 vector element load/stores to f64.
Type legalization splits up i64 values into pairs of i32 values, which leads
to poor quality code when inserting or extracting i64 vector elements.
If the vector element is loaded or stored, it can be treated as an f64 value
and loaded or stored directly from a VPR register.  Use the pre-legalization
DAG combiner to cast those vector elements to f64 types so that the type
legalizer won't mess them up.  Radar 8755338.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122319 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-21 06:43:19 +00:00