Commit Graph

15967 Commits

Author SHA1 Message Date
Jim Grosbach
1835547ec1 ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VUZP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11222366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154511 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 17:40:18 +00:00
Jim Grosbach
6073b30b05 ARM 'vzip.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VZIP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11221911

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154505 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 16:53:25 +00:00
Evan Cheng
14b4c03580 Add more fused mul+add/sub patterns. rdar://10139676
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154484 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 06:59:47 +00:00
Nadav Rotem
e611378a6e Reapply 154396 after fixing a test.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154483 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 06:40:27 +00:00
Evan Cheng
bee78fe5fc Clean up ARM fused multiply + add/sub support some more: rename some isel
predicates.
Also remove NEON2 since it's not really useful and it is confusing. If
NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it
really mean?

rdar://10139676


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154480 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 05:33:07 +00:00
Evan Cheng
92c904539a Match (fneg (fma) to vfnma. rdar://10139676
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154469 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 01:21:25 +00:00
Charles Davis
0d82fe77f2 Add retw and lretw instructions. Also, fix Intel syntax parsing for all
ret instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154468 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 01:10:53 +00:00
Evan Cheng
a0908d0a44 Merge fma.ll into fusedMAC.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154466 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 01:03:11 +00:00
Kevin Enderby
a69da35c12 Fix ARM disassembly of VLD instructions with writebacks.  And add test a case
for all opcodes handed by DecodeVLDInstruction() in ARMDisassembler.cpp .


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154459 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 00:25:40 +00:00
Jim Grosbach
a5378ebe78 ARM add missing Thumb1 two-operand aliases for shift-by-immediate.
rdar://11222742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154457 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 00:15:16 +00:00
Evan Cheng
82509e5c62 Fix a number of problems with ARM fused multiply add/subtract instructions.
1. The new instruction itinerary entries are not properly described.
2. The asm parser can't handle vfms and vfnms.
3. There were no assembler, disassembler test cases.
4. HasNEON2 has the wrong assembler predicate.
rdar://10139676


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154456 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 00:13:00 +00:00
Jakob Stoklund Olesen
89cdaf46ec Fix test to be register assignment invariant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154453 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-11 00:00:24 +00:00
Owen Anderson
06886aaaeb Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point.
Zap a testcase that this allows us to completely fold away.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154447 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 22:46:53 +00:00
Kostya Serebryany
cff60c1409 [tsan] two more compile-time optimizations:
- don't isntrument reads from constant globals.
Saves ~1.5% of instrumented instructions on CPU2006
(counting static instructions, not their execution).
- don't insrument reads from vtable (which is a global constant too).
Saves ~5%.

I did not measure the run-time impact of this,
but it is certainly non-negative.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154444 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 22:29:17 +00:00
Evan Cheng
3aef2ff514 Handle llvm.fma.* intrinsics. rdar://10914096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154439 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 21:40:28 +00:00
Duncan Sands
507bb7a42f Add a comment noting that the fdiv -> fmul conversion won't generate
multiplication by a denormal, and some tests checking that.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154431 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 20:35:27 +00:00
Eric Christopher
a139051654 Temporarily revert this patch to see if it brings the buildbots back.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154425 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 19:33:16 +00:00
Kostya Serebryany
2076af0184 [tsan] compile-time instrumentation: do not instrument a read if
a write to the same temp follows in the same BB.
Also add stats printing.

On Spec CPU2006 this optimization saves roughly 4% of instrumented reads
(which is 3% of all instrumented accesses):
Writes            : 161216
Reads             : 446458
Reads-before-write: 18295



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154418 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 18:18:56 +00:00
Eric Christopher
18112d83e7 To ensure that we have more accurate line information for a block
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.

PR9796 and rdar://11215207

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154417 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 18:18:10 +00:00
Jim Grosbach
a23ecc2ba9 ARM fix cc_out operand handling for t2SUBrr instructions.
We were incorrectly conflating some add variants which don't have a
cc_out operand with the mirroring sub encodings, which do. Part of the
awesome non-orthogonality legacy of thumb1. Similarly, handling of
add/sub of an immediate was sometimes incorrectly removing the cc_out
operand for add/sub register variants.

rdar://11216577

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154411 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 17:31:55 +00:00
Nadav Rotem
50e64cfe6e Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154396 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 14:33:13 +00:00
Anton Korobeynikov
999821cddf Transform div to mul with reciprocal only when fp imm is legal.
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154394 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 13:22:49 +00:00
Duncan Sands
1fd63df693 Express the number of ULPs in fpaccuracy metadata as a real rather than a
rational number, eg as 2.5 rather than 5, 2.  OK'd by Peter Collingbourne.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154387 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 08:22:43 +00:00
Andrew Trick
d9fc1ce809 Fix 12513: Loop unrolling breaks with indirect branches.
Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154386 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 05:14:42 +00:00
Evan Cheng
fa12d0df5a Add proper checks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154379 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 03:15:42 +00:00
Evan Cheng
bf010eb911 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154370 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 01:51:00 +00:00
Rafael Espindola
fdb230a154 Don't try to zExt just to check if an integer constant is zero, it might
not fit in a i64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154364 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-10 00:16:22 +00:00
Lang Hames
23f369d1fe Test case for PR12495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154359 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 23:58:59 +00:00
Akira Hatanaka
787c3fd385 Have TargetLowering::getPICJumpTableRelocBase return a node that points to the
GOT if jump table uses 64-bit gp-relative relocation.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154341 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 20:32:12 +00:00
Chad Rosier
7f35455708 When performing a truncating store, it's possible to rearrange the data
in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154340 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 20:32:02 +00:00
Rafael Espindola
decbc43f72 Pattern match a setcc of boolean value with 0 as a truncate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154322 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 16:06:03 +00:00
Nadav Rotem
e80aa7c783 Lower some x86 shuffle sequences to the vblend family of instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154313 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 08:33:21 +00:00
Nadav Rotem
154819dd6f Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type.
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154310 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 07:45:58 +00:00
Chandler Carruth
ab5a55e118 Cleanup and relax a restriction on the matching of global offsets into
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
   the match to try to fit RIP into the base register. If it fails, it
   now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
   as it did before. The reason these immediates are safe is because the
   ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
   This is the only case changed by the patch, and the primary place you
   see it is in TLS, either the win64 section offset TLS or Linux
   local-exec TLS model in a PIC compilation. Here the ABI again ensures
   that the immediates fit because we are in small mode, and any other
   operations required due to the PIC relocation model have been handled
   externally to the Wrapper node (extra loads etc are made around the
   wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154304 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 02:13:06 +00:00
Chandler Carruth
6916a2375a Fold 15 tiny test cases into a single file that implements the
comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154303 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-09 01:43:17 +00:00
Duncan Sands
3ef3fcfc04 Only have codegen turn fdiv by a constant into fmul by the reciprocal
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154296 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 18:08:12 +00:00
Chandler Carruth
253933ee9e Teach LLVM about a PIE option which, when enabled on top of PIC, makes
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154294 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 17:51:45 +00:00
Chandler Carruth
2450eca960 Teach InstCombine to nuke a common alloca pattern -- an alloca which has
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154285 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 14:36:56 +00:00
Nadav Rotem
9d68b06bc5 AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154284 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 12:54:54 +00:00
Bill Wendling
864737cc51 Remove old 'grep' lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154283 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 11:53:54 +00:00
Bill Wendling
a0126afec8 FileCheckize these testcases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154281 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-08 11:00:38 +00:00
Nadav Rotem
d16c8d0d33 1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new
shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154266 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 21:19:08 +00:00
Duncan Sands
961d666be4 Convert floating point division by a constant into multiplication by the
reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154265 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 20:04:00 +00:00
Chandler Carruth
c0d18b6696 Fix ValueTracking to conclude that debug intrinsics are safe to
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.

This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154263 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 19:22:18 +00:00
Benjamin Kramer
c77764591b SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW.
Found by inspection.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154262 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 17:19:26 +00:00
Sean Hunt
0fdfaafb70 Make the test for r154235 more platform-independent with a shorter
string.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154243 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 01:33:14 +00:00
Sean Hunt
3420e7f360 Output UTF-8-encoded characters as identifier characters into assembly
by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154235 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 00:37:53 +00:00
Akira Hatanaka
3e59b5edd6 Add lines in global-address.ll to test N32 and N64 code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154202 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-06 20:23:36 +00:00
Jakob Stoklund Olesen
70fbea7c75 Allow negative immediates in ARM and Thumb2 compares.
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154183 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-06 17:45:04 +00:00
Chandler Carruth
9ceebb7e92 Sink the collection of return instructions until after *all*
simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154179 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-06 17:21:31 +00:00