Commit Graph

28961 Commits

Author SHA1 Message Date
Bruno Cardoso Lopes
83f6fece72 [AsmPrinter][TLOF] XFAIL AArch64 test to appease buildbots
The checking for extgotequiv and localgotequiv rely on the emission
order, which is not guaranteed because we use DenseMap to hold the GOT
equivalents. XFAIL this now until I get time to use MapVector and test
out the solution. In the meantime, appease buildbots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231497 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 18:38:42 +00:00
Frederic Riss
98ecf5a7ed [dsymutil] Add debug_str construction support.
With this comes the ability to correctly clone string attributes in DIEs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231493 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 17:56:30 +00:00
Bruno Cardoso Lopes
653997ebc2 [AsmPrinter][TLOF] Make AArch64 test a bit more flexible
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231481 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 15:11:41 +00:00
Bruno Cardoso Lopes
9dda04db93 [AsmPrinter][TLOF] Split tests and move to appropriate directories
Follow up from r231474 and 231475 to appease buildbots

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231480 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 14:41:56 +00:00
Bruno Cardoso Lopes
dfc6383227 [AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalents
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.

-- before

    _extgotequiv:
       .long _extfoo

    _delta:
       .long _extgotequiv-_delta

-- after

    _delta:
       .long L_extfoo$non_lazy_ptr-_delta

       .section __IMPORT,__pointers,non_lazy_symbol_pointers
    L_extfoo$non_lazy_ptr:
       .indirect_symbol _extfoo
       .long 0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231475 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 13:49:05 +00:00
Bruno Cardoso Lopes
66aa390799 [AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalents
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:

-- before

   .globl  _foo
  _foo:
   .long   42

   .globl  _gotequivalent
  _gotequivalent:
   .quad   _foo

   .globl  _delta
  _delta:
   .long   _gotequivalent-_delta

-- after

   .globl  _foo
  _foo:
   .long   42

   .globl  _delta
  Ltmp3:
   .long _foo@GOT-Ltmp3

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231474 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 13:48:45 +00:00
Toma Tabacu
25c2850952 [mips] [IAS] Add missing constraints and improve testing for the .module directive.
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7140

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231465 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 12:15:12 +00:00
Karthik Bhat
52610d84ad Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.

For e.g. given a loop like -
  for(int i=0;i<N;i++)
    for(int j=0;j<N;j++)
      A[j][i] = A[j][i]+B[j][i];

is interchanged to -
  for(int j=0;j<N;j++)
    for(int i=0;i<N;i++)
      A[j][i] = A[j][i]+B[j][i];

This pass is currently disabled by default.

To give a brief introduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.

TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.

A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231458 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 10:11:25 +00:00
David Majnemer
ee711b5b16 X86: Form IMGREL relocations for LLVM Functions
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable.  Extend this
functionality to all GlobalObjects.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231456 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 08:11:32 +00:00
Michael Zolotukhin
6023ad2d37 LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231443 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 01:13:01 +00:00
Rafael Espindola
b6fd95ab41 Remember to move a type to the correct set when setting the body.
We would set the body of a struct type (therefore making it non-opaque)
but were forgetting to move it to the non-opaque set.

Fixes pr22807.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231442 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 00:50:21 +00:00
Michael Gottesman
0d7cce41ff [objc-arc] Remove annotations code.
It will always be in the history if it is needed again. Now it is just dead
code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231435 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 00:34:29 +00:00
Nadav Rotem
368d2e9976 Teach ComputeNumSignBits about signed reminder.
This optimization a continuation of r231140 that reasoned about signed div.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231433 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-06 00:23:58 +00:00
Philip Reames
292f5ef237 [RewriteStatepointsForGC] Yet more test cases for relocation
At this point, we should have decent coverage of the involved code.  I've got a few more test cases to cleanup and submit, but what's here is already reasonable.

I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days.  Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.  



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231414 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 22:28:06 +00:00
Sanjay Patel
5f79fd2f02 [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)
This patch reduces code size for all AVX targets and increases speed for some chips.

SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.

AVX subsequently made the instruction useful by adding a 4-register operand form.

So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.

This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter

No new test cases with this patch because:

1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.

http://llvm.org/bugs/show_bug.cgi?id=22483

Differential Revision: http://reviews.llvm.org/D8063



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231408 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 21:46:54 +00:00
Ahmed Bougacha
77f46f4f9f [AArch64] Teach AsmPrinter about GlobalAddress operands.
Fixes PR22761, rdar://20024866.
Differential Revision: http://reviews.llvm.org/D8042


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231400 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 20:04:21 +00:00
Philip Reames
03001d828e [RewriteStatepointsForGC] Add additional tests around relocation
These are focused around the actual relocation rewriting itself, not the rest of the infrastructure.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231399 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 19:52:13 +00:00
Rafael Espindola
2f76abe7d7 Use the correct func begin symbol in all places in ppc.
I missed an occurrence of the old symbol in my previous patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231398 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 19:47:50 +00:00
Ahmed Bougacha
67297cd956 [ARM] Enable vector extload combine for legal types.
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.

Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because: 
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
  when we "manually expand" extloads to vld+vmov.

For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.

Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231396 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 19:37:53 +00:00
Rafael Espindola
2e2dbc35da Use the generic Lfunc_begin label on ppc.
This removes yet another custom label to mark the start of a function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231390 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 18:55:50 +00:00
David Majnemer
42fcf79f36 X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes
We know that the absolute symbol will be less than 2GB and thus will
always fit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231389 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 18:50:12 +00:00
Reid Kleckner
9f7c861416 Replace llvm.frameallocate with llvm.frameescape
Turns out it's pretty straightforward and simplifies the implementation.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D8051

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231386 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 18:26:34 +00:00
Simon Pilgrim
a744a15e97 [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231380 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 17:14:04 +00:00
Kit Barton
b98636a0f8 While reviewing the changes to Clang to add builtin support for the vsld, vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231378 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 16:24:38 +00:00
Igor Laevsky
684d323b9b Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231374 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 15:41:14 +00:00
Elena Demikhovsky
e670dc7848 AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231371 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 15:11:35 +00:00
Igor Laevsky
f8b3003ab8 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231366 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 14:11:21 +00:00
Michael Kuperstein
2d8a36ee71 [InstCombine] Fix an assertion when fmul has a ConstantExpr operand
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.

Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231359 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 08:38:57 +00:00
Craig Topper
62eaac6087 [X86] Use vmovss to handle inserting an element into index 0 of a v8f32 vector of zeros.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231354 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 06:38:42 +00:00
Rafael Espindola
304fe62b74 Use the existing begin and end symbol for debug info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231338 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 02:05:42 +00:00
Kostya Serebryany
c2f4077b88 [sanitizer] add nosanitize metadata to more coverage instrumentation instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231333 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 01:20:05 +00:00
Chandler Carruth
4197c13062 [MBP] Revert r231238 which attempted to fix a nasty bug where MBP is
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).

This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231332 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 01:07:03 +00:00
Paul Robinson
948b2db8a7 Turn off .debug_pubnames/pubtypes for PS4.
Differential Revision: http://reviews.llvm.org/D8067


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231322 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-05 00:08:27 +00:00
Matthias Braun
29aeaf5408 Improve test robustness
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231307 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 22:31:18 +00:00
Sanjoy Das
12aa70b7e9 [SCEV] make SCEV smarter about proving no-wrap.
Summary:
Teach SCEV to prove no overflow for an add recurrence by proving
something about the range of another add recurrence a loop-invariant
distance away from it.

Reviewers: atrick, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7980

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231305 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 22:24:17 +00:00
Frederic Riss
d0d92e7d30 [dsymutil] Add minimal code to emit DIE trees.
This commit adds code to emit DIE trees that have been pruned from the
parts that haven't been marked as kept in the previous pass.

It works by 'cloning' the input DIE tree (as read by libDebugInfoDwarf)
into a tree of DIE objects. Cloning the DIEs means essentially cloning
their attributes. The code in this commit does only handle scalar and
block attributes (scalar because they are trivial, blocks because they
can't be easily replaced by a scalr placeholder), all the other ones
are replaced by placeholder zero values and will be handled in
further commits.

The added tests mostly check that the DIE tree has the correct layout and
also verify that a few chosen scalar and block attributes correctly make
their way into the output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231300 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 22:07:44 +00:00
Rafael Espindola
236aa85873 Expand variables when evaluating absolute expressions.
This allows for variables to be used in .size.
This matches gnu AS functionality.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231295 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 22:03:21 +00:00
Paul Robinson
4ceab42509 Support standard DWARF TLS opcode; Darwin and PS4 use it.
Differential Revision: http://reviews.llvm.org/D8018


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231286 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 20:55:11 +00:00
Nemanja Ivanovic
b69d556c37 Add LLVM support for PPC cryptography builtins
Review: http://reviews.llvm.org/D7955


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231285 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 20:44:33 +00:00
Rafael Espindola
c90e7f79ca Bring r231132 back with a fix.
The issue was that we were always printing the remarks. Fix that and add a test
showing that it prints nothing if -pass-remarks is not given.

Original message:
Correctly handle -pass-remarks in the gold plugin.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231273 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 18:51:45 +00:00
Mehdi Amini
c94da20917 Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231270 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 18:43:29 +00:00
Adrian Prantl
2e74ddea3a Update the out-of-date dwarf expressions in these testcases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231261 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 17:39:59 +00:00
Marek Olsak
506d4b2cb4 R600/SI: Add an intrinsic for S_FLBIT_I32 / V_FFBH_I32
Required by OpenGL (ARB_gpu_shader5).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231259 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 17:33:45 +00:00
NAKAMURA Takumi
69de0932a5 Revert r231132, "Correctly handle -pass-remarks in the gold plugin.", for now, to suppress log floodng in LTO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231253 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 16:24:28 +00:00
Jozef Kolek
2e37a6f306 [mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator
Differential Revision: http://reviews.llvm.org/D7609


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231249 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 15:47:42 +00:00
Andrea Di Biagio
da5e5688e9 [X86][FastISel] Simplify the logic in method X86SelectSIToFP.
The target-independent selection algorithm in FastISel already knows how
to select a SINT_TO_FP if the target is SSE but not AVX.

On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions
for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr
(for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64
conversion).

This patch simplifies the logic in method X86SelectSIToFP knowing that
the code would not be reachable if the subtarget doesn't have AVX.
No functional change intended.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231243 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 14:23:25 +00:00
Dmitry Vyukov
826cbaf934 asan: do not instrument direct inbounds accesses to stack variables
Do not instrument direct accesses to stack variables that can be
proven to be inbounds, e.g. accesses to fields of structs on stack.

But it eliminates 33% of instrumentation on webrtc/modules_unittests
(number of memory accesses goes down from 290152 to 193998) and
reduces binary size by 15% (from 74M to 64M) and improved compilation time by 6-12%.

The optimization is guarded by asan-opt-stack flag that is off by default.

http://reviews.llvm.org/D7583



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231241 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 13:27:53 +00:00
Chandler Carruth
67fade9110 [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
a flag for now.

First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.

The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:

  if (a1()) { if (b1()) c1(); else d1(); f1(); }
  if (a2()) { if (b2()) c2(); else d2(); f2(); }
  done();

Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:

  a1, a2, done;

And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:

  a1, a2, done; b1, c1;

At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.

... wait for it ...

Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!

  a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;

So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.

It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.

Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231238 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 12:18:08 +00:00
Daniel Jasper
f68f28a41d Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.

Review: http://reviews.llvm.org/D7719

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231230 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 11:05:34 +00:00
Kristof Beyls
78c4ef5120 Fix PR22408 - LLVM producing AArch64 TLS relocations that GNU linkers cannot handle yet.
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.

This patch:

* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
  would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
  linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
  is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
  accesses is produced, by making use of a new pseudo instruction. The patch also removes
  two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
  SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231227 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-04 09:12:08 +00:00