Commit Graph

10412 Commits

Author SHA1 Message Date
Tim Northover
c499ecd1d1 AArch64/ARM64: add extra testing from AArch64 to ARM64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206887 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 12:45:32 +00:00
Tim Northover
ba61446a56 AArch64/ARM64: enable various AArch64 tests on ARM64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206877 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 10:10:26 +00:00
Tim Northover
0e277d18bb AArch64/ARM64: add patterns for scalar_to_vector/extract pairs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206876 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 10:10:18 +00:00
Tim Northover
85974bc77e AArch64/ARM64: mark fmul intrinsic as commutative.
This gives DAG patterns matching indexed patterns where either side is an
indexed vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206875 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 10:10:14 +00:00
Tim Northover
74bd57b16b ARM: disable emission of __XYZvfp in soft-float environment.
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.

rdar://problem/13766161

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206874 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 10:10:09 +00:00
Hao Liu
07dcdc7c90 Fix an infinite loop bug in DAG Combine about keeping transfering between ANY_EXTEND and SIGN_EXTEND.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206873 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 09:57:06 +00:00
Lang Hames
53b4d83b63 [X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the
review.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206869 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 07:40:34 +00:00
Matt Arsenault
3ddf868b04 R600: Make sign_extend_inreg legal.
Don't know why I didn't just do this in the first place.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 03:49:30 +00:00
Jiangning Liu
0240286c23 [AArch64] Enable global merge pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 03:33:26 +00:00
Quentin Colombet
8959c39450 [CodeGenPrepare] Use APInt to check the value of the immediate in a and
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.

This fixes PR19503.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206834 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 01:20:34 +00:00
Yi Jiang
5d473a0831 ARM64: Combine shifts and uses from different basic block to bit-extract instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206774 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 19:34:27 +00:00
Duncan P. N. Exon Smith
9a11d668f9 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206707, reapplying r206704.  The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206766 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 17:57:07 +00:00
Eli Bendersky
85af3e7445 Fix the test: DCE optimized away everything.
Use volatile store to protect the generated PTX from DCE.

Patch by Jingyue Wu.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206763 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 17:23:12 +00:00
Michael Zolotukhin
d329c79f16 Reapply r206732. This time without optimization of branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206749 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 12:01:33 +00:00
NAKAMURA Takumi
2b3433557f llvm/test/CodeGen/X86/bmi.ll: Relax expressions for targeting win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206743 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 11:01:46 +00:00
Lang Hames
f69bb5e43c [X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available.
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.

<rdar://problem/15480077>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206738 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 08:18:53 +00:00
Chandler Carruth
81549a0a39 Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206735 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 07:11:15 +00:00
Michael Zolotukhin
7d5100d14e Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 05:33:09 +00:00
Duncan P. N. Exon Smith
f44eda4764 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206704, as expected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith
c404a5334e Revert "blockfreq: Temporarily turn on -debug-only=block-freq"
This reverts commit r206705, as planned.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:45:44 +00:00
Duncan P. N. Exon Smith
69552aa77e blockfreq: Temporarily turn on -debug-only=block-freq
These tests fail after my BlockFrequencyInfo rewrite on two buildbots
[1][2].  I can't reproduce it locally, so I'm temporarily turning on
-debug-only=block-freq so I can find the problem.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1860
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:40:56 +00:00
Duncan P. N. Exon Smith
f465370a49 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.

I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths).  There's a small
chance that this will appease the failing bots [1][2].  (If so, great!)

If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing.  Once I've triggered those builds, I'll
revert both commits so the bots go green again.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206704 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:34:26 +00:00
Yaron Keren
64b2297786 Patch by Vadim Chugunov
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by 
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.

A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206684 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 13:47:43 +00:00
Duncan P. N. Exon Smith
2033057de8 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206666, as planned.

Still stumped on why the bots are failing.  Sanitizer bots haven't
turned anything up.  If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer.  (In the meantime, I'll be
auditing my patch for undefined behaviour.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206677 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith
036e26bc29 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206628, reapplying r206622 (and r206626).

Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux.  Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.

So, I'm applying this again to get more feedback from the bots.  I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206666 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 22:30:03 +00:00
Chad Rosier
6c4ec69c6b [ARM64] Ports the Cortex-A53 Machine Model description from AArch64.
Summary:
This port includes the rudimentary latencies that were provided for
the Cortex-A53 Machine Model in the AArch64 backend. It also changes
the SchedAlias for COPY in the Cyclone model to an explicit
WriteRes mapping to avoid conflicts in other subtargets.

Differential Revision: http://reviews.llvm.org/D3427
Patch by Dave Estes <cestes@codeaurora.org>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206652 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 21:22:04 +00:00
Yaron Keren
904f8dcaa4 Expanded test for x86-pc-windows-gnu and x86_64-pc-windows-gnu environments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206649 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 21:10:11 +00:00
Adam Nemet
d290fa608f [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith
ebb5d29473 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206622 and the MSVC fixup in r206626.

Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith
54850bedf2 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.

In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally).  I'm hoping the mystery
is solved?  I'll revert this again if the tests are still failing
remotely.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith
1e1954f749 Add some target triples for better determinism
These tests were failing on some buildbots after r206548 (reverted in
r206556), but passing locally.

They were missing target triples, so maybe that's the problem?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:22:19 +00:00
Tim Northover
7b4b261611 AArch64/ARM64: add more NEON tests.
Mostly no testing this time, since they were just wrangling
target-specific intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206613 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:53 +00:00
Tim Northover
f34a512a68 ARM64: disable generation of .loh directives outside MachO.
Part of PR19455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:46 +00:00
Tim Northover
9cfd368302 ARM64: don't emit .subsections_via_symbols on ELF.
Part of PR19455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:41 +00:00
Tim Northover
1d5a2ad8a6 ARM64: add extra NEG pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206609 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:35 +00:00
Tim Northover
936285440b AArch64/ARM64: port more AArch64 tests to ARM64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206592 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 13:16:55 +00:00
Tim Northover
753cfe6172 AArch64/ARM64: add non-scalar lowering for more FCVT operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206591 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 13:16:42 +00:00
Tim Northover
7b4b522ec8 AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206588 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 12:50:58 +00:00
Benjamin Kramer
c32e261a1a X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206579 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 10:45:33 +00:00
Tim Northover
fb96efa7dd AArch64/ARM64: port atomics test to ARM64.
Covers quite a few extra instructions (like any of the max/min ones
which were broken until recently on ARM64).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206575 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:31 +00:00
Tim Northover
0d6995985a AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206574 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:27 +00:00
Tim Northover
70b63374f2 ARM64: implement cunning optimisation from AArch64
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:20 +00:00
Tim Northover
66643da8fc AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:07 +00:00
Tim Northover
937290d7ed AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206569 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:01 +00:00
Tim Northover
2f5d14af9d AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206568 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:30:52 +00:00
Jiangning Liu
bc3655f9c8 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206559 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 05:58:09 +00:00
Matt Arsenault
746734df1a R600/SI: Try to use scalar BFE.
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206558 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 05:19:26 +00:00
Jiangning Liu
532a5ffe4c This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206557 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 03:58:38 +00:00
Duncan P. N. Exon Smith
c7a3b95c0f Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206556 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith
cc1e1707b8 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206548 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 01:57:45 +00:00