http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.
I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123547 91177308-0d34-0410-b5e6-96231b3b80d8
half a million non-local queries, each of which would otherwise have triggered a
linear scan over a basic block.
Also fix a fixme for memory intrinsics which dereference pointers. With this,
we prove that a pointer is non-null because it was dereferenced by an intrinsic
112 times in llvm-test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123533 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123526 91177308-0d34-0410-b5e6-96231b3b80d8
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123524 91177308-0d34-0410-b5e6-96231b3b80d8
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123523 91177308-0d34-0410-b5e6-96231b3b80d8
these would try hard to match constants by inverting the bits
and recursively matching. There are two problems with this:
1) some patterns would match when we didn't want them to (theoretical)
2) this is insanely expensive to do, and most often pointless.
This was apparently useful in just 2 instcombine cases, which I
added code to handle explicitly. This change speeds up 'opt'
time on 176.gcc by 1% and produces bitwise identical code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123518 91177308-0d34-0410-b5e6-96231b3b80d8
This is needed to allow an InstAlias for an instruction with an "OptionalDef"
result register (like ARM's cc_out) where you want to set the optional register
to reg0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123490 91177308-0d34-0410-b5e6-96231b3b80d8
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123468 91177308-0d34-0410-b5e6-96231b3b80d8
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123442 91177308-0d34-0410-b5e6-96231b3b80d8
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123441 91177308-0d34-0410-b5e6-96231b3b80d8