http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.
I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123547 91177308-0d34-0410-b5e6-96231b3b80d8
half a million non-local queries, each of which would otherwise have triggered a
linear scan over a basic block.
Also fix a fixme for memory intrinsics which dereference pointers. With this,
we prove that a pointer is non-null because it was dereferenced by an intrinsic
112 times in llvm-test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123533 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123526 91177308-0d34-0410-b5e6-96231b3b80d8
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123524 91177308-0d34-0410-b5e6-96231b3b80d8
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123523 91177308-0d34-0410-b5e6-96231b3b80d8
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123468 91177308-0d34-0410-b5e6-96231b3b80d8
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123442 91177308-0d34-0410-b5e6-96231b3b80d8
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123441 91177308-0d34-0410-b5e6-96231b3b80d8
instead of DomTree/DomFrontier. This may be interesting for reducing compile
time. This is currently disabled, but seems to work just fine.
When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123434 91177308-0d34-0410-b5e6-96231b3b80d8
- Fixed :upper16: fix up routine. It should be shifting down the top 16 bits first.
- Added support for Thumb2 :lower16: and :upper16: fix up.
- Added :upper16: and :lower16: relocation support to mach-o object writer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123424 91177308-0d34-0410-b5e6-96231b3b80d8
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123417 91177308-0d34-0410-b5e6-96231b3b80d8
Add methods for accessing the (single) entry / exit edge of a region. If no such
edge exists, null is returned. Both accessors return the start block of the
corresponding edge. The edge can finally be formed by utilizing
Region::getEntry() or Region::getExit();
Contributed by: Andreas Simbuerger <simbuerg@fim.uni-passau.de>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123410 91177308-0d34-0410-b5e6-96231b3b80d8
the symbolic immediate names used for these instructions, fixing their pretty-printers, and
adding proper encoding information for them.
With this, we can properly pretty-print and encode assembly like:
mrc p15, #0, r3, c13, c0, #3
Fixes <rdar://problem/8857858>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123404 91177308-0d34-0410-b5e6-96231b3b80d8
set up the source operands. The original instr has an immediate operand that
should be replaced with the frame reg operand rather than just adding the
reg operand. Previously, the instruction ended up with too many operands
causing an assert() when adding the default predicate. rdar://8825456
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123387 91177308-0d34-0410-b5e6-96231b3b80d8
It will still return an iterator that points to the first terminator or end(),
but there may be DBG_VALUE instructions following the first terminator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123384 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123381 91177308-0d34-0410-b5e6-96231b3b80d8
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123380 91177308-0d34-0410-b5e6-96231b3b80d8
is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way
llvm-gcc converts gcc's conditional expressions. Add this, and a few other
similar transforms for completeness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123372 91177308-0d34-0410-b5e6-96231b3b80d8
in the right direction. It eliminated some hacks and will unblock codegen
work. But it's far from being done. It doesn't reject illegal expressions,
e.g. (FOO - :lower16:BAR). It also doesn't work in Thumb2 mode at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123369 91177308-0d34-0410-b5e6-96231b3b80d8
.code 32 if the TargetMachine's isThumb() boolean does not match. The correct
fix is to switch ARM subtargets at that point and is tracked by rdar://8856789
which is bigger task.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123353 91177308-0d34-0410-b5e6-96231b3b80d8
that way, unfortunately. If you want to change them to work additively instead
of a one-variant-kind-per-symbolref, that's great and I completely agree it's
worth doing, but it really should be a separate patch. Until then, this isn't
correct."
So I am reverting this bit until a more opportune time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123340 91177308-0d34-0410-b5e6-96231b3b80d8
R_ARM_MOVT_PREL and R_ARM_MOVW_PREL_NC.
2. Fix minor bug in ARMAsmPrinter - treat bitfield flag as a bitfield, not an enum.
3. Add support for 3 new elf section types (no-ops)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123294 91177308-0d34-0410-b5e6-96231b3b80d8
DT->changeImmediateDominator() trivially ignores identity updates, so there is
really no need for the uniqueing provided by SmallPtrSet.
I expect this to fix PR8954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123286 91177308-0d34-0410-b5e6-96231b3b80d8
For one, MachineBasicBlock::getFirstTerminator() doesn't understand what is
happening, and it also makes sense to have all control flow run through the
DBG_VALUE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123277 91177308-0d34-0410-b5e6-96231b3b80d8
carry setting flag from the mnemonic.
Note that this currently involves me disabling a number of working cases in
arm_instructions.s, this is a hopefully short term evil which will be rapidly
fixed (and greatly surpassed), assuming my current approach flies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123238 91177308-0d34-0410-b5e6-96231b3b80d8
"this" pointer for any subclass of User, you could static_cast it to
User* and then reinterpret_cast that to Use* to get the end of the
operand list. This isn't a safe assumption in general, because the
static_cast might adjust the "this" pointer. Fixed by having these
OperandTraits classes take an extra template parameter, which is the
subclass of User. This is groundwork for PR889.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123235 91177308-0d34-0410-b5e6-96231b3b80d8
Filling no-ops is done just before emitting of assembly,
when the instruction stream is final. No-ops are inserted
to align the instructions so the dual-issue of the pipeline
is utilized. This speeds up generated code with a minimum of
1% on a select set of algorithms.
This pass may be redundant if the instruction scheduler and
all subsequent passes that modify the instruction stream
(prolog+epilog inserter, register scavenger, are there others?)
are made aware of the instruction alignments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123226 91177308-0d34-0410-b5e6-96231b3b80d8
phi nodes. It is called from MergeBlockIntoPredecessor which is
called from GVN, which claims to preserve these.
I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123222 91177308-0d34-0410-b5e6-96231b3b80d8
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123206 91177308-0d34-0410-b5e6-96231b3b80d8