LSR can fold three addressing modes into its ICmpZero node:
ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset
ICmpZero -1*ScaleReg + Offset => ICmp ScaleReg, Offset
ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg
The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.
Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.
<rdar://problem/11184260>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154079 91177308-0d34-0410-b5e6-96231b3b80d8
A MOVCCr instruction can be commuted by inverting the condition. This
can help reduce register pressure and remove unnecessary copies in some
cases.
<rdar://problem/11182914>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154033 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
might have more than 19 operands. Add a testcase to make sure I
never screw that up again.
Part of rdar://11026482
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153961 91177308-0d34-0410-b5e6-96231b3b80d8
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.
Patch by Shamil Kurmangaleev!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153904 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows llvm to recognize that a 64 bit object file is being produced
and that the subsequently generated ELF header has the correct information.
The test case checks for both big and little endian flavors.
Patch by Jack Carter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153889 91177308-0d34-0410-b5e6-96231b3b80d8
http://llvm.org/bugs/show_bug.cgi?id=12343
We have not trivial way for splitting edges that are goes from indirect branch. We can do it with some tricks, but it should be additionally discussed. And it is still dangerous due to difficulty of indirect branches controlling.
Fix forbids this case for unswitching.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153879 91177308-0d34-0410-b5e6-96231b3b80d8
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153864 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153842 91177308-0d34-0410-b5e6-96231b3b80d8
a single missing character. Somehow, this had gone untested. I've added
tests for returns-twice logic specifically with the always-inliner that
would have caught this, and fixed the bug.
Thanks to Matt for the careful review and spotting this!!! =D
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153832 91177308-0d34-0410-b5e6-96231b3b80d8
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153817 91177308-0d34-0410-b5e6-96231b3b80d8
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153812 91177308-0d34-0410-b5e6-96231b3b80d8
The powi intrinsic requires special handling because it always takes a single
integer power regardless of the result type. As a result, we can vectorize
only if the powers are equal. Fixes PR12364.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153797 91177308-0d34-0410-b5e6-96231b3b80d8
When an immediate is both a value [t2_]so_imm and a [t2_]so_imm_neg,
we want to use the non-negated form to make sure we prefer the normal
encoding, not the aliased encoding via the negation of, e.g., 'cmp.w'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153770 91177308-0d34-0410-b5e6-96231b3b80d8
Make the non-tied register operand names line up with what the base
class encoding handler expects.
rdar://11157236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153766 91177308-0d34-0410-b5e6-96231b3b80d8
For 'adds r2, r2, #56' outside of an IT block, the 16-bit encoding T2
can be used for this syntax. Prefer the narrow encoding when possible.
rdar://11156277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153759 91177308-0d34-0410-b5e6-96231b3b80d8
The CMP->CMN alias was matching for an immediate of zero when it
should only match for negative values.
rdar://11129224
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153689 91177308-0d34-0410-b5e6-96231b3b80d8
CodeGenPrepare sinks compare instructions down to their uses to prevent
live flags and predicate registers across basic blocks.
PRE of a compare instruction prevents that, forcing the i1 compare
result into a general purpose register. That is usually more expensive
than the redundant compare PRE was trying to eliminate in the first
place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153657 91177308-0d34-0410-b5e6-96231b3b80d8
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153635 91177308-0d34-0410-b5e6-96231b3b80d8
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153617 91177308-0d34-0410-b5e6-96231b3b80d8
When an strd instruction doesn't get the registers it wants, it can be
expanded into two str instructions. Make sure the first str doesn't kill
the base register in the case where the base and data registers are
identical:
t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg
t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg
<rdar://problem/11101911>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153611 91177308-0d34-0410-b5e6-96231b3b80d8
The arm_neon intrinsics can create virtual registers from the DPair
register class which allows both even-odd and odd-even D-register pairs.
This fixes PR12389.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153603 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message for r153521 (aka r153423):
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153587 91177308-0d34-0410-b5e6-96231b3b80d8
blocks in the function cloner. This removes the last case of trivially
dead code that I've been seeing in the wild getting inlined, analyzed,
re-inlined, optimized, only to be deleted. Nukes a FIXME from the
cleanup tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153572 91177308-0d34-0410-b5e6-96231b3b80d8
undefined behavior, which Rafael was kind enough to fix.
Original commit message for r153423:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153521 91177308-0d34-0410-b5e6-96231b3b80d8
produces a 32-bit immediate which is consumed by the use. It tries to
fold the immediate by breaking it into two parts and fold them into the
immmediate fields of two uses. e.g
movw r2, #40885
movt r3, #46540
add r0, r0, r3
=>
add.w r0, r0, #3019898880
add.w r0, r0, #30146560
;
However, this transformation is incorrect if the user produces a flag. e.g.
movw r2, #40885
movt r3, #46540
adds r0, r0, r3
=>
add.w r0, r0, #3019898880
adds.w r0, r0, #30146560
Note the adds.w may not set the carry flag even if the original sequence
would.
rdar://11116189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153484 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loading a boolean value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153452 91177308-0d34-0410-b5e6-96231b3b80d8
constant-offsets of a common base using the generic GEP-walking logic
I added for computing pointer differences in the same situation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153419 91177308-0d34-0410-b5e6-96231b3b80d8
inbounds GEPs. This isn't really necessary for simplifying pointer
differences, but I'm planning to re-use the same code to simplify
pointer comparisons where it is necessary. Since real code almost
exclusively uses inbounds GEPs, it doesn't seem worth it to support the
extra complexity of turning it on and off. If anyone would like that
back, feel free to shout. Note that instcombine will still catch any of
these patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153418 91177308-0d34-0410-b5e6-96231b3b80d8
aggressively. There are lots of dire warnings about this being expensive
that seem to predate switching to the TrackingVH-based value remapper
that is automatically updated on RAUW. This makes it easy to not just
prune single-entry PHIs, but to fully simplify PHIs, and to recursively
simplify the newly inlined code to propagate PHINode simplifications.
This introduces a bit of a thorny problem though. We may end up
simplifying a branch condition to a constant when we fold PHINodes, and
we would like to nuke any dead blocks resulting from this so that time
isn't wasted continually analyzing them, but this isn't easy. Deleting
basic blocks *after* they are fully cloned and mapped into the new
function currently requires manually updating the value map. The last
piece of the simplification-during-inlining puzzle will require either
switching to WeakVH mappings or some other piece of refactoring. I've
left a FIXME in the testcase about this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153410 91177308-0d34-0410-b5e6-96231b3b80d8
* Removed test/lib/llvm.exp - it is no longer needed
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
left in the test suite so this code is no longer required. test/lit.cfg is
now much shorter and clearer
* Removed a lot of duplicate code in lit.local.cfg files that need access to
the root configuration, by adding a "root" attribute to the TestingConfig
object. This attribute is dynamically computed to provide the same
information as was previously provided by the custom getRoot functions.
* Documented the config.root attribute in docs/CommandGuide/lit.pod
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153408 91177308-0d34-0410-b5e6-96231b3b80d8
to instead rely on much more generic and powerful instruction
simplification in the function cloner (and thus inliner).
This teaches the pruning function cloner to use instsimplify rather than
just the constant folder to fold values during cloning. This can
simplify a large number of things that constant folding alone cannot
begin to touch. For example, it will realize that 'or' and 'and'
instructions with certain constant operands actually become constants
regardless of what their other operand is. It also can thread back
through the caller to perform simplifications that are only possible by
looking up a few levels. In particular, GEPs and pointer testing tend to
fold much more heavily with this change.
This should (in some cases) have a positive impact on compile times with
optimizations on because the inliner itself will simply avoid cloning
a great deal of code. It already attempted to prune proven-dead code,
but now it will be use the stronger simplifications to prove more code
dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153403 91177308-0d34-0410-b5e6-96231b3b80d8
regressed seriously here, we are no longer removing allocas during
inline cleanup. This appears to be because of lifetime markers "using"
them. =/ I'll look into this shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153394 91177308-0d34-0410-b5e6-96231b3b80d8
The PPC64 SVR4 ABI requires integer stack arguments, and thus the var. args., that
are smaller than 64 bits be zero extended to 64 bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153373 91177308-0d34-0410-b5e6-96231b3b80d8
same basic block, and it's not safe to insert code in the successor
blocks if the edges are critical edges. Splitting those edges is
possible, but undesirable, especially on the unwind side. Instead,
make the bottom-up code motion to consider invokes to be part of
their successor blocks, rather than part of their parent blocks, so
that it doesn't push code past them and onto the edges. This fixes
PR12307.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153343 91177308-0d34-0410-b5e6-96231b3b80d8
(and hopefully on Windows). The bots have been down most of the day
because of this, and it's not clear to me what all will be required to
fix it.
The commits started with r153205, then r153207, r153208, and r153221.
The first commit seems to be the real culprit, but I couldn't revert
a smaller number of patches.
When resubmitting, r153207 and r153208 should be folded into r153205,
they were simple build fixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153241 91177308-0d34-0410-b5e6-96231b3b80d8
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153230 91177308-0d34-0410-b5e6-96231b3b80d8
These changes allow us to compile big endian from the command line for 32 bit
Mips targets. This patch will result in code and data actually being produced
in the correct endianess.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153153 91177308-0d34-0410-b5e6-96231b3b80d8
Do not call SplitBlockPredecessors on a loop preheader when one of the
predecessors is an indirectbr. Otherwise, you will hit this assert:
!isa<IndirectBrInst>(Preds[i]->getTerminator()) && "Cannot split an edge from an IndirectBrInst"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153134 91177308-0d34-0410-b5e6-96231b3b80d8
instead of skipping the current loop.
My prior fix was incomplete because of an overzealous compile-time optimization:
Better fix for: <rdar://problem/11049788> Segmentation fault: 11 in LoopStrengthReduce
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153131 91177308-0d34-0410-b5e6-96231b3b80d8
This results in things such as
vmovups 16(%rdi), %xmm0
vinsertf128 $1, %xmm0, %ymm0, %ymm0
to be combined to
vinsertf128 $1, 16(%rdi), %ymm0, %ymm0
rdar://11076953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153092 91177308-0d34-0410-b5e6-96231b3b80d8
i128). In that case, we may not be able to print out the MCExpr as an
expression. For instance, we could have an MCExpr like this:
0xBEEF0000BEEF0000 | (0xBEEF0000BEEF0000 << 64)
The MCExpr printer handles sizes up to 64-bits, but this expression would
require 128-bits. In this situation, try to evaluate the constant expression and
emit that as the value into 64-bit chunks.
<rdar://problem/11070338>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153081 91177308-0d34-0410-b5e6-96231b3b80d8
X86InstrCompiler.td.
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153033 91177308-0d34-0410-b5e6-96231b3b80d8
overflow checking multiply intrinsic as well.
Add a test for this, updating the test from grep to FileCheck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153028 91177308-0d34-0410-b5e6-96231b3b80d8
It's not a good style idea, as the registers will be laid down in memory in
numerical order, not the order they're in the list, but it's legal. vldm/vstm
are stricter.
rdar://11064740
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152943 91177308-0d34-0410-b5e6-96231b3b80d8
alignment. If that's the case, then we want to make sure that we don't increase
the alignment of the store instruction. Because if we increase it to be "more
aligned" than the pointer, code-gen may use instructions which require a greater
alignment than the pointer guarantees.
<rdar://problem/11043589>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152907 91177308-0d34-0410-b5e6-96231b3b80d8
It was added in 2007 as the first cut at supporting no-inline
attributes, but we didn't have function attributes of any form at the
time. However, it was added without any mention in the LangRef or other
documentation.
Later on, in 2008, Devang added function notes for 'inline=never' and
then turned them into proper function attributes. From that point
onward, as far as I can tell, the world moved on, and no one has touched
'llvm.noinline' in any meaningful way since.
It's time has now come. We have had better mechanisms for doing this for
a long time, all the frontends I'm aware of use them, and this is just
holding back progress. Given that it was never a documented feature of
the IR, I've provided no auto-upgrade support. If people know of real,
in-the-wild bitcode that relies on this, yell at me and I'll add it, but
I *seriously* doubt anyone cares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152904 91177308-0d34-0410-b5e6-96231b3b80d8
Only record IVUsers that are dominated by simplified loop
headers. Otherwise SCEVExpander will crash while looking for a
preheader.
I previously tried to work around this in LSR itself, but that was
insufficient. This way, LSR can continue to run if some uses are not
in simple loops, as long as we don't attempt to analyze those users.
Fixes <rdar://problem/11049788> Segmentation fault: 11 in LoopStrengthReduce
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152892 91177308-0d34-0410-b5e6-96231b3b80d8
out the DW_AT_name. Older gdbs unfortunately still use it to
disambiguate member functions in templated classes (gdb.cp/templates.exp).
rdar://11043421 (which is now deferred for a bit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152782 91177308-0d34-0410-b5e6-96231b3b80d8
This results in things such as
vmovaps -96(%rbx), %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
to be combined to
vinsertf128 $1, -96(%rbx), %ymm0, %ymm0
rdar://10643481
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152762 91177308-0d34-0410-b5e6-96231b3b80d8
correlated pairs of pointer arguments at the callsite. This is designed
to recognize the common C++ idiom of begin/end pointer pairs when the
end pointer is a constant offset from the begin pointer. With the
C-based idiom of a pointer and size, the inline cost saw the constant
size calculation, and this provides the same level of information for
begin/end pairs.
In order to propagate this information we have to search for candidate
operations on a pair of pointer function arguments (or derived from
them) which would be simplified if the pointers had a known constant
offset. Then the callsite analysis looks for such pointer pairs in the
argument list, and applies the appropriate bonus.
This helps LLVM detect that half of bounds-checked STL algorithms
(such as hash_combine_range, and some hybrid sort implementations)
disappear when inlined with a constant size input. However, it's not
a complete fix due the inaccuracy of our cost metric for constants in
general. I'm looking into that next.
Benchmarks showed no significant code size change, and very minor
performance changes. However, specific code such as hashing is showing
significantly cleaner inlining decisions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152752 91177308-0d34-0410-b5e6-96231b3b80d8
output (we're emitting a specification already and the information
isn't changing).
Saves 1% on the debug information for a build of llvm.
Fixes rdar://11043421
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152697 91177308-0d34-0410-b5e6-96231b3b80d8
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152675 91177308-0d34-0410-b5e6-96231b3b80d8
instruction's destination operand like it does for the source operand.
Also fix a typo in the comment for X86AsmParser::isSrcOp().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152654 91177308-0d34-0410-b5e6-96231b3b80d8
candidate set for subsequent inlining, try to simplify the arguments to
the inner call site now that inlining has been performed.
The goal here is to propagate and fold constants through deeply nested
call chains. Without doing this, we loose the inliner bonus that should
be applied because the arguments don't match the exact pattern the cost
estimator uses.
Reviewed on IRC by Benjamin Kramer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152556 91177308-0d34-0410-b5e6-96231b3b80d8
Typically instcombine has handled this, but pointer differences show up
in several contexts where we would like to get constant folding, and
cannot afford to run instcombine. Specifically, I'm working on improving
the constant folding of arguments used in inline cost analysis with
instsimplify.
Doing this in instsimplify implies some algorithm changes. We have to
handle multiple layers of all-constant GEPs because instsimplify cannot
fold them into a single GEP the way instcombine can. Also, we're only
interested in all-constant GEPs. The result is that this doesn't really
replace the instcombine logic, it's just complimentary and focused on
constant folding.
Reviewed on IRC by Benjamin Kramer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152555 91177308-0d34-0410-b5e6-96231b3b80d8
The 'CmpInst::isFalseWhenEqual' function returns 'false' for values other than
simply equality. For instance, it returns 'false' for <= or >=. This isn't the
correct behavior for this transformation, which is checking for strict equality
and non-equality. It was causing the gcc.c-torture/execute/frame-address.c test
to fail because it would completely (and incorrectly) optimize a whole function
into a 'ret i32 0'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152497 91177308-0d34-0410-b5e6-96231b3b80d8
* Add enums and structures for GNU version information.
* Implement extraction of that information on a per-symbol basis (ELFObjectFile::getSymbolVersion).
* Implement a generic interface, GetELFSymbolVersion(), for getting the symbol version from the ObjectFile (hides the templating).
* Have llvm-readobj print out the version, when available.
* Add a test for the new feature: readobj-elf-versioning.test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152436 91177308-0d34-0410-b5e6-96231b3b80d8
traversal, consider nodes for which the only successors are backedges
which the traversal is ignoring to be exit nodes. This fixes a problem
where the bottom-up traversal was failing to visit split blocks along
split loop backedges. This fixes rdar://10989035.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152421 91177308-0d34-0410-b5e6-96231b3b80d8
negative switch cases if the branch condition is known to be positive.
Inspired by a recent improvement to GCC's VRP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152405 91177308-0d34-0410-b5e6-96231b3b80d8
introduced. Specifically, there are cost reductions for all
constant-operand icmp instructions against an alloca, regardless of
whether the alloca will in fact be elligible for SROA. That means we
don't want to abort the icmp reduction computation when we abort the
SROA reduction computation. That in turn frees us from the need to keep
a separate worklist and defer the ICmp calculations.
Use this new-found freedom and some judicious function boundaries to
factor the innards of computing the cost factor of any given instruction
out of the loop over the instructions and into static helper functions.
This greatly simplifies the code, and hopefully makes it more clear what
is happening here.
Reviewed by Eric Christopher. There is some concern that we'd like to
ensure this doesn't get out of hand, and I plan to benchmark the effects
of this change over the next few days along with some further fixes to
the inline cost.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152368 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message from r147481:
DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
Fix:
Unaligned loads need to generate a vmovups.
rdar://10974078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152366 91177308-0d34-0410-b5e6-96231b3b80d8
The test fell back to the C backend, making it useless and it started to fail
on configurations that don't build the C backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152342 91177308-0d34-0410-b5e6-96231b3b80d8
When an instruction only writes sub-registers, it is still necessary to
add an <imp-def> operand for the super-register. When reloading into a
virtual register, rewriting will add the operand, but when loading
directly into a virtual register, the <imp-def> operand is still
necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152095 91177308-0d34-0410-b5e6-96231b3b80d8
The fpscr register contains both flags (set by FP operations/comparisons) and
control bits. The control bits (FPSCR) should be reserved, since they're always
available and needn't be defined before use. The flag bits (FPSCR_NZCV) should
like to be unreserved so they can be hoisted by MachineCSE. This fixes PR12165.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152076 91177308-0d34-0410-b5e6-96231b3b80d8
This was testing the handling of sub-register coalescing followed by
remat. The original problem was caused by the extra <imp-def> operands
added by sub-register coalescing. Those <imp-def> operands are not
added any longer, and the test case passes even when the original patch
is reverted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152040 91177308-0d34-0410-b5e6-96231b3b80d8
In this update:
- I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
- I kept setting .fpu=neon-vfpv4 code attribute because that is what the
assembler understands.
Patch by Ana Pazos <apazos@codeaurora.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152036 91177308-0d34-0410-b5e6-96231b3b80d8
MachineOperands that define part of a virtual register must have an
<undef> flag if they are not intended as read-modify-write operands.
The old trick of adding an <imp-def> operand doesn't work any longer.
Fixes PR12177.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152008 91177308-0d34-0410-b5e6-96231b3b80d8
equalities into phi node operands for which the equality is known to
hold in the incoming basic block. That's because replaceAllDominatedUsesWith
wasn't handling phi nodes correctly in general (that this didn't give wrong
results was just luck: the specific way GVN uses replaceAllDominatedUsesWith
precluded wrong changes to phi nodes).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152006 91177308-0d34-0410-b5e6-96231b3b80d8
Some BBs can become dead after codegen preparation. If we delete them here, it
could help enable tail-call optimizations later on.
<rdar://problem/10256573>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152002 91177308-0d34-0410-b5e6-96231b3b80d8
This could probably be made a lot smarter, but this is a common case and doesn't require LVI to scan a lot
of code. With this change CVP can optimize away the "shift == 0" case in Hashing.h that only gets hit when
"shift" is in a range not containing 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151919 91177308-0d34-0410-b5e6-96231b3b80d8
In this instance we are generating the tail-call during legalizeDAG. The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call. The first floor call can't be a tail-call because it's
not in the tail position. The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.
rdar://10930395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151877 91177308-0d34-0410-b5e6-96231b3b80d8
The inline table needs to be constructed ahead of time so that it doesn't try to
create new strings while we're emitting everything.
This reverts commit a8ff9bccb399183cdd5f1c3cec2bda763664b4b0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151864 91177308-0d34-0410-b5e6-96231b3b80d8
floating point equality comparisons into integer ones with -ffast-math. The
issue is the optimization causes +0.0 != -0.0.
Now the optimization is only done when one side is known to be 0.0. The other
side's sign bit is masked off for the comparison.
rdar://10964603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151861 91177308-0d34-0410-b5e6-96231b3b80d8
fixups that are being used to determine section offsets. Reduces
the total number of fixups by 50% for a non-trivial testcase.
Part of rdar://10413936
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151852 91177308-0d34-0410-b5e6-96231b3b80d8