more. Update the syntax we're checking for and filecheckize it too.
This will fix the selfhost buildbots but will 'break' the others (sigh) because
they're still linked against older LLVM which is emitting less optimized IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90104 91177308-0d34-0410-b5e6-96231b3b80d8
handle cases like this:
void test(int N, double* G) {
long j;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
where G[1] isn't live into the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90041 91177308-0d34-0410-b5e6-96231b3b80d8
way that getUnderlyingObject does it.
This fixes the 'DecomposeGEPExpression and getUnderlyingObject disagree!'
assertion on sqlite3.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90038 91177308-0d34-0410-b5e6-96231b3b80d8
translation of add with immediate. This allows us
to optimize this function:
void test(int N, double* G) {
long j;
G[1] = 1;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
to only do one load every iteration of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90013 91177308-0d34-0410-b5e6-96231b3b80d8
array indexes. The "complex" case of SRoA still handles them, and correctly.
This fixes a weirdness where we'd correctly avoid transforming A[0][42] if
the 42 was too large, but we'd only do it if it was one gep, not two separate
ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90007 91177308-0d34-0410-b5e6-96231b3b80d8
the problem only shows for msp430 and pic16 which is why it specifies
them using -march. But it is wrong to put such tests in CodeGen/Generic,
since not everyone builds these targets. Put a copy of the test in each
of the target test directories.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90005 91177308-0d34-0410-b5e6-96231b3b80d8
where it is not available. It's unclear how to get this inserted
computation into GVN's scalar availability sets, Owen, help? :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89997 91177308-0d34-0410-b5e6-96231b3b80d8
generates store to undef and some generates store to null as the idiom
for undefined behavior. Since simplifycfg zaps both, don't remove the
undefined behavior in instcombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89971 91177308-0d34-0410-b5e6-96231b3b80d8
first expression as P+4+4*i which we considered to possibly alias
P+4*j. Now we correctly analyze the former one as P+1+4*i.
@test10 is a sanity test that verfies that we know that P+4+4*i != P+4*i.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89960 91177308-0d34-0410-b5e6-96231b3b80d8
This violates the ABI (that area is "reserved"), and
while it is safe if all code is generated with current
compilers, there is some very old code around that uses
that slot for something else, and breaks if it is stored
into. Adjust testcases looking for current behavior.
I've verified that the stack frame size is right in all
testcases, whether it changed or not. 7311323.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89811 91177308-0d34-0410-b5e6-96231b3b80d8
than doing the same via constpool:
1. Load from constpool costs 3 cycles on A9, movt/movw pair - just 2.
2. Load from constpool might stall up to 300 cycles due to cache miss.
3. Movt/movw does not use load/store unit.
4. Less constpool entries => better compiler performance.
This is only enabled on ELF systems, since darwin does not have needed
relocations (yet).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89720 91177308-0d34-0410-b5e6-96231b3b80d8
ConstantExpr, not just the top-level operator. This allows it to
fold many more constants.
Also, make GlobalOpt call ConstantFoldConstantExpression on
GlobalVariable initializers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89659 91177308-0d34-0410-b5e6-96231b3b80d8
values, resolving references to them, and then removing the definitions.
If a template argument is set to an undefined value, we need to resolve
references to that argument to an explicit undefined value. The current code
leaves the reference to the template argument as it is, which causes an
assertion failure later when the definition of the template argument is
removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89581 91177308-0d34-0410-b5e6-96231b3b80d8
Also fixed the corresponding testcase, and the PALIGNR
intrinsic (tested for correctness with llvm-gcc).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89491 91177308-0d34-0410-b5e6-96231b3b80d8
it may be used in contexts where preheader insertion may have failed due
to an indirectbr.
Make LoopSimplify's LoopSimplify::SeparateNestedLoop properly fail in
the case that it would require splitting an indirectbr edge.
These fix PR5502.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89484 91177308-0d34-0410-b5e6-96231b3b80d8
which was an expensive checks failure due to a bug in the checking. This
patch in essence reverts the original fix for PR3393, and refixes it by a
tweak to the way expensive checking is done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89454 91177308-0d34-0410-b5e6-96231b3b80d8
if it is not ultimately captured. Teach BasicAliasAnalysis that a
local object address which does not escape and is never stored does
not alias with a value resulting from a load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89398 91177308-0d34-0410-b5e6-96231b3b80d8
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89374 91177308-0d34-0410-b5e6-96231b3b80d8
When TwoAddressInstructionPass deletes a dead instruction, make sure that all
register kills are accounted for. The 2-addr register does not get special
treatment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89246 91177308-0d34-0410-b5e6-96231b3b80d8
The local register allocator doesn't like it when LiveVariables is run.
We should also disable edge splitting under -O0, but that has to wait a bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89125 91177308-0d34-0410-b5e6-96231b3b80d8
by the recent FixedStackPseudoSourceValue-related changes, now that
the specific bug that affected it is fixed, in r88954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88997 91177308-0d34-0410-b5e6-96231b3b80d8
address space (though it only uses a small fraction of that), and the
buildbots disallow that.
Also add a comment to the Makefile's ulimit line warning future
developers that changing it won't work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88994 91177308-0d34-0410-b5e6-96231b3b80d8
The large code model is documented at
http://www.x86-64.org/documentation/abi.pdf and says that calls should
assume their target doesn't live within the 32-bit pc-relative offset
that fits in the call instruction.
To do this, we turn off the global-address->target-global-address
conversion in X86TargetLowering::LowerCall(). The first attempt at
this broke the lazy JIT because it can separate the movabs(imm->reg)
from the actual call instruction. The lazy JIT receives the address of
the movabs as a relocation and needs to record the return address from
the call; and then when that call happens, it needs to patch the
movabs with the newly-compiled target. We could thread the call
instruction into the relocation and record the movabs<->call mapping
explicitly, but that seems to require at least as much new
complication in the code generator as this change.
To fix this, we make lazy functions _always_ go through a call
stub. You'd think we'd only have to force lazy calls through a stub on
difficult platforms, but that turns out to break indirect calls
through a function pointer. The right fix for that is to distinguish
between calls and address-of operations on uncompiled functions, but
that's complex enough to leave for someone else to do.
Another attempt at this defined a new CALL64i pseudo-instruction,
which expanded to a 2-instruction sequence in the assembly output and
was special-cased in the X86CodeEmitter's emitInstruction()
function. That broke indirect calls in the same way as above.
This patch also removes a hack forcing Darwin to the small code model.
Without far-call-stubs, the small code model requires things of the
JITMemoryManager that the DefaultJITMemoryManager can't provide.
Thanks to echristo for lots of testing!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88984 91177308-0d34-0410-b5e6-96231b3b80d8
Have the asm printer emit a comment if an instruction is a spill or
reload and have the spiller mark copies it introdues so the asm printer
can also annotate those.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88911 91177308-0d34-0410-b5e6-96231b3b80d8
- If destination is a physical register and it has a subreg index, use the
sub-register instead.
This fixes PR5423.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88745 91177308-0d34-0410-b5e6-96231b3b80d8
target-specific AsmPrinters. Not all comments need DebugInfo.
Re-enable the line numbers comment test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88697 91177308-0d34-0410-b5e6-96231b3b80d8
code-size win, and not when it's only likely to be code-size neutral,
such as when only a single instruction would be eliminated and a new
branch would be required.
This fixes rdar://7392894.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88692 91177308-0d34-0410-b5e6-96231b3b80d8
D0<def,dead> = ...
...
= S0<use, kill>
S0<def> = ...
...
D0<def> =
The first D0 def is correctly marked dead, however, livevariables should have
added an implicit def of S0 or we end up with a use without a def.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@88690 91177308-0d34-0410-b5e6-96231b3b80d8
running IPSCCP early, and we run functionattrs interlaced with the inliner,
we often (particularly for small or noop functions) completely propagate
all of the information about a call to its call site in IPSSCP (making a call
dead) and functionattrs is smart enough to realize that the function is
readonly (because it is interlaced with inliner).
To improve compile time and make the inliner threshold more accurate, realize
that we don't have to inline dead readonly function calls. Instead, just
delete the call. This happens all the time for C++ codes, here are some
counters from opt/llvm-ld counting the number of times calls were deleted vs
inlined on various apps:
Tramp3d opt:
5033 inline - Number of call sites deleted, not inlined
24596 inline - Number of functions inlined
llvm-ld:
667 inline - Number of functions deleted because all callers found
699 inline - Number of functions inlined
483.xalancbmk opt:
8096 inline - Number of call sites deleted, not inlined
62528 inline - Number of functions inlined
llvm-ld:
217 inline - Number of allocas merged together
2158 inline - Number of functions inlined
471.omnetpp:
331 inline - Number of call sites deleted, not inlined
8981 inline - Number of functions inlined
llvm-ld:
171 inline - Number of functions deleted because all callers found
629 inline - Number of functions inlined
Deleting a call is much faster than inlining it, and is insensitive to the
size of the callee. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86975 91177308-0d34-0410-b5e6-96231b3b80d8
cannot be folded into target cmp instruction.
- Avoid a phase ordering issue where early cmp optimization would prevent the
later count-to-zero optimization.
- Add missing checks which could cause LSR to reuse stride that does not have
users.
- Fix a bug in count-to-zero optimization code which failed to find the pre-inc
iv's phi node.
- Remove, tighten, loosen some incorrect checks disable valid transformations.
- Quite a bit of code clean up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86969 91177308-0d34-0410-b5e6-96231b3b80d8
tail merging support to handle more cases.
- Recognize several cases where tail merging is beneficial even when
the tail size is smaller than the generic threshold.
- Make use of MachineInstrDesc::isBarrier to help detect
non-fallthrough blocks.
- Check for and avoid disrupting fall-through edges in more cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86871 91177308-0d34-0410-b5e6-96231b3b80d8
llvm.invariant.start to be used without necessarily being paired with a call
to llvm.invariant.end. If you run the entire optimization pipeline then such
calls are in fact deleted (adce does it), but that's actually a good thing since
we probably do want them to be zapped late in the game. There should really be
an integration test that checks that the llvm.invariant.start call lasts long
enough that all passes that do interesting things with it get to do their stuff
before it is deleted. But since no passes do anything interesting with it yet
this will have to wait for later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86840 91177308-0d34-0410-b5e6-96231b3b80d8
constant whose component type is not a legal type for the target.
(If the target ConstantPool cannot handle this type either, it has
an opportunity to merge elements. In practice any target with
8-bit bytes must support i8 *as data*). 7320806 (partial).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86751 91177308-0d34-0410-b5e6-96231b3b80d8
generates a sequence similar to this:
__Z4funci:
LFB2:
mflr r0
LCFI0:
stmw r30,-8(r1)
LCFI1:
stw r0,8(r1)
LCFI2:
stwu r1,-80(r1)
LCFI3:
mr r30,r1
LCFI4:
where LCFI3 and LCFI4 are used by the FDE to indicate what the FP, LR, and other
things are. We generated something more like this:
Leh_func_begin1:
mflr r0
stw r31, 20(r1)
stw r0, 8(r1)
Llabel1:
stwu r1, -80(r1)
Llabel2:
mr r31, r1
Note that we are missing the "mr" instruction. This patch makes it more like the
GCC output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86729 91177308-0d34-0410-b5e6-96231b3b80d8
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86722 91177308-0d34-0410-b5e6-96231b3b80d8
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86646 91177308-0d34-0410-b5e6-96231b3b80d8
the loop. This is needed because with indirectbr it may not be possible
for LoopSimplify to guarantee that all loop exit predecessors are
inside the loop. This fixes PR5437.
LCCSA no longer actually requires LoopSimplify form, but for now it
must still have the dependency because the PassManager doesn't know
how to schedule LoopSimplify otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86569 91177308-0d34-0410-b5e6-96231b3b80d8
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86498 91177308-0d34-0410-b5e6-96231b3b80d8