The problematic part of this patch is that we were out of attribute bits,
requiring some fancy bit hacking to make it fit (by shrinking alignment)
without breaking existing users or the file format.
This change will require users to rebuild llvm-gcc to match llvm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61239 91177308-0d34-0410-b5e6-96231b3b80d8
nodes. This allows it to do fairly general phi insertion if a
load from a pointer global wants to be SRAd but the load is used
by (recursive) phi nodes. This fixes a pessimization on ppc
introduced by Load PRE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61123 91177308-0d34-0410-b5e6-96231b3b80d8
DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.
In terms of restoring the optimization, the best fix here isn't
obvious... any ideas?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61119 91177308-0d34-0410-b5e6-96231b3b80d8
consistently for deleting branches. In addition to being slightly
more readable, this makes SimplifyCFG a bit better
about cleaning up after itself when it makes conditions unused.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61100 91177308-0d34-0410-b5e6-96231b3b80d8
visited set before they are used. If used, their blocks need to be
added to the visited set so that subsequent queries don't use conflicting
pointer values in the cache result blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61080 91177308-0d34-0410-b5e6-96231b3b80d8
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61073 91177308-0d34-0410-b5e6-96231b3b80d8
and insert vector element. Modified extract vector element to extend the
result to match the expected promoted type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61029 91177308-0d34-0410-b5e6-96231b3b80d8
cleans up the generated code a bit. This should have the added benefit of
not randomly renaming functions/globals like my previous patch did. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61023 91177308-0d34-0410-b5e6-96231b3b80d8
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61022 91177308-0d34-0410-b5e6-96231b3b80d8
llvm[2]: Linking Release executable opt (without symbols)
...
Undefined symbols:
"llvm::APFloat::IEEEsingle", referenced from:
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
"llvm::APFloat::IEEEdouble", referenced from:
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
ld: symbol(s) not found
This is in release mode. To replicate, compile llvm and llvm-gcc in optimized
mode. Then build llvm, in optimized mode, with the newly created compiler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60977 91177308-0d34-0410-b5e6-96231b3b80d8
which are identical to the original patterns.
- Change the multiply with overflow so that we distinguish between signed and
unsigned multiplication. Currently, unsigned multiplication with overflow
isn't working!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60963 91177308-0d34-0410-b5e6-96231b3b80d8
for promoted integer types, eg: i16 on ppc-32, or
i24 on any platform. Complete support for arbitrary
precision integers would require handling expanded
integer types, eg: i128, but I couldn't be bothered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60834 91177308-0d34-0410-b5e6-96231b3b80d8
parallel, allowing it to decide that P/Q must alias if A/B
must alias in things like:
P = gep A, 0, i, 1
Q = gep B, 0, i, 1
This allows GVN to delete 62 more instructions out of 403.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60820 91177308-0d34-0410-b5e6-96231b3b80d8
- Emit DW_AT_byte_size for struct and union of size zero.
- Emit DW_AT_declaration for forward type declaration.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60812 91177308-0d34-0410-b5e6-96231b3b80d8
overflow/carry from the "arithmetic with overflow" intrinsics. It searches the
machine basic block from bottom to top to find the SETO/SETC instruction that is
its conditional. If an instruction modifies EFLAGS before it reaches the
SETO/SETC instruction, then it defaults to the normal instruction emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60807 91177308-0d34-0410-b5e6-96231b3b80d8
target-independent way of determining overflow on multiplication. It's very
tricky. Patch by Zoltan Varga!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60800 91177308-0d34-0410-b5e6-96231b3b80d8
essential problem was that the DAG can contain
random unused nodes which were never analyzed.
When remapping a value of a node being processed,
such a node may become used and need to be analyzed;
however due to operands being transformed during
analysis the node may morph into a different one.
Users of the morphing node need to be updated, and
this wasn't happening. While there I added a bunch
of documentation and sanity checks, so I (or some
other poor soul) won't have to scratch their head
over this stuff so long trying to remember how it
was all supposed to work next time some obscure
problem pops up! The extra sanity checking exposed
a few places where invariants weren't being preserved,
so those are fixed too. Since some of the sanity
checking is expensive, I added a flag to turn it
on. It is also turned on when building with
ENABLE_EXPENSIVE_CHECKS=1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60797 91177308-0d34-0410-b5e6-96231b3b80d8
tricks based on readnone/readonly functions.
Teach memdep to look past readonly calls when analyzing
deps for a readonly call. This allows elimination of a
few more calls from 403.gcc:
before:
63 gvn - Number of instructions PRE'd
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
after:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
5 calls isn't much, but this adds plumbing for the next change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60794 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix call.ll and call_indirect.ll expected results, now that it's using a
different pre-register allocation scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60741 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the shift amount when unrolling a vector shift into scalar shifts.
Fix problem in getShuffleScalarElt where it assumes that the input of
a bit convert must be a vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60740 91177308-0d34-0410-b5e6-96231b3b80d8
and use it in x86 address mode folding. Also, make
getRegForValue return 0 for illegal types even if it has a
ValueMap for them, because Argument values are put in the
ValueMap. This fixes PR3181.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60696 91177308-0d34-0410-b5e6-96231b3b80d8
doesn't do its own local caching, and is slightly more aggressive about
free/store dse (see testcase). This eliminates the last external client
of MemDep::getDependenceFrom().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60619 91177308-0d34-0410-b5e6-96231b3b80d8
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60608 91177308-0d34-0410-b5e6-96231b3b80d8
1. GlobalBaseReg may have been spilled.
2. It may not be live at the use.
3. Spiller doesn't know this is happening so it won't prevent GlobalBaseReg from being spilled later (That by itself is a nasty hack. It's needed because we don't insert the reload until later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60595 91177308-0d34-0410-b5e6-96231b3b80d8
aren't part of the test suite but are generally useful nonetheless, and can
be expanded later to test the backend against the actual Cell SPU system.
There's basically no other good place to put this code, so put it here for
the time being.
- vecoperations.c: Vector shuffles for all supported vector types, tests
for v16i8 add and multiply.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60566 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes many bugs. I will add more test cases in a separate check-in.
Some day, the code that manipulates CFG and updates dom. info could use refactoring help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60554 91177308-0d34-0410-b5e6-96231b3b80d8
1) have it fold "br undef", which does occur with
surprising frequency as jump threading iterates.
2) teach j-t to delete dead blocks. This removes the successor
edges, reducing the in-edges of other blocks, allowing
recursive simplification.
3) Fold things like:
br COND, BBX, BBY
BBX:
br COND, BBZ, BBW
which also happens because jump threading iterates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60470 91177308-0d34-0410-b5e6-96231b3b80d8
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.
Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60461 91177308-0d34-0410-b5e6-96231b3b80d8
delegates to the regular x86-32 convention which handles byval, but only
after it handles a few cases, and it's necessary to handle byval before
handling those cases. This fixes PR3122 (and rdar://6400815), llvm-gcc
miscompiling LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60453 91177308-0d34-0410-b5e6-96231b3b80d8
1. ppcf128 select is expanded to f64 select's.
2. f64 select operand 0 is an i1 truncate, it's promoted to i32 zero_extend.
3. f64 select is updated. It's changed back to a "NewNode" and being re-analyzed.
4. f64 select operands are being processed. Operand 0 is a "NewNode". It's being expunged out of ReplacedValues map.
5. ExpungeNode tries to remap f64 select and notice it's a "NewNode" and assert.
Duncan, please take a look. Thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60443 91177308-0d34-0410-b5e6-96231b3b80d8
- Incorporate Tilmann Scheller's ISD::TRUNCATE custom lowering patch
- Update SPU calling convention info, even if it's not used yet (but can be
at some point or another)
- Ensure that any-extended f32 loads are custom lowered, especially when
they're promoted for use in printf.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60438 91177308-0d34-0410-b5e6-96231b3b80d8
straight-forward implementation. This does not require any extra
alias analysis queries beyond what we already do for non-local loads.
Some programs really really like load PRE. For example, SPASS triggers
this ~1000 times, ~300 times in 255.vortex, and ~1500 times on 403.gcc.
The biggest limitation to the implementation is that it does not split
critical edges. This is a huge killer on many programs and should be
addressed after the initial patch is enabled by default.
The implementation of this should incidentally speed up rejection of
non-local loads because it avoids creating the repl densemap in cases
when it won't be used for fully redundant loads.
This is currently disabled by default.
Before I turn this on, I need to fix a couple of miscompilations in
the testsuite, look at compile time performance numbers, and look at
perf impact. This is pretty close to ready though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60408 91177308-0d34-0410-b5e6-96231b3b80d8
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
depending on the type of ADDO requested.
- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
COND_C set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60388 91177308-0d34-0410-b5e6-96231b3b80d8
figuring out the base of the IV. This produces better
code in the example. (Addresses use (IV) instead of
(BASE,IV) - a significant improvement on low-register
machines like x86).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60374 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix v2[if]64 vector insertion code before IBM files a bug report.
- Ensure that zero (0) offsets relative to $sp don't trip an assert
(add $sp, 0 gets legalized to $sp alone, tripping an assert)
- Shuffle masks passed to SPUISD::SHUFB are now v16i8 or v4i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60358 91177308-0d34-0410-b5e6-96231b3b80d8
multiplies.
Some more cleverness would be nice, though. It would be nice if we
could do this transformation on illegal types. Also, we would
prefer a narrower constant when possible so that we can use a narrower
multiply, which can be cheaper.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60283 91177308-0d34-0410-b5e6-96231b3b80d8
nearby FIXME.
I'm not sure what the right way to fix the Cell test was; if the
approach I used isn't okay, please let me know.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60277 91177308-0d34-0410-b5e6-96231b3b80d8
overflowed on negation. This commit checks to make sure that neithe C nor X
overflows. This requires that the RHS of X (a subtract instruction) be a
constant integer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60275 91177308-0d34-0410-b5e6-96231b3b80d8
properly updates the reverse dependency map when it installs updated
dependencies for instructions that depend on the removed instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60222 91177308-0d34-0410-b5e6-96231b3b80d8
1. Make it fold blocks separated by an unconditional branch. This enables
jump threading to see a broader scope.
2. Make jump threading able to eliminate locally redundant loads when they
feed the branch condition of a block. This frequently occurs due to
reg2mem running.
3. Make jump threading able to eliminate *partially redundant* loads when
they feed the branch condition of a block. This is common in code with
lots of loads and stores like C++ code and 255.vortex.
This implements thread-loads.ll and rdar://6402033.
Per the fixme's, several pieces of this should be moved into Transforms/Utils.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60148 91177308-0d34-0410-b5e6-96231b3b80d8
performance in most cases on the Grawp tester, but does speed some
things up (like shootout/hash by 15%). This also doesn't impact
compile time in a noticable way on the Grawp tester.
It also, of course, gets the testcase it was designed for right :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60120 91177308-0d34-0410-b5e6-96231b3b80d8
-enable-smarter-addr-folding to llc) that gives CGP a better
cost model for when to sink computations into addressing modes.
The basic observation is that sinking increases register
pressure when part of the addr computation has to be available
for other reasons, such as having a use that is a non-memory
operation. In cases where it works, it can substantially reduce
register pressure.
This code is currently an overall win on 403.gcc and 255.vortex
(the two things I've been looking at), but there are several
things I want to do before enabling it by default:
1. This isn't doing any caching of results, so it is much slower
than it could be. It currently slows down release-asserts llc
by 1.7% on 176.gcc: 27.12s -> 27.60s.
2. This doesn't think about inline asm memory operands yet.
3. The cost model botches the case when the needed value is live
across the computation for other reasons.
I'll continue poking at this, and eventually turn it on as llcbeta.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60074 91177308-0d34-0410-b5e6-96231b3b80d8
optimize addressing modes. This allows us to optimize things like isel-sink2.ll
into:
movl 4(%esp), %eax
cmpb $0, 4(%eax)
jne LBB1_2 ## F
LBB1_1: ## TB
movl $4, %eax
ret
LBB1_2: ## F
movzbl 7(%eax), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpb $0, 4(%eax)
leal 4(%eax), %eax
jne LBB1_2 ## F
LBB1_1: ## TB
movl $4, %eax
ret
LBB1_2: ## F
movzbl 3(%eax), %eax
ret
This shrinks (e.g.) 403.gcc from 1133510 to 1128345 lines of .s.
Note that the 2008-10-16-SpillerBug.ll testcase is dubious at best, I doubt
it is really testing what it thinks it is.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60068 91177308-0d34-0410-b5e6-96231b3b80d8
(a) Remove conditionally removed code in SelectXAddr. Basically, hope for the
best that the A-form and D-form address predicates catch everything before
the code decides to emit a X-form address.
(b) Expand vector store test cases to include the usual suspects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60034 91177308-0d34-0410-b5e6-96231b3b80d8
introduce any new spilling; it just uses unused registers.
Refactor the SUnit topological sort code out of the RRList scheduler and
make use of it to help with the post-pass scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59999 91177308-0d34-0410-b5e6-96231b3b80d8
(a) Slight rethink on i64 zero/sign/any extend code - use a shuffle to
directly zero-extend i32 to i64, but use rotates and shifts for
sign extension. Also ensure unified register consistency.
(b) Add new test harness for i64 operations: i64ops.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59970 91177308-0d34-0410-b5e6-96231b3b80d8
(a) Improve the extract element code: there's no need to do gymnastics with
rotates into the preferred slot if a shuffle will do the same thing.
(b) Rename a couple of SPUISD pseudo-instructions for readability and better
semantic correspondence.
(c) Fix i64 sign/any/zero extension lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59965 91177308-0d34-0410-b5e6-96231b3b80d8
indicate functions that allocate, such as operator new, or list::insert. The
actual definition is slightly less strict (for now).
No changes to the bitcode reader/writer, asm printer or verifier were needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59934 91177308-0d34-0410-b5e6-96231b3b80d8