by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66941 91177308-0d34-0410-b5e6-96231b3b80d8
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66875 91177308-0d34-0410-b5e6-96231b3b80d8
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66869 91177308-0d34-0410-b5e6-96231b3b80d8
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66865 91177308-0d34-0410-b5e6-96231b3b80d8
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66826 91177308-0d34-0410-b5e6-96231b3b80d8
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66779 91177308-0d34-0410-b5e6-96231b3b80d8
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66641 91177308-0d34-0410-b5e6-96231b3b80d8
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66610 91177308-0d34-0410-b5e6-96231b3b80d8
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66318 91177308-0d34-0410-b5e6-96231b3b80d8
with multiple chain operands. This can occur when the scheduler
has added chain operands to a node that already has a chain
operand, in order to handle physical register dependencies.
This fixes an llvm-gcc bootstrap failure on x86-64 introduced
in r66058.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66240 91177308-0d34-0410-b5e6-96231b3b80d8
so it changed it into a 31 via the TLO.ShrinkDemandedConstant() call. Then it
would go through the DAG combiner again. This time it had a value of 31, which
was turned into a -1 by TLI.SimplifyDemandedBits(). This would ping pong
forever.
Teach the TLO.ShrinkDemandedConstant() call not to lower a value if the demanded
value is an XOR of all ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65985 91177308-0d34-0410-b5e6-96231b3b80d8
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65296 91177308-0d34-0410-b5e6-96231b3b80d8
Now we're using one gross, but quite robust hack :) (previous ones
did not work, for example, when ext_weak symbol was used deep inside
constant expression in the initializer).
The proper fix of this problem will require some quite huge asmprinter
changes and that's why was postponed. This fixes PR3629 by the way :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65230 91177308-0d34-0410-b5e6-96231b3b80d8
addresses, part 1. This fixes an obvious logic bug. Previously if the only
in-loop use is a PHI, it would return AllUsesAreAddresses as true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65178 91177308-0d34-0410-b5e6-96231b3b80d8
reduction of address calculations down to basic pointer arithmetic.
This is currently off by default, as it needs a few other features
before it becomes generally useful. And even when enabled, full
strength reduction is only performed when it doesn't increase
register pressure, and when several other conditions are true.
This also factors out a bunch of exisiting LSR code out of
StrengthReduceStridedIVUsers into separate functions, and tidies
up IV insertion. This actually decreases register pressure even
in non-superhero mode. The change in iv-users-in-other-loops.ll
is an example of this; there are two more adds because there are
two fewer leas, and there is less spilling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65108 91177308-0d34-0410-b5e6-96231b3b80d8
Enhance instcombine to use the preferred field of
GetOrEnforceKnownAlignment in more cases, so that regular IR operations are
optimized in the same way that the intrinsics currently are.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64623 91177308-0d34-0410-b5e6-96231b3b80d8
addrec in a different loop to check the value being added to
the accumulated Start value, not the Start value before it has
the new value added to it. This prevents LSR from going crazy
on the included testcase. Dale, please review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64440 91177308-0d34-0410-b5e6-96231b3b80d8
after sorting by stride value. This prevents it from missing
IV reuse opportunities in a host-sensitive manner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64415 91177308-0d34-0410-b5e6-96231b3b80d8
in inline asm as signed (what gcc does). Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64400 91177308-0d34-0410-b5e6-96231b3b80d8
unless they actually have data successors, and likewise for nodes
with no data successors unless they actually have data precessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64327 91177308-0d34-0410-b5e6-96231b3b80d8
It was transforming (x&y)==y to (x&y)!=0 in the case where
y is variable and known to have at most one bit set (e.g. z&1).
This is not correct; the expressions are not equivalent when y==0.
I believe this patch salvages what can be salvaged, including
all the cases in bt.ll. Dan, please review.
Fixes gcc.c-torture/execute/20040709-[12].c
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64314 91177308-0d34-0410-b5e6-96231b3b80d8
in any old order. Since analyzing a node analyzes its
operands also, this can mean that when we pop a node
off the list of nodes to be analyzed, it may already
have been analyzed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63632 91177308-0d34-0410-b5e6-96231b3b80d8
With the new world order, it can handle cases where the first
store into the alloca is an element of the vector, instead of
requiring the first analyzed store to have the vector type
itself. This allows us to un-xfail
test/CodeGen/X86/vec_ins_extract.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63590 91177308-0d34-0410-b5e6-96231b3b80d8
--This line, and those below, will be ignaored--
A test/CodeGen/X86/nosse-error1.ll
A test/CodeGen/X86/nosse-error2.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63496 91177308-0d34-0410-b5e6-96231b3b80d8
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63494 91177308-0d34-0410-b5e6-96231b3b80d8
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization. Thanks to Dan for writing the
original patch (which I shamelessly pillaged).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63482 91177308-0d34-0410-b5e6-96231b3b80d8
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63266 91177308-0d34-0410-b5e6-96231b3b80d8
checking logic. Rather than make the checking more
complicated, I've tweaked some logic to make things
conform to how the checking thought things ought to
be, since this results in a simpler "mental model".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63048 91177308-0d34-0410-b5e6-96231b3b80d8
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1
%reg1029<def> = MOV8rr %reg1028
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>
insert => %reg1030<def> = MOV8rr %reg1028
%reg1030<def> = ADD8rr %reg1028<kill>, %reg1029<kill>, %EFLAGS<imp-def,dead>
In this case, it might not be possible to coalesce the second MOV8rr
instruction if the first one is coalesced. So it would be profitable to
commute it:
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1
%reg1029<def> = MOV8rr %reg1028
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>
insert => %reg1030<def> = MOV8rr %reg1029
%reg1030<def> = ADD8rr %reg1029<kill>, %reg1028<kill>, %EFLAGS<imp-def,dead>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62954 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify x+0 to x in unsafe-fp-math mode. This avoids a bunch of
redundant work in many cases, because in unsafe-fp-math mode,
ISD::FADD with a constant is considered free to negate, so the
DAGCombiner often negates x+0 to -0-x thinking it's free, when
in reality the end result is -x, which is more expensive than x.
Also, combine x*0 to 0.
This fixes PR3374.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62789 91177308-0d34-0410-b5e6-96231b3b80d8
special cases after producing the new reduced-width load, because the
new load already has the needed adjustments built into it. This fixes
several bugs due to the special cases, including PR3317.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62692 91177308-0d34-0410-b5e6-96231b3b80d8
we want to clear %ah to zero before a division, just use a
zero-extending mov to %al. This fixes PR3366.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62691 91177308-0d34-0410-b5e6-96231b3b80d8
as its comment says, even in the case where it will be generating
extending loads. This fixes PR3216.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62557 91177308-0d34-0410-b5e6-96231b3b80d8
uses are added to the From node while it is processing From's
use list, because of automatic local CSE. The fix is to avoid
visiting any new uses.
Fix a few places in the DAGCombiner that assumed that after
a RAUW call, the From node has no users and may be deleted.
This fixes PR3018.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62533 91177308-0d34-0410-b5e6-96231b3b80d8
optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62504 91177308-0d34-0410-b5e6-96231b3b80d8
simple %prcontext which doesn't find what it's looking for
if the scheduler has rearranged the instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62363 91177308-0d34-0410-b5e6-96231b3b80d8
to Eli for pointing out that these forms don't ignore the high bits of
their index operands, and as such are not immediately suitable for use
by isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62194 91177308-0d34-0410-b5e6-96231b3b80d8
via two paths, process it once not twice, d'oh!
Analysis, testcase and original patch thanks to
Mon Ping Wang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62169 91177308-0d34-0410-b5e6-96231b3b80d8
Also future proof the scheduler to handle "normal" physical register dependencies. The code is not exercised yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62074 91177308-0d34-0410-b5e6-96231b3b80d8
v1024 = EDI // not killed
=
= EDI
One possible solution is for the coalescer to examine the sub-register live intervals in the same manner as the physical register. Another possibility is to examine defs and uses (when needed) of sub-registers. Both solutions are too expensive. For now, look for "short virtual intervals" and scan instructions to look for conflict instead.
This is a small win on x86-64. e.g. It shaves 403.gcc by ~80 instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61847 91177308-0d34-0410-b5e6-96231b3b80d8
into their left operand, rather than their right. Do this
by commuting the operands and inverting the condition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61842 91177308-0d34-0410-b5e6-96231b3b80d8
avoid the need for spilling, add a new testcase that tests that the
pcmpeqd used for V_SETALLONES is changed to a constant-pool load as
needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61831 91177308-0d34-0410-b5e6-96231b3b80d8
converted to LEA64_32r in x86's convertToThreeAddress. This
replaces code like this:
movl %esi, %edi
inc %edi
with this:
lea 1(%rsi), %edi
which appears to be beneficial.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61830 91177308-0d34-0410-b5e6-96231b3b80d8
AddPseudoTwoAddrDeps. This lets the scheduling infrastructure
avoid recalculating node heights. In very large testcases this
was a major bottleneck. Thanks to Roman Levenstein for finding
this!
As a side effect, fold-pcmpeqd-0.ll is now scheduled better
and it no longer requires spilling on x86-32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61778 91177308-0d34-0410-b5e6-96231b3b80d8
constant shift count that doesn't fit in the shift instruction's
immediate field. This fixes PR3242.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61281 91177308-0d34-0410-b5e6-96231b3b80d8
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>,
36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
The right way to represent this is something like
%reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47)
Of course that won't work since that means overlapping live ranges defined by two val#.
The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61259 91177308-0d34-0410-b5e6-96231b3b80d8
- Use SplitBlockPredecessors to factor out common predecessors of the critical edge destination. This is disabled for now due to some regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61248 91177308-0d34-0410-b5e6-96231b3b80d8
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61242 91177308-0d34-0410-b5e6-96231b3b80d8
DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.
In terms of restoring the optimization, the best fix here isn't
obvious... any ideas?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61119 91177308-0d34-0410-b5e6-96231b3b80d8
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61073 91177308-0d34-0410-b5e6-96231b3b80d8
and insert vector element. Modified extract vector element to extend the
result to match the expected promoted type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61029 91177308-0d34-0410-b5e6-96231b3b80d8
which are identical to the original patterns.
- Change the multiply with overflow so that we distinguish between signed and
unsigned multiplication. Currently, unsigned multiplication with overflow
isn't working!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60963 91177308-0d34-0410-b5e6-96231b3b80d8
overflow/carry from the "arithmetic with overflow" intrinsics. It searches the
machine basic block from bottom to top to find the SETO/SETC instruction that is
its conditional. If an instruction modifies EFLAGS before it reaches the
SETO/SETC instruction, then it defaults to the normal instruction emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60807 91177308-0d34-0410-b5e6-96231b3b80d8
target-independent way of determining overflow on multiplication. It's very
tricky. Patch by Zoltan Varga!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60800 91177308-0d34-0410-b5e6-96231b3b80d8
essential problem was that the DAG can contain
random unused nodes which were never analyzed.
When remapping a value of a node being processed,
such a node may become used and need to be analyzed;
however due to operands being transformed during
analysis the node may morph into a different one.
Users of the morphing node need to be updated, and
this wasn't happening. While there I added a bunch
of documentation and sanity checks, so I (or some
other poor soul) won't have to scratch their head
over this stuff so long trying to remember how it
was all supposed to work next time some obscure
problem pops up! The extra sanity checking exposed
a few places where invariants weren't being preserved,
so those are fixed too. Since some of the sanity
checking is expensive, I added a flag to turn it
on. It is also turned on when building with
ENABLE_EXPENSIVE_CHECKS=1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60797 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the shift amount when unrolling a vector shift into scalar shifts.
Fix problem in getShuffleScalarElt where it assumes that the input of
a bit convert must be a vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60740 91177308-0d34-0410-b5e6-96231b3b80d8
and use it in x86 address mode folding. Also, make
getRegForValue return 0 for illegal types even if it has a
ValueMap for them, because Argument values are put in the
ValueMap. This fixes PR3181.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60696 91177308-0d34-0410-b5e6-96231b3b80d8
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60608 91177308-0d34-0410-b5e6-96231b3b80d8
1. GlobalBaseReg may have been spilled.
2. It may not be live at the use.
3. Spiller doesn't know this is happening so it won't prevent GlobalBaseReg from being spilled later (That by itself is a nasty hack. It's needed because we don't insert the reload until later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60595 91177308-0d34-0410-b5e6-96231b3b80d8
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.
Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60461 91177308-0d34-0410-b5e6-96231b3b80d8
delegates to the regular x86-32 convention which handles byval, but only
after it handles a few cases, and it's necessary to handle byval before
handling those cases. This fixes PR3122 (and rdar://6400815), llvm-gcc
miscompiling LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60453 91177308-0d34-0410-b5e6-96231b3b80d8
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
depending on the type of ADDO requested.
- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
COND_C set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60388 91177308-0d34-0410-b5e6-96231b3b80d8
figuring out the base of the IV. This produces better
code in the example. (Addresses use (IV) instead of
(BASE,IV) - a significant improvement on low-register
machines like x86).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60374 91177308-0d34-0410-b5e6-96231b3b80d8
multiplies.
Some more cleverness would be nice, though. It would be nice if we
could do this transformation on illegal types. Also, we would
prefer a narrower constant when possible so that we can use a narrower
multiply, which can be cheaper.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60283 91177308-0d34-0410-b5e6-96231b3b80d8
nearby FIXME.
I'm not sure what the right way to fix the Cell test was; if the
approach I used isn't okay, please let me know.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60277 91177308-0d34-0410-b5e6-96231b3b80d8
performance in most cases on the Grawp tester, but does speed some
things up (like shootout/hash by 15%). This also doesn't impact
compile time in a noticable way on the Grawp tester.
It also, of course, gets the testcase it was designed for right :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60120 91177308-0d34-0410-b5e6-96231b3b80d8
-enable-smarter-addr-folding to llc) that gives CGP a better
cost model for when to sink computations into addressing modes.
The basic observation is that sinking increases register
pressure when part of the addr computation has to be available
for other reasons, such as having a use that is a non-memory
operation. In cases where it works, it can substantially reduce
register pressure.
This code is currently an overall win on 403.gcc and 255.vortex
(the two things I've been looking at), but there are several
things I want to do before enabling it by default:
1. This isn't doing any caching of results, so it is much slower
than it could be. It currently slows down release-asserts llc
by 1.7% on 176.gcc: 27.12s -> 27.60s.
2. This doesn't think about inline asm memory operands yet.
3. The cost model botches the case when the needed value is live
across the computation for other reasons.
I'll continue poking at this, and eventually turn it on as llcbeta.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60074 91177308-0d34-0410-b5e6-96231b3b80d8
optimize addressing modes. This allows us to optimize things like isel-sink2.ll
into:
movl 4(%esp), %eax
cmpb $0, 4(%eax)
jne LBB1_2 ## F
LBB1_1: ## TB
movl $4, %eax
ret
LBB1_2: ## F
movzbl 7(%eax), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpb $0, 4(%eax)
leal 4(%eax), %eax
jne LBB1_2 ## F
LBB1_1: ## TB
movl $4, %eax
ret
LBB1_2: ## F
movzbl 3(%eax), %eax
ret
This shrinks (e.g.) 403.gcc from 1133510 to 1128345 lines of .s.
Note that the 2008-10-16-SpillerBug.ll testcase is dubious at best, I doubt
it is really testing what it thinks it is.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60068 91177308-0d34-0410-b5e6-96231b3b80d8
introduce any new spilling; it just uses unused registers.
Refactor the SUnit topological sort code out of the RRList scheduler and
make use of it to help with the post-pass scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59999 91177308-0d34-0410-b5e6-96231b3b80d8
to carry a SmallVector of flagged nodes, just calculate the flagged nodes
dynamically when they are needed.
The local-liveness change is due to a trivial scheduling change where
the scheduler arbitrary decision differently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@59273 91177308-0d34-0410-b5e6-96231b3b80d8
inform the optimizers that the result must be zero/
sign extended from the smaller type. For example,
if a fp to unsigned i16 is promoted to fp to i32,
then we are allowed to assume that the extra 16 bits
are zero (because the result of fp to i16 is undefined
if the result does not fit in an i16). This is
quite aggressive, but should help the optimizers
produce better code. This requires correcting a
test which thought that fp_to_uint is some kind
of truncation, which it is not: in the testcase
(which does fp to i1), either the fp value converts
to 0 or 1 or the result is undefined, which is
quite different to truncation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58991 91177308-0d34-0410-b5e6-96231b3b80d8
have its node id set. The new and and shift nodes are the nodes that need
the IDs. This fixes PR2982.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58655 91177308-0d34-0410-b5e6-96231b3b80d8
bits, use a union of a SimpleValueType enum and a regular Type*.
This increases the size of MVT on 64-bit hosts from 32 bits to 64 bits.
In most cases, this doesn't add significant overhead. There are places
in codegen that use arrays of MVTs, so these are now larger, but
they're small in common cases.
This eliminates restrictions on the size of integer types and vector
types that can be represented in codegen. As the included testcase
demonstrates, it's now possible to codegen very large add operations.
There are still some complications with using very large types. PR2880
is still open so they can't be used as return values on normal targets,
there are no libcalls defined for very large integers so operations
like multiply and divide aren't supported.
This also introduces a minimal tablgen Type library, capable of
handling IntegerType and VectorType. This will allow parts of
TableGen that don't depend on using SimpleValueType values to handle
arbitrary integer and vector types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58623 91177308-0d34-0410-b5e6-96231b3b80d8