1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66875 91177308-0d34-0410-b5e6-96231b3b80d8
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66869 91177308-0d34-0410-b5e6-96231b3b80d8
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66865 91177308-0d34-0410-b5e6-96231b3b80d8
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66826 91177308-0d34-0410-b5e6-96231b3b80d8
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66779 91177308-0d34-0410-b5e6-96231b3b80d8
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66641 91177308-0d34-0410-b5e6-96231b3b80d8
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66610 91177308-0d34-0410-b5e6-96231b3b80d8
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66318 91177308-0d34-0410-b5e6-96231b3b80d8
with multiple chain operands. This can occur when the scheduler
has added chain operands to a node that already has a chain
operand, in order to handle physical register dependencies.
This fixes an llvm-gcc bootstrap failure on x86-64 introduced
in r66058.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@66240 91177308-0d34-0410-b5e6-96231b3b80d8
so it changed it into a 31 via the TLO.ShrinkDemandedConstant() call. Then it
would go through the DAG combiner again. This time it had a value of 31, which
was turned into a -1 by TLI.SimplifyDemandedBits(). This would ping pong
forever.
Teach the TLO.ShrinkDemandedConstant() call not to lower a value if the demanded
value is an XOR of all ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65985 91177308-0d34-0410-b5e6-96231b3b80d8
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65296 91177308-0d34-0410-b5e6-96231b3b80d8
Now we're using one gross, but quite robust hack :) (previous ones
did not work, for example, when ext_weak symbol was used deep inside
constant expression in the initializer).
The proper fix of this problem will require some quite huge asmprinter
changes and that's why was postponed. This fixes PR3629 by the way :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65230 91177308-0d34-0410-b5e6-96231b3b80d8
addresses, part 1. This fixes an obvious logic bug. Previously if the only
in-loop use is a PHI, it would return AllUsesAreAddresses as true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65178 91177308-0d34-0410-b5e6-96231b3b80d8
reduction of address calculations down to basic pointer arithmetic.
This is currently off by default, as it needs a few other features
before it becomes generally useful. And even when enabled, full
strength reduction is only performed when it doesn't increase
register pressure, and when several other conditions are true.
This also factors out a bunch of exisiting LSR code out of
StrengthReduceStridedIVUsers into separate functions, and tidies
up IV insertion. This actually decreases register pressure even
in non-superhero mode. The change in iv-users-in-other-loops.ll
is an example of this; there are two more adds because there are
two fewer leas, and there is less spilling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65108 91177308-0d34-0410-b5e6-96231b3b80d8
Enhance instcombine to use the preferred field of
GetOrEnforceKnownAlignment in more cases, so that regular IR operations are
optimized in the same way that the intrinsics currently are.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64623 91177308-0d34-0410-b5e6-96231b3b80d8
addrec in a different loop to check the value being added to
the accumulated Start value, not the Start value before it has
the new value added to it. This prevents LSR from going crazy
on the included testcase. Dale, please review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64440 91177308-0d34-0410-b5e6-96231b3b80d8
after sorting by stride value. This prevents it from missing
IV reuse opportunities in a host-sensitive manner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64415 91177308-0d34-0410-b5e6-96231b3b80d8
in inline asm as signed (what gcc does). Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64400 91177308-0d34-0410-b5e6-96231b3b80d8
unless they actually have data successors, and likewise for nodes
with no data successors unless they actually have data precessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64327 91177308-0d34-0410-b5e6-96231b3b80d8
It was transforming (x&y)==y to (x&y)!=0 in the case where
y is variable and known to have at most one bit set (e.g. z&1).
This is not correct; the expressions are not equivalent when y==0.
I believe this patch salvages what can be salvaged, including
all the cases in bt.ll. Dan, please review.
Fixes gcc.c-torture/execute/20040709-[12].c
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64314 91177308-0d34-0410-b5e6-96231b3b80d8
in any old order. Since analyzing a node analyzes its
operands also, this can mean that when we pop a node
off the list of nodes to be analyzed, it may already
have been analyzed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63632 91177308-0d34-0410-b5e6-96231b3b80d8
With the new world order, it can handle cases where the first
store into the alloca is an element of the vector, instead of
requiring the first analyzed store to have the vector type
itself. This allows us to un-xfail
test/CodeGen/X86/vec_ins_extract.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63590 91177308-0d34-0410-b5e6-96231b3b80d8
--This line, and those below, will be ignaored--
A test/CodeGen/X86/nosse-error1.ll
A test/CodeGen/X86/nosse-error2.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63496 91177308-0d34-0410-b5e6-96231b3b80d8
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63494 91177308-0d34-0410-b5e6-96231b3b80d8
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization. Thanks to Dan for writing the
original patch (which I shamelessly pillaged).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63482 91177308-0d34-0410-b5e6-96231b3b80d8
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63266 91177308-0d34-0410-b5e6-96231b3b80d8
checking logic. Rather than make the checking more
complicated, I've tweaked some logic to make things
conform to how the checking thought things ought to
be, since this results in a simpler "mental model".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63048 91177308-0d34-0410-b5e6-96231b3b80d8
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1
%reg1029<def> = MOV8rr %reg1028
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>
insert => %reg1030<def> = MOV8rr %reg1028
%reg1030<def> = ADD8rr %reg1028<kill>, %reg1029<kill>, %EFLAGS<imp-def,dead>
In this case, it might not be possible to coalesce the second MOV8rr
instruction if the first one is coalesced. So it would be profitable to
commute it:
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1
%reg1029<def> = MOV8rr %reg1028
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>
insert => %reg1030<def> = MOV8rr %reg1029
%reg1030<def> = ADD8rr %reg1029<kill>, %reg1028<kill>, %EFLAGS<imp-def,dead>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62954 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify x+0 to x in unsafe-fp-math mode. This avoids a bunch of
redundant work in many cases, because in unsafe-fp-math mode,
ISD::FADD with a constant is considered free to negate, so the
DAGCombiner often negates x+0 to -0-x thinking it's free, when
in reality the end result is -x, which is more expensive than x.
Also, combine x*0 to 0.
This fixes PR3374.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62789 91177308-0d34-0410-b5e6-96231b3b80d8
special cases after producing the new reduced-width load, because the
new load already has the needed adjustments built into it. This fixes
several bugs due to the special cases, including PR3317.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62692 91177308-0d34-0410-b5e6-96231b3b80d8
we want to clear %ah to zero before a division, just use a
zero-extending mov to %al. This fixes PR3366.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62691 91177308-0d34-0410-b5e6-96231b3b80d8
as its comment says, even in the case where it will be generating
extending loads. This fixes PR3216.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62557 91177308-0d34-0410-b5e6-96231b3b80d8
uses are added to the From node while it is processing From's
use list, because of automatic local CSE. The fix is to avoid
visiting any new uses.
Fix a few places in the DAGCombiner that assumed that after
a RAUW call, the From node has no users and may be deleted.
This fixes PR3018.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62533 91177308-0d34-0410-b5e6-96231b3b80d8
optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62504 91177308-0d34-0410-b5e6-96231b3b80d8
simple %prcontext which doesn't find what it's looking for
if the scheduler has rearranged the instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@62363 91177308-0d34-0410-b5e6-96231b3b80d8