mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.
Also commit Mon Ping's VSETCC patch
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54039 91177308-0d34-0410-b5e6-96231b3b80d8
and knowledge of PseudoSourceValues. This unfortunately isn't sufficient to allow
constants to be rematerialized in PIC mode -- the extra indirection is a
complication.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54000 91177308-0d34-0410-b5e6-96231b3b80d8
8 %reg1024<def> = IMPLICIT_DEF
12 %reg1024<def> = INSERT_SUBREG %reg1024<kill>, %reg1025, 2
The live range [12, 14) are not part of the r1024 live interval since it's defined by an implicit def. It will not conflicts with live interval of r1025. Now suppose both registers are spilled, you can easily see a situation where both registers are reloaded before the INSERT_SUBREG and both target registers that would overlap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53503 91177308-0d34-0410-b5e6-96231b3b80d8
1. LSR runOnLoop is always returning false regardless if any transformation is made.
2. AddUsersIfInteresting can create new instructions that are added to DeadInsts. But there is a later early exit which prevents them from being freed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53193 91177308-0d34-0410-b5e6-96231b3b80d8
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
awful codegen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52740 91177308-0d34-0410-b5e6-96231b3b80d8
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52706 91177308-0d34-0410-b5e6-96231b3b80d8
test (doesn't work for any MMX vector types, it's
not me). Rewritten to use v2i16 which is generic
and going to stay that way; I think that preserves
the point of the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52692 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle could be skipped. The check is invalid because the loop index i
doesn't correspond to the element actually inserted. The correct check is
already done a few lines earlier, for whether the element is already in
the right spot, so this shouldn't have any effect on the codegen for
code that was already correct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52486 91177308-0d34-0410-b5e6-96231b3b80d8
wrong for volatile loads and stores. In fact this
is almost all of them! There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access. These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used. Consider
loading an i32 but only using the lower 8 bits. It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes. It
is also unwise to make a load/store wider. For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects. (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware. (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several. For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores). In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic. My policy here is
to say that the number of processor operations for
an illegal operation is undefined. So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok. It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal! That is because
operations are marked legal by default, regardless of
whether the type is legal or not. In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation. However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal. So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before. This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52254 91177308-0d34-0410-b5e6-96231b3b80d8
variable expansions involving the $ character.
This fixes 4 tests that were not running properly before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52183 91177308-0d34-0410-b5e6-96231b3b80d8
in DAGISelEmitter output. This bug was recently uncovered by the
addition of patterns for CALL32m and CALL64m, which are nodes
that now have both MemOperands and variadic_ops.
This bug was especially visible with PIC in various configurations,
because the new patterns are matching the indirect call code used
in many PIC configurations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51877 91177308-0d34-0410-b5e6-96231b3b80d8
cases due to an isel deficiency already noted in
lib/Target/X86/README.txt, but they can be matched in this fold-call.ll
testcase, for example.
This is interesting mainly because it exposes a tricky tblgen bug;
tblgen was incorrectly computing the starting index for variable_ops
in the case of a complex pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51706 91177308-0d34-0410-b5e6-96231b3b80d8
sometimes a "mov %ebp, %esp" in the epilogue.
Force these tests that rely on counting 'mov' to use i686-apple-darwin8.8.0
where they were written.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51568 91177308-0d34-0410-b5e6-96231b3b80d8
BB1:
vr1025 = copy vr1024
..
BB2:
vr1024 = op
= op vr1025
<loop eventually branch back to BB1>
Even though vr1025 is copied from vr1024, it's not safe to coalesced them since live range of vr1025 intersects the def of vr1024. This happens when vr1025 is assigned the value of the previous iteration of vr1024 in the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51394 91177308-0d34-0410-b5e6-96231b3b80d8
use-before-def. The problem comes up in code with multiple PHIs where
one PHI is being rewritten in terms of the other, but the other needs
to be casted first. LLVM rules requre the cast instruction to be
inserted after any PHI instructions, but when instructions were
inserted to replace the second PHI value with a function of the first,
they were ended up going before the cast instruction. Avoid this
problem by remembering the location of the cast instruction, when one
is needed, and inserting the expansion of the new value after it.
This fixes a bug that surfaced in 255.vortex on x86-64 when
instcombine was removed from the middle of the loop optimization
passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51169 91177308-0d34-0410-b5e6-96231b3b80d8
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50918 91177308-0d34-0410-b5e6-96231b3b80d8
%ecx = op
store %cl<kill>, (addr)
(addr) = op %al
It's not safe to unfold the last operand and eliminate store even though %cl is marked kill. It's a sub-register use which means one of its super-register(s) may be used below.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50794 91177308-0d34-0410-b5e6-96231b3b80d8
the code being generated does not require an executable stack.
Also, add target-specific code to make use of this on Linux
on x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50634 91177308-0d34-0410-b5e6-96231b3b80d8
ffastmath mode. This fixes rdar://5902801, a miscompilation
of gcc.dg/builtins-8.c.
Bill, please pull this into Tak.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50523 91177308-0d34-0410-b5e6-96231b3b80d8
We now compile test2/test3 to:
_test2:
## InlineAsm Start
set %xmm0, %xmm1
## InlineAsm End
addps %xmm1, %xmm0
ret
_test3:
## InlineAsm Start
set %xmm0, %xmm1
## InlineAsm End
paddd %xmm1, %xmm0
ret
as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50389 91177308-0d34-0410-b5e6-96231b3b80d8
towards PR2094. It now compiles the attached .ll file to:
_sad16_sse2:
movslq %ecx, %rax
## InlineAsm Start
%ecx %rdx %rax %rax %r8d %rdx %rsi
## InlineAsm End
## InlineAsm Start
set %eax
## InlineAsm End
ret
which is pretty decent for a 3 output, 4 input asm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50386 91177308-0d34-0410-b5e6-96231b3b80d8
e.g.
vr1024<2> extract_subreg vr1025, 2
If vr1024 do not have the same register class as vr1025, it's not safe to coalesce this away. For example, vr1024 might be a GPR32 while vr1025 might be a GPR64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50385 91177308-0d34-0410-b5e6-96231b3b80d8
When choosing between constraints with multiple options,
like "ir", test to see if we can use the 'i' constraint and
go with that if possible. This produces more optimal ASM in
all cases (sparing a register and an instruction to load it),
and fixes inline asm like this:
void test () {
asm volatile (" %c0 %1 " : : "imr" (42), "imr"(14));
}
Previously we would dump "42" into a memory location (which
is ok for the 'm' constraint) which would cause a problem
because the 'c' modifier is not valid on memory operands.
Isn't it great how inline asm turns 'missed optimization'
into 'compile failed'??
Incidentally, this was the todo in
PowerPC/2007-04-24-InlineAsm-I-Modifier.ll
Please do NOT pull this into Tak.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50315 91177308-0d34-0410-b5e6-96231b3b80d8