long long test2(unsigned A, unsigned B) {
return ((unsigned long long)A << 32) + B;
}
is equivalent to this:
long long test1(unsigned A, unsigned B) {
return ((unsigned long long)A << 32) | B;
}
Now they are both codegen'd to this on ppc:
_test2:
blr
or this on x86:
test2:
movl 4(%esp), %edx
movl 8(%esp), %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21231 91177308-0d34-0410-b5e6-96231b3b80d8
masking shifts.
This fixes the miscompilation of this:
long long test1(unsigned A, unsigned B) {
return ((unsigned long long)A << 32) | B;
}
into this:
test1:
movl 4(%esp), %edx
movl %edx, %eax
orl 8(%esp), %eax
ret
allowing us to generate this instead:
test1:
movl 4(%esp), %edx
movl 8(%esp), %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21230 91177308-0d34-0410-b5e6-96231b3b80d8
Fix libcall code to not crash or assert looking for an ADJCALLSTACKUP node
when it is known that there is no ADJCALLSTACKDOWN to match.
Expand i64 multiply when ISD::MULHU is legal for the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21214 91177308-0d34-0410-b5e6-96231b3b80d8
the result does change as a result of the extend.
This improves codegen for Alpha on this testcase:
int %a(ushort* %i) {
%tmp.1 = load ushort* %i
%tmp.2 = cast ushort %tmp.1 to int
%tmp.4 = and int %tmp.2, 1
ret int %tmp.4
}
Generating:
a:
ldgp $29, 0($27)
ldwu $0,0($16)
and $0,1,$0
ret $31,($26),1
instead of:
a:
ldgp $29, 0($27)
ldwu $0,0($16)
and $0,1,$0
addl $0,0,$0
ret $31,($26),1
btw, alpha really should switch to livein/outs for args :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21213 91177308-0d34-0410-b5e6-96231b3b80d8
the new zero extend, not the original operand. This fixes cast bool -> long
on ppc.
Add an unrelated fixme
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21196 91177308-0d34-0410-b5e6-96231b3b80d8
int a(short i) {
return i & 1;
}
as
_a:
andi. r3, r3, 1
blr
instead of:
_a:
rlwinm r2, r3, 0, 16, 31
andi. r3, r2, 1
blr
on ppc. It should also help the other risc targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21189 91177308-0d34-0410-b5e6-96231b3b80d8
is deconstructed then reconstructed here. This catches 19 fabs's in 177.mesa
9 in 168.wupwise, 5 in 171.swim, 3 in 172.mgrid, and 14 in 173.applu out of
specfp2000.
This allows the X86 code generator to make MUCH better code than before for
each of these and saves one instr on ppc.
This depends on the previous CFE patch to expose these correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21171 91177308-0d34-0410-b5e6-96231b3b80d8
(likewise for <= >=u >=u).
Second, it implements a special case hack to turn 'X gtu SINTMAX' -> 'X lt 0'
On powerpc, for example, this changes this:
lis r2, 32767
ori r2, r2, 65535
cmplw cr0, r3, r2
bgt .LBB_test_2
into:
cmpwi cr0, r3, 0
blt .LBB_test_2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21142 91177308-0d34-0410-b5e6-96231b3b80d8
dont' regen the whole dag if unneccesary. Second, fix and ugly bug with
the _PARTS nodes that caused legalize to produce multiples of them.
Finally, implement initial support for FABS and FNEG. Currently FNEG is
the only one to be trusted though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21009 91177308-0d34-0410-b5e6-96231b3b80d8
Teach the SelectionDAG code how to expand and promote it
Have PPC32 LowerCallTo generate ISD::UNDEF for int arg regs used up by fp
arguments, but not shadowing their value. This allows us to do the right
thing with both fixed and vararg floating point arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20988 91177308-0d34-0410-b5e6-96231b3b80d8
them up after the code has been emitted. This allows targets to select one
mbb as multiple mbb's as needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20937 91177308-0d34-0410-b5e6-96231b3b80d8
returned integer values all of the way to 64-bits (we only did it to 32-bits
leaving the top bits undefined). This causes problems for targets like alpha
whose ABI's define the top bits too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20926 91177308-0d34-0410-b5e6-96231b3b80d8
using Function::arg_{iterator|begin|end}. Likewise Module::g* -> Module::global_*.
This patch is contributed by Gabor Greif, thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20597 91177308-0d34-0410-b5e6-96231b3b80d8
Changing 'op' here caused us to not enter the store into a map, causing
reemission of the code!! In practice, a simple loop like this:
no_exit: ; preds = %no_exit, %entry
%indvar = phi uint [ %indvar.next, %no_exit ], [ 0, %entry ] ; <uint> [#uses=3]
%tmp.4 = getelementptr "complex long double"* %P, uint %indvar, uint 0 ; <double*> [#uses=1]
store double 0.000000e+00, double* %tmp.4
%indvar.next = add uint %indvar, 1 ; <uint> [#uses=2]
%exitcond = seteq uint %indvar.next, %N ; <bool> [#uses=1]
br bool %exitcond, label %return, label %no_exit
was being code gen'd to:
.LBBtest_1: # no_exit
movl %edx, %esi
shll $4, %esi
movl $0, 4(%eax,%esi)
movl $0, (%eax,%esi)
incl %edx
movl $0, (%eax,%esi)
movl $0, 4(%eax,%esi)
cmpl %ecx, %edx
jne .LBBtest_1 # no_exit
Note that we are doing 4 32-bit stores instead of 2. Now we generate:
.LBBtest_1: # no_exit
movl %edx, %esi
incl %esi
shll $4, %edx
movl $0, (%eax,%edx)
movl $0, 4(%eax,%edx)
cmpl %ecx, %esi
movl %esi, %edx
jne .LBBtest_1 # no_exit
This is much happier, though it would be even better if the increment of ESI
was scheduled after the compare :-/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20265 91177308-0d34-0410-b5e6-96231b3b80d8
The first half of correct chain insertion for libcalls. This is not enough
to fix Fhourstones yet though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19781 91177308-0d34-0410-b5e6-96231b3b80d8
select operations or to shifts that are by a constant. This automatically
implements (with no special code) all of the special cases for shift by 32,
shift by < 32 and shift by > 32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19679 91177308-0d34-0410-b5e6-96231b3b80d8
range. Either they are undefined (the default), they mask the shift amount
to the size of the register (X86, Alpha, etc), or they extend the shift (PPC).
This defaults to undefined, which is conservatively correct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19677 91177308-0d34-0410-b5e6-96231b3b80d8
do it. This results in better code on X86 for floats (because if strict
precision is not required, we can elide some more expensive double -> float
conversions like the old isel did), and allows other targets to emit
CopyFromRegs that are not legal for arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19668 91177308-0d34-0410-b5e6-96231b3b80d8
X86/reg-pressure.ll again, and allows us to do nice things in other cases.
For example, we now codegen this sort of thing:
int %loadload(int *%X, int* %Y) {
%Z = load int* %Y
%Y = load int* %X ;; load between %Z and store
%Q = add int %Z, 1
store int %Q, int* %Y
ret int %Y
}
Into this:
loadload:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%EAX]
mov %ECX, DWORD PTR [%ESP + 8]
inc DWORD PTR [%ECX]
ret
where we weren't able to form the 'inc [mem]' before. This also lets the
instruction selector emit loads in any order it wants to, which can be good
for register pressure as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19644 91177308-0d34-0410-b5e6-96231b3b80d8
the basic block that uses them if possible. This is a big win on X86, as it
lets us fold the argument loads into instructions and reduce register pressure
(by not loading all of the arguments in the entry block).
For this (contrived to show the optimization) testcase:
int %argtest(int %A, int %B) {
%X = sub int 12345, %A
br label %L
L:
%Y = add int %X, %B
ret int %Y
}
we used to produce:
argtest:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, 12345
sub %EAX, %ECX
mov %EDX, DWORD PTR [%ESP + 8]
.LBBargtest_1: # L
add %EAX, %EDX
ret
now we produce:
argtest:
mov %EAX, 12345
sub %EAX, DWORD PTR [%ESP + 4]
.LBBargtest_1: # L
add %EAX, DWORD PTR [%ESP + 8]
ret
This also fixes the FIXME in the code.
BTW, this occurs in real code. 164.gzip shrinks from 8623 to 8608 lines of
.s file. The stack frame in huft_build shrinks from 1644->1628 bytes,
inflate_codes shrinks from 116->108 bytes, and inflate_block from 2620->2612,
due to fewer spills.
Take that alkis. :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19639 91177308-0d34-0410-b5e6-96231b3b80d8
track of how to deal with it, and provide the target with a hook that they
can use to legalize arbitrary operations in arbitrary ways.
Implement custom lowering for a couple of ops, implement promotion for select
operations (which x86 needs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19613 91177308-0d34-0410-b5e6-96231b3b80d8