as the shift amount operand to a shift instruction. This was causing us to
emit unnecessary clear operations for code such as:
int foo(int x) { return 1 << x; }
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17175 91177308-0d34-0410-b5e6-96231b3b80d8
including registers, constants, and partial support for global addresses
* The JIT is disabled by default to allow building llvm-gcc, which wants to test
running programs during configure
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17149 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of unconditionally copying all phi node values into temporaries for
all successor blocks, generate code that will determine what successor
block will be called and then copy only those phi node values needed by
the successor block.
This seems to cut down namd execution time from being 8% higher than GCC to
4% higher than GCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17144 91177308-0d34-0410-b5e6-96231b3b80d8
- Support added for functions, basic blocks, constant pool, constants,
registers, and some basic support for globals, all untested
* Turn assert()s into abort()s so that unimplemented functions fail in release
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17143 91177308-0d34-0410-b5e6-96231b3b80d8
loops. This optimization is not turned on by default yet, but may be run
with the opt tool's -loop-reduce flag. There are many FIXMEs listed in the
code that will make it far more applicable to a wide range of code, but you
have to start somewhere :)
This limited version currently triggers on the following tests in the
MultiSource directory:
pcompress2: 7 times
cfrac: 5 times
anagram: 2 times
ks: 6 times
yacr2: 2 times
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17134 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify code by simplifying terminators that branch to blocks that start
with an unreachable instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17116 91177308-0d34-0410-b5e6-96231b3b80d8
change hacks off 10K of bytecode from perlbmk (.5%) even though the front-end
is not generating them yet and we are not optimizing the resultant code.
This isn't too bad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17111 91177308-0d34-0410-b5e6-96231b3b80d8
particular, invoke ret values are only live in the normal dest of the invoke
not in the unwind dest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17108 91177308-0d34-0410-b5e6-96231b3b80d8
exercise that I'm not interested in tackling right now. Just punt and treat them
like unwind's.
This 'fixes' test/Regression/Transforms/ADCE/unreachable-function.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17106 91177308-0d34-0410-b5e6-96231b3b80d8
If a function had no return instruction in it, and the result of the inlined
call instruction was used, we would crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17104 91177308-0d34-0410-b5e6-96231b3b80d8
unneccesary. This allows us to delete several hundred phi nodes of the
form PHI(x,x,x,undef) from 253.perlbmk and probably other programs as well.
This implements Mem2Reg/UndefValuesMerge.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17098 91177308-0d34-0410-b5e6-96231b3b80d8
0->field, which is illegal. Now we print ((foo*)0)->field.
The second hunk is an optimization to not print undefined phi values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17094 91177308-0d34-0410-b5e6-96231b3b80d8
double %test(uint %X) {
%tmp.1 = cast uint %X to double ; <double> [#uses=1]
ret double %tmp.1
}
into:
test:
sub %ESP, 8
mov %EAX, DWORD PTR [%ESP + 12]
mov %ECX, 0
mov DWORD PTR [%ESP], %EAX
mov DWORD PTR [%ESP + 4], %ECX
fild QWORD PTR [%ESP]
add %ESP, 8
ret
... which basically zero extends to 8 bytes, then does an fild for an
8-byte signed int.
Now we generate this:
test:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
shr %EAX, 31
fadd DWORD PTR [.CPItest_0 + 4*%EAX]
add %ESP, 4
ret
.section .rodata
.align 4
.CPItest_0:
.quad 5728578726015270912
This does a 32-bit signed integer load, then adds in an offset if the sign
bit of the integer was set.
It turns out that this is substantially faster than the preceeding sequence.
Consider this testcase:
unsigned a[2]={1,2};
volatile double G;
void main() {
int i;
for (i=0; i<100000000; ++i )
G += a[i&1];
}
On zion (a P4 Xeon, 3Ghz), this patch speeds up the testcase from 2.140s
to 0.94s.
On apoc, an athlon MP 2100+, this patch speeds up the testcase from 1.72s
to 1.34s.
Note that the program takes 2.5s/1.97s on zion/apoc with GCC 3.3 -O3
-fomit-frame-pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17083 91177308-0d34-0410-b5e6-96231b3b80d8
%X = and Y, constantint
%Z = setcc %X, 0
instead of emitting:
and %EAX, 3
test %EAX, %EAX
je .LBBfoo2_2 # UnifiedReturnBlock
We now emit:
test %EAX, 3
je .LBBfoo2_2 # UnifiedReturnBlock
This triggers 581 times on 176.gcc for example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17080 91177308-0d34-0410-b5e6-96231b3b80d8
1. optional shift left
2. and x, immX
3. and y, immY
4. or z, x, y
==> rlwimi z, x, y, shift, mask begin, mask end
where immX == ~immY and immX is a run of set bits. This transformation
fires 32 times on voronoi, once on espresso, and probably several
dozen times on external benchmarks such as gcc.
To put this in terms of actual code generated for
struct B { unsigned a : 3; unsigned b : 2; };
void storeA (struct B *b, int v) { b->a = v;}
void storeB (struct B *b, int v) { b->b = v;}
Old:
_storeA:
rlwinm r2, r4, 0, 29, 31
lwz r4, 0(r3)
rlwinm r4, r4, 0, 0, 28
or r2, r4, r2
stw r2, 0(r3)
blr
_storeB:
rlwinm r2, r4, 3, 0, 28
rlwinm r2, r2, 0, 27, 28
lwz r4, 0(r3)
rlwinm r4, r4, 0, 29, 26
or r2, r2, r4
stw r2, 0(r3)
blr
New:
_storeA:
lwz r2, 0(r3)
rlwimi r2, r4, 0, 29, 31
stw r2, 0(r3)
blr
_storeB:
lwz r2, 0(r3)
rlwimi r2, r4, 3, 27, 28
stw r2, 0(r3)
blr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17078 91177308-0d34-0410-b5e6-96231b3b80d8
flag rotate left word immediate then mask insert (rlwimi) as a two-address
instruction, and update the ISel usage of the instruction accordingly.
This will allow us to properly schedule rlwimi, and use it to efficiently
codegen bitfield operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17068 91177308-0d34-0410-b5e6-96231b3b80d8
that the vtables for these classes are only instantiated in this translation
unit, not in every xlation unit they are used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17026 91177308-0d34-0410-b5e6-96231b3b80d8
case:
int C[100];
int foo() {
return C[4];
}
We now codegen:
foo:
mov %EAX, DWORD PTR [C + 16]
ret
instead of:
foo:
mov %EAX, OFFSET C
mov %EAX, DWORD PTR [%EAX + 16]
ret
Other impressive features may be coming later.
This patch is contributed by Jeff Cohen!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17011 91177308-0d34-0410-b5e6-96231b3b80d8
useful when you have a reference like:
int A[100];
void foo() { A[10] = 1; }
In this case, &A[10] is a single constant and should be treated as such.
Only MO_GlobalAddress and MO_ExternalSymbol are allowed to use this field, no
other operand type is.
This is another fine patch contributed by Jeff Cohen!!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17007 91177308-0d34-0410-b5e6-96231b3b80d8
The problem occurred when trying to reload this instruction:
MOV32mr %reg2326, 8, %reg2297, 4, %reg2295
The value of reg2326 was available in EBX, so it was reused from there, instead
of reloading it into EDX.
The value of reg2297 was available in EDX, so it was reused from there, instead
of reloading it into EDI.
The value of reg2295 was not available, so we tried reloading it into EBX, its
assigned register. However, we checked and saw that we already reloaded
something into EBX, so we chose what reg2326 was assigned to (EDX) and reloaded
into that register instead.
Unfortunately EDX had already been used by reg2297, so reloading into EDX
clobbered the value used by the reg2326 operand, breaking the program.
The fix for this is to check that the newly picked register is ok. In this
case we now find that EDX is already used and try using EDI, which succeeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17006 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation fires a few dozen times across the testsuite.
For example, int test2(int X) { return X ^ 0x0FF00FF0; }
Old:
_test2:
lis r2, 4080
ori r2, r2, 4080
xor r3, r3, r2
blr
New:
_test2:
xoris r3, r3, 4080
xori r3, r3, 4080
blr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17004 91177308-0d34-0410-b5e6-96231b3b80d8
addPassesToEmitMachineCode()
* Add support for registers and constants in getMachineOpValue()
This enables running "int main() { ret 0 }" via the PowerPC JIT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16983 91177308-0d34-0410-b5e6-96231b3b80d8
and 64-bit code emitters that cannot share code unless we use virtual
functions
* Identify components being built by tablegen with more detail by assigning them
to PowerPC, PPC32, or PPC64 more specifically; also avoids seeing 'building
PowerPC XYZ' messages twice, where one is for PPC32 and one for PPC64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16980 91177308-0d34-0410-b5e6-96231b3b80d8
nodes unless we KNOW that we are able to promote all of them.
This fixes: test/Regression/Transforms/SimplifyCFG/PhiNoEliminate.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16973 91177308-0d34-0410-b5e6-96231b3b80d8
to go in. This patch allows us to compute the trip count of loops controlled
by values loaded from constant arrays. The cannonnical example of this is
strlen when passed a constant argument:
for (int i = 0; "constantstring"[i]; ++i) ;
return i;
In this case, it will compute that the loop executes 14 times, which means
that the exit value of i is 14. Because of this, the loop gets DCE'd and
we are happy. This also applies to anything that does similar things, e.g.
loops like this:
const float Array[] = { 0.1, 2.1, 3.2, 23.21 };
for (int i = 0; Array[i] < 20; ++i)
and is actually fairly general.
The problem with this is that it almost never triggers. The reason is that
we run indvars and the loop optimizer only at compile time, which is before
things like strlen and strcpy have been inlined into the program from libc.
Because of this, it almost never is used (it triggers twice in specint2k).
I'm committing it because it DOES work, may be useful in the future, and
doesn't slow us down at all. If/when we start running the loop optimizer
at link-time (-O4?) this will be very nice indeed :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16926 91177308-0d34-0410-b5e6-96231b3b80d8
pointer recurrences into expressions from this:
%P_addr.0.i.0 = phi sbyte* [ getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), %entry ], [ %inc.0.i, %no_exit.i ]
%inc.0.i = getelementptr sbyte* %P_addr.0.i.0, int 1 ; <sbyte*> [#uses=2]
into this:
%inc.0.i = getelementptr sbyte* getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), int %inc.0.i.rec
Actually create something nice, like this:
%inc.0.i = getelementptr [8 x sbyte]* %.str_1, int 0, int %inc.0.i.rec
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16924 91177308-0d34-0410-b5e6-96231b3b80d8
well as a vector of constant*'s. It turns out that this is more efficient
and all of the clients want to do that, so we should cater to them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16923 91177308-0d34-0410-b5e6-96231b3b80d8
First, it allows SRA of globals that have embedded arrays, implementing
GlobalOpt/globalsra-partial.llx. This comes up infrequently, but does allow,
for example, deleting several stores to dead parts of globals in dhrystone.
Second, this implements GlobalOpt/malloc-promote-*.llx, which is the
following nifty transformation:
Basically if a global pointer is initialized with malloc, and we can tell
that the program won't notice, we transform this:
struct foo *FooPtr;
...
FooPtr = malloc(sizeof(struct foo));
...
FooPtr->A FooPtr->B
Into:
struct foo FooPtrBody;
...
FooPtrBody.A FooPtrBody.B
This comes up occasionally, for example, the 'disp' global in 183.equake (where
the xform speeds the CBE version of the program up from 56.16s to 52.40s (7%)
on apoc), and the 'desired_accept', 'fixLRBT', 'macroArray', & 'key_queue'
globals in 300.twolf (speeding it up from 22.29s to 21.55s (3.4%)).
The nice thing about this xform is that it exposes the resulting global to
global variable optimization and makes alias analysis easier in addition to
eliminating a few loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16916 91177308-0d34-0410-b5e6-96231b3b80d8
first element of an array, return a GEP instead of a cast. This allows us
to transparently fold this:
int* getelementptr (int* cast ([100 x int]* %Gbody to int*), int 40)
into this:
int* getelementptr ([100 x int]* %Gbody, int 0, int 40)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16911 91177308-0d34-0410-b5e6-96231b3b80d8
still optimize away all of the indirect calls and loads, etc from it.
This turns code like this:
if (G != 0)
G();
into
if (G != 0)
ActualCallee();
This triggers a couple of times in gcc and libstdc++.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16901 91177308-0d34-0410-b5e6-96231b3b80d8