SimplifyDemandedBits. The idea is that some operations can be simplified if
not all of the computed elements are needed. Some targets (like x86) have a
large number of intrinsics that operate on a single element, but pass other
elts through unmodified. If those other elements are not needed, the
intrinsics can be simplified to scalar operations, and insertelement ops can
be removed.
This turns (f.e.):
ushort %Convert_sse(float %f) {
%tmp = insertelement <4 x float> undef, float %f, uint 0 ; <<4 x float>> [#uses=1]
%tmp10 = insertelement <4 x float> %tmp, float 0.000000e+00, uint 1 ; <<4 x float>> [#uses=1]
%tmp11 = insertelement <4 x float> %tmp10, float 0.000000e+00, uint 2 ; <<4 x float>> [#uses=1]
%tmp12 = insertelement <4 x float> %tmp11, float 0.000000e+00, uint 3 ; <<4 x float>> [#uses=1]
%tmp28 = tail call <4 x float> %llvm.x86.sse.sub.ss( <4 x float> %tmp12, <4 x float> < float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp37 = tail call <4 x float> %llvm.x86.sse.mul.ss( <4 x float> %tmp28, <4 x float> < float 5.000000e-01, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp37, <4 x float> < float 6.553500e+04, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> zeroinitializer ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
into:
ushort %Convert_sse(float %f) {
entry:
%tmp28 = sub float %f, 1.000000e+00 ; <float> [#uses=1]
%tmp37 = mul float %tmp28, 5.000000e-01 ; <float> [#uses=1]
%tmp375 = insertelement <4 x float> undef, float %tmp37, uint 0 ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp375, <4 x float> < float 6.553500e+04, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> < float 0.000000e+00, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
which improves codegen from:
_Convert_sse:
movss LCPI1_0, %xmm0
movss 4(%esp), %xmm1
subss %xmm0, %xmm1
movss LCPI1_1, %xmm0
mulss %xmm0, %xmm1
movss LCPI1_2, %xmm0
minss %xmm0, %xmm1
xorps %xmm0, %xmm0
maxss %xmm0, %xmm1
cvttss2si %xmm1, %eax
andl $65535, %eax
ret
to:
_Convert_sse:
movss 4(%esp), %xmm0
subss LCPI1_0, %xmm0
mulss LCPI1_1, %xmm0
movss LCPI1_2, %xmm1
minss %xmm1, %xmm0
xorps %xmm1, %xmm1
maxss %xmm1, %xmm0
cvttss2si %xmm0, %eax
andl $65535, %eax
ret
This is just a first step, it can be extended in many ways. Testcase here:
Transforms/InstCombine/vec_demanded_elts.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30752 91177308-0d34-0410-b5e6-96231b3b80d8
Ensure that we copy KnownProperties before calling visitBasicBlock, else
we may leak properties into blocks where they don't belong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30705 91177308-0d34-0410-b5e6-96231b3b80d8
The critical edge block dominates the dest block if the destblock dominates
all edges other than the one incoming from the critical edge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30696 91177308-0d34-0410-b5e6-96231b3b80d8
or when splitting loops with a common header into multiple loops. In particular
the old code would always insert the preheader before the old loop header. This
is disasterous in cases where the loop hasn't been rotated. For example, it can
produce code like:
.. outside the loop...
jmp LBB1_2 #bb13.outer
LBB1_1: #bb1
movsd 8(%esp,%esi,8), %xmm1
mulsd (%edi), %xmm1
addsd %xmm0, %xmm1
addl $24, %edi
incl %esi
jmp LBB1_3 #bb13
LBB1_2: #bb13.outer
leal (%edx,%eax,8), %edi
pxor %xmm1, %xmm1
xorl %esi, %esi
LBB1_3: #bb13
movapd %xmm1, %xmm0
cmpl $4, %esi
jl LBB1_1 #bb1
Note that the loop body is actually LBB1_1 + LBB1_3, which means that the
loop now contains an uncond branch WITHIN it to jump around the inserted
loop header (LBB1_2). Doh.
This patch changes the preheader insertion code to insert it in the right
spot, producing this code:
... outside the loop, fall into the header ...
LBB1_1: #bb13.outer
leal (%edx,%eax,8), %esi
pxor %xmm0, %xmm0
xorl %edi, %edi
jmp LBB1_3 #bb13
LBB1_2: #bb1
movsd 8(%esp,%edi,8), %xmm0
mulsd (%esi), %xmm0
addsd %xmm1, %xmm0
addl $24, %esi
incl %edi
LBB1_3: #bb13
movapd %xmm0, %xmm1
cmpl $4, %edi
jl LBB1_2 #bb1
Totally crazy, no branch in the loop! :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30587 91177308-0d34-0410-b5e6-96231b3b80d8
reachable, making it general purpose enough for use by InsertPreheaderForLoop.
Eliminate custom dominfo updating code in InsertPreheaderForLoop, using
UpdateDomInfoForRevectoredPreds instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30586 91177308-0d34-0410-b5e6-96231b3b80d8
that we can't modify the CFG any more, at least not until it's possible
to update the dominator tree (PR217).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30469 91177308-0d34-0410-b5e6-96231b3b80d8
DLL* linkages got full (I hope) codegeneration support in C & both x86
assembler backends.
External weak linkage added for future use, we don't provide any
codegeneration, etc. support for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30374 91177308-0d34-0410-b5e6-96231b3b80d8
operations (like findProperties) should be faster, at the expense of
unionSets being slower in cases that are rare in practise.
Don't erase a dead Instruction. This fixes a memory corruption issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30235 91177308-0d34-0410-b5e6-96231b3b80d8
Call these from your backend to enjoy setjmp/longjmp goodness, see
lib/Target/IA64/IA64ISelLowering.cpp for an example
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30095 91177308-0d34-0410-b5e6-96231b3b80d8
Reorder operations to remove duplicated work.
Fix to leave floating-point types out of the optimization.
Add tests to predsimplify.ll for SwitchInst and SelectInst handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30055 91177308-0d34-0410-b5e6-96231b3b80d8
another Value) weren't being found by findProperties.
This fixes predsimplify.ll test6, a missed optimization opportunity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29991 91177308-0d34-0410-b5e6-96231b3b80d8
If a branch's condition has become a ConstantBool, simplify it immediately.
Removing the edge saves work and exposes up more optimization opportunities
in the pass.
Add support for SelectInst.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29970 91177308-0d34-0410-b5e6-96231b3b80d8
exit blocks. The output is dependent on addresses of basic block.
Add and use Loop::getUniqueExitBlocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29966 91177308-0d34-0410-b5e6-96231b3b80d8
Not only will this take huge amounts of compile time, the resultant loop nests
won't be useful for optimization. This reduces loopsimplify time on
Transforms/LoopSimplify/2006-08-11-LoopSimplifyLongTime.ll from ~32s to ~0.4s
with a debug build of llvm on a 2.7Ghz G5.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29647 91177308-0d34-0410-b5e6-96231b3b80d8
blocks that target loop blocks.
Before, the code was run once per loop, and depended on the number of
predecessors each block in the loop had. Unfortunately, scanning preds can
be really slow when huge numbers of phis exist or when phis with huge numbers
of inputs exist.
Now, the code is run once per function and scans successors instead of preds,
which is far faster. In addition, the new code is simpler and is goto free,
woo.
This change speeds up a nasty testcase Duraid provided me from taking hours to
taking ~72s with a debug build. The functionality this implements is already
tested in the testsuite as Transforms/CodeExtractor/2004-03-13-LoopExtractorCrash.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29644 91177308-0d34-0410-b5e6-96231b3b80d8
SlowOperatingInfo, Statistics). Besides providing an example of how to
use these facilities, it also serves to debug problems with runtime linking
when dlopening a loadable module. These three support facilities exercise
different combinations of Text/Weak Weak/Text and Text/Text linking
between the executable and the module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29552 91177308-0d34-0410-b5e6-96231b3b80d8
1. Change the usage of LOADABLE_MODULE so that it implies all the things
necessary to make a loadable module. This reduces the user's burdern to
get a loadable module correctly built.
2. Document the usage of LOADABLE_MODULE in the MakefileGuide
3. Adjust the makefile for lib/Transforms/Hello to use the new specification
for building loadable modules
4. Adjust the sample project to not attempt to build a shared library for
its little library. This was just wasteful and not instructive at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29551 91177308-0d34-0410-b5e6-96231b3b80d8
1. Update an obsolete comment.
2. Make the sorting by base an explicit (though still N^2) step, so
that the code is more clear on what it is doing.
3. Partition uses so that uses inside the loop are handled before uses
outside the loop.
Note that none of these changes currently changes the code inserted by LSR,
but they are a stepping stone to getting there.
This code is the result of some crazy pair programming with Nate. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29493 91177308-0d34-0410-b5e6-96231b3b80d8
down approach, inspired by discussions with Tanya.
This approach is significantly faster, because it does not need dominator
frontiers and it does not insert extraneous unused PHI nodes. For example, on
252.eon, in a release-asserts build, this speeds up LCSSA (which is the slowest
pass in gccas) from 9.14s to 0.74s on my G5. This code is also slightly smaller
and significantly simpler than the old code.
Amusingly, in a normal Release build (which includes the
"assert(L->isLCSSAForm());" assertion), asserting that the result of LCSSA
is in LCSSA form is actually slower than the LCSSA transformation pass
itself on 252.eon. I will see if Loop::isLCSSAForm can be sped up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29463 91177308-0d34-0410-b5e6-96231b3b80d8
Handle this case, which doesn't require a new callgraph edge. This fixes
a crash compiling MallocBench/gs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29121 91177308-0d34-0410-b5e6-96231b3b80d8