This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31063 91177308-0d34-0410-b5e6-96231b3b80d8
SimplifyDemandedBits. The idea is that some operations can be simplified if
not all of the computed elements are needed. Some targets (like x86) have a
large number of intrinsics that operate on a single element, but pass other
elts through unmodified. If those other elements are not needed, the
intrinsics can be simplified to scalar operations, and insertelement ops can
be removed.
This turns (f.e.):
ushort %Convert_sse(float %f) {
%tmp = insertelement <4 x float> undef, float %f, uint 0 ; <<4 x float>> [#uses=1]
%tmp10 = insertelement <4 x float> %tmp, float 0.000000e+00, uint 1 ; <<4 x float>> [#uses=1]
%tmp11 = insertelement <4 x float> %tmp10, float 0.000000e+00, uint 2 ; <<4 x float>> [#uses=1]
%tmp12 = insertelement <4 x float> %tmp11, float 0.000000e+00, uint 3 ; <<4 x float>> [#uses=1]
%tmp28 = tail call <4 x float> %llvm.x86.sse.sub.ss( <4 x float> %tmp12, <4 x float> < float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp37 = tail call <4 x float> %llvm.x86.sse.mul.ss( <4 x float> %tmp28, <4 x float> < float 5.000000e-01, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp37, <4 x float> < float 6.553500e+04, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> zeroinitializer ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
into:
ushort %Convert_sse(float %f) {
entry:
%tmp28 = sub float %f, 1.000000e+00 ; <float> [#uses=1]
%tmp37 = mul float %tmp28, 5.000000e-01 ; <float> [#uses=1]
%tmp375 = insertelement <4 x float> undef, float %tmp37, uint 0 ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp375, <4 x float> < float 6.553500e+04, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> < float 0.000000e+00, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
which improves codegen from:
_Convert_sse:
movss LCPI1_0, %xmm0
movss 4(%esp), %xmm1
subss %xmm0, %xmm1
movss LCPI1_1, %xmm0
mulss %xmm0, %xmm1
movss LCPI1_2, %xmm0
minss %xmm0, %xmm1
xorps %xmm0, %xmm0
maxss %xmm0, %xmm1
cvttss2si %xmm1, %eax
andl $65535, %eax
ret
to:
_Convert_sse:
movss 4(%esp), %xmm0
subss LCPI1_0, %xmm0
mulss LCPI1_1, %xmm0
movss LCPI1_2, %xmm1
minss %xmm1, %xmm0
xorps %xmm1, %xmm1
maxss %xmm1, %xmm0
cvttss2si %xmm0, %eax
andl $65535, %eax
ret
This is just a first step, it can be extended in many ways. Testcase here:
Transforms/InstCombine/vec_demanded_elts.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30752 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the Function pointer cast in these calls, converting it to
a cast of argument.
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( int 10 )
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( uint %tmp51 )
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28953 91177308-0d34-0410-b5e6-96231b3b80d8
the program. This exposes more opportunities for the instcombiner, and implements
vec_shuffle.ll:test6
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28487 91177308-0d34-0410-b5e6-96231b3b80d8
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.
This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28224 91177308-0d34-0410-b5e6-96231b3b80d8
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
blocks (due to constants in conditional branches/switches) to the worklist.
This causes them to be deleted before instcombine starts up, leading to
better optimization.
2. In the prepass over instructions, do trivial constprop/dce as we go. This
has the effect of improving the effectiveness of #1. In addition, it
*significantly* speeds up instcombine on test cases with large amounts of
constant folding code (for example, that produced by code specialization
or partial evaluation). In one example, it speeds up instcombine from
0.0589s to 0.0224s with a release build (a 2.6x speedup).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28215 91177308-0d34-0410-b5e6-96231b3b80d8