Teach constant folding to perform conversions from constant floating

point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123206 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Chandler Carruth
2011-01-11 01:07:24 +00:00
parent f7b0047f5f
commit 15ed90c859
3 changed files with 89 additions and 55 deletions

View File

@@ -2259,58 +2259,3 @@ Since we know that x+2.0 doesn't care about the sign of any zeros in X, we can
transform the fmul to 0.0, and then the fadd to 2.0.
//===---------------------------------------------------------------------===//
clang -O3 currently compiles this code:
#include <emmintrin.h>
int f(double x) { return _mm_cvtsd_si32(_mm_set_sd(x)); }
int g(double x) { return _mm_cvttsd_si32(_mm_set_sd(x)); }
into
define i32 @_Z1fd(double %x) nounwind readnone {
entry:
%vecinit.i = insertelement <2 x double> undef, double %x, i32 0
%vecinit1.i = insertelement <2 x double> %vecinit.i, double 0.000000e+00,i32 1
%0 = tail call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> %vecinit1.i) nounwind
ret i32 %0
}
define i32 @_Z1gd(double %x) nounwind readnone {
entry:
%conv.i = fptosi double %x to i32
ret i32 %conv.i
}
This difference carries over to the assmebly produced, resulting in:
_Z1fd: # @_Z1fd
# BB#0: # %entry
pushq %rbp
movq %rsp, %rbp
xorps %xmm1, %xmm1
movsd %xmm0, %xmm1
cvtsd2sil %xmm1, %eax
popq %rbp
ret
_Z1gd: # @_Z1gd
# BB#0: # %entry
pushq %rbp
movq %rsp, %rbp
cvttsd2si %xmm0, %eax
popq %rbp
ret
The problem is that we can't see through the intrinsic call used for cvtsd2si,
and fold away the unnecessary manipulation of the function parameter. When
these functions are inlined, it forms a barrier preventing many further
optimizations. LLVM IR doesn't have a good way to model the logic of
'cvtsd2si', its only FP -> int conversion path forces truncation. We should add
a rounding flag onto fptosi so that it can represent this type of rounding
naturally in the IR rather than using intrinsics. We might need to use a
'system_rounding_mode' flag to encode that the semantics of the rounding mode
can be changed by the program, but ideally we could just say that isn't
supported, and hard code the rounding.
//===---------------------------------------------------------------------===//