llvm-6502/test/CodeGen/X86/vec_zero_cse.ll

36 lines
960 B
LLVM
Raw Normal View History

; RUN: llc < %s -relocation-model=static -march=x86 -mcpu=yonah | grep pxor | count 1
; RUN: llc < %s -relocation-model=static -march=x86 -mcpu=yonah | grep xorps | count 1
; RUN: llc < %s -relocation-model=static -march=x86 -mcpu=yonah | grep pcmpeqd | count 2
Fix a long standing deficiency in the X86 backend: we would sometimes emit "zero" and "all one" vectors multiple times, for example: _test2: pcmpeqd %mm0, %mm0 movq %mm0, _M1 pcmpeqd %mm0, %mm0 movq %mm0, _M2 ret instead of: _test2: pcmpeqd %mm0, %mm0 movq %mm0, _M1 movq %mm0, _M2 ret This patch fixes this by always arranging for zero/one vectors to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be any random type. This ensures they get trivially CSE'd on the dag. This fix is also important for LegalizeDAGTypes, as it gets unhappy when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when 'i64' isn't legal. This patch makes the following changes: 1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into their canonical types. 2) The now-dead patterns are removed from the SSE/MMX .td files. 3) All the patterns in the .td file that referred to immAllOnesV or immAllZerosV in the wrong form now use *_bc to match them with a bitcast wrapped around them. 4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle bitcast'd zero vectors, which simplifies the code actually. 5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that is legal, instead of generating one that is illegal and expecting a later legalize pass to clean it up. 6) isZeroShuffle is generalized to handle bitcast of zeros. 7) several other minor tweaks. This patch is definite goodness, but has the potential to cause random code quality regressions. Please be on the lookout for these and let me know if they happen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44310 91177308-0d34-0410-b5e6-96231b3b80d8
2007-11-25 00:24:49 +00:00
@M1 = external global <1 x i64>
@M2 = external global <2 x i32>
@S1 = external global <2 x i64>
@S2 = external global <4 x i32>
define void @test() {
store <1 x i64> zeroinitializer, <1 x i64>* @M1
store <2 x i32> zeroinitializer, <2 x i32>* @M2
ret void
}
define void @test2() {
store <1 x i64> < i64 -1 >, <1 x i64>* @M1
store <2 x i32> < i32 -1, i32 -1 >, <2 x i32>* @M2
ret void
}
define void @test3() {
store <2 x i64> zeroinitializer, <2 x i64>* @S1
store <4 x i32> zeroinitializer, <4 x i32>* @S2
ret void
}
define void @test4() {
store <2 x i64> < i64 -1, i64 -1>, <2 x i64>* @S1
store <4 x i32> < i32 -1, i32 -1, i32 -1, i32 -1 >, <4 x i32>* @S2
ret void
}