llvm-6502/sse2-mul.ll at a67352d4012b9db3c0c06cf962c0e561054753c3 - llvm-6502 - Applefritter: Git

6502/llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-12-16 11:30:51 +00:00

Benjamin Kramer 2f8a6cdfa3 X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.

pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170985 91177308-0d34-0410-b5e6-96231b3b80d8

2012-12-22 16:07:56 +00:00

15 lines

307 B

LLVM

Raw Blame History

 ; RUN: llc < %s -march=x86-64 -mcpu=core2 | FileCheck %s
 define <4 x i32> @test1(<4 x i32> %x, <4 x i32> %y) {
   %m = mul <4 x i32> %x, %y
   ret <4 x i32> %m
 ; CHECK: test1:
 ; CHECK: pshufd $49
 ; CHECK: pmuludq
 ; CHECK: pshufd $49
 ; CHECK: pmuludq
 ; CHECK: shufps $-120
 ; CHECK: pshufd $-40
 ; CHECK: ret
 }