llvm-6502/test/CodeGen/X86/widen_conversions.ll

; RUN: llc < %s -mcpu=x86-64 -x86-experimental-vector-widening-legalization -x86-experimental-vector-shuffle-lowering | FileCheck %s

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-unknown"

define <4 x i32> @zext_v4i8_to_v4i32(<4 x i8>* %ptr) {
; CHECK-LABEL: zext_v4i8_to_v4i32:
; 
; CHECK:      movd (%{{.*}}), %[[X:xmm[0-9]+]]
; CHECK-NEXT: pxor %[[Z:xmm[0-9]+]], %[[Z]]
; CHECK-NEXT: punpcklbw %[[Z]], %[[X]]
; CHECK-NEXT: punpcklbw %[[Z]], %[[X]]
; CHECK-NEXT: ret

  %val = load <4 x i8>* %ptr
  %ext = zext <4 x i8> %val to <4 x i32>
  ret <4 x i32> %ext
}
[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8 2014-07-09 10:58:18 +00:00			`; RUN: llc < %s -mcpu=x86-64 -x86-experimental-vector-widening-legalization -x86-experimental-vector-shuffle-lowering \| FileCheck %s`

			`target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"`
			`target triple = "x86_64-unknown-unknown"`

			`define <4 x i32> @zext_v4i8_to_v4i32(<4 x i8>* %ptr) {`
			`; CHECK-LABEL: zext_v4i8_to_v4i32:`
			`;`
			`; CHECK: movd (%{{.*}}), %[[X:xmm[0-9]+]]`
			`; CHECK-NEXT: pxor %[[Z:xmm[0-9]+]], %[[Z]]`
			`; CHECK-NEXT: punpcklbw %[[Z]], %[[X]]`
			`; CHECK-NEXT: punpcklbw %[[Z]], %[[X]]`
			`; CHECK-NEXT: ret`

			`%val = load <4 x i8>* %ptr`
			`%ext = zext <4 x i8> %val to <4 x i32>`
			`ret <4 x i32> %ext`
			`}`