llvm-6502/test/Transforms/LoopVectorize/runtime-check.ll

; RUN: opt < %s  -loop-vectorize -force-vector-unroll=1 -force-vector-width=4 -dce -instcombine -S | FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.9.0"

; Make sure we vectorize this loop:
; int foo(float *a, float *b, int n) {
;   for (int i=0; i<n; ++i)
;     a[i] = b[i] * 3;
; }

;CHECK: for.body.preheader:
;CHECK: br i1 %cmp.zero, label %middle.block, label %vector.memcheck
;CHECK: vector.memcheck:
;CHECK: br i1 %found.conflict, label %middle.block, label %vector.ph
;CHECK: load <4 x float>
define i32 @foo(float* nocapture %a, float* nocapture %b, i32 %n) nounwind uwtable ssp {
entry:
  %cmp6 = icmp sgt i32 %n, 0
  br i1 %cmp6, label %for.body, label %for.end

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
  %arrayidx = getelementptr inbounds float* %b, i64 %indvars.iv
  %0 = load float* %arrayidx, align 4
  %mul = fmul float %0, 3.000000e+00
  %arrayidx2 = getelementptr inbounds float* %a, i64 %indvars.iv
  store float %mul, float* %arrayidx2, align 4
  %indvars.iv.next = add i64 %indvars.iv, 1
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp eq i32 %lftr.wideiv, %n
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, %entry
  ret i32 undef
}
Remove the -licm pass from the loop vectorizer test because the loop vectorizer does it now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171930 91177308-0d34-0410-b5e6-96231b3b80d8 2013-01-09 01:20:59 +00:00			`; RUN: opt < %s -loop-vectorize -force-vector-unroll=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s`
Add support for memory runtime check. When we can, we calculate array bounds. If the arrays are found to be disjoint then we run the vectorized version of the loop. If they are not, we run the scalar code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167608 91177308-0d34-0410-b5e6-96231b3b80d8 2012-11-09 07:09:44 +00:00
			`target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"`
			`target triple = "x86_64-apple-macosx10.9.0"`

			`; Make sure we vectorize this loop:`
			`; int foo(float a, float b, int n) {`
			`; for (int i=0; i<n; ++i)`
			`; a[i] = b[i] * 3;`
			`; }`

LoopVectorizer: Emit memory checks into their own basic block. This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172902 91177308-0d34-0410-b5e6-96231b3b80d8 2013-01-19 13:57:58 +00:00			`;CHECK: for.body.preheader:`
			`;CHECK: br i1 %cmp.zero, label %middle.block, label %vector.memcheck`
			`;CHECK: vector.memcheck:`
			`;CHECK: br i1 %found.conflict, label %middle.block, label %vector.ph`
Add support for memory runtime check. When we can, we calculate array bounds. If the arrays are found to be disjoint then we run the vectorized version of the loop. If they are not, we run the scalar code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167608 91177308-0d34-0410-b5e6-96231b3b80d8 2012-11-09 07:09:44 +00:00			`;CHECK: load <4 x float>`
			`define i32 @foo(float* nocapture %a, float* nocapture %b, i32 %n) nounwind uwtable ssp {`
			`entry:`
			`%cmp6 = icmp sgt i32 %n, 0`
			`br i1 %cmp6, label %for.body, label %for.end`

			`for.body: ; preds = %entry, %for.body`
			`%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]`
			`%arrayidx = getelementptr inbounds float* %b, i64 %indvars.iv`
TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180796 91177308-0d34-0410-b5e6-96231b3b80d8 2013-04-30 17:52:57 +00:00			`%0 = load float* %arrayidx, align 4`
Add support for memory runtime check. When we can, we calculate array bounds. If the arrays are found to be disjoint then we run the vectorized version of the loop. If they are not, we run the scalar code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167608 91177308-0d34-0410-b5e6-96231b3b80d8 2012-11-09 07:09:44 +00:00			`%mul = fmul float %0, 3.000000e+00`
			`%arrayidx2 = getelementptr inbounds float* %a, i64 %indvars.iv`
TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180796 91177308-0d34-0410-b5e6-96231b3b80d8 2013-04-30 17:52:57 +00:00			`store float %mul, float* %arrayidx2, align 4`
Add support for memory runtime check. When we can, we calculate array bounds. If the arrays are found to be disjoint then we run the vectorized version of the loop. If they are not, we run the scalar code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167608 91177308-0d34-0410-b5e6-96231b3b80d8 2012-11-09 07:09:44 +00:00			`%indvars.iv.next = add i64 %indvars.iv, 1`
			`%lftr.wideiv = trunc i64 %indvars.iv.next to i32`
			`%exitcond = icmp eq i32 %lftr.wideiv, %n`
			`br i1 %exitcond, label %for.end, label %for.body`

			`for.end: ; preds = %for.body, %entry`
			`ret i32 undef`
			`}`