llvm-6502/test/CodeGen/X86/extractelement-load.ll
Chandler Carruth 963a5e6c61 [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.

However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.

The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.

Original commit message for r215611 which covers the rest of the chang:
  [SDAG] Fix a case where we would iteratively legalize a node during
  combining by replacing it with something else but not re-process the
  node afterward to remove it.

  In a truly remarkable stroke of bad luck, this would (in the test case
  attached) end up getting some other node combined into it without ever
  getting re-processed. By adding it back on to the worklist, in addition
  to deleting the dead nodes more quickly we also ensure that if it
  *stops* being dead for any reason it makes it back through the
  legalizer. Without this, the test case will end up failing during
  instruction selection due to an and node with a type we don't have an
  instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:22:16 +00:00

47 lines
1.6 KiB
LLVM

; RUN: llc < %s -march=x86 -mattr=+sse2 -mcpu=yonah | FileCheck %s
; RUN: llc < %s -march=x86-64 -mattr=+sse2 -mcpu=core2 | FileCheck %s
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
define i32 @t(<2 x i64>* %val) nounwind {
; CHECK-LABEL: t:
; CHECK-NOT: movd
; CHECK: movl 8(
; CHECK-NEXT: ret
%tmp2 = load <2 x i64>* %val, align 16 ; <<2 x i64>> [#uses=1]
%tmp3 = bitcast <2 x i64> %tmp2 to <4 x i32> ; <<4 x i32>> [#uses=1]
%tmp4 = extractelement <4 x i32> %tmp3, i32 2 ; <i32> [#uses=1]
ret i32 %tmp4
}
; Case where extractelement of load ends up as undef.
; (Making sure this doesn't crash.)
define i32 @t2(<8 x i32>* %xp) {
; CHECK-LABEL: t2:
; CHECK: ret
%x = load <8 x i32>* %xp
%Shuff68 = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32
undef, i32 7, i32 9, i32 undef, i32 13, i32 15, i32 1, i32 3>
%y = extractelement <8 x i32> %Shuff68, i32 0
ret i32 %y
}
; This case could easily end up inf-looping in the DAG combiner due to an
; low alignment load of the vector which prevents us from reliably forming a
; narrow load.
; FIXME: It would be nice to detect whether the target has fast and legal
; unaligned loads and use them here.
define void @t3() {
; CHECK-LABEL: t3:
;
; This movs the entire vector, shuffling the high double down. If we fixed the
; FIXME above it would just move the high double directly.
; CHECK: movhpd %xmm
bb:
%tmp13 = load <2 x double>* undef, align 1
%.sroa.3.24.vec.extract = extractelement <2 x double> %tmp13, i32 1
store double %.sroa.3.24.vec.extract, double* undef, align 8
unreachable
}