llvm-6502/test
Jim Grosbach 0e536ee4ca Legalize: Improve legalization of long vector extends.
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
  %tmp = zext <16 x i8> %a to <16 x i32>
  store <16 x i32> %tmp, <16 x i32>*%p
  ret void
}

Generates:
  .section  __TEXT,__text,regular,pure_instructions
  .section  __TEXT,__const
  .align  5
LCPI0_0:
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _bar
  .align  4, 0x90
_bar:
  vpunpckhbw  %xmm0, %xmm0, %xmm1
  vpunpckhwd  %xmm0, %xmm1, %xmm2
  vpmovzxwd %xmm1, %xmm1
  vinsertf128 $1, %xmm2, %ymm1, %ymm1
  vmovaps LCPI0_0(%rip), %ymm2
  vandps  %ymm2, %ymm1, %ymm1
  vpmovzxbw %xmm0, %xmm3
  vpunpckhwd  %xmm0, %xmm3, %xmm3
  vpmovzxbd %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vandps  %ymm2, %ymm0, %ymm0
  vmovaps %ymm0, (%rdi)
  vmovaps %ymm1, 32(%rdi)
  vzeroupper
  ret

So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
  - the number of vector elements is even,
  - the source type is legal,
  - the type of a split source is illegal,
  - the type of an extended (by doubling element size) source is legal, and
  - the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."

With this change, the above example now compiles to:
_bar:
  vpxor %xmm1, %xmm1, %xmm1
  vpunpcklbw  %xmm1, %xmm0, %xmm2
  vpunpckhwd  %xmm1, %xmm2, %xmm3
  vpunpcklwd  %xmm1, %xmm2, %xmm2
  vinsertf128 $1, %xmm3, %ymm2, %ymm2
  vpunpckhbw  %xmm1, %xmm0, %xmm0
  vpunpckhwd  %xmm1, %xmm0, %xmm3
  vpunpcklwd  %xmm1, %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vmovaps %ymm0, 32(%rdi)
  vmovaps %ymm2, (%rdi)
  vzeroupper
  ret

This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.

rdar://14735100

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193727 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-31 00:20:48 +00:00
..
Analysis SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive. 2013-10-28 07:30:06 +00:00
Assembler Change objectsize intrinsic to accept different address spaces. 2013-10-07 18:06:48 +00:00
Bindings Fix check for supported targets in llvm-c lit.local.cfg 2013-10-23 08:47:52 +00:00
Bitcode
BugPoint
CodeGen Legalize: Improve legalization of long vector extends. 2013-10-31 00:20:48 +00:00
DebugInfo Add DebugInfo testcase for high_pc encoded as constant, fixed in r193555. 2013-10-30 20:27:17 +00:00
ExecutionEngine Adding a workaround for __main linking with remote lli and Cygwin/MinGW 2013-10-29 01:29:56 +00:00
Feature
FileCheck Fix "existant" typos 2013-10-29 02:35:28 +00:00
Instrumentation fix PR17635: false positive with packed structures 2013-10-24 09:17:24 +00:00
Integer
JitListener
Linker
LTO Optimize more linkonce_odr values during LTO. 2013-10-21 17:14:55 +00:00
MC This commit adds some (but not all) of the x86-64 relocations that are not 2013-10-30 18:47:25 +00:00
Object
Other Quote potential shell expansions found in tests 2013-10-28 23:37:45 +00:00
TableGen
tools
Transforms Teach scalarrepl about address spaces 2013-10-30 22:54:58 +00:00
Unit
Verifier
YAMLParser [Support][YAML] Add support for accessing tags and tag handle substitution. 2013-10-18 22:38:04 +00:00
CMakeLists.txt lit: add missing substitutions for recently added tools 2013-10-28 23:37:49 +00:00
lit.cfg lit: add missing substitutions for recently added tools 2013-10-28 23:37:49 +00:00
lit.site.cfg.in
Makefile
Makefile.tests
TestRunner.sh