llvm-6502/test/CodeGen
Sanjay Patel 8765e82c83 [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
For code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

    We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
    We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.

[1] AddedComplexity has close to no documentation in the source. 
The best we have is this comment: "roughly corresponds to the number of nodes that are covered". 
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). 

I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Differential Revision: http://reviews.llvm.org/D8794



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233931 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-02 17:56:17 +00:00
..
AArch64 Fix PR23065. Avoid optimizing bitcast of build_vector with constant input to scalar_to_vector. 2015-04-01 01:52:38 +00:00
ARM [SDAG] Move TRUNCATE splitting logic into a helper, and use 2015-03-31 10:20:58 +00:00
BPF [bpf] mark mov instructions as ReMaterializable 2015-03-31 02:49:58 +00:00
CPP [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
Generic LLParser: Require non-null scope for MDLocation and MDLocalVariable 2015-03-27 17:56:39 +00:00
Hexagon Expand MUX instructions early on Hexagon 2015-03-31 13:35:12 +00:00
Inputs DebugInfo: Fix bad debug info for compile units and types 2015-03-27 20:46:33 +00:00
Mips [mips] Make sure that we don't adjust the stack pointer by zero amount. 2015-04-02 10:14:54 +00:00
MSP430 [opaque pointer type] Add textual IR support for explicit type parameter to gep operator 2015-03-13 18:20:45 +00:00
NVPTX [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
PowerPC [PowerPC] FastISel can't handle i1 return values when using CR bits 2015-04-01 00:40:48 +00:00
R600 [R600/SI] Fix testcase check line. 2015-03-27 20:41:42 +00:00
SPARC [opaque pointer type] Add textual IR support for explicit type parameter to gep operator 2015-03-13 18:20:45 +00:00
SystemZ [SystemZ] Support transactional execution on zEC12 2015-04-01 12:51:43 +00:00
Thumb DebugInfo: Fix bad debug info for compile units and types 2015-03-27 20:46:33 +00:00
Thumb2 Fix a nasty bug in DAGCombine of STORE nodes. 2015-03-19 22:48:57 +00:00
WinEH Fix WinEHPrepare bug with multiple catch handlers 2015-04-01 17:21:25 +00:00
X86 [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073) 2015-04-02 17:56:17 +00:00
XCore DebugInfo: Fix testcases that fail -verify-debug-info=true 2015-03-16 21:10:12 +00:00