1
0
mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-04-03 18:32:50 +00:00
Sanjay Patel 39110ecd35 [X86] Prefer blendps over insertps codegen for one special case
With this patch, for this one exact case, we'll generate:

  blendps %xmm0, %xmm1, $1

instead of:

  insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866; 
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

Differential Revision: http://reviews.llvm.org/D8332



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232850 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-20 21:19:52 +00:00
..
2015-02-26 04:53:00 +00:00