mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-01-10 02:36:06 +00:00
6bf4f67641
On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (splat) or a constant scalar. An example of this is a vector shift on x86. We can efficiently support for (i = 0 ; i < ; i += 4) w[0:3] = v[0:3] << <2, 2, 2, 2> but not for (i = 0; i < ; i += 4) w[0:3] = v[0:3] << x[0:3] This patch adds a parameter to getArithmeticInstrCost to further qualify operand values as uniform or uniform constant. Targets can then choose to return a different cost for instructions with such operand values. A follow-up commit will test this feature on x86. radar://13576547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178807 91177308-0d34-0410-b5e6-96231b3b80d8
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//