Evan Cheng d51425a82d Use movaps / movapd (instead of movss / movsd) to do FR32 / FR64 reg to reg
transfer.

According to the Intel P4 Optimization Manual:

Moves that write a portion of a register can introduce unwanted
dependences. The movsd reg, reg instruction writes only the bottom
64 bits of a register, not to all 128 bits. This introduces a dependence on
the preceding instruction that produces the upper 64 bits (even if those
bits are not longer wanted). The dependence inhibits register renaming,
and thereby reduces parallelism.

Not to mention movaps is shorter than movss.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26226 91177308-0d34-0410-b5e6-96231b3b80d8
2006-02-16 01:50:02 +00:00
..
2006-02-13 18:52:29 +00:00
2006-02-15 22:14:34 +00:00

Target Independent Opportunities:

===-------------------------------------------------------------------------===

FreeBench/mason contains code like this:

static p_type m0u(p_type p) {
  int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
  p_type pu;
  pu.a = m[p.a];
  pu.b = m[p.b];
  pu.c = m[p.c];
  return pu;
}

We currently compile this into a memcpy from a static array into 'm', then
a bunch of loads from m.  It would be better to avoid the memcpy and just do
loads from the static array.

===-------------------------------------------------------------------------===

Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
precision don't matter (ffastmath).  Misc/mandel will like this. :)

//===---------------------------------------------------------------------===//

Solve this DAG isel folding deficiency:

int X, Y;

void fn1(void)
{
  X = X | (Y << 3);
}

compiles to

fn1:
	movl Y, %eax
	shll $3, %eax
	orl X, %eax
	movl %eax, X
	ret

The problem is the store's chain operand is not the load X but rather
a TokenFactor of the load X and load Y, which prevents the folding.

There are two ways to fix this:

1. The dag combiner can start using alias analysis to realize that y/x
   don't alias, making the store to X not dependent on the load from Y.
2. The generated isel could be made smarter in the case it can't
   disambiguate the pointers.

Number 1 is the preferred solution.

//===---------------------------------------------------------------------===//

DAG combine this into mul A, 8:

int %test(int %A) {
  %B = mul int %A, 8  ;; shift
  %C = add int %B, 7  ;; dead, no demanded bits.
  %D = and int %C, -8 ;; dead once add is gone.
  ret int %D
}

This sort of thing occurs in the alloca lowering code and other places that
are generating alignment of an already aligned value.