llvm-6502/lib
Sanjay Patel 0332323ab6 [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235012 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-15 15:22:55 +00:00
..
Analysis Re-apply r234898 and fix tests. 2015-04-15 06:24:07 +00:00
AsmParser Remove empty non-virtual destructors or mark them =default when non-public 2015-04-11 15:32:26 +00:00
Bitcode Revert "Verify sizes when trying to read a VBR" 2015-04-15 11:10:17 +00:00
CodeGen [MBP] Spell the conditions the same way through out this if statement. 2015-04-15 13:39:42 +00:00
DebugInfo Use 'override/final' instead of 'virtual' for overridden methods 2015-04-11 02:11:45 +00:00
ExecutionEngine [RuntimeDyld] Add casts to make delta computation 64-bit. 2015-04-15 04:46:01 +00:00
Fuzzer Removing a spurious space; NFC. 2015-04-06 16:09:13 +00:00
IR uselistorder: Remove the global bits 2015-04-15 03:14:06 +00:00
IRReader Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. 2015-02-11 03:28:02 +00:00
LineEditor Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. 2015-02-11 03:28:02 +00:00
Linker DebugInfo: Gut DISubprogram and DILexicalBlock* 2015-04-14 03:40:37 +00:00
LTO uselistorder: Remove the global bits 2015-04-15 03:14:06 +00:00
MC Write section and section table entries in the same order. 2015-04-15 13:07:47 +00:00
Object Change range-based for-loops to be -Wrange-loop-analysis clean. 2015-04-15 01:21:15 +00:00
Option Remove more superfluous .str() and replace std::string concatenation with Twine. 2015-03-30 15:42:36 +00:00
Passes [PM] Fixup for r231556 where I missed a dependency on intrinsics 2015-03-07 09:08:20 +00:00
ProfileData Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. 2015-03-23 19:32:43 +00:00
Support Fix lib\support\Windows/TimeValue.inc(48): warning C4189: 2015-04-15 07:45:52 +00:00
TableGen Remove empty non-virtual destructors or mark them =default when non-public 2015-04-11 15:32:26 +00:00
Target [x86] Implement combineRepeatedFPDivisors 2015-04-15 15:22:55 +00:00
Transforms Change range-based for-loops to be -Wrange-loop-analysis clean. 2015-04-15 01:21:15 +00:00
CMakeLists.txt [PM] Create a separate library for high-level pass management code. 2015-03-07 09:02:36 +00:00
LLVMBuild.txt [PM] Create a separate library for high-level pass management code. 2015-03-07 09:02:36 +00:00
Makefile [PM] Create a separate library for high-level pass management code. 2015-03-07 09:02:36 +00:00