mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-08-05 13:26:55 +00:00
On recent Intel u-arch's, folding loads into some unary SSE instructions can
be non-optimal. To be precise, we should avoid folding loads if the instructions only update part of the destination register, and the non-updated part is not needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks the partial register dependency and it can improve performance. e.g. movss (%rdi), %xmm0 cvtss2sd %xmm0, %xmm0 instead of cvtss2sd (%rdi), %xmm0 An alternative method to break dependency is to clear the register first. e.g. xorps %xmm0, %xmm0 cvtss2sd (%rdi), %xmm0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91672 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
@@ -57,6 +57,8 @@ def Feature64Bit : SubtargetFeature<"64bit", "HasX86_64", "true",
|
||||
"Support 64-bit instructions">;
|
||||
def FeatureSlowBTMem : SubtargetFeature<"slow-bt-mem", "IsBTMemSlow", "true",
|
||||
"Bit testing of memory is slow">;
|
||||
def FeatureBreakSSEDep : SubtargetFeature<"break-sse-dep", "BreakSSEDep","true",
|
||||
"Should break SSE partial update dep with load / xorps">;
|
||||
def FeatureSSE4A : SubtargetFeature<"sse4a", "HasSSE4A", "true",
|
||||
"Support SSE 4a instructions">;
|
||||
|
||||
@@ -86,17 +88,27 @@ def : Proc<"pentium2", [FeatureMMX, FeatureCMOV]>;
|
||||
def : Proc<"pentium3", [FeatureSSE1]>;
|
||||
def : Proc<"pentium-m", [FeatureSSE2, FeatureSlowBTMem]>;
|
||||
def : Proc<"pentium4", [FeatureSSE2]>;
|
||||
def : Proc<"x86-64", [FeatureSSE2, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"yonah", [FeatureSSE3, FeatureSlowBTMem]>;
|
||||
def : Proc<"prescott", [FeatureSSE3, FeatureSlowBTMem]>;
|
||||
def : Proc<"nocona", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"core2", [FeatureSSSE3, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"penryn", [FeatureSSE41, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"atom", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"corei7", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"nehalem", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem]>;
|
||||
def : Proc<"x86-64", [FeatureSSE2, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"yonah", [FeatureSSE3, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"prescott", [FeatureSSE3, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"nocona", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"core2", [FeatureSSSE3, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"penryn", [FeatureSSE41, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"atom", [FeatureSSE3, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"corei7", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
def : Proc<"nehalem", [FeatureSSE42, Feature64Bit, FeatureSlowBTMem,
|
||||
FeatureBreakSSEDep]>;
|
||||
// Sandy Bridge does not have FMA
|
||||
def : Proc<"sandybridge", [FeatureSSE42, FeatureAVX, Feature64Bit]>;
|
||||
def : Proc<"sandybridge", [FeatureSSE42, FeatureAVX, Feature64Bit,
|
||||
FeatureBreakSSEDep]>;
|
||||
|
||||
def : Proc<"k6", [FeatureMMX]>;
|
||||
def : Proc<"k6-2", [FeatureMMX, Feature3DNow]>;
|
||||
|
Reference in New Issue
Block a user