mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2024-12-12 13:30:51 +00:00
checkpoint, the release notes are now feature complete.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115495 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
3bdcda1a8b
commit
7714c91533
@ -742,8 +742,9 @@ it run faster:</p>
|
||||
<li>A new (experimental) "-rendermf" pass is available which renders a
|
||||
MachineFunction into HTML, showing live ranges and other useful
|
||||
details.</li>
|
||||
|
||||
<!--New SubRegIndex tblgen class for targets -> jakob -->
|
||||
<li>The new SubRegIndex tablegen class allows subregisters to be indexed
|
||||
symbolically instead of numerically. If your target uses subregisters you
|
||||
will need to adapt to use SubRegIndex when you upgrade to 2.8.</li>
|
||||
<!-- SplitKit -->
|
||||
|
||||
<li>The -fast-isel instruction selection path (used at -O0 on X86) was rewritten
|
||||
@ -760,7 +761,7 @@ it run faster:</p>
|
||||
</div>
|
||||
|
||||
<div class="doc_text">
|
||||
<p>New features of the X86 target include:
|
||||
<p>New features and major changes in the X86 target include:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
@ -768,30 +769,38 @@ it run faster:</p>
|
||||
in registers across basic blocks, dramatically improving performance of code
|
||||
that uses long double, and when targetting CPUs that don't support SSE.</li>
|
||||
|
||||
New SSEDomainFix pass:
|
||||
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a
|
||||
register in a different domain than where it was defined. Some instructions
|
||||
have equvivalents for different domains, like por/orps/orpd. The
|
||||
SSEDomainFix pass tries to minimize the number of domain crossings by
|
||||
changing between equvivalent opcodes where possible.
|
||||
<li>The X86 backend now uses a SSEDomainFix pass to optimize SSE operations. On
|
||||
Nehalem ("Core i7") and newer CPUs there is a 2 cycle latency penalty on
|
||||
using a register in a different domain than where it was defined. This pass
|
||||
optimizes away these stalls.</li>
|
||||
|
||||
X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid
|
||||
0x66 prefixes, which are slow on some microarchitectures and bloat the code
|
||||
on others.
|
||||
<li>The X86 backend now promote 16-bit integer operations to 32-bits when
|
||||
possible. This avoids 0x66 prefixes, which are slow on some
|
||||
microarchitectures and bloat the code on all of them.</li>
|
||||
|
||||
New support for X86 "thiscall" calling convention (x86_thiscallcc in IR) for windows.
|
||||
<li>The X86 backend now supports the Microsoft "thiscall" calling convention,
|
||||
and a <a href="LangRef.html#callingconv">calling convention</a> to support
|
||||
<a href="#GHC">ghc</a>.</li>
|
||||
|
||||
New llvm.x86.int intrinsic (for int $42 and int3)
|
||||
<li>The X86 backend supports a new "llvm.x86.int" intrinsic, which maps onto
|
||||
the X86 "int $42" and "int3" instructions.</li>
|
||||
|
||||
Verbose assembly decodes X86 shuffle instructions, e.g.:
|
||||
insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
|
||||
unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
|
||||
pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
|
||||
<li>At the IR level, the <2 x float> datatype is now promoted and passed
|
||||
around as a <4 x float> instead of being passed and returns as an MMX
|
||||
vector. If you have a frontend that uses this, please pass and return a
|
||||
<2 x i32> instead (using bitcasts).</li>
|
||||
|
||||
<li>When printing .s files in verbose assembly mode (the default for clang -S),
|
||||
the X86 backend now decodes X86 shuffle instructions and prints human
|
||||
readable comments after the most inscrutible of them, e.g.:
|
||||
|
||||
<pre>
|
||||
insertps $113, %xmm3, %xmm0 <i># xmm0 = zero,xmm0[1,2],xmm3[1]</i>
|
||||
unpcklps %xmm1, %xmm0 <i># xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]</i>
|
||||
pshufd $1, %xmm1, %xmm1 <i># xmm1 = xmm1[1,0,0,0]</i>
|
||||
</pre>
|
||||
</li>
|
||||
|
||||
X86 ABI: <2 x float> in IR no longer maps onto MMX, it turns into <4 x float>
|
||||
|
||||
new GHC calling convention
|
||||
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
@ -806,14 +815,21 @@ it run faster:</p>
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>The ARM backend now optimizes tail calls into jumps.</li>
|
||||
<li>Scheduling is improved through the new list-hybrid scheduler as well
|
||||
as through better modeling of structural hazards.</li>
|
||||
<li><a href="LangRef.html#int_fp16">Half float</a> instructions are now
|
||||
supported.</li>
|
||||
<li>NEON support has been improved to model instructions which operate onto
|
||||
multiple consequtive registers more aggressively. This avoids lots of
|
||||
extraneous register copies.</li>
|
||||
<li>The ARM backend now uses a new "ARMGlobalMerge" pass, which merges several
|
||||
global variables into one, saving extra address computation (all the global
|
||||
variables can be accessed via same base address) and potentially reducing
|
||||
register pressure.</li>
|
||||
|
||||
NEON: Better performance for QQQQ (4-consecutive Q register) instructions. New reg sequence abstraction?
|
||||
ARM: Better scheduling (list-hybrid, hybrid?)
|
||||
ARM: Tail call support.
|
||||
ARM: General performance work and tuning.
|
||||
|
||||
ARM: Half float support through intrinsics LangRef.html#int_fp16
|
||||
<li>ARMGlobalMerge: <!-- Anton --> </li>
|
||||
<li>The ARM has received many minor improvements and tweaks which lead to
|
||||
substantially better performance in a wide range of different scenarios.</li>
|
||||
|
||||
<li>The ARM NEON intrinsics have been substantially reworked to reduce
|
||||
redundancy and improve code generation. Some of the major changes are:
|
||||
@ -863,23 +879,10 @@ it run faster:</p>
|
||||
</li>
|
||||
</ol>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<!--=========================================================================-->
|
||||
<div class="doc_subsection">
|
||||
<a name="otherimprovements">Other Improvements and New Features</a>
|
||||
</div>
|
||||
|
||||
<div class="doc_text">
|
||||
<p>Other miscellaneous features include:</p>
|
||||
|
||||
<ul>
|
||||
<li></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
<!--=========================================================================-->
|
||||
<div class="doc_subsection">
|
||||
|
Loading…
Reference in New Issue
Block a user