PTX: Add basic documentation to CodeGenerator.html

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137315 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Justin Holewinski 2011-08-11 17:34:16 +00:00
parent 6236f7f2b6
commit dceb002f82

View File

@ -114,6 +114,7 @@
<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
</ul></li>
<li><a href="#ptx">The PTX backend</a></li>
</ul></li>
</ol>
@ -2912,6 +2913,70 @@ MOVSX32rm16 -&gt; movsx, 32-bit register, 16-bit memory
</div>
</div>
<!-- ======================================================================= -->
<h3>
<a name="ptx">The PTX backend</a>
</h3>
<div>
<p>The PTX code generator lives in the lib/Target/PTX directory. It is
currently a work-in-progress, but already supports most of the code
generation functionality needed to generate correct PTX kernels for
CUDA devices.</p>
<p>The code generator can target PTX 2.0+, and shader model 1.0+. The
PTX ISA Reference Manual is used as the primary source of ISA
information, though an effort is made to make the output of the code
generator match the output of the NVidia nvcc compiler, whenever
possible.</p>
<p>Code Generator Options:</p>
<table border="1" cellspacing="0">
<tr>
<th>Option</th>
<th>Description</th>
</tr>
<tr>
<td><code>double</code></td>
<td align="left">If enabled, the map_f64_to_f32 directive is
disabled in the PTX output, allowing native double-precision
arithmetic</td>
</tr>
<tr>
<td><code>no-fma</code></td>
<td align="left">Disable generation of Fused-Multiply Add
instructions, which may be beneficial for some devices</td>
</tr>
<tr>
<td><code>smxy / computexy</code></td>
<td align="left">Set shader model/compute capability to x.y,
e.g. sm20 or compute13</td>
</tr>
</table>
<p>Working:</p>
<ul>
<li>Arithmetic instruction selection (including combo FMA)</li>
<li>Bitwise instruction selection</li>
<li>Control-flow instruction selection</li>
<li>Function calls (only on SM 2.0+ and no return arguments)</li>
<li>Addresses spaces (0 = global, 1 = constant, 2 = local, 4 =
shared)</li>
<li>Thread synchronization (bar.sync)</li>
<li>Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)</li>
</ul>
<p>In Progress:</p>
<ul>
<li>Robust call instruction selection</li>
<li>Stack frame allocation</li>
<li>Device-specific instruction scheduling optimizations</li>
</ul>
</div>
</div>