- The idea is that when a match fails, we just try to match each of +'b', +'w',
+'l'. If exactly one matches, we assume this is a mnemonic prefix and accept
it. If all match, we assume it is width generic, and take the 'l' form.
- This would be a horrible hack, if it weren't so simple. Therefore it is an
elegant solution! Chris gets the credit for this particular elegant
solution. :)
- Next step to making this more robust is to have the X86 matcher generate the
mnemonic prefix information. Ideally we would also compute up-front exactly
which mnemonic to attempt to match, but this may require more custom code in
the matcher than is really worth it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103012 91177308-0d34-0410-b5e6-96231b3b80d8
When a target instruction wants to set target-specific flags, it should simply
set bits in the TSFlags bit vector defined in the Instruction TableGen class.
This works well because TableGen resolves member references late:
class I : Instruction {
AddrMode AM = AddrModeNone;
let TSFlags{3-0} = AM.Value;
}
let AM = AddrMode4 in
def ADD : I;
TSFlags gets the expected bits from AddrMode4 in this example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@100384 91177308-0d34-0410-b5e6-96231b3b80d8
a new subtarget option for AES and check for the support. Add "westmere"
line of processors and add AES-NI support to the core i7.
Add a couple of TODOs for information I couldn't verify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@100231 91177308-0d34-0410-b5e6-96231b3b80d8
Remove much horribleness from X86InstrFormats as a result. Similar
simplifications are probably possible for other targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@99539 91177308-0d34-0410-b5e6-96231b3b80d8
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.
The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.
This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.
The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@99524 91177308-0d34-0410-b5e6-96231b3b80d8
temporary workaround for matching inc/dec on x86_64 to the correct instruction.
- This hack will eventually be replaced with a robust mechanism for handling
matching instructions based on the available target features.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@98858 91177308-0d34-0410-b5e6-96231b3b80d8
ignore alignment requirements for SIMD memory operands. This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@93151 91177308-0d34-0410-b5e6-96231b3b80d8
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.
movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0
instead of
cvtss2sd (%rdi), %xmm0
An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91672 91177308-0d34-0410-b5e6-96231b3b80d8
class into its own X86ATTInstPrinter class. The inst
printer now has just one dependence on the code generator
(TRI).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81703 91177308-0d34-0410-b5e6-96231b3b80d8
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10
New processor names match those used by gcc.
Patch by Paul Redmond!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72434 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2, however it's possible to disable SSE2, and the subtarget support
code thinks that if 64-bit implies SSE2 and SSE2 is disabled then
64-bit should also be disabled. Instead, just mark all the 64-bit
subtargets as explicitly supporting SSE2.
Also, move the code that makes -march=x86-64 enable 64-bit support by
default to only apply when there is no explicit subtarget. If you
need to specify a subtarget and you want 64-bit code, you'll need to
select a subtarget that supports 64-bit code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63575 91177308-0d34-0410-b5e6-96231b3b80d8
for fastcc from X86CallingConv.td. This means that nested functions
are not supported for calling convention 'fastcc'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@42934 91177308-0d34-0410-b5e6-96231b3b80d8
feature is set, then the features in the implied list should be set also.
The opposite is also enforced: if a feature in the implied list isn't set,
then the feature that owns that implies list shouldn't be set either.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@36756 91177308-0d34-0410-b5e6-96231b3b80d8
- Added a new format for instructions where the source register is implied
and it is same as the destination register. Used for pseudo instructions
that clear the destination register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@25872 91177308-0d34-0410-b5e6-96231b3b80d8