a new subtarget option for AES and check for the support. Add "westmere"
line of processors and add AES-NI support to the core i7.
Add a couple of TODOs for information I couldn't verify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@100231 91177308-0d34-0410-b5e6-96231b3b80d8
aes instead of sse4.2. Add a brief todo for a subtarget flag and rework
the aeskeygenassist instruction to more closely match the docs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@100078 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrite the pmulld patterns, and make sure that they fold in loads of
arguments into the instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@99910 91177308-0d34-0410-b5e6-96231b3b80d8
--- Reverse-merging r99440 into '.':
U test/MC/AsmParser/X86/x86_32-bit_cat.s
U test/MC/AsmParser/X86/x86_32-encoding.s
U include/llvm/IntrinsicsX86.td
U include/llvm/CodeGen/SelectionDAGNodes.h
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86ISelLowering.h
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@99450 91177308-0d34-0410-b5e6-96231b3b80d8
override prefix and only the r/m16 forms should have had that. Also for variant
one, the AT&T syntax, added suffixes to all forms. Also added the missing
64-bit form for 'CRC32 r64, r/m8'. Plus added test cases for all forms and
tweaked one test case to add the needed suffixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@98980 91177308-0d34-0410-b5e6-96231b3b80d8
to input patterns, we can fix X86ISD::CMP and X86ISD::BT as taking
two inputs (which have to be the same type) and *returning an i32*.
This is how the SDNodes get made in the graph, but we weren't able
to model it this way due to deficiencies in the pattern language.
Now we can change things like this:
def UCOM_FpIr80: FpI_<(outs), (ins RFP80:$lhs, RFP80:$rhs), CompareFP,
- [(X86cmp RFP80:$lhs, RFP80:$rhs),
- (implicit EFLAGS)]>; // CC = ST(0) cmp ST(i)
+ [(set EFLAGS, (X86cmp RFP80:$lhs, RFP80:$rhs))]>;
and fix terrible crimes like this:
-def : Pat<(parallel (X86cmp GR8:$src1, 0), (implicit EFLAGS)),
+def : Pat<(X86cmp GR8:$src1, 0),
(TEST8rr GR8:$src1, GR8:$src1)>;
This relies on matching the result of TEST8rr (which is EFLAGS, which is
an implicit def) to the result of X86cmp, an i32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@98903 91177308-0d34-0410-b5e6-96231b3b80d8
Extracting the low element of a vector is now done with EXTRACT_SUBREG,
and the zero-extension performed by load movss is now modeled with
SUBREG_TO_REG, and so on.
Register-to-register movss and movsd are no longer considered copies;
they are two-address instructions which insert a scalar into a vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@97354 91177308-0d34-0410-b5e6-96231b3b80d8
non-temporal. Fix from r96241 for botched encoding of MOVNTDQ.
Add documentation for !nontemporal metadata.
Add a simpler movnt testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@96386 91177308-0d34-0410-b5e6-96231b3b80d8
ignore alignment requirements for SIMD memory operands. This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@93151 91177308-0d34-0410-b5e6-96231b3b80d8
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.
movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0
instead of
cvtss2sd (%rdi), %xmm0
An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91672 91177308-0d34-0410-b5e6-96231b3b80d8
Also fixed the corresponding testcase, and the PALIGNR
intrinsic (tested for correctness with llvm-gcc).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@89491 91177308-0d34-0410-b5e6-96231b3b80d8
1. rename the movhp patfrag to movlhps, since thats what it actually matches
2. eliminate the bogus movhps load and store patterns, they were incorrect. The load transforms are already handled (correctly) by shufps/unpack.
3. revert a recent test change to its correct form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@86415 91177308-0d34-0410-b5e6-96231b3b80d8
bunch of associated comments, because it doesn't have anything to do
with DAGs or scheduling. This is another step in decoupling MachineInstr
emitting from scheduling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@85517 91177308-0d34-0410-b5e6-96231b3b80d8
All of these do not have patterns (they're for the
disassembler).
Many of the floating-point instructions will probably
be rolled into definitions that have patterns, and may
eventually be superseded by mdefs. So I put them
together and left a comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81979 91177308-0d34-0410-b5e6-96231b3b80d8
Add patterns and instruction encoding information.
Add custom lowering to deal with hardwired return register of
uncertain type (xmm0).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@79377 91177308-0d34-0410-b5e6-96231b3b80d8
- Used to mark fake instructions which don't correspond to an actual machine
instruction (or are duplicates of a real instruction). This is to be used for
"special cases" in the .td files, which should be ignored by things like the
assembler and disassembler. We still need a good solution to handle pervasive
duplication, like with the Int_ instructions.
- Set the bit on fake "mov 0" style instructions, which allows turning an
assembler matcher warning into a hard error.
- -2 FIXMEs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78731 91177308-0d34-0410-b5e6-96231b3b80d8
bytes for F2 0F 38 and propagate. Add a FIXME for a set
of possibilities which correspond to intrinsics already used.
New test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78508 91177308-0d34-0410-b5e6-96231b3b80d8
due to x86 encoding restrictions. This is currently off by default
because it may cause code quality regressions. This is for PR4572.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77565 91177308-0d34-0410-b5e6-96231b3b80d8
to support vector arguments and scalar arguments correctly. Update
lowering and fix comment to refer to pinsr* instead of insertps.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76921 91177308-0d34-0410-b5e6-96231b3b80d8
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.
Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72556 91177308-0d34-0410-b5e6-96231b3b80d8
the Intel manual (screenshot) says it should be 0b11110110 (f6). The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."
Patch by Sean Callanan!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72508 91177308-0d34-0410-b5e6-96231b3b80d8
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@70225 91177308-0d34-0410-b5e6-96231b3b80d8
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@69952 91177308-0d34-0410-b5e6-96231b3b80d8
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@68560 91177308-0d34-0410-b5e6-96231b3b80d8
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@68552 91177308-0d34-0410-b5e6-96231b3b80d8
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.
Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60461 91177308-0d34-0410-b5e6-96231b3b80d8
the predicates by extending simple predicates to create
more complex predicates instead of duplicating the logic
for the simple predicates.
This doesn't reduce much redundancy in DAGISelEmitter.cpp's
generated source yet; that will require improvements to
DAGISelEmitter.cpp's instruction sorting, to make it more
effectively group nodes with similar predicates together.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57565 91177308-0d34-0410-b5e6-96231b3b80d8
the same pattern as roundpd/roundps, the Intel compiler
builtins do not: rounds* has an extra operand. Fixes
gcc.target/i386/sse4_1-rounds[sd]-[1234].c
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57370 91177308-0d34-0410-b5e6-96231b3b80d8
a constant vector ("{0x123, 0x456}" syntax). The fix is to simplify the
_mm_srli_si128 macro, and move the "* 8" from the macro into the compiler
back-end. I can't change the existing __builtins because so many people are
using them :-(."
Patch by Stuart Hastings!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56944 91177308-0d34-0410-b5e6-96231b3b80d8