Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:
1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).
So instead of this:
movaps %xmm0, %xmm1
rcpss %xmm1, %xmm1
movss %xmm1, %xmm0
We should now get:
rcpss %xmm0, %xmm0
And instead of this:
vsqrtss %xmm0, %xmm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]
We should now get:
vsqrtss %xmm0, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D9504
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236740 91177308-0d34-0410-b5e6-96231b3b80d8
We don't need codegen-only intrinsic instructions for the vector forms of these instructions.
This makes the reciprocal estimate instruction lowering identical to how we handle normal
square roots: (V)SQRTPS / (V)SQRTPD.
No existing regression tests fail with this patch.
Differential Revision: http://reviews.llvm.org/D9301
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236013 91177308-0d34-0410-b5e6-96231b3b80d8
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.
It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.
Differential Revision: http://reviews.llvm.org/D8691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235014 91177308-0d34-0410-b5e6-96231b3b80d8
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230845 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230499 91177308-0d34-0410-b5e6-96231b3b80d8
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230226 91177308-0d34-0410-b5e6-96231b3b80d8
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors.
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.
Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.
Differential Revision: http://reviews.llvm.org/D7474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229531 91177308-0d34-0410-b5e6-96231b3b80d8
Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes.
This patch also sets the SSE domains of all the XOP instructions.
Differential Revision: http://reviews.llvm.org/D7646
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229267 91177308-0d34-0410-b5e6-96231b3b80d8
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229214 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc.
Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources.
Differential Revision: http://reviews.llvm.org/D7484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228685 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
(Re-commit of r227728)
Differential Revision: http://reviews.llvm.org/D6789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227752 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
Differential Revision: http://reviews.llvm.org/D6789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227728 91177308-0d34-0410-b5e6-96231b3b80d8
The default op indices frmo TargetInstrInfo::findCommutedOpIndices are being commuted so we don't need to do this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227689 91177308-0d34-0410-b5e6-96231b3b80d8
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.
Differential Revision: http://reviews.llvm.org/D7178
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227145 91177308-0d34-0410-b5e6-96231b3b80d8
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.
Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226873 91177308-0d34-0410-b5e6-96231b3b80d8
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.
The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.
Differential Revision: http://reviews.llvm.org/D7094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226745 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).
Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.
Differential Revision: http://reviews.llvm.org/D7055
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226513 91177308-0d34-0410-b5e6-96231b3b80d8
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.
Command line options:
-noop-insertion // Enable noop insertion.
-noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
-max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.
In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
-fdiversify
This is the llvm part of the patch.
clang part: D3393
http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225908 91177308-0d34-0410-b5e6-96231b3b80d8
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225252 91177308-0d34-0410-b5e6-96231b3b80d8
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225242 91177308-0d34-0410-b5e6-96231b3b80d8
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions.
Added lowering tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224516 91177308-0d34-0410-b5e6-96231b3b80d8