This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222710 91177308-0d34-0410-b5e6-96231b3b80d8
We were matching against the assume intrinsic in every check. Since we know that it must be an assume, this is just wasted work. Somewhat surprisingly, matching an intrinsic id is actually relatively expensive. It devolves to a string construction and comparison in Function::isIntrinsic.
I originally spotted this because it showed up in a performance profile of my compiler. I've since discovered a separate issue which seems to be the actual root cause, but this is minor perf goodness regardless.
I'm likely to follow up with another change to factor out the comparison matching. There's no need to match the compare instruction in every single one of the tests.
Differential Revision: http://reviews.llvm.org/D6312
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222709 91177308-0d34-0410-b5e6-96231b3b80d8
This change implements the comment and style changes Sean requested during post commit review with r221742. Sorry for the delay.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222707 91177308-0d34-0410-b5e6-96231b3b80d8
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.
Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222672 91177308-0d34-0410-b5e6-96231b3b80d8
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.
This fixes PR21653.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222659 91177308-0d34-0410-b5e6-96231b3b80d8
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.
Differential Revision: http://reviews.llvm.org/D6149
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222648 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.
Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout. This resulted in opt crashing.
This fixes PR21651.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222645 91177308-0d34-0410-b5e6-96231b3b80d8
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:
1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.
This caused a crash, since the source value for the insertps ends-up uninitialized.
Differential Revision: http://reviews.llvm.org/D6377
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
i1 is not a legal type on Evergreen, so this combine proceeded
and tried to produce a bitcast between i1 and i8.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222630 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222614 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222590 91177308-0d34-0410-b5e6-96231b3b80d8
We need to use a s_mov_b32 rather than a copy, so that CSE will
eliminate redundant moves to the m0 register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222584 91177308-0d34-0410-b5e6-96231b3b80d8
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.
This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222583 91177308-0d34-0410-b5e6-96231b3b80d8
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.
Differential Revision: http://reviews.llvm.org/D3566
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222580 91177308-0d34-0410-b5e6-96231b3b80d8