testcases accordingly. Some are currently xfailed and will be filed
as bugs to be fixed or understood.
Performance results:
roughly neutral on SPEC
some micro benchmarks in the llvm suite are up between 100 and 150%, only
a pair of regressions that are due to be investigated
john-the-ripper saw:
10% improvement in traditional DES
8% improvement in BSDI DES
59% improvement in FreeBSD MD5
67% improvement in OpenBSD Blowfish
14% improvement in LM DES
Small compile time impact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127208 91177308-0d34-0410-b5e6-96231b3b80d8
bitcasts, which are really no-ops here. This fixes slowdowns on
MultiSource/Applications/aha and others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127031 91177308-0d34-0410-b5e6-96231b3b80d8
There was a previous implementation with patterns that would
have matched e.g.
shl <v4i32> <i32>,
but this is not valid LLVM IR so they never were selected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126998 91177308-0d34-0410-b5e6-96231b3b80d8
missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126845 91177308-0d34-0410-b5e6-96231b3b80d8
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Patch by Justin Holewinski
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126824 91177308-0d34-0410-b5e6-96231b3b80d8
- Add appropriate TableGen patterns for fadd, fsub, fmul.
- Add .f32 as the PTX type for the LLVM float type.
- Allow parameters, return values, and global variable declarations
to accept the float type.
- Add appropriate test cases.
Patch by Justin Holewinski
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126636 91177308-0d34-0410-b5e6-96231b3b80d8
It improves Win64's prologue/epilogue but it would not affect ia32 and amd64 (lack of nonvolatile XMMs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126568 91177308-0d34-0410-b5e6-96231b3b80d8
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126557 91177308-0d34-0410-b5e6-96231b3b80d8
Limit the folding of any_ext and sext into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.
Similar to commit 126080 (for enabling zext).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126424 91177308-0d34-0410-b5e6-96231b3b80d8
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126380 91177308-0d34-0410-b5e6-96231b3b80d8
events on the thread and wait until a resource is ready to event. The vector
of the resource that is ready is returned.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126320 91177308-0d34-0410-b5e6-96231b3b80d8
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.
The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
vmov.i32 d1, #0x80000000
vbsl d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126295 91177308-0d34-0410-b5e6-96231b3b80d8