missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126845 91177308-0d34-0410-b5e6-96231b3b80d8
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Patch by Justin Holewinski
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126824 91177308-0d34-0410-b5e6-96231b3b80d8
intersection of the LHS and RHS ConstantRanges and return "false" when
the range is empty.
This simplifies some code and catches some extra cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126744 91177308-0d34-0410-b5e6-96231b3b80d8
more work to do here, "icmp ult (urem X, 10), 11" doesn't optimize away yet.
Fixes example 3 from PR9343!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126741 91177308-0d34-0410-b5e6-96231b3b80d8
- Add appropriate TableGen patterns for fadd, fsub, fmul.
- Add .f32 as the PTX type for the LLVM float type.
- Allow parameters, return values, and global variable declarations
to accept the float type.
- Add appropriate test cases.
Patch by Justin Holewinski
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126636 91177308-0d34-0410-b5e6-96231b3b80d8
It improves Win64's prologue/epilogue but it would not affect ia32 and amd64 (lack of nonvolatile XMMs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126568 91177308-0d34-0410-b5e6-96231b3b80d8
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126557 91177308-0d34-0410-b5e6-96231b3b80d8
Yes, there are other types than i8* and GEPs on them can produce an add+multiply.
We don't consider that cheap enough to be speculatively executed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126481 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce a variable in the AsmParserExtension whether [] is valid in an
expression. If it is true, parse them like (). Enable this for ELF only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126443 91177308-0d34-0410-b5e6-96231b3b80d8
Limit the folding of any_ext and sext into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.
Similar to commit 126080 (for enabling zext).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126424 91177308-0d34-0410-b5e6-96231b3b80d8
Some tests on Windows use the "not" utility and fail with an error "program not executable". The reason for this error is that the name of the executable file sended to the "not" without the extension.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126383 91177308-0d34-0410-b5e6-96231b3b80d8
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126380 91177308-0d34-0410-b5e6-96231b3b80d8
function prototype into a call to a varargs prototype. We do
allow the xform if we have a definition, but otherwise we don't
want to risk that we're changing the abi in a subtle way. On
X86-64, for example, varargs require passing stuff in %al.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126363 91177308-0d34-0410-b5e6-96231b3b80d8
enabled for all targets. Non-X86 targets should not have this behavior
enabled by default.
Joerg, if you would like to resubmit with the behavior conditionalized to be
X86-ELF only, that's fine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126336 91177308-0d34-0410-b5e6-96231b3b80d8
events on the thread and wait until a resource is ready to event. The vector
of the resource that is ready is returned.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126320 91177308-0d34-0410-b5e6-96231b3b80d8
it to ignore valid uses of FS and GS as additional
base registers in address computations. Added a test
case for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126302 91177308-0d34-0410-b5e6-96231b3b80d8
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.
The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
vmov.i32 d1, #0x80000000
vbsl d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126295 91177308-0d34-0410-b5e6-96231b3b80d8
at phis. This enables us to eliminate a lot of pointless zexts during the DAGCombine
phase. This fixes <rdar://problem/8760114>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126170 91177308-0d34-0410-b5e6-96231b3b80d8
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working.
- The debugger needs to be aware of prolog_end attribute attached with line table entries.
- The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126155 91177308-0d34-0410-b5e6-96231b3b80d8
"dllimport" function must not be GlobalVariable, but Function. It is enough to check with GlobalValue.
test/CodeGen/X86/dll-linkage.ll is updated to check llc -O0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126110 91177308-0d34-0410-b5e6-96231b3b80d8
of a constant had a minor typo introduced when copying it from the book, which
caused it to favor negative approximations over positive approximations in many
cases. Positive approximations require fewer operations beyond the multiplication.
In the case of division by 3, we still generate code that is a single instruction
larger than GCC's code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126097 91177308-0d34-0410-b5e6-96231b3b80d8
test for that. With this change, test/CodeGen/X86/codegen-dce.ll no longer finds
any instructions to DCE, so delete the test.
Also renamed J and JP to I and IP in RecursivelyDeleteDeadPHINode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126088 91177308-0d34-0410-b5e6-96231b3b80d8
We usually catch this kind of optimization through InstSimplify's distributive
magic, but or doesn't distribute over xor in general.
"A | ~(A | B) -> A | ~B" hits 24 times on gcc.c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126081 91177308-0d34-0410-b5e6-96231b3b80d8
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126080 91177308-0d34-0410-b5e6-96231b3b80d8
constant, including globals. This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.
This enables us to handle stores of relocatable globals. This kicks in about
48 times in 254.gap, giving us stuff like this:
@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16
...
call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126044 91177308-0d34-0410-b5e6-96231b3b80d8
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126040 91177308-0d34-0410-b5e6-96231b3b80d8
taken (and used!). This prevents merging the blocks (invalidating
the block addresses) in a case like this:
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
void foo() {
printf("%p\n", _THIS_IP_);
printf("%p\n", _THIS_IP_);
printf("%p\n", _THIS_IP_);
}
which fixes PR4151.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125829 91177308-0d34-0410-b5e6-96231b3b80d8
No one uses *-mingw64. mingw-w64 is represented as {i686|x86_64}-w64-mingw32. In llvm side, i686 and x64 can be treated as similar way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125747 91177308-0d34-0410-b5e6-96231b3b80d8
variations (some of these were already present so I unified the code). Spotted by my
auto-simplifier as occurring a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125734 91177308-0d34-0410-b5e6-96231b3b80d8
transformation if we can't legally create a build vector of the correct
type. Check that we can make the transformation first, and add a TODO to
refactor this code with similar cases.
Fixes: PR9223 and rdar://9000350
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125631 91177308-0d34-0410-b5e6-96231b3b80d8
Machine instruction range consisting of only DBG_VALUE MIs only contributes consecutive labels in assembly output, which is harmless, and empty scope entry in DebugInfo, which confuses debugger tools.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125577 91177308-0d34-0410-b5e6-96231b3b80d8
The i64_buildvector test in this file relies on the alignment of i64 and
f64 types being the same, which is true for Darwin but not AAPCS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125525 91177308-0d34-0410-b5e6-96231b3b80d8
- Add custom operand matching for imod and iflags.
- Rename SplitMnemonicAndCC to SplitMnemonic since it splits more than CC
from mnemonic.
- While adding ".w" as an operand, don't change "Head" to avoid passing the
wrong mnemonic to ParseOperand.
- Add asm parser tests.
- Add disassembler tests just to make sure it can catch all cps versions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125489 91177308-0d34-0410-b5e6-96231b3b80d8
have their low bits set to zero. This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.
Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively. Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST). The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125470 91177308-0d34-0410-b5e6-96231b3b80d8
plus some variations of this. According to my auto-simplifier this occurs a lot
but usually in combination with max/min idioms. Because max/min aren't handled
yet this unfortunately doesn't have much effect in the testsuite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125462 91177308-0d34-0410-b5e6-96231b3b80d8
It caused a crash in MultiSource/Benchmarks/Bullet.
Opt hit an assertion with "opt -std-compile-opts" because
Constant::getAllOnesValue doesn't know how to handle floats.
This patch added a test to reproduce the problem and a check that the
destination vector is of integer type.
Thank you Benjamin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125459 91177308-0d34-0410-b5e6-96231b3b80d8
the shift amounts are in a suitably wide type so that
we don't generate out of range constant shift amounts.
This fixes PR9028.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125458 91177308-0d34-0410-b5e6-96231b3b80d8
is narrower than the shift register. Doing an anyext provides undefined bits in
the top part of the register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125457 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a FIXME in scev-aa.ll (allowing a new no-alias result) and
generally makes things more precise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125449 91177308-0d34-0410-b5e6-96231b3b80d8
These are just FXSAVE and FXRSTOR with REX.W prefixes. These versions use
64-bit pointer values instead of 32-bit pointer values in the memory map they
dump and restore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125446 91177308-0d34-0410-b5e6-96231b3b80d8