type after type legalization has completed. Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127000 91177308-0d34-0410-b5e6-96231b3b80d8
There was a previous implementation with patterns that would
have matched e.g.
shl <v4i32> <i32>,
but this is not valid LLVM IR so they never were selected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126998 91177308-0d34-0410-b5e6-96231b3b80d8
"icmp pred %X, CI" and a number of examples where "%X = binop %Y, CI2".
Some of these cases (div and rem) used to make it through opt -O2, but the
others are probably now making code elsewhere redundant (probably instcombine).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126988 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126981 91177308-0d34-0410-b5e6-96231b3b80d8
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.
Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.
Testcase ObjC/exceptions-4.m in r126968.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126969 91177308-0d34-0410-b5e6-96231b3b80d8
and iprintf is available on the target. Currently iprintf is only
marked as being available on the XCore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126935 91177308-0d34-0410-b5e6-96231b3b80d8
for calls to weak symbols with a definition has the appearance of working
with LLVM-generated code because weak symbol definitions are put in their
own sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126933 91177308-0d34-0410-b5e6-96231b3b80d8
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126926 91177308-0d34-0410-b5e6-96231b3b80d8
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126925 91177308-0d34-0410-b5e6-96231b3b80d8
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126924 91177308-0d34-0410-b5e6-96231b3b80d8
We need to wait until we meet a PHIDef in its defining block before resurrecting
PHIKills in the predecessors.
This should unbreak the llvm-gcc-build-x86_64-darwin10-x-mingw32-x-armeabi bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126905 91177308-0d34-0410-b5e6-96231b3b80d8
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG. Unfortunately
this blows up exponentially in some cases. The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes. Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126899 91177308-0d34-0410-b5e6-96231b3b80d8
Values that map to a single new value in a new interval after splitting don't
need new PHIDefs, and if the parent value was never rematerialized the live
range will be the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126894 91177308-0d34-0410-b5e6-96231b3b80d8
missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126845 91177308-0d34-0410-b5e6-96231b3b80d8
uses.
The result produced by the streamer is used to give the linker more accurate
information and to add to llvm.compiler.used. The second improvement removes
the need for the user to add __attribute__((used)) to functions only used in
inline asm. The first one lets us build firefox with LTO on Darwin :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126830 91177308-0d34-0410-b5e6-96231b3b80d8
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Patch by Justin Holewinski
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126824 91177308-0d34-0410-b5e6-96231b3b80d8
Extract the updateSSA() method from the too long extendRange().
LiveOutCache can be shared among all the new intervals since there is at most
one of the new ranges live out from each basic block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126818 91177308-0d34-0410-b5e6-96231b3b80d8
This method could probably be used by LiveIntervalAnalysis::shrinkToUses, and
now it can use extendIntervalEndTo() which coalesces ranges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126803 91177308-0d34-0410-b5e6-96231b3b80d8
The value map is currently not used, all values are 'complex mapped' and
LiveIntervalMap::mapValue is used to dig them out.
This is the first step in a series changes leading to the removal of
LiveIntervalMap. Its data structures can be shared among all the live intervals
created by a split, so it is wasteful to create a copy for each.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126800 91177308-0d34-0410-b5e6-96231b3b80d8
This is a waste of time since we already know how to evict all interferences
which is a better approach anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126798 91177308-0d34-0410-b5e6-96231b3b80d8
Use 8 bits from line number field to keep track of argument ordering while encoding debug info for an argument. That leaves 24 bit for line no, DebugLoc also allocates 24 bit for line numbers. If a function has more than 255 arguments then rest of the arguments will be ordered by llvm.dbg.* intrinsics' ordering in IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126793 91177308-0d34-0410-b5e6-96231b3b80d8
addressing code. On 403.gcc this almost halves CodeGenPrepare time and reduces
total llc time by 9.5%. Unfortunately, getNumUses() is still the hottest function
in llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126782 91177308-0d34-0410-b5e6-96231b3b80d8
This effectively disables the 'turbo' functionality of the greedy register
allocator where all new live ranges created by splitting would be reconsidered
as if they were originals.
There are two reasons for doing this, 1. It guarantees that the algorithm
terminates. Early versions were prone to infinite looping in certain corner
cases. 2. It is a 2x speedup. We can skip a lot of unnecessary interference
checks that won't lead to good splitting anyway.
The problem is that region splitting only gets one shot, so it should probably
be changed to target multiple physical registers at once.
Local live range splitting is still 'turbo' enabled. It only accounts for a
small fraction of compile time, so it is probably not necessary to do anything
about that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126781 91177308-0d34-0410-b5e6-96231b3b80d8
intersection of the LHS and RHS ConstantRanges and return "false" when
the range is empty.
This simplifies some code and catches some extra cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126744 91177308-0d34-0410-b5e6-96231b3b80d8
more work to do here, "icmp ult (urem X, 10), 11" doesn't optimize away yet.
Fixes example 3 from PR9343!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126741 91177308-0d34-0410-b5e6-96231b3b80d8
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126664 91177308-0d34-0410-b5e6-96231b3b80d8