Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234567 91177308-0d34-0410-b5e6-96231b3b80d8
The IPToState table must be emitted after we have generated labels for
all functions in the table. Don't rely on the order of the list of
globals. Instead, utilize WinEHFuncInfo to tell us how many catch
handlers we expect to outline. Once we know we've visited all the catch
handlers, emit the cppxdata.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234566 91177308-0d34-0410-b5e6-96231b3b80d8
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Fixes PR23173.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234561 91177308-0d34-0410-b5e6-96231b3b80d8
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234550 91177308-0d34-0410-b5e6-96231b3b80d8
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.
It is used only for asm printing.
This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:
* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234535 91177308-0d34-0410-b5e6-96231b3b80d8
We already do:
concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.
Differential Revision: http://reviews.llvm.org/D8883
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234530 91177308-0d34-0410-b5e6-96231b3b80d8
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.
This fixes rdar://problem/20470788.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234529 91177308-0d34-0410-b5e6-96231b3b80d8
Using this instead of
namespace llvm {
func...
}
Has the advantage that the build fails with a compiler error if it gets out
of sync with the .h file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234515 91177308-0d34-0410-b5e6-96231b3b80d8
Pull the `-preserve-*-use-list-order` flags out of "experimental" mode,
and preserve use-list order by default when serializing to bitcode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234510 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Add classof implementations to the raw_ostream classes."
Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream."
The underlying issue can be fixed without classof.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234495 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, llvm (backend) doesn't know cortex-r4, even though it is the
default target for armv7r. Using "--target=armv7r-arm-none-eabi" provokes
'cortex-r4' is not a recognized processor for this target' by llvm.
This patch adds support for cortex-r4 and, very closely related, r4f.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234486 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Make the code more readable by fusing the for-loops together and explicitly checking for each register class.
Also, this version is more straightforward because it doesn't assume that FPU registers always come before CPU registers in the CalleeSavedInfo vector.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8033
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234475 91177308-0d34-0410-b5e6-96231b3b80d8
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.
This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.
<rdar://problem/17829180>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234462 91177308-0d34-0410-b5e6-96231b3b80d8
If we know we are producing an object, we don't need to wrap the stream
in a formatted_raw_ostream anymore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234461 91177308-0d34-0410-b5e6-96231b3b80d8
The input to compileOptimized is already optimized and internalized, so remove
internalize pass from compileOptimized.
rdar://20227235
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234446 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed insert point for allocas created for demoted values.
Clear the nested landing pad list after it has been processed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234433 91177308-0d34-0410-b5e6-96231b3b80d8
The bug manifests when there are two loads and two stores chained as follows in
a DAG,
(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)
and the stores' values are extracted from the preceding vector loads.
MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.
This commits fixes the bug by replacing the last store in the chain instead.
rdar://problem/20275084
Differential Revision: http://reviews.llvm.org/D8849
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234430 91177308-0d34-0410-b5e6-96231b3b80d8
r234262 changed some code in DIBuilderBindings.cpp to use the unwrap function
to unwrap debug metadata. The problem with this is that unwrap asserts that
its argument is non-null, which is not what we want in a number of places
in DIBuilder where the argument is optional. This change makes certain
arguments optional by adding null checks in places where it is required,
fixing the llgo build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234428 91177308-0d34-0410-b5e6-96231b3b80d8
The code uses a priority queue and a worklist, which share the same
visited set, but the visited set is only updated when inserting into
the priority queue. Instead, switch to using separate visited sets
for the priority queue and worklist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234425 91177308-0d34-0410-b5e6-96231b3b80d8