need to be updated for the new vector shuffle lowering.
After talking to Adam Nemet, Tim Northover, etc., it seems that testing
MC encodings in the same suite as the basic codegen isn't the right
approach. Instead, we're going to want dedicated MC tests for the
encodings. These encodings are starting to get in my way so I wanted to
cut them out early. The total set of instructions that should have
encoding tests added is:
vpaddd
vsqrtss
vsqrtsd
vmovlhps
vmovhlps
valignq
vbroadcastss
Not too many parts of these tests were even using this. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218932 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change intended.
Very similar to the change I made for subvector extract in r218480.
test/CodeGen/X86/avx512-insert-extract.ll covers this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218928 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
Very similar to the extract refactoring I did in r218478.
Compared X86.td.expanded before and after.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218927 91177308-0d34-0410-b5e6-96231b3b80d8
Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.
This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218923 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.
For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
store atomic i8 42, i8* %mem unordered, align 1
ret void
}
```
from
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
rlwinm r2, r3, 3, 27, 28
li r4, 42
xori r5, r2, 24
rlwinm r2, r3, 0, 0, 29
li r3, 255
slw r4, r4, r5
slw r3, r3, r5
and r4, r4, r3
LBB4_1: ; =>This Inner Loop Header: Depth=1
lwarx r5, 0, r2
andc r5, r5, r3
or r5, r4, r5
stwcx. r5, 0, r2
bne cr0, LBB4_1
; BB#2:
blr
```
into
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
li r2, 42
stb r2, 0(r3)
blr
```
which looks like a pretty clear win to me.
Test Plan:
fixed the tests + new test for indexed accesses + make check-all
Reviewers: jfb, wschmidt, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218922 91177308-0d34-0410-b5e6-96231b3b80d8
to be a ManagedStatic in r218163 to not be a global variable written and
read to from within the innards of SpillPlacement.
This will fix a really scary race condition for anyone that has two
copies of LLVM running spill placement concurrently. Yikes!
This will also fix a really significant compile time hit that r218163
caused because the spill placement threshold read is actually in the
*very* hot path of this code. The memory fence on each read was showing
up as huge compile time regressions when spilling is responsible for
most of the compile time. For example, optimizing sanitized code showed
over 50% compile time regressions here. =/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218921 91177308-0d34-0410-b5e6-96231b3b80d8
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.
This fixes PR21107.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218920 91177308-0d34-0410-b5e6-96231b3b80d8
This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218915 91177308-0d34-0410-b5e6-96231b3b80d8
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218914 91177308-0d34-0410-b5e6-96231b3b80d8
elements as well as integer elements in order to form simpler shuffle
patterns.
This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.
Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218911 91177308-0d34-0410-b5e6-96231b3b80d8
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.
This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.
Differential Revision: http://reviews.llvm.org/D5584
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218906 91177308-0d34-0410-b5e6-96231b3b80d8
floating point and integer domains.
Merge the AVX2 test into it and add an extra RUN line. Generate clean
FileCheck statements with my script. Remove the now merged AVX2 tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218903 91177308-0d34-0410-b5e6-96231b3b80d8
Every time we were adding or removing an expression when generating a
coverage mapping we were doing a linear search to try and deduplicate
the list. The indices in the list are important, so we can't just
replace it by a DenseMap entirely, but an auxilliary DenseMap for fast
lookup massively improves the performance issues I was seeing here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218892 91177308-0d34-0410-b5e6-96231b3b80d8
When the flag is given, the command prints out the COFF import table.
Currently only the import table directory will be printed.
I'm going to make another patch to print out the imported symbols.
The implementation of import directory entry iterator in
COFFObjectFile.cpp was buggy. This patch fixes that too.
http://reviews.llvm.org/D5569
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218891 91177308-0d34-0410-b5e6-96231b3b80d8
When I was preparing r218879 for commit, I removed an early return
that I decided was just noise. It wasn't. This is r218879 no-crash
edition.
This reverts commit r218881, reapplying r218879.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218887 91177308-0d34-0410-b5e6-96231b3b80d8
The Terms vector here represented a polynomial of of all possible
counters, and is used to simplify expressions when generating coverage
mapping. There are a few problems with this:
1. Keeping the vector as a member is wasteful, since we clear it every
time we use it.
2. Most expressions refer to a subset of the counters, so we end up
iterating over a large number of zeros doing nothing a lot of the
time.
This updates the user of the vector to store the terms locally, and
uses a sort and combine approach so that we only operate on counters
that are actually used in a given expression. For small cases this
makes very little difference, but in cases with a very large number of
counted regions this is a significant performance fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218879 91177308-0d34-0410-b5e6-96231b3b80d8
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it.
Thanks to Jörg Sonnenberger for pointing.
Differential Revision: http://reviews.llvm.org/D5576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218867 91177308-0d34-0410-b5e6-96231b3b80d8
This file isn't really doing anything useful. Many of the tests that
seem to be combined are also repeats from other test files. Many of the
other tests, despite the comment that they should be combined into
a single shuffle... well... aren't combined into a single shuffle.
=/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218862 91177308-0d34-0410-b5e6-96231b3b80d8
least seem *slightly* more interesting test wise, although given how
spotily we actually combine anything, I remain somewhat suspicious.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218861 91177308-0d34-0410-b5e6-96231b3b80d8
checks for all the ISA variants.
If the SSE2 checks here terrify you, good. This is (in large part) the
kind of amazingly bad code that is holding LLVM back when vectorizing on
older ISAs.
At the same time, these tests seem increasingly dubious to me. There are
a very large number of tests and it isn't clear that they are
systematically covering a specific set of functionality. Anyways,
I don't want to reduce testing during the transition, I just want to
consolidate it to where it is easier to manage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218860 91177308-0d34-0410-b5e6-96231b3b80d8
file.
Some of these really don't make sense to test -- we're testing for the
*lack* of combining two shuffles into one, presumably because the two
would generate better shuffles in the end. But if you look at the
generated code shown here, in many cases the generated code is, frankly,
terrible. Or we combine any two generated shuffles back into a single
instruction! I've left a FIXME to revisit these decisions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218859 91177308-0d34-0410-b5e6-96231b3b80d8
and use the new grouped FileCheck patterns to match them.
No interesting changes yet, but this test is now in proper form to have
the other shuffle combining tests merged into it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218857 91177308-0d34-0410-b5e6-96231b3b80d8
The test has to do with DAG combines, and so it doesn't need the new
vector shuffle lowering to be effective. Also, it has a nice in-IR
triple string which we should really be using rather than command line
flags (unless it varies form RUN-line to RUN-line). Finally, I much
prefer letting LLVM synthesize the correct datalayout string from the
triple rather than baking one in here that will just become stale.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218856 91177308-0d34-0410-b5e6-96231b3b80d8
generic DAG combining of shuffles relevant to x86.
My plan is to fold a bunch of the other DAG combining test cases into
this one, while converting them to use the nice new FileCheck assertion
syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218855 91177308-0d34-0410-b5e6-96231b3b80d8
a bare-metal triple and have nice BB labels, etc.
No significant change here, just tidying up to have a consistent set of
OS-agnostic vector functionality here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218854 91177308-0d34-0410-b5e6-96231b3b80d8
in the future to attach useful information about the PBQP graph (e.g. the
associated MachineFunction, pointers to regalloc passes) to the graph itself,
making that information accessible to the solver. This should also allow the
PBQPBuilder interface to be simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218848 91177308-0d34-0410-b5e6-96231b3b80d8
When writing a coverage mapping we iterate through the mapping regions
in order of FileID, but we were then repeatedly searching from the
beginning of the list to count the number of regions with a given
FileID.
It is simpler and more efficient to search forward from the current
iterator to find the number of regions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218842 91177308-0d34-0410-b5e6-96231b3b80d8
matching and lowering 64-bit insertions.
The first problem was that we weren't looking through bitcasts to
discover that we *could* lower as insertions. Once fixed, we in turn
weren't looking through bitcasts to discover that we could fold a load
into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR
node around the inserted element and instead were passing a scalar to
a DAG node that expected a vector. It turns out there are some patterns
that will "lower" this into the correct asm, but the rest of the X86
backend is very unhappy with such antics.
This should fix a few more edge case regressions I've spotted going
through the regression test suite to enable the new vector shuffle
lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218839 91177308-0d34-0410-b5e6-96231b3b80d8
FastISel has a fixed set of virtual functions that are overridden by the
tablegen-generated code for each target. These functions are distinguished by
the kinds of operands, e.g., register + immediate = "ri". The FastISel emitter
has been blindly emitting functions with different combinations of operand
kinds, even for combinations that are completely unused by FastISel, e.g.,
"fastEmit_rrr". Change to filter out functions that will be irrelevant for
FastISel and do not bother generating the code for them. Also add explicit
"override" keywords for the virtual functions that are overridden.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218838 91177308-0d34-0410-b5e6-96231b3b80d8