Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Differential Revision: http://reviews.llvm.org/D7939
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231380 91177308-0d34-0410-b5e6-96231b3b80d8
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231371 91177308-0d34-0410-b5e6-96231b3b80d8
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231366 91177308-0d34-0410-b5e6-96231b3b80d8
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231332 91177308-0d34-0410-b5e6-96231b3b80d8
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231307 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231270 91177308-0d34-0410-b5e6-96231b3b80d8
The target-independent selection algorithm in FastISel already knows how
to select a SINT_TO_FP if the target is SSE but not AVX.
On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions
for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr
(for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64
conversion).
This patch simplifies the logic in method X86SelectSIToFP knowing that
the code would not be reachable if the subtarget doesn't have AVX.
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231243 91177308-0d34-0410-b5e6-96231b3b80d8
a flag for now.
First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.
The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:
if (a1()) { if (b1()) c1(); else d1(); f1(); }
if (a2()) { if (b2()) c2(); else d2(); f2(); }
done();
Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:
a1, a2, done;
And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:
a1, a2, done; b1, c1;
At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.
... wait for it ...
Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!
a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.
It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.
Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231238 91177308-0d34-0410-b5e6-96231b3b80d8
In a CFG with the edges A->B->C and A->C, B is an optional branch.
LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.
Review: http://reviews.llvm.org/D7719
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231230 91177308-0d34-0410-b5e6-96231b3b80d8
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
accesses is produced, by making use of a new pseudo instruction. The patch also removes
two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231227 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector.
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.
Differential Revision: http://reviews.llvm.org/D8040
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231219 91177308-0d34-0410-b5e6-96231b3b80d8
test - we only care that there are two moves in the loop and not
which part is relative to which register anyhow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231191 91177308-0d34-0410-b5e6-96231b3b80d8
The intrinsic is no longer generated by the front-end. Remove the intrinsic and
auto-upgrade it to a vector shuffle.
Reviewed by Nadav
This is related to rdar://problem/18742778.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231182 91177308-0d34-0410-b5e6-96231b3b80d8
Ultimately, we'll need to leave something behind to indicate which
alloca will hold the exception, but we can figure that out when it comes
time to emit the __CxxFrameHandler3 catch handler table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231164 91177308-0d34-0410-b5e6-96231b3b80d8
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
total = total + i / 2;
}
}
This is the kernel loop:
.LBB0_2: # %for.body
=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".
The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
Predecessors according to CFG: BB#1 BB#2
%vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
%vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
%vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
%vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
%vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
%vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
%vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
%vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
%vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.
So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.
Patch by Wei Mi.
http://reviews.llvm.org/D7806
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231148 91177308-0d34-0410-b5e6-96231b3b80d8
Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a
table, and it will take responsibility for copying the exception object
into that slot. Modelling the exception object as an SSA value returned
by begincatch isn't going to work in general, so make it use an output
parameter.
Reviewers: andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D7920
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231086 91177308-0d34-0410-b5e6-96231b3b80d8
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231082 91177308-0d34-0410-b5e6-96231b3b80d8
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.
This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
* Assertion in:
CodeGen/Hexagon/hwloop-cleanup.ll
* Worse code in:
CodeGen/X86/coalescer-commute4.ll
CodeGen/X86/phys_subreg_coalesce-2.ll
CodeGen/X86/zlib-longest-match.ll
The root cause here seems to be that the heuristic that determines
the visitation order in RegisterCoalescer gets less lucky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231064 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!
Follow-up to r231045.
rdar://19767934
Closes http://reviews.llvm.org/D8019
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231046 91177308-0d34-0410-b5e6-96231b3b80d8
Fold and/or of setcc's to double CMOV:
(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)
When we can't use the CMOV instruction, it might increase branch
mispredicts. When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.
These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).
rdar://19767934
http://reviews.llvm.org/D7634
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231045 91177308-0d34-0410-b5e6-96231b3b80d8
In the future, we should run the output of clang through instnamer to
make it easier to manually edit test cases.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231037 91177308-0d34-0410-b5e6-96231b3b80d8
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 without the assertion in DebugLocEntry::finalize()
because not all Machine registers can be lowered into DWARF register
numbers and floating point constants cannot be expressed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231023 91177308-0d34-0410-b5e6-96231b3b80d8
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize()
that allows for empty DWARF expressions for constant FP values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230975 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When the RHS of a conditional move node is zero, we can utilize the $zero
register by inverting the conditional move instruction and by swapping the
order of its True/False operands.
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D7945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230956 91177308-0d34-0410-b5e6-96231b3b80d8
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230930 91177308-0d34-0410-b5e6-96231b3b80d8
There are two types of files in the old (current) debug info schema.
!0 = !{!"some/filename", !"/path/to/dir"}
!1 = !{!"0x29", !0} ; [ DW_TAG_file_type ]
!1 has a wrapper class called `DIFile` which inherits from `DIScope` and
is referenced in 'scope' fields.
!0 is called a "file node", and debug info nodes with a 'file' field
point at one of these directly -- although they're built in `DIBuilder`
by sending in a `DIFile` and reaching into it.
In the new hierarchy, I unified these nodes as `MDFile` (which `DIFile`
is a lightweight wrapper for) in r230057. Moving the new hierarchy into
place (and upgrading testcases) caused CodeGen/X86/unknown-location.ll
to start failing -- apparently "0x29" was previously showing up in the
linetable as a filename, causing:
.loc 2 4 3
(where 2 points at filename "0x29") instead of:
.loc 1 4 3
(where 1 points at the actual filename).
Change the testcase to use the old schema correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230880 91177308-0d34-0410-b5e6-96231b3b80d8
Straightforward patch to emit an alignment directive when emitting a
TOC entry. The test case was generated from the test in PR22711 that
demonstrated a misaligned .toc section. The object code is run
through llvm-readobj to verify that the correct alignment has been
applied to the .toc section.
Thanks to Ulrich Weigand for running down where the fix was needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230801 91177308-0d34-0410-b5e6-96231b3b80d8
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230794 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Until now, we did this (among other things) based on whether or not the
target was Windows. This is clearly wrong, not just for Win64 ABI functions
on non-Windows, but for System V ABI functions on Windows, too. In this
change, we make this decision based on the ABI the calling convention
specifies instead.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230793 91177308-0d34-0410-b5e6-96231b3b80d8
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.
For QPX, we use vector loads are stores, but only for aligned memory accesses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230788 91177308-0d34-0410-b5e6-96231b3b80d8
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230786 91177308-0d34-0410-b5e6-96231b3b80d8
This work is currently being rethought along different lines and
if this work is needed it can be resurrected out of svn. Remove it
for now as no current work in ongoing on it and it's unused. Verified
with the authors before removal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230780 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.
This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.
Reviewers: resistor, echristo
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D7941
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230775 91177308-0d34-0410-b5e6-96231b3b80d8
This removes a bit of duplicated code and more importantly, remembers the
labels so that they don't need to be looked up by name.
This in turn allows for any name to be used and avoids a crash if the name
we wanted was already taken.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230772 91177308-0d34-0410-b5e6-96231b3b80d8
vectors. This lets us fix the rest of the v16 lowering problems when
pshufb is clearly better.
We might still be able to improve some of the lowerings by enabling the
other combine-based rewriting to fire for non-128-bit vectors, but this
at least should remove any regressions from using the fancy v16i16
lowering strategy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230753 91177308-0d34-0410-b5e6-96231b3b80d8
repeated 128-bit lane shuffles of wider vector types and use it to lower
256-bit v16i16 vector shuffles where applicable.
This should let us perfectly lowering the pattern of pshuflw and pshufhw
even for AVX2 256-bit patterns.
I've not added AVX-512 support, but it should be trivial for someone
working on that to wire up.
Note that currently this generates bad, long shuffle chains because we
don't combine 256-bit target shuffles. The subsequent patches will fix
that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230751 91177308-0d34-0410-b5e6-96231b3b80d8
by mirroring v8i16 test cases across both 128-bit lanes. This should
highlight problems where we aren't correctly using 128-bit shuffles to
implement things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230750 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We identify the cases where the operand to an ADDE node is a constant
zero. In such cases, we can avoid generating an extra ADDu instruction
disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r).
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7906
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230742 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change causes us to actually save non-volatile registers in a Win64
ABI function that calls a System V ABI function, and vice-versa.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230714 91177308-0d34-0410-b5e6-96231b3b80d8
blend as legal.
We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230695 91177308-0d34-0410-b5e6-96231b3b80d8
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.
Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.
Differential Revision: http://reviews.llvm.org/D7181
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230659 91177308-0d34-0410-b5e6-96231b3b80d8
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230651 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out that after the past MMX commits, we don't need to rely on this
flag to get better codegen for MMX. Also update the tests to become
triple neutral.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230637 91177308-0d34-0410-b5e6-96231b3b80d8
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.
Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.
Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.
This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230553 91177308-0d34-0410-b5e6-96231b3b80d8
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.
This means that given:
mov %rbp, %rsp
pop %rbp
The mov isn't the epilogue, only the pop is. This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function. If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230543 91177308-0d34-0410-b5e6-96231b3b80d8
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
Differential Revision: http://reviews.llvm.org/D7778
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230533 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230499 91177308-0d34-0410-b5e6-96231b3b80d8
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:
* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.
Patch by John Brawn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230496 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).
I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).
The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230413 91177308-0d34-0410-b5e6-96231b3b80d8
This patch unifies the comdat and non-comdat code paths. By doing this
it add missing features to the comdat side and removes the fixed
section assumptions from the non-comdat side.
In ELF there is no one true section for "4 byte mergeable" constants.
We are better off computing the required properties of the section
and asking the context for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230411 91177308-0d34-0410-b5e6-96231b3b80d8
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Mon Feb 23 23:04:28 2015 +0000
Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
and
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Sun Feb 22 18:17:28 2015 +0000
[DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
as the root cause of PR22678 which is causing an assertion inside the DAG combiner.
I'll follow up to the main thread as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230358 91177308-0d34-0410-b5e6-96231b3b80d8
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).
Differential Revision: http://reviews.llvm.org/D6940
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230355 91177308-0d34-0410-b5e6-96231b3b80d8
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230348 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Begin to add various address modes; including alloca.
Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: echristo, rfuhler, llvm-commits
Differential Revision: http://reviews.llvm.org/D6426
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230300 91177308-0d34-0410-b5e6-96231b3b80d8
We can only use 'add' in epilogues, 'lea' is not permitted unless we've
established a frame pointer in the prologue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230286 91177308-0d34-0410-b5e6-96231b3b80d8
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.
NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279. This commit
message is the correct one.
Differential Revision: http://reviews.llvm.org/D7778
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230280 91177308-0d34-0410-b5e6-96231b3b80d8
230275 got committed with an incorrect commit message due to a mixup
on my side. Will re-land in a few moments with the correct commit
message.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230279 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.
Added test CodeGen/X86/fastmath-float-half-conversion.ll
Differential Revision: http://reviews.llvm.org/D7832
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230276 91177308-0d34-0410-b5e6-96231b3b80d8
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
Differential Revision: http://reviews.llvm.org/D7808
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230275 91177308-0d34-0410-b5e6-96231b3b80d8
Prologue emission, in some cases, requires calls to a stack probe helper
function. The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.
Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230270 91177308-0d34-0410-b5e6-96231b3b80d8
'mov' and 'lea' are equivalent when the displacement applied with 'lea'
is zero. However, 'mov' should encode smaller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230269 91177308-0d34-0410-b5e6-96231b3b80d8
This test failed in several buildbots, a bit unclear how that happen
since this was the previous behavior before r230248.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230258 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
-mno-odd-spreg prohibits the use of odd-numbered single-precision floating
point registers. However, vector insert/extract was still using them when
manipulating the subregisters of an MSA register. Fixed this by ensuring
that insertion/extraction is only performed on even-numbered vector
registers when -mno-odd-spreg is given.
Reviewers: vmedic, sstankovic
Reviewed By: sstankovic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7672
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230235 91177308-0d34-0410-b5e6-96231b3b80d8
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230226 91177308-0d34-0410-b5e6-96231b3b80d8
Add tests to cover the RR form of the pslli, psrli and psrai intrinsics.
In the next commit, the loads are going to be folded and the
instructions use the RM form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230224 91177308-0d34-0410-b5e6-96231b3b80d8
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230177 91177308-0d34-0410-b5e6-96231b3b80d8
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.
This fixes PR22572.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230113 91177308-0d34-0410-b5e6-96231b3b80d8
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230072 91177308-0d34-0410-b5e6-96231b3b80d8
This enables a few useful combines that used to only
use fma.
Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230071 91177308-0d34-0410-b5e6-96231b3b80d8
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230053 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.
Differential Revision: http://reviews.llvm.org/D7673
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230043 91177308-0d34-0410-b5e6-96231b3b80d8
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.
I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.
Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229966 91177308-0d34-0410-b5e6-96231b3b80d8
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]
Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.
In case you aren't using a diff algorithm that can handle this:
X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229964 91177308-0d34-0410-b5e6-96231b3b80d8
is going well, remove the flag and the code for the old legality tests.
This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229963 91177308-0d34-0410-b5e6-96231b3b80d8
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.
My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229958 91177308-0d34-0410-b5e6-96231b3b80d8
one test case that is only partially tested in 32-bits into two test
cases so that the script doesn't generate massive spews of tests for the
cases we don't care about.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229955 91177308-0d34-0410-b5e6-96231b3b80d8
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229952 91177308-0d34-0410-b5e6-96231b3b80d8
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.
Fixes PR20300.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7744
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229944 91177308-0d34-0410-b5e6-96231b3b80d8
The instructions were being generated on architectures that don't support avx512.
This reverts commit r229837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229942 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.
The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.
However, for ARMISD nodes, we just used the MMO alignment, no matter
what. In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.
And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).
To fix this, we can just mirror the hack done for ISD nodes: only
take into account the MMO alignment when the access is overaligned.
Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
rdar://19717869, rdar://14062261.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229932 91177308-0d34-0410-b5e6-96231b3b80d8
X86 load folding is fragile; eg, the tests here
don't work without AVX even though they should. This
is because we have a mix of tablegen patterns that have
been added over time, and we have a load folding table
used by the peephole optimizer that has to be kept in
sync with the ever-changing ISA and tablegen defs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229870 91177308-0d34-0410-b5e6-96231b3b80d8
systematic lowering of v8i16.
This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.
In some cases, this is yet another significant improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229859 91177308-0d34-0410-b5e6-96231b3b80d8
addition to lowering to trees rooted in an unpack.
This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229856 91177308-0d34-0410-b5e6-96231b3b80d8
terribly complex partial blend logic.
This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.
This does expose one (minor) weakness of the unpack lowering that I'll
try to address.
In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229853 91177308-0d34-0410-b5e6-96231b3b80d8
needed, and significantly improve the SSSE3 path.
This makes the new strategy much more clear. If we can blend, we just go
with that. If we can't blend, we try to permute into an unpack so
that we handle cases where the unpack doing the blend also simplifies
the shuffle. If that fails and we've got SSSE3, we now call into
factored-out pshufb lowering code so that we leverage the fact that
pshufb can set up a blend for us while shuffling. This generates great
code, especially because we *know* we don't have a fast blend at this
point. Finally, we fall back on decomposing into permutes and blends
because we do at least have a bit-math-based blend if we need to use
that.
This pretty significantly improves some of the v8i16 code paths. We
never need to form pshufb for the single-input shuffles because we have
effective target-specific combines to form it there, but we were missing
its effectiveness in the blends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229851 91177308-0d34-0410-b5e6-96231b3b80d8
them into permutes and a blend with the generic decomposition logic.
This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.
With this we're down to only really random permutations that cause an
explosion of instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8
v16i8 shuffles, and replace it with new facilities.
This uses precise patterns to match exact unpacks, and the new
generalized unpack lowering only when we detect a case where we will
have to shuffle both inputs anyways and they terminate in exactly
a blend.
This fixes all of the blend horrors that I uncovered by always lowering
blends through the vector shuffle lowering. It also removes *sooooo*
much of the crazy instruction sequences required for v16i8 lowering
previously. Much cleaner now.
The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when
it would be marginally nicer to use pshufb+pshufb+por. However, the
difference there is *tiny*. In many cases its a win because we re-use
the pshufb mask. In others, we get to avoid the pshufb entirely. I've
left a FIXME, but I'm dubious we can really do better than this. I'm
actually pretty happy with this lowering now.
For SSE2 this exposes some horrors that were really already there. Those
will have to fixed by changing a different path through the v16i8
lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229846 91177308-0d34-0410-b5e6-96231b3b80d8
lowering paths. I'm going to be leveraging this to simplify a lot of the
overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes.
Sadly, this isn't profitable on v4i32 and v2i64. There, the float and
double blending instructions for pre-SSE4.1 are actually pretty good,
and we can't beat them with bit math. And once SSE4.1 comes around we
have direct blending support and this ceases to be relevant.
Also, some of the test cases look odd because the domain fixer
canonicalizes these to floating point domain. That's OK, it'll use the
integer domain when it matters and some day I may be able to update
enough of LLVM to canonicalize the other way.
This restores almost all of the regressions from teaching x86's vselect
lowering to always use vector shuffle lowering for blends. The remaining
problems are because the v16 lowering path is still doing crazy things.
I'll be re-arranging that strategy in more detail in subsequent commits
to finish recovering the performance here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229836 91177308-0d34-0410-b5e6-96231b3b80d8
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.
However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.
Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).
This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!
There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8
This tests the simple resume instruction elimination logic that we have
before making some changes to it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229768 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
These ISA's didn't add any instructions so they are almost identical to
Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA
revision in .MIPS.abiflags is 3 or 5 respectively instead of 2.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: tomatabacu, llvm-commits, atanasyan
Differential Revision: http://reviews.llvm.org/D7381
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229695 91177308-0d34-0410-b5e6-96231b3b80d8
Add some of the missing M and R class Cortex CPUs, namely:
Cortex-M0+ (called Cortex-M0plus for GCC compatibility)
Cortex-M1
SC000
SC300
Cortex-R5
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229660 91177308-0d34-0410-b5e6-96231b3b80d8
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.
Patch by Gil Rapaport <gil.rapaport@intel.com>
Differential Revision: http://reviews.llvm.org/D6949
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229659 91177308-0d34-0410-b5e6-96231b3b80d8
code.
While this didn't have the miscompile (it used MatchLeft consistently)
it missed some cases where it could use right shifts. I've added a test
case Craig Topper came up with to exercise the right shift matching.
This code is really identical between the two. I'm going to merge them
next so that we don't keep two copies of all of this logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229655 91177308-0d34-0410-b5e6-96231b3b80d8
The current SystemZ back-end only supports the local-exec TLS access model.
This patch adds all required CodeGen support for the other TLS models, which
means in particular:
- Expand initial-exec TLS accesses by loading TLS offsets from the GOT
using @indntpoff relocations.
- Expand general-dynamic and local-dynamic accesses by generating the
appropriate calls to __tls_get_offset. Note that this routine has
a non-standard ABI and requires loading the GOT pointer into %r12,
so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node.
- Add a new platform-specific optimization pass to remove redundant
__tls_get_offset calls in the local-dynamic model (modeled after
the corresponding X86 pass).
- Add test cases verifying all access models and optimizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229654 91177308-0d34-0410-b5e6-96231b3b80d8
track state.
I didn't like this in the code review because the pattern tends to be
error prone, but I didn't see a clear way to rewrite it. Turns out that
there were bugs here, I found them when fuzz testing our shuffle
lowering for correctness on x86.
The core of the problem is that we need to consistently test all our
preconditions for the same directionality of shift and the same input
vector. Instead, formulate this as two predicates (one doesn't depend on
the input in any way), pass things like the directionality and input
vector as inputs, and loop over the alternatives.
This fixes a pattern of very rare miscompiles coming out of this code.
Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz
testing. The new code is over half a million test runs with no failures
yet. I've also fuzzed every other function in the lowering code with
over 3.5 million test cases and not discovered any other miscompiles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229642 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to
float conversion, and how to select a (V)CVTSI2SDrr for an integer to double
conversion.
Added test 'fast-isel-int-float-conversion.ll'.
Differential Revision: http://reviews.llvm.org/D7698
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229589 91177308-0d34-0410-b5e6-96231b3b80d8
The problem in the original patch was not switching back to .text after printing
an eh table.
Original message:
On ELF, put PIC jump tables in a non executable section.
Fixes PR22558.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229586 91177308-0d34-0410-b5e6-96231b3b80d8
If an EH table is printed in between the function and the jump table we would
fail to switch back to the text section to print the jump table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229580 91177308-0d34-0410-b5e6-96231b3b80d8
We were trying to fold into implicit uses, which led to out of bounds
access of the MCInstrDesc::OpInfo arrray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229533 91177308-0d34-0410-b5e6-96231b3b80d8
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors.
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.
Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.
Differential Revision: http://reviews.llvm.org/D7474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229531 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-on patch to:
http://reviews.llvm.org/D7093
That patch canonicalized constant splats as build_vectors,
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.
This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283
The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...
This improves an existing x86 AVX test and changes codegen in an ARM test.
Differential Revision: http://reviews.llvm.org/D7389
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229511 91177308-0d34-0410-b5e6-96231b3b80d8
Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229489 91177308-0d34-0410-b5e6-96231b3b80d8
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229482 91177308-0d34-0410-b5e6-96231b3b80d8
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.
Differential Revision: http://reviews.llvm.org/D7666
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229480 91177308-0d34-0410-b5e6-96231b3b80d8
This required changing how the computation of the ABI is handled
and how some of the checks for ABI/target are done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229471 91177308-0d34-0410-b5e6-96231b3b80d8
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.
I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229463 91177308-0d34-0410-b5e6-96231b3b80d8
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229460 91177308-0d34-0410-b5e6-96231b3b80d8
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).
Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229457 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.
Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.
Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229413 91177308-0d34-0410-b5e6-96231b3b80d8
to generically lower blends and is particularly nice because it is
available frome SSE2 onward. This removes a lot of the remaining domain
crossing blends in SSE2 code.
I'm hoping to replace some of the "interleaved" lowering hacks with
something closer to this which should be more principled. First, this
needs to learn how to detect and use other interleavings besides that of
the natural type provided. That will be a follow-up patch though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229378 91177308-0d34-0410-b5e6-96231b3b80d8