D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225612 91177308-0d34-0410-b5e6-96231b3b80d8
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation. This occurs when Constants have
the same bit pattern but have different types.
Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.
This fixes PR22188.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225597 91177308-0d34-0410-b5e6-96231b3b80d8
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225596 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for parsing and emitting the SBREL relocation variant for the
ARM target. Handling this relocation variant is necessary for supporting the
full ARM ELF specification. Addresses PR22128.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225595 91177308-0d34-0410-b5e6-96231b3b80d8
This default constructor is a bit weird. It left the range in an invalid
state. That might be reasonable so that you can construct a local
iterator range and assign to it based on some logic to compute the range
you want. If folks would like to support that use case, I can add it
back, but in 238-odd usages none have actually wanted to do this. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225592 91177308-0d34-0410-b5e6-96231b3b80d8
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.
This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.
We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.
Differential Revision: http://reviews.llvm.org/D6879
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225589 91177308-0d34-0410-b5e6-96231b3b80d8
Often, we miss committing new files, and 'arc diff' is supposed to warn
us about this. Unfortunately, because of the spurious output of the
command (due to unignored untracked files), we tend to ignore it and
lose information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225588 91177308-0d34-0410-b5e6-96231b3b80d8
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.
Fixes the initial complaint associated with PR22120.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225584 91177308-0d34-0410-b5e6-96231b3b80d8
The previous commit accidentally missed changes to the test output checking,
resulting in an errant failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225577 91177308-0d34-0410-b5e6-96231b3b80d8
There is a fair number of relocations that are part of the AAELF specification.
Simply merge the tests into a single test file, otherwise, we will end up with
far too many test files to test each relocation type. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225576 91177308-0d34-0410-b5e6-96231b3b80d8
These tests are checking the relocation generation. Use the readobj output as
it is much easier to follow when glancing over the tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225575 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.
Test Plan: Tests updated
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6906
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225573 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the way that the partial unrolling threshold for small loops is used
to compute the unrolling factor as been corrected, a slightly smaller threshold
is preferable. This is expected; other targets may need to re-tune as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225566 91177308-0d34-0410-b5e6-96231b3b80d8
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225565 91177308-0d34-0410-b5e6-96231b3b80d8
The bitcode reading interface used std::error_code to report an error to the
callers and it is the callers job to print diagnostics.
This is not ideal for error handling or diagnostic reporting:
* For error handling, all that the callers care about is 3 possibilities:
* It worked
* The bitcode file is corrupted/invalid.
* The file is not bitcode at all.
* For diagnostic, it is user friendly to include far more information
about the invalid case so the user can find out what is wrong with the
bitcode file. This comes up, for example, when a developer introduces a
bug while extending the format.
The compromise we had was to have a lot of error codes.
With this patch we use the DiagnosticHandler to communicate with the
human and std::error_code to communicate with the caller.
This allows us to have far fewer error codes and adds the infrastructure to
print better diagnostics. This is so because the diagnostics are printed when
he issue is found. The code that detected the problem in alive in the stack and
can pass down as much context as needed. As an example the patch updates
test/Bitcode/invalid.ll.
Using a DiagnosticHandler also moves the fatal/non-fatal error decision to the
caller. A simple one like llvm-dis can just use fatal errors. The gold plugin
needs a bit more complex treatment because of being passed non-bitcode files. An
hypothetical interactive tool would make all bitcode errors non-fatal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225562 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
One more attempt to fix UBSan reports: make sure DenseMapInfo::getEmptyKey()
and DenseMapInfo::getTombstoneKey() doesn't do any upcasts/downcasts to/from Value*.
Test Plan: check-llvm test suite with/without UBSan bootstrap
Reviewers: chandlerc, dexonsmith
Subscribers: llvm-commits, majnemer
Differential Revision: http://reviews.llvm.org/D6903
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225558 91177308-0d34-0410-b5e6-96231b3b80d8
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225552 91177308-0d34-0410-b5e6-96231b3b80d8
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.
Differential Revision: http://reviews.llvm.org/D6878
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225551 91177308-0d34-0410-b5e6-96231b3b80d8
doing Load PRE"
It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.
Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225536 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r225498 (but leaves r225499, which was a worthy
cleanup).
My plan was to change `DEBUG_LOC` to store the `MDNode` directly rather
than its operands (patch was to go out this morning), but on reflection
it's not clear that it's strictly better. (I had missed that the
current code is unlikely to emit the `MDNode` at all.)
Conflicts:
lib/Bitcode/Reader/BitcodeReader.cpp (due to r225499)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225531 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Mips Linux uses $gp to hold a pointer to thread info structure and accesses it
with a named register. This makes this work for LLVM.
The N32 ABI doesn't quite work yet since the frontend generates incorrect IR
for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before
converting to a pointer. Given correct IR (as in the testcase in this patch),
it works correctly.
Reviewers: sstankovic, vmedic, atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6893
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225529 91177308-0d34-0410-b5e6-96231b3b80d8
The P7 benefits from not have really-small loops so that we either have
multiple dispatch groups in the loop and/or the ability to form more-full
dispatch groups during scheduling. Setting the partial unrolling threshold to
44 seems good, empirically, for the P7. Compared to using no late partial
unrolling, this yields the following test-suite speedups:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding
-66.3253% +/- 24.1975%
SingleSource/Benchmarks/Misc-C++/oopack_v1p8
-44.0169% +/- 29.4881%
SingleSource/Benchmarks/Misc/pi
-27.8351% +/- 12.2712%
SingleSource/Benchmarks/Stanford/Bubblesort
-30.9898% +/- 22.4647%
I've speculatively added a similar setting for the P8. Also, I've noticed that
the unroller does not quite calculate the unrolling factor correctly for really
tiny loops because it neglects to account for the fact that not every loop body
replicant contains an ending branch and counter increment. I'll fix that later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225522 91177308-0d34-0410-b5e6-96231b3b80d8
Add an additional test case to ensure that we generate the relocation even if
the thumb target is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225509 91177308-0d34-0410-b5e6-96231b3b80d8
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225503 91177308-0d34-0410-b5e6-96231b3b80d8