When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Added a test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213826 91177308-0d34-0410-b5e6-96231b3b80d8
Add `Value::sortUseList()`, templated on the comparison function to use.
The sort is an iterative merge sort that uses a binomial vector of
already-merged lists to limit the size overhead to `O(1)`.
This is part of PR5680.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213824 91177308-0d34-0410-b5e6-96231b3b80d8
This allows people to try clang inside MSBuild with the VS "14" CTP
releases.
Fixes PR20341.
Patch by Marcel Raad!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213819 91177308-0d34-0410-b5e6-96231b3b80d8
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
rdar://17735071
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213815 91177308-0d34-0410-b5e6-96231b3b80d8
It isn't reasonable to test storing things using undef pointers --
storing through those is at best "good luck" and really should be
transformed to "unreachable". Random changes in the combiner can
randomly break these tests for no good reason. I'm following up on the
original commit regarding the right long-term strategy here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213810 91177308-0d34-0410-b5e6-96231b3b80d8
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213808 91177308-0d34-0410-b5e6-96231b3b80d8
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.
To address this, ensure that the map is updated after each argument
promotion.
In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213805 91177308-0d34-0410-b5e6-96231b3b80d8
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213803 91177308-0d34-0410-b5e6-96231b3b80d8
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213800 91177308-0d34-0410-b5e6-96231b3b80d8
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213799 91177308-0d34-0410-b5e6-96231b3b80d8
Constant fold the lanes of the input constant build_vector individually
so we correctly handle when the vector elements are not all the same
constant value.
PR20394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213798 91177308-0d34-0410-b5e6-96231b3b80d8
The cast to NVPTXTargetLowering was missing a 'const', but let's
just access the right pointer through the subtarget anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213793 91177308-0d34-0410-b5e6-96231b3b80d8
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.
Reduced test case from Chad. Thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213788 91177308-0d34-0410-b5e6-96231b3b80d8
linker_private and linker_private_weak were deprecated in 3.5. Remove support
for them now that the 3.5 branch has been created.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213777 91177308-0d34-0410-b5e6-96231b3b80d8
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213773 91177308-0d34-0410-b5e6-96231b3b80d8
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2
However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213758 91177308-0d34-0410-b5e6-96231b3b80d8
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213754 91177308-0d34-0410-b5e6-96231b3b80d8
Although the final shifter operand is a rotate, this actually only matters for
the half-word extends when the amount == 24. Otherwise folding a shift in is
just as good.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213753 91177308-0d34-0410-b5e6-96231b3b80d8
This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN.
This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213752 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213750 91177308-0d34-0410-b5e6-96231b3b80d8
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213748 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213745 91177308-0d34-0410-b5e6-96231b3b80d8
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.
We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213743 91177308-0d34-0410-b5e6-96231b3b80d8
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.
The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213736 91177308-0d34-0410-b5e6-96231b3b80d8
instruction sequences with CHECK-NEXT for these test cases.
This notably exposes how absolutely horrible the generated code is for
several of these test cases, and will make any future updates to the
test as our vector instruction selection gets better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213732 91177308-0d34-0410-b5e6-96231b3b80d8
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213729 91177308-0d34-0410-b5e6-96231b3b80d8
insertions.
The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).
This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.
Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.
Differential Revision: http://reviews.llvm.org/D4616
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213727 91177308-0d34-0410-b5e6-96231b3b80d8
FIXME: "llvm-rtdyld -verify -check" is still sensitive to path separator.
Fix searching StubMap to be tolerant of both '/' and '\\' on Win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213723 91177308-0d34-0410-b5e6-96231b3b80d8