Consider:
%add = add nsw i32 %a, -16777216
%and = and i32 %add, 255
Regardless of whether or not we demand the sign bit of %add, we cannot
replace -16777216 with 2130706432 without also removing 'nsw' from the
instruction.
This fixes PR20377.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216261 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216045 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.
The subtraction cannot be a signed overflow in either case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216037 91177308-0d34-0410-b5e6-96231b3b80d8
While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the
new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215825 91177308-0d34-0410-b5e6-96231b3b80d8
We can combne a mul with a div if one of the operands is a multiple of
the other:
%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
=>
%ret = mul nsw %a, (C1 / C2)
This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.
Consider this small example:
define i32 @f(i32 %a) {
%mul = mul nuw i32 %a, 14
%div = udiv exact i32 %mul, 7
ret i32 %div
}
which gets CodeGen'd to:
imull $14, %edi, %eax
imulq $613566757, %rax, %rcx
shrq $32, %rcx
subl %ecx, %eax
shrl %eax
addl %ecx, %eax
shrl $2, %eax
retq
We can now transform this into:
define i32 @f(i32 %a) {
%shl = shl nuw i32 %a, 1
ret i32 %shl
}
which gets CodeGen'd to:
leal (%rdi,%rdi), %eax
retq
This fixes PR20681.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215815 91177308-0d34-0410-b5e6-96231b3b80d8
What follows bellow is a correctness proof of the transform using CVC3.
$ < t.cvc
A, B : BITVECTOR(32);
QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);
$ cvc3 < t.cvc
Valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215400 91177308-0d34-0410-b5e6-96231b3b80d8
We can only propagate the nsw bits if both subtraction instructions are
marked with the appropriate bit.
N.B. We only propagate the nsw bit in InstCombine because the nuw case
is already handled in InstSimplify.
This fixes PR20189.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214385 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch we had
@a = weak global ...
but
@b = alias weak ...
The patch changes aliases to look more like global variables.
Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.
The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214355 91177308-0d34-0410-b5e6-96231b3b80d8
While we can already transform A | (A ^ B) into A | B, things get bad
once we have (A ^ B) | (A ^ B ^ Cst) because reassociation will morph
this into (A ^ B) | ((A ^ Cst) ^ B). Our existing patterns fail once
this happens.
To fix this, we add a new pattern which looks through the tree of xor
binary operators to see that, in fact, there exists a redundant xor
operation.
What follows bellow is a correctness proof of the transform using CVC3.
$ cat t.cvc
A, B, C : BITVECTOR(64);
QUERY BVXOR(A, B) | BVXOR(BVXOR(B, C), A) = BVXOR(A, B) | C;
QUERY BVXOR(BVXOR(A, C), B) | BVXOR(A, B) = BVXOR(A, B) | C;
QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C;
QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C;
$ cvc3 < t.cvc
Valid.
Valid.
Valid.
Valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214342 91177308-0d34-0410-b5e6-96231b3b80d8
It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch.
Added code for lshr as well as non-exact right shifts.
It implements :
(icmp eq/ne (ashr/lshr const2, A), const1)" ->
(icmp eq/ne A, Log2(const2/const1)) ->
(icmp eq/ne A, Log2(const2) - Log2(const1))
Differential Revision: http://reviews.llvm.org/D4068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213678 91177308-0d34-0410-b5e6-96231b3b80d8
We previously supported the align attribute on all (pointer) parameters, but we
only used it for byval parameters. However, it is completely consistent at the
IR level to treat 'align n' on all pointer parameters as an alignment
assumption on the pointer, and now we wll. Specifically, this causes
computeKnownBits to use the align attribute on all pointer parameters, not just
byval parameters. I've also added an explicit parameter attribute test for this
to test/Bitcode/attributes.ll.
And I've updated the LangRef to document the align parameter attribute (as it
turns out, it was not documented at all previously, although the byval
documentation mentioned that it could be used).
There are (at least) two benefits to doing this:
- It allows enhancing alignment based on the pointer alignment after inlining callees.
- It allows simplification of pointer arithmetic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213670 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets
created as an argument to a GEP partway through an iteration, causing
-instcombine to optimize the GEP before the multiply.
rdar://problem/17615671
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212742 91177308-0d34-0410-b5e6-96231b3b80d8
In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem).
This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed.
Differential Revision: http://reviews.llvm.org/D4424
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212629 91177308-0d34-0410-b5e6-96231b3b80d8
It is not safe to negate the smallest signed integer, doing so yields
the same number back.
This fixes PR20186.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212164 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
With this patch, range metadata can be added to call/invoke including
IntrinsicInst. Previously, it could only be added to load.
Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because
range metadata is not only used by load.
Update the language reference to reflect this change.
Test Plan:
Add several tests in range-2.ll to confirm the verifier is happy with
having range metadata on call/invoke.
Add two tests in AddOverFlow.ll to confirm annotating range metadata to
call/invoke can benefit InstCombine.
Reviewers: meheff, nlewycky, reames, hfinkel, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4187
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211281 91177308-0d34-0410-b5e6-96231b3b80d8
* Find factorization opportunities using identity values.
* Find factorization opportunities by treating shl(X, C) as mul (X, shl(C))
* Keep NSW flag while simplifying instruction using factorization.
This fixes PR19263.
Differential Revision: http://reviews.llvm.org/D3799
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211261 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombineMulDivRem has:
// Canonicalize (X+C1)*CI -> X*CI+C1*CI.
InstCombineAddSub has:
// W*X + Y*Z --> W * (X+Z) iff W == Y
These two transforms could fight with each other if C1*CI would not fold
away to something simpler than a ConstantExpr mul.
The InstCombineMulDivRem transform only acted on ConstantInts until
r199602 when it was changed to operate on all Constants in order to
let it fire on ConstantVectors.
To fix this, make this transform more careful by checking to see if we
actually folded away C1*CI.
This fixes PR20079.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211258 91177308-0d34-0410-b5e6-96231b3b80d8
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211247 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
As a starting step, we only use one simple heuristic: if the sign bits
of both a and b are zero, we can prove "add a, b" do not unsigned
overflow, and thus convert it to "add nuw a, b".
Updated all affected tests and added two new tests (@zero_sign_bit and
@zero_sign_bit2) in AddOverflow.ll
Test Plan: make check-all
Reviewers: eliben, rafael, meheff, chandlerc
Reviewed By: chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D4144
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211084 91177308-0d34-0410-b5e6-96231b3b80d8
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.
Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.
Piggyback a minor refactor in InstCombineCasts.cpp
Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211004 91177308-0d34-0410-b5e6-96231b3b80d8
The messages were
"PR19753: Optimize comparisons with "ashr exact" of a constanst."
"Added support to optimize comparisons with "lshr exact" of a constant."
They were not correctly handling signed/unsigned operation differences,
causing pr19958.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210393 91177308-0d34-0410-b5e6-96231b3b80d8
addrspacecast X addrspace(M)* to Y addrspace(N)*
-->
bitcast X addrspace(M)* to Y addrspace(M)*
addrspacecast Y addrspace(M)* to Y addrspace(N)*
Updat all affected tests and add several new tests in addrspacecast.ll.
This patch is based on http://reviews.llvm.org/D2186 (authored by Matt
Arsenault) with fixes and more tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210375 91177308-0d34-0410-b5e6-96231b3b80d8
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.
It seems natural to allow unnamed_addr in aliases:
* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.
This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210302 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210186 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.
This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like
@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
i32 ptrtoint (i32* @bar to i32)) to i32*)
An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).
Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210062 91177308-0d34-0410-b5e6-96231b3b80d8
The code was actually correct. Sorry for the confusion. I have expanded the
comment saying why the analysis is valid to avoid me misunderstaning it
again in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210052 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r210029.
It was not correctly handling cases where LHS and RHS had multiple but different
sign bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210048 91177308-0d34-0410-b5e6-96231b3b80d8
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing
Differential Revision: http://reviews.llvm.org/D3777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210006 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209843 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r209762, bringing back r209746. It was not responsible for the libc++ build failure
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209776 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r209746.
It looks it is causing a crash while building libcxx. I am trying to get a
reduced testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209762 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209755 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209746 91177308-0d34-0410-b5e6-96231b3b80d8
Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio
Fixed the argument order to select (the mask semantics to blendv* are the
inverse of select) and fixed the tests
Added parenthesis to the assert condition
Ran clang-format
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209667 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Reviewers: grosbach, delena, nadav
Differential Revision: http://reviews.llvm.org/D3859
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209643 91177308-0d34-0410-b5e6-96231b3b80d8
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
rdar://15547484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209049 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209007 91177308-0d34-0410-b5e6-96231b3b80d8
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing
Z3 Verifications code for above transform
http://rise4fun.com/Z3/Pmsh
Differential Revision: http://reviews.llvm.org/D3717
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208848 91177308-0d34-0410-b5e6-96231b3b80d8
In transformation:
BinOp(shuffle(v1,undef), shuffle(v2,undef)) -> shuffle(BinOp(v1, v2),undef)
type of the undef argument must be same as type of BinOp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208531 91177308-0d34-0410-b5e6-96231b3b80d8
Do not apply transformation:
BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2))
if operands v1 and v2 are of different size.
This change fixes PR19717, which was caused by r208488.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208518 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables transformations:
BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2))
BinOp(shuffle(v1), const1) -> shuffle(BinOp, const2)
They allow to eliminate extra shuffles in some cases.
Differential Revision: http://reviews.llvm.org/D3525
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208488 91177308-0d34-0410-b5e6-96231b3b80d8
The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect.
The _256 variants have indexes into the individual 128 bit lanes and in all
cases it also has to mask out unused bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207577 91177308-0d34-0410-b5e6-96231b3b80d8
right intrinsics.
A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207299 91177308-0d34-0410-b5e6-96231b3b80d8
This excludes avx512 as I don't have hardware to verify. It excludes _dq
variants because they are represented in the IR as <{2,4} x i64> when it's
actually a byte shift of the entire i{128,265}.
This also excludes _dq_bs as they aren't at all supported by the backend.
There are also no corresponding instructions in the ISA. I have no idea why
they exist...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207058 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Since the upper 64 bits of the destination register are undefined when
performing this operation, we can substitute it and let the optimizer
figure out that only a copy is needed.
Also added range merging, if an instruction copies a range that can be
merged with a previous copied range.
Added test cases for both optimizations.
Reviewers: grosbach, nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3357
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207055 91177308-0d34-0410-b5e6-96231b3b80d8
With a constant mask a vpermil* is just a shufflevector. This patch implements
that simplification. This allows us to produce denser code. It should also
allow more folding down the line.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206801 91177308-0d34-0410-b5e6-96231b3b80d8
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:
%mul32 = trunc i64 %mul64 to i32
%zext = zext i32 %mul32 to i64
%overflow = icmp ne i64 %mul64, %zext
or
%overflow = icmp ugt i64 %mul64 , 0xffffffff
then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.
Differential Revision: http://llvm-reviews.chandlerc.com/D2814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206137 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r204912, and follow-up commit r204948.
This introduced a performance regression, and the fix is not completely
clear yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205010 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes a miscompile introduced in r204912. It would miscompile code like
(unsigned)(a + -49) <= 5U. The transform would turn this into
(unsigned)a < 55U, which would return true for values in [0, 49], when
it should not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204948 91177308-0d34-0410-b5e6-96231b3b80d8
Transform:
icmp X+Cst2, Cst
into:
icmp X, Cst-Cst2
when Cst-Cst2 does not overflow, and the add has nsw.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204912 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previously the code didn't check if the before and after types for the
store were pointers to different address spaces. This resulted in
instcombine using a bitcast to convert between pointers to different
address spaces, causing an assertion due to the invalid cast.
It is not be appropriate to use addrspacecast this case because it is
not guaranteed to be a no-op cast. Instead bail out and do not do the
transformation.
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3117
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204733 91177308-0d34-0410-b5e6-96231b3b80d8
bitcast between pointers of two different address spaces if they happened to have
the same pointer size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203862 91177308-0d34-0410-b5e6-96231b3b80d8
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.
MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.
For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203705 91177308-0d34-0410-b5e6-96231b3b80d8
optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.
e.g.
extern double pow(double, double);
double t(double x) {
return pow(10, x);
}
Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
ret double %0
}
declare double @llvm.pow.f64(double, double) #1
Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%__exp10 = call double @__exp10(double %x) #1
ret double %__exp10
}
declare double @__exp10(double)
The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.
I can think of 3 options to fix this.
1. Make "C" calling convention just work since the target should know what CC
is being used.
This doesn't work because each function can use different CC with the "pcs"
attribute.
2. Have Clang add the right CC keyword on the calls to LLVM builtin.
This will work but it doesn't match the LLVM IR specification which states
these are "Standard C Library Intrinsics".
3. Fix simplify libcall so the resulting calls to the C routines will have the
proper CC keyword. e.g.
%__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1
This works and is the solution I implemented here.
Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.
1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.
There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
simplify-libcall that's not handled.
But for now, this is the fix that I'm most comfortable with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203488 91177308-0d34-0410-b5e6-96231b3b80d8
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).
This should allow shuffles with different numbers of elements on the
input and output sides as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203229 91177308-0d34-0410-b5e6-96231b3b80d8
are operations that do not access memory but may be sensitive
to floating-point environment changes. LLVM does not attempt
to model FP environment changes, so this was unnecessarily conservative
and was getting on the way of some optimizations, in particular
SLP vectorization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203037 91177308-0d34-0410-b5e6-96231b3b80d8
logical operations on the i1's driving them. This is a bad idea for every
target I can think of (confirmed with micro tests on all of: x86-64, ARM,
AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into
a general purpose register, whereas consuming it directly into a select generally
allows it to exist only transiently in a predicate or flags register.
Chandler ran a set of performance tests with this change, and reported no
measurable change on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201275 91177308-0d34-0410-b5e6-96231b3b80d8
Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
and remove the ToDo comment.
Reviewer: Duncan P.N. Exan Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200736 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.
I added tests for any change that would allow an optimization to fire on
inalloca.
Reviewers: nlewycky
Differential Revision: http://llvm-reviews.chandlerc.com/D2449
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200281 91177308-0d34-0410-b5e6-96231b3b80d8
This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199629 91177308-0d34-0410-b5e6-96231b3b80d8
widespread glibc bugs.
The glibc implementation of exp10 has a very serious precision bug in
version 2.15 (and older versions). This is still very widely used (the
current Ubuntu LTS for example uses it) and so it isn't reasonable to
make transforms that produce these functions. This fixes many
miscompiles introduced when we started transforming pow(10.0, ...) into
exp10, and it may have fixed other latent miscompiles where exp10
provided sufficient precision but exp10f did not.
This is all really horrible. The primary bug has been fixed for over
a year and glibc 2.18 works correctly for the test cases I have, but it
will be 2017 before the LTS using 2.15 is no longer supported by Ubuntu
(and thus reasonable for folks to be relying on). =[ We're either going
to need to live without these optimizations, or find a way to switch
behavior more dynamically than using simply the fact that the OS is
"Linux".
To make matters worse, there appears to be significant testing and
fixing of numerous other bugs in the exp10 family of functions right now
in glibc. While those haven't been causing problems I've seen in the
wild, it gives me concerns that we may need to wait until an even later
release of glibc before we can reliably transform code into exp10.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198093 91177308-0d34-0410-b5e6-96231b3b80d8
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195504 91177308-0d34-0410-b5e6-96231b3b80d8
Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.
The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194943 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:
(fptrunc (sqrt (fpext x))) -> (sqrtf x)
but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.
This change makes InstCombine apply this optimization to llvm.sqrt as well.
This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194935 91177308-0d34-0410-b5e6-96231b3b80d8
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194013 91177308-0d34-0410-b5e6-96231b3b80d8
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.
Patch by Tim Northover
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193943 91177308-0d34-0410-b5e6-96231b3b80d8
Bitcasting everything to i8* won't work. Autoupgrade the old
intrinsic declarations to use the new mangling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192117 91177308-0d34-0410-b5e6-96231b3b80d8
The test's output doesn't change, but this ensures
this is actually hit with a different address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191701 91177308-0d34-0410-b5e6-96231b3b80d8
when it was actually a Constant*.
There are quite a few other casts to Instruction that might have the same problem,
but this is the only one I have a test case for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191668 91177308-0d34-0410-b5e6-96231b3b80d8
Currently foldSelectICmpAndOr asserts if the "or" involves a vector
containing several of the same power of two. We can easily avoid this by
only performing the fold on integer types, like foldSelectICmpAnd does.
Fixes <rdar://problem/15012516>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191552 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.
When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.
Existing testing cases are updated to use the struct-path aware TBAA format.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191538 91177308-0d34-0410-b5e6-96231b3b80d8
The GEP pattern is what SCEV expander emits for "ugly geps". The latter is what
you get for pointer subtraction in C code. The rest of instcombine already
knows how to deal with that so just canonicalize on that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191090 91177308-0d34-0410-b5e6-96231b3b80d8
If "C1/X" were having multiple uses, the only benefit of this
transformation is to potentially shorten critical path. But it is at the
cost of instroducing additional div.
The additional div may or may not incur cost depending on how div is
implemented. If it is implemented using Newton–Raphson iteration, it dosen't
seem to incur any cost (FIXME). However, if the div blocks the entire
pipeline, that sounds to be pretty expensive. Let CodeGen to take care
this transformation.
This patch sees 6% on a benchmark.
rdar://15032743
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191037 91177308-0d34-0410-b5e6-96231b3b80d8
If address space 0 was smaller than the address space
in a constant inttoptr/ptrtoint pair, the wrong mask size
would be used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190899 91177308-0d34-0410-b5e6-96231b3b80d8
To avoid regressions with bitfield optimizations, this slicing should take place
later, like ISel time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190891 91177308-0d34-0410-b5e6-96231b3b80d8
Some of this code is no longer necessary since int<->ptr casts are no
longer occur as of r187444.
This also fixes handling vectors of pointers, and adds a bunch of new
testcases for vectors and address spaces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190885 91177308-0d34-0410-b5e6-96231b3b80d8
other in memory.
The motivation was to get rid of truncate and shift right instructions that get
in the way of paired load or floating point load.
E.g.,
Consider the following example:
struct Complex {
float real;
float imm;
};
When accessing a complex, llvm was generating a 64-bits load and the imm field
was obtained by a trunc(lshr) sequence, resulting in poor code generation, at
least for x86.
The idea is to declare that two load instructions is the canonical form for
loading two arithmetic type, which are next to each other in memory.
Two scalar loads at a constant offset from each other are pretty
easy to detect for the sorts of passes that like to mess with loads.
<rdar://problem/14477220>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190870 91177308-0d34-0410-b5e6-96231b3b80d8
Several architectures use the same instruction to perform both a comparison and
a subtract. The instruction selection framework does not allow to consider
different basic blocks to expose such fusion opportunities.
Therefore, these instructions are “merged” by CSE at MI IR level.
To increase the likelihood of CSE to apply in such situation, we reorder the
operands of the comparison, when they have the same complexity, so that they
matches the order of the most frequent subtract.
E.g.,
icmp A, B
...
sub B, A
<rdar://problem/14514580>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190352 91177308-0d34-0410-b5e6-96231b3b80d8
Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom),
field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram
(Context, Type, ContainingType).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190205 91177308-0d34-0410-b5e6-96231b3b80d8
"(icmp op i8 A, B)" is equivalent to "(icmp op i8 (A & 0xff), B)" as a
degenerate case. Allowing this as a "masked" comparison when analysing "(icmp)
&/| (icmp)" allows us to combine them in more cases.
rdar://problem/7625728
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189931 91177308-0d34-0410-b5e6-96231b3b80d8
Even in cases which aren't universally optimisable like "(A & B) != 0 && (A &
C) != 0", the masks can make one of the comparisons completely redundant. In
this case, since we've gone to the effort of spotting masked comparisons we
should combine them.
rdar://problem/7625728
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189930 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code missed some edge cases when e.g. we're going to emit sqrtf but
only the availability of sqrt was checked. This happens on odd platforms like
windows.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189724 91177308-0d34-0410-b5e6-96231b3b80d8
PR17026. Also avoid undefined shifts and shift amounts larger than 64 bits
(those are always undef because we can't represent integer types that large).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189672 91177308-0d34-0410-b5e6-96231b3b80d8
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.
Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.
Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189282 91177308-0d34-0410-b5e6-96231b3b80d8