Example:
define i1 @foo(i32 %a) {
%shr = ashr i32 -9, %a
%cmp = icmp ne i32 %shr, -5
ret i1 %cmp
}
Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.
This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217950 91177308-0d34-0410-b5e6-96231b3b80d8
First step done in this commit is to get flush out enough of the
SymbolizerGetOpInfo() routine to symbolic an X86_64 hello world .o and
its loading of the literal string and call to printf. Also the code to
symbolicate the X86_64_RELOC_SUBTRACTOR relocation and a test is also
added to show a slightly more complicated case.
Next will be to flush out enough of SymbolizerSymbolLookUp() to get the
literal string “Hello world” printed as a comment on the instruction that load
the pointer to it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217893 91177308-0d34-0410-b5e6-96231b3b80d8
By class-instance values I mean 'Class<Arg>' in 'Class<Arg>.Field' or in
'Other<Class<Arg>>' (syntactically s SimpleValue). This is to differentiate
from unnamed/anonymous record definitions (syntactically an ObjectBody) which
are not affected by this change.
Consider the testcase:
class Struct<int i> {
int I = !shl(i, 1);
int J = !shl(I, 1);
}
class Class<Struct s> {
int Class_J = s.J;
}
multiclass MultiClass<int i> {
def Def : Class<Struct<i>>;
}
defm Defm : MultiClass<2>;
Before this fix, DefmDef.Class_J yields !shl(I, 1) instead of 8.
This is the sequence of events. We start with this:
multiclass MultiClass<int i> {
def Def : Class<Struct<i>>;
}
During ParseDef the anonymous object for the class-instance value is created:
multiclass Multiclass<int i> {
def anonymous_0 : Struct<i>;
def Def : Class<NAME#anonymous_0>;
}
Then class Struct<i> is added to anonymous_0. Also Class<NAME#anonymous_0> is
added to Def:
multiclass Multiclass<int i> {
def anonymous_0 {
int I = !shl(i, 1);
int J = !shl(I, 1);
}
def Def {
int Class_J = NAME#anonymous_0.J;
}
}
So far so good but then we move on to instantiating this in the defm
by substituting the template arg 'i'.
This is how the anonymous prototype looks after fully instantiating.
defm Defm = {
def Defmanonymous_0 {
int I = 4;
int J = !shl(I, 1);
}
Note that we only resolved the reference to the template arg. The
non-template-arg reference in 'J' has not been resolved yet.
Then we go on to instantiating the Def prototype:
def DefmDef {
int Class_J = NAME#anonymous_0.J;
}
Which is resolved to Defmanonymous_0.J and then to !shl(I, 1).
When we fully resolve each record in a defm, Defmanonymous_0.J does get set
to 8 but that's too late for its use.
The patch adds a new attribute to the Record class that indicates that this
def is actually a class-instance value that may be *used* by other defs in a
multiclass. (This is unlike regular defs which don't reference each other and
thus can be resolved indepedently.) They are then fully resolved before the
other defs while the multiclass is instantiated.
I added vg_leak to the new test. I am not sure if this is necessary but I
don't think I have a way to test it. I can also check in without the XFAIL
and let the bots test this part.
Also tested that X86.td.expanded and AAarch64.td.expanded were unchange before
and after this change. (This issue triggering this problem is a WIP patch.)
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217886 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Changed error messages to be more informative and to resemble other clang/llvm error messages (first letter is lower case, no ending punctuation) and updated corresponding tests.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217873 91177308-0d34-0410-b5e6-96231b3b80d8
The default implementation of getCmpSelInstrCost, which provides the cost of
icmp/fcmp/select instructions, did not deal sensibly with illegal vector types
that were scalarized. We'd ask for the legalization cost of the vector type,
which would return something like (4, f64) given an input of <4 x double>, and
we'd then check the TLI status of the ISD opcode on that scalar type. This would
result in querying (ISD::VSELECT, f64), for example. Amusingly enough,
ISD::VSELECT on scalar types is marked as Legal by default (as with most other
operations), and most backends never change this because VSELECT is never
generated on scalars. However, seeing the resulting operation as Legal, we'd
neglect to add the scalarization cost before returning. The result is that we'd
grossly under-estimate the cost of cmps/selects on illegal vector types.
Now, if type legalization clearly results in scalarization, we skip the early
return and add the scalarization cost.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217859 91177308-0d34-0410-b5e6-96231b3b80d8
Teach yaml2obj how to make a bigobj COFF file. Like the rest of LLVM,
we automatically decide whether or not to use regular COFF or bigobj
COFF on the fly depending on how many sections the resulting object
would have.
This ends the task of adding bigobj support to LLVM.
N.B. This was tested by forcing yaml2obj to be used in bigobj mode
regardless of the number of sections. While a dedicated test was
written, the smallest I could make it was 36 MB (!) of yaml and it still
took a significant amount of time to execute on a powerful machine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217858 91177308-0d34-0410-b5e6-96231b3b80d8
This finishes the ability of llvm-objdump to print out all information from
the LC_DYLD_INFO load command.
The -bind option prints out symbolic references that dyld must resolve
immediately.
The -lazy-bind option prints out symbolc reference that are lazily resolved on
first use.
The -weak-bind option prints out information about symbols which dyld must
try to coalesce across images.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217853 91177308-0d34-0410-b5e6-96231b3b80d8
that we don't use VSELECT and directly emit an addsub synthetic node.
Also remove a stale comment referencing VSELECT.
The test case is updated to use 'core2' which only has SSE3, not SSE4.1,
and it still passes. Previously it would not because we lacked
sufficient blend support to legalize the VSELECT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8
ADDSUBPD nodes out of blends of adds and subs.
This allows us to actually form these instructions with SSE3 rather than
only forming them when we had both SSE3 for the ADDSUB instructions and
SSE4.1 for the blend instructions. ;] Kind-of important.
I've adjusted the CPU requirements on one of the tests to demonstrate
this kicking in nicely for an SSE3 cpu configuration.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217848 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the missing test case for the previous commit:
Allow handling of vectors during return lowering for little endian machines.
Sorry for the noise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217847 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the debug output of the llvm-cov tool to consistently
write to stderr, and moves the highlighting output closer to where
it's relevant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217838 91177308-0d34-0410-b5e6-96231b3b80d8
In r217746, though it was supposed to be NFC, I broke llvm-cov's
handling of showing regions without showing counts. This should've
shown up in the existing tests, except they were checking debug output
that was displayed regardless of what was actually output. I've moved
the relevant debug output to a more appropriate place so that the
tests catch this kind of thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217835 91177308-0d34-0410-b5e6-96231b3b80d8
This lowers frem to a runtime libcall inside fast-isel.
The test case also checks the CallLoweringInfo bug that was exposed by this
change.
This fixes rdar://problem/18342783.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217833 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for the last two missing fcmp condition codes: UEQ and ONE.
This fixes rdar://problem/18341575.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217823 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Expand list of supported targets for Mips to include mips32 r1.
Previously it only include r2. More patches are coming where there is
a difference but in the current patches as pushed upstream, r1 and r2
are equivalent.
Test Plan:
simplestorefp1.ll
add new build bots at mips to test this flavor at both -O0 and -O2
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5306
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217821 91177308-0d34-0410-b5e6-96231b3b80d8
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.
When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).
Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217801 91177308-0d34-0410-b5e6-96231b3b80d8
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!
Thanks Joerg Sonnenberger for the test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217800 91177308-0d34-0410-b5e6-96231b3b80d8
Peephole optimization was folding MOVSDrm, which is a zero-extending double
precision floating point load, into ADDPDrr, which is a SIMD add of two packed
double precision floating point values.
(before)
%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21
(after)
%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20
X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this
from happening. However the check wasn't being conducted for loads from stack
objects. This commit factors out the logic into a new function and uses it for
checking loads from stack slots are not zero-extending loads.
rdar://problem/18236850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217799 91177308-0d34-0410-b5e6-96231b3b80d8
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217789 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed some odd looking cases where addr64 wasn't set
when storing to a pointer in an SGPR. This seems to be intentional,
and partially tested already.
The documentation seems to describe addr64 in terms of which registers
addressing modifiers come from, but I would expect to always need
addr64 when using 64-bit pointers. If no offset is applied,
it makes sense to not need to worry about doing a 64-bit add
for the final address. A small immediate offset can be applied,
so is it OK to not have addr64 set if a carry is necessary when adding
the base pointer in the resource to the offset?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217785 91177308-0d34-0410-b5e6-96231b3b80d8
when SSE4.1 is available.
This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.
This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.
Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8
missing specific checks.
While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217765 91177308-0d34-0410-b5e6-96231b3b80d8
instructions from the relevant shuffle patterns.
This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.
With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217761 91177308-0d34-0410-b5e6-96231b3b80d8
instructions when it finds an appropriate pattern.
These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.
I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217758 91177308-0d34-0410-b5e6-96231b3b80d8
AVX is available, and generally tidy up things surrounding UNPCK
formation.
Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.
This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.
This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.
Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217755 91177308-0d34-0410-b5e6-96231b3b80d8
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.
While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.
When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.
Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217752 91177308-0d34-0410-b5e6-96231b3b80d8
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.
This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.
This fixes PR20916.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217750 91177308-0d34-0410-b5e6-96231b3b80d8
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.
I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.
Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8