when SSE4.1 is available.
This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.
This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.
Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8
missing specific checks.
While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217765 91177308-0d34-0410-b5e6-96231b3b80d8
instructions from the relevant shuffle patterns.
This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.
With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217761 91177308-0d34-0410-b5e6-96231b3b80d8
instructions when it finds an appropriate pattern.
These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.
I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217758 91177308-0d34-0410-b5e6-96231b3b80d8
AVX is available, and generally tidy up things surrounding UNPCK
formation.
Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.
This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.
This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.
Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217755 91177308-0d34-0410-b5e6-96231b3b80d8
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.
While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.
When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.
Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217752 91177308-0d34-0410-b5e6-96231b3b80d8
Use fully qualified name inside a typedef from llvm::iterator_range<...> to
iterator_range. This is reported (rightly I think) by GCC as an
ambiguous name redefinition. Hope this fixes the buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217751 91177308-0d34-0410-b5e6-96231b3b80d8
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.
This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.
This fixes PR20916.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217750 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
replaceAllUsesWith had been modified to allow a DbgNode value to be
replaced by itself. In that case a new node is created by copying the
current DbgNode and the copy is used as replacement value.
When that copying happens, the value stored in this->DbgNode at the end
of RAUW would be a reference to the Node that has just been deleted.
This doesn't produce any bug right now, because the DI node on which we
call RAUW won't be used again.
Reviewers: dblaikie, echristo, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5326
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217749 91177308-0d34-0410-b5e6-96231b3b80d8
RAUW was only used on DIType to merge declarations and full definitions
of types. In order to support the same functionality for functions and
global variables, move the function up type DI type hierarchy to the
common parent of DIType, DISubprogram and DIVariable which is
DIDescriptor.
This functionality will be exercized when we add the code to emit
imported declarations for forward declared function/variables.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5325
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217748 91177308-0d34-0410-b5e6-96231b3b80d8
A DWARFUnitSection is the collection of Units that have been extracted from
the same debug section.
By embeding a reference to their DWARFUnitSection in each unit, the DIEs
will be able to resolve inter-unit references by interrogating their Unit's
DWARFUnitSection.
This is a minimal patch where the DWARFUnitSection is-a SmallVector of Units,
thus exposing exactly the same interface as before. Followup-up patches might
change from inheritance to composition in order to expose only the wanted
DWARFUnitSection abstraction.
Differential Revision: http://reviews.llvm.org/D5310
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217747 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the need to pass a starting and ending line when creating
a SourceCoverageView, since these are easy to determine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217746 91177308-0d34-0410-b5e6-96231b3b80d8
A single function in SourceCoverageDataManager was the only user of
some of the comparisons in CounterMappingRegion, and at this point we
know that only one file is relevant. This lets us use slightly simpler
logic directly in the client.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217745 91177308-0d34-0410-b5e6-96231b3b80d8
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.
I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.
Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8
variants where significant.
This will make it more obvious what is happening when we start using
blends in SSE41.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217740 91177308-0d34-0410-b5e6-96231b3b80d8
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.
A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/
To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.
There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217738 91177308-0d34-0410-b5e6-96231b3b80d8
pointer to a dead function. To make sure it's valid, doFinalization nullptrs
RewindFunction just like the constructor and so it will be found on next run.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217737 91177308-0d34-0410-b5e6-96231b3b80d8
... Just make sure we check uses first so we see the kill first. It
turns out ignoring defs gives some pretty nasty runtime failures.
I'm certain this is the fix but I'm still reducing a testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217735 91177308-0d34-0410-b5e6-96231b3b80d8
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.
Fixes rdar://problem/18330589.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217732 91177308-0d34-0410-b5e6-96231b3b80d8
Check that the post RA scheduler is being skipped, regardless of
whether it's the top-down list latency scheduler or the post-RA
MI scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217725 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to my previous -exports-trie option, the -rebase option dumps info from
the LC_DYLD_INFO load command. The rebasing info is a list of the the locations
that dyld needs to adjust if a mach-o image is not loaded at its preferred
address. Since ASLR is now the default, images almost never load at their
preferred address, and thus need to be rebased by dyld.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217709 91177308-0d34-0410-b5e6-96231b3b80d8
The raw profiles that are generated in compiler-rt always add padding
so that each profile is aligned, so we can simply treat files that
don't have this property as malformed.
Caught by Alexey's new ubsan bot. Thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217708 91177308-0d34-0410-b5e6-96231b3b80d8
enabled. A good chunk of the MIsNeedChainEdge() is logic that is valid and should be applied even for targets
that are not using for alias analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217706 91177308-0d34-0410-b5e6-96231b3b80d8
Vector MUL/MLAs have tied operands, which gives us extra constraints
that we currently can't handle. Instead of silently doing the wrong
thing, remove support to be readded later properly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217690 91177308-0d34-0410-b5e6-96231b3b80d8
Defs are seen before uses, so a def without the kill flag doesn't necessarily
mean that the register is not killed on that instruction. It may be killed
in a later use operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217689 91177308-0d34-0410-b5e6-96231b3b80d8
As far as I can tell UTF-8 has been supported since the beginning of Python's
codec support, and it's the de facto standard for text these days, at least
for primarily-English text. This allows us to put Unicode into lit RUN lines.
rdar://problem/18311663
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217688 91177308-0d34-0410-b5e6-96231b3b80d8