abilities of INSERTPS which are really powerful and come up in very
important contexts such as forming diagonal matrices, etc.
With this I ended up being able to remove the somewhat weird helper
I added for INSERTPS because we can collapse the entire state to a no-op
mask. Added a bunch of tests for inserting into a zero-ish vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217117 91177308-0d34-0410-b5e6-96231b3b80d8
'insertps' patterns.
This replaces two shuffles with a single insertps in very common cases.
My next patch will extend this to leverage the zeroing capabilities of
insertps which will allow it to be used in a much wider set of cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217100 91177308-0d34-0410-b5e6-96231b3b80d8
an immediate operand when we don't have instruction-specific comments.
This ensures that instruction-specific comments are attached to the same
line as the instruction which is important for using them to write
readable and maintainable tests. My next commit will just such a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217099 91177308-0d34-0410-b5e6-96231b3b80d8
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217075 91177308-0d34-0410-b5e6-96231b3b80d8
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216982 91177308-0d34-0410-b5e6-96231b3b80d8
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8
We duplicate ~30 lines of code to lower FABS and FNEG for x86, so this patch combines them into one function.
No functional change intended, so no additional test cases. Test-suite behavior is unchanged.
Differential Revision: http://reviews.llvm.org/D5064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216942 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().
Fixes PR20828.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8
The structures for Windows unwinding are shared across multiple platforms.
Indicate the encoding to be used for the particular target. Use this to switch
the unwind emitter instantiated by the AsmPrinter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216895 91177308-0d34-0410-b5e6-96231b3b80d8
This change will ease refactoring LowerFABS() and LowerFNEG()
since they have a lot of overlap.
Remove the creation of a floating point constant from an integer
because it's going to be used for a bitwise integer op anyway.
No change to codegen expected, but the verbose comment string
for asm output may change from float values to hex (integer),
depending on whether the constant already exists or not.
Differential Revision: http://reviews.llvm.org/D5052
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216889 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).
Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.
Reviewers: majnemer
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5060
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216780 91177308-0d34-0410-b5e6-96231b3b80d8
We've rejected these kinds of functions since r28405 in 2006 because
it's impossible to lower the return of a callee cleanup varargs
function. However there are lots of legal ways to leave such a function
without returning, such as aborting. Today we can leave a function with
a musttail call to another function with the correct prototype, and
everything works out.
I'm removing the verifier check declaring that a normal return from such
a function is UB.
Reviewed By: nlewycky
Differential Revision: http://reviews.llvm.org/D5059
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216779 91177308-0d34-0410-b5e6-96231b3b80d8
The new solution is to not use this lowering if there are any dynamic
allocas in the current function. We know up front if there are dynamic
allocas, but we don't know if we'll need to create stack temporaries
with large alignment during lowering. Conservatively assume that we will
need such temporaries.
Reviewed By: hans
Differential Revision: http://reviews.llvm.org/D5128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216775 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Mostly renaming the (not very explicit) variables Tmp0, .. Tmp4, and grouping
related statements together, along with a few lines of comments for the
surprising parts.
No functional change intended.
Test Plan: make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5088
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216768 91177308-0d34-0410-b5e6-96231b3b80d8
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.
Reviewed by Elena Demikhovsky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
a single early exit.
And factor the subsequent cast<> from all but one block into a single
variable.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216645 91177308-0d34-0410-b5e6-96231b3b80d8
functionality changed.
Separating this into two functions wasn't helping. There was a decent
amount of boilerplate duplicated, and some subsequent refactorings here
will pull even more common code out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216644 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Introduce support::ulittleX_t::ref type to Support/Endian.h and use it in x86 JIT
to enforce correct endianness and fix unaligned accesses.
Test Plan: regression test suite
Reviewers: lhames
Subscribers: ributzka, llvm-commits
Differential Revision: http://reviews.llvm.org/D5011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216631 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions like 'fxsave' and control flow instructions like 'jne'
match any operand size. The loop I added to the Intel syntax matcher
assumed that using a different size would give a different instruction.
Now it handles the case where we get the same instruction for different
memory operand sizes.
This also allows us to remove the hack we had for unsized absolute
memory operands, because we can successfully match things like 'jnz'
without reporting ambiguity. Removing this hack uncovered test case
involving 'fadd' that was ambiguous. The memory operand could have been
single or double precision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216604 91177308-0d34-0410-b5e6-96231b3b80d8
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.
This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.
However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.
The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.
Original commit message for r215611 which covers the rest of the chang:
[SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.
In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.
It took many million runs of the shuffle fuzz tester to find this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
The existing matcher has lots of AT&T assembly dialect assumptions baked
into it. In particular, the hack for resolving the size of a memory
operand by appending the four most common suffixes doesn't work at all.
The Intel assembly dialect mnemonic table has ambiguous entries, so we
need to try matching multiple times with different operand sizes, since
that's the only way to choose different instruction variants.
This makes us more compatible with gas's implementation of Intel
assembly syntax. MSVC assumes you want byte-sized operations for the
instructions that we reject as ambiguous.
Reviewed By: grosbach
Differential Revision: http://reviews.llvm.org/D4747
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216481 91177308-0d34-0410-b5e6-96231b3b80d8
This actually was caught by existing tests but those tests were disabled
with an XFAIL because of PR20736. While working on fixing that,
I noticed the test failure, and tracked it down to this.
We even have a really nice Clang warning that would have caught this but
it isn't enabled in LLVM! =[ I may look at enabling it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216391 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
these DAG combines.
The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing
a node with its operand, you can cause its uses to CSE to itself, which
then causes their uses to become your uses which causes them to be
picked up by the RAUW. For nodes that are determined to be "no-ops",
this is "fine". But if the RAUW is one of several steps to enact
a transformation, this causes the DAG to really silently eat an discard
nodes that you would never expect. It took days for me to actually
pinpoint a test case triggering this and a really frustrating amount of
time to even comprehend the bug because I never even thought about the
ability of RAUW to iteratively consume nodes due to CSE-ing them into
itself.
To fix this, we have to build up a brand-new chain of operations any
time we are combining across (potentially) intervening nodes. But once
the logic is added to do this, another issue surfaces: CombineTo eagerly
deletes the one node combined, *but no others*. This is... really
frustrating. If deleting it makes its operands become dead, those
operand nodes often won't go onto the worklist in the
order you would want -- they're already on it and not near the top. That
means things higher on the worklist will get combined prior to these
dead nodes being GCed out of the worklist, and if the chain is long, the
immediate users won't be enough to re-detect where the root of the chain
is that became single-use again after deleting the dead nodes. The
better way to do this is to never immediately delete nodes, and instead
to just enqueue them so we can recursively delete them. The
combined-from node is typically not on the worklist anyways by virtue of
having been popped off.... But that in turn breaks other tests that
*require* CombineTo to delete unused nodes. :: sigh ::
Fortunately, there is a better way. This whole routine should have been
returning the replacement rather than using CombineTo which is quite
hacky. Switch to that, and all the pieces fall together.
I suspect the same kind of miscompile is possible in the half-shuffle
folding code, and potentially the recursive folding code. I'll be
switching those over to a pattern more like this one for safety's sake
even though I don't immediately have any test cases for them. Note that
the only way I got a test case for this instance was with *heavily* DAG
combined 256-bit shuffle sequences generated by my fuzzer. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216319 91177308-0d34-0410-b5e6-96231b3b80d8
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.
Most of the test suite changes are adding va_start calls to existing
tests to keep things working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8
This (mostly) reverts commit r216119.
Somewhere during the review Reid committed r214980 which fixed this
another way, and I neglected to check that the testcase still failed
before committing.
I've left test/CodeGen/X86/aligned-variadic.ll around in case it adds
extra coverage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216246 91177308-0d34-0410-b5e6-96231b3b80d8
We discussed the issue of generality vs. readability of the AVX512 classes
recently. I proposed this approach to try to hide and centralize the mappings
we commonly perform based on the vector type. A new class X86VectorVTInfo
captures these.
The idea is to pass an instance of this class to classes/multiclasses instead
of the corresponding ValueType. Then the class/multiclass can use its field
for things that derive from the type rather than passing all those as separate
arguments.
I modified avx512_valign to demonstrate this new approach. As you can see
instead of 7 related template parameters we now have one. The downside is
that we have to refer to fields for the derived values. I named the argument
'_' in order to make this as invisible as possible. Please let me know if you
absolutely hate this. (Also once we allow local initializations in
multiclasses we can recover the original version by assigning the fields to
local variables.)
Another possible use-case for this class is to directly map things, e.g.:
RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216209 91177308-0d34-0410-b5e6-96231b3b80d8
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648
This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.
This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.
Differential Revision: http://reviews.llvm.org/D4934
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly. The controlling sentence is, "The size of each argument gets
rounded up to eightbytes. Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area." For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.
Patch by Thomas Jablin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.
Test Plan: Test case from the original bug report is included.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D4968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216117 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.
Fixes <rdar://problem/17674628>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216012 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.
This fixes <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216010 91177308-0d34-0410-b5e6-96231b3b80d8
Group: Floating Point XMM and YMM instructions.
Sub-group: Other instructions.
<rdar://problem/15607571>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215923 91177308-0d34-0410-b5e6-96231b3b80d8
Group: Floating Point XMM and YMM instructions.
Sub-group: Math instructions.
<rdar://problem/15607571>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215921 91177308-0d34-0410-b5e6-96231b3b80d8
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".
Mostly just refactoring at present, and there's probably no way to test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215887 91177308-0d34-0410-b5e6-96231b3b80d8
It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics.
I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215826 91177308-0d34-0410-b5e6-96231b3b80d8
MSVC gives this awesome diagnostic:
..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument
..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1'
..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl'
Using an anonymous namespace makes the problem go away.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215744 91177308-0d34-0410-b5e6-96231b3b80d8
the new shuffle lowering and an implementation for v4 shuffles.
This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215702 91177308-0d34-0410-b5e6-96231b3b80d8
BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments.
These will be used in my next commit as part of test cases for AVX
shuffles which can directly use blend in more places.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215701 91177308-0d34-0410-b5e6-96231b3b80d8
elements of a shuffle mask and simplify how it works. No functionality
changed now that the bug that was here has been fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215696 91177308-0d34-0410-b5e6-96231b3b80d8
target-specific shuffl DAG combines.
We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.
In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215690 91177308-0d34-0410-b5e6-96231b3b80d8
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.
While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.
Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.
Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.
Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.
Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215687 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. This will be used by the new FMA intrinsic lowering
code.
We can probably add NO_EXC here as well, I am just not too familiar with this
part of AVX512 yet. We can add that later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215662 91177308-0d34-0410-b5e6-96231b3b80d8
This change further evolves the base class AVX512_masking in order to make it
suitable for the masking variants of the FMA instructions.
Besides AVX512_masking there is now a new base class that instructions
including FMAs can use: AVX512_masking_3src. With three-source (destructive)
instructions one of the sources is already tied to the destination. This
difference from AVX512_masking is captured by this new class. The common bits
between _masking and _masking_3src are broken out into a new super class
called AVX512_masking_common.
As with valign, there is some corresponding restructuring of the underlying
format classes. The idea is the same we want to derive from two classes
essentially: one providing the format bits and another format-independent
multiclass supplying the various masking and non-masking instruction variants.
Existing fma tests in avx512-fma*.ll provide coverage here for the non-masking
variants. For masking, the next patches in the series will add intrinsics and
intrinsic tests.
For AVX512_masking_3src to work, the (ins ...) dag has to be passed *without*
the leading source operand that is tied to dst ($src1). This is necessary to
properly construct the (ins ...) for the different variants. For the record,
I did check that if $src is mistakenly included, you do get a fairly intuitive
error message from the tablegen backend.
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215660 91177308-0d34-0410-b5e6-96231b3b80d8
lowering scheme.
Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.
Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215637 91177308-0d34-0410-b5e6-96231b3b80d8
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.
Fixes <rdar://problem/17674628>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215595 91177308-0d34-0410-b5e6-96231b3b80d8
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.
This fixes <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215593 91177308-0d34-0410-b5e6-96231b3b80d8
Split the constant materialization code into three separate helper functions for
Integer-, Floating-Point-, and GlobalValue-Constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215586 91177308-0d34-0410-b5e6-96231b3b80d8
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215558 91177308-0d34-0410-b5e6-96231b3b80d8
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215536 91177308-0d34-0410-b5e6-96231b3b80d8
one pesky test case correctly.
This test case caused the old code to infloop occilating between solving
the low-half and the high-half. The 'side balancing' part of
single-input v8 shuffle lowering didn't handle the one pattern which can
cause it to occilate. Fortunately the fuzz testing found this case.
Unfortuately it was *terrible* to handle. I'm really sorry for the
amount and density of the code here, I'd love suggestions on how to
simplify it. I feel like there *must* be a simpler form here, but after
a lot of days I've not found it. This is the only one I've found that
even works. I've added the one pesky test case along with some nice
comments explaining the core problem that we have to solve here.
So far this has survived approximately 32k test cases. More strenuous
fuzzing commencing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215519 91177308-0d34-0410-b5e6-96231b3b80d8
I think that this will scale better in most cases than adding a Pat<> for each
mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We
can just lower to the SDNode and have the resulting DAG be matches by the DAG
patterns.
Alternatively (long term), we could keep the Pat<>s but generate them via the
new AVX512_masking multiclass. The difficulty is that in order to formulate
that we would have to concatenate DAGs. Currently this is only supported if
the operators of the input DAGs are identical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215473 91177308-0d34-0410-b5e6-96231b3b80d8
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215237 91177308-0d34-0410-b5e6-96231b3b80d8
This completes one item from the todo-list of r215125 "Generate masking
instruction variants with tablegen".
The AddedComplexity is needed just like for the k variant.
Added a codegen test based on valignq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215173 91177308-0d34-0410-b5e6-96231b3b80d8
The AddedComplexity is needed just like in avx512_perm_3src. There may be a
bug in the complexity computation...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215168 91177308-0d34-0410-b5e6-96231b3b80d8
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215154 91177308-0d34-0410-b5e6-96231b3b80d8
After adding the masking variants to several instructions, I have decided to
experiment with generating these from the non-masking/unconditional
variant. This will hopefully reduce the amount repetition that we currently
have in order to define an instruction with all its variants (for a reg/mem
instruction this would be 6 instruction defs and 2 Pat<> for the intrinsic).
The patch is the first cut that is currently only applied to valignd/q to make
the patch small.
A few notes on the approach:
* In order to stitch together the dag for both the conditional and the
unconditional patterns I pass the RHS of the set rather than the full
pattern (set dest, RHS).
* Rather than subclassing each instruction base class (e.g. AVX512AIi8),
with a masking variant which wouldn't scale, I derived the masking
instructions from a new base class AVX512 (this is just I<> with
Requires<HasAVX512>). The instructions derive from this now, plus a new set
of classes that add the format bits and everything else that instruction
base class provided (i.e. AVX512AIi8 vs. AVX512AIi8Base).
I hope we can go incrementally from here. I expect that:
* We will need different variants of the masking class. One example is
instructions requiring three vector sources. In this case we tie one of the
source operands to dest rather than a new implicit source operand ($src0)
* Add the zero-masking variant
* Add more AVX512*Base classes as new uses are added
I've looked at X86.td.expanded before and after to make sure that nothing got
lost for valignd/q.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215125 91177308-0d34-0410-b5e6-96231b3b80d8
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215111 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle lowering.
This is closely related to the previous one. Here we failed to use the
source offset when swapping in the other case -- where we end up
swapping the *final* shuffle. The cause of this bug is a bit different:
I simply wasn't thinking about the fact that this mask is actually
a slice of a wide mask and thus has numbers that need SourceOffset
applied. Simple fix. Would be even more simple with an algorithm-y thing
to use here, but correctness first. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215095 91177308-0d34-0410-b5e6-96231b3b80d8
via the fuzz tester.
Here I missed an offset when round-tripping a value through a shuffle
mask. I got it right 2 lines below. See a problem? I do. ;] I'll
probably be adding a little "swap" algorithm which accepts a range and
two values and swaps those values where they occur in the range. Don't
really have a name for it, let me know if you do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215094 91177308-0d34-0410-b5e6-96231b3b80d8
through the new fuzzer.
This one is great: bad operator precedence led the modulus to happen at
the wrong point. All the asserts didn't fire because there were usually
the right values past the end of the 4 element region we were looking
at. Probably could have gotten a crash here with ASan + fuzzing, but the
correctness tests pinpointed this really nicely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215092 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack
pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still
require 64-bit register, so using 64-bit MachineFramePtr where required.
X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that
both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing
this issue here as well by making isTarget64BitLP64 false.
Also mark hasReservedSpillSlot unreachable on X86. See inlined comments.
Test Plan: Add one new simple test and upgrade 2 existing with x32 target case.
Reviewers: nadav, dschuff
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D4617
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215091 91177308-0d34-0410-b5e6-96231b3b80d8
fuzz testing.
The function which tested for adjacency did what it said on the tin, but
when I called it, I wanted it to do something more thorough: I wanted to
know if the *pairs* of shuffle elements were adjacent and started at
0 mod 2. In one place I had the decency to try to test for this, but in
the other it was completely skipped, miscompiling this test case. Fix
this by making the helper actually do what I wanted it to do everywhere
I called it (and removing the now redundant code in one place).
I *really* dislike the name "canWidenShuffleElements" for this
predicate. If anyone can come up with a better name, please let me know.
The other name I thought about was "canWidenShuffleMask" but is it
really widening the mask to reduce the number of lanes shuffled? I don't
know. Naming things is hard.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215089 91177308-0d34-0410-b5e6-96231b3b80d8
This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be
shared across multiple architectures and a target specific bit which is
overridden (Win64::UnwindEmitter). This enables sharing the section selection
code across X86 and the intended use in ARM for emitting unwind information for
Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215050 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR18916. I don't think we need to implement support for either
hybrid syntax. Nobody should write Intel assembly with '%' prefixes on
their registers or AT&T assembly without them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215031 91177308-0d34-0410-b5e6-96231b3b80d8
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8
test case to actually generate correct code.
The primary miscompile fixed here is that we weren't correctly handling
in-place elements in one half of a single-input v8i16 shuffle when
moving a dword of elements from that half to the other half. Some times,
we would clobber the in-place elements in forming the dword to move
across halves.
The fix to this involves forcibly marking the in-place inputs even when
there is no need to gather them into a dword, and to much more carefully
re-arrange the elements when grouping them into a dword to move across
halves. With these two changes we would generate correct shuffles for
the test case, but found another miscompile. There are also some random
perturbations of the generated shuffle pattern in SSE2. It looks like
a wash; more instructions in some cases fewer in others.
The second miscompile would corrupt the results into nonsense. This is
a buggy pattern in one of the added DAG combines. Mapping elements
through a PSHUFD when pairing redundant half-shuffles is *much* harder
than this code makes it out to be -- it requires reasoning about *all*
of where the input is used in the PSHUFD, not just one part of where it
is used. Plus, we can't combine a half shuffle *into* a PSHUFD but the
code didn't guard against it. I think this was just a bad idea and I've
just removed that aspect of the combine. No tests regress as
a consequence so seems OK.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214954 91177308-0d34-0410-b5e6-96231b3b80d8
not corrupting the mask by mutating it more times than intended. No
functionality changed (the results were non-overlapping so the old
version "worked" but was non-obvious).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214953 91177308-0d34-0410-b5e6-96231b3b80d8
a test case.
We also miscompile this test case which is showing a serious flaw in the
single-input v8i16 shuffle code. I've left the specific instruction
checks FIXME-ed out until I can address the bug in the single-input
code, but I wanted to separate out a significant functionality change to
produce correct code from a very simple and targeted crasher fix.
The miscompile problem stems from keeping track of inputs by value
rather than by index. As a consequence of doing this, we can't reliably
update those inputs because they might swap and we can't detect this
without copying the mask.
The blend code now uses indices for the input lists and this seems
strictly better. It also should make it easier to sort things and do
other cleanups. I think the time has come to simplify The Great Lambda
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214914 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to what I did with the two-source permutation recently. (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)
Both encoding and intrinsic tests are added as well. For the latter, this is
what the IR that the intrinsic test on the clang side generates.
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214890 91177308-0d34-0410-b5e6-96231b3b80d8
This controls the number of operands in the disassembler's x86OperandSets
table. The entries describe how the operand is encoded and its type.
Not to surprisingly 5 operands is insufficient for AVX512. Consider
VALIGNDrrik in the next patch. These are its operand specifiers:
{ /* 328 */
{ ENCODING_DUP, TYPE_DUP1 },
{ ENCODING_REG, TYPE_XMM512 },
{ ENCODING_WRITEMASK, TYPE_VK8 },
{ ENCODING_VVVV, TYPE_XMM512 },
{ ENCODING_RM_CD64, TYPE_XMM512 },
{ ENCODING_IB, TYPE_IMM8 },
},
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214889 91177308-0d34-0410-b5e6-96231b3b80d8
This was currently part of lowering to PALIGNR with some special-casing to
make interlane shifting work. Since AVX512F has interlane alignr (valignd/q)
and AVX512BW has vpalignr we need to support both of these *at the same time*,
e.g. for SKX.
This patch breaks out the common code and then add support to check both of
these lowering options from LowerVECTOR_SHUFFLE.
I also added some FIXMEs where I think the AVX512BW and AVX512VL additions
should probably go.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214888 91177308-0d34-0410-b5e6-96231b3b80d8
They have different semantics (valign is interlane while palingr is intralane)
and palingr is still needed even in the AVX512 context. According to the
latest spec AVX512BW provides these.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214887 91177308-0d34-0410-b5e6-96231b3b80d8
The packed integer pattern becomes the DAG pattern for rri and the packed
float, another Pat<> inside the multiclass.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214885 91177308-0d34-0410-b5e6-96231b3b80d8
found by a single test reduced out of a failure on llvm-stress.
The start of the problem (and the crash) came when we tried to use
a find of a non-used slot in the move-to half of the move-mask as the
target for two bad-half inputs. While if lucky this will be the first of
a pair of slots which we can place the bad-half inputs into, it isn't
actually guaranteed. This really isn't surprising, not sure what I was
thinking. The correct way to find the two unused slots is to look for
one of the *used* slots. We know it isn't that pair, and we can use some
modular arithmetic to find the other pair by masking off the odd bit and
adding 2 modulo 4. With this, we reliably found a viable pair of slots
for the bad-half inputs.
Sadly, that wasn't enough. We also had a wrong code bug that surfaced
when I reduced the test case for this where we would use the same slot
twice for the two bad inputs. This is because both of the bad inputs
could be in odd slots originally and thus the mod-2 mapping would
actually be the same. The whole point of the weird indexing into the
pair of empty slots was to try to leverage when the end result needed
the two bad-half inputs to be paired in a dword and pre-pair them in the
correct orrientation. This is less important with the powerful combining
we're now doing, and also easier and more reliable to achieve be noting
that we add the bad-half inputs in order. Thus, if they are in a dword
pair, the low part of that will be the first input in the sequence.
Always putting that in the low element will just do the right thing in
addition to computing the correct result.
Test case added. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214849 91177308-0d34-0410-b5e6-96231b3b80d8
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range. This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.
Differential Revision: http://reviews.llvm.org/D4751
Patch by Vadim Chugunov!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214775 91177308-0d34-0410-b5e6-96231b3b80d8
use of PACKUS. It's cleaner that way.
I looked at implementing clever combine-based folding of PACKUS chains
into PSHUFB but it is quite hard and doesn't seem likely to be worth it.
The most annoying part would be detecting that the correct masking had
been done to use PACKUS-style instructions as a blend operation rather
than there being any saturating as is indicated by its name. We generate
really nice code for what few test cases I've come up with that aren't
completely contrived for this by just directly prefering PSHUFB and so
let's go with that strategy for now. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214707 91177308-0d34-0410-b5e6-96231b3b80d8
patterns of v16i8 shuffles.
This implements one of the more important FIXMEs for the SSE2 support in
the new shuffle lowering. We now generate the optimal shuffle sequence
for truncate-derived shuffles which show up essentially everywhere.
Unfortunately, this exposes a weakness in other parts of the shuffle
logic -- we can no longer form PSHUFB here. I'll add the necessary
support for that and other things in a subsequent commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214702 91177308-0d34-0410-b5e6-96231b3b80d8
I spent some time looking into a better or more principled way to handle
this. For example, by detecting arbitrary "unneeded" ORs... But really,
there wasn't any point. We just shouldn't build blatantly wrong code so
late in the pipeline rather than adding more stages and logic later on
to fix it. Avoiding this is just too simple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214680 91177308-0d34-0410-b5e6-96231b3b80d8
GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition:
lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
assert(STReturns == 0 || isMask_32(STReturns) && N <= 2);
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214672 91177308-0d34-0410-b5e6-96231b3b80d8
This makes EmitWindowsUnwindTables a virtual function and lowers the
implementation of the function to the X86WinCOFFStreamer. This method is a
target specific operation. This enables making the behaviour target dependent
by isolating it entirely to the target specific streamer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214664 91177308-0d34-0410-b5e6-96231b3b80d8
lowering with a small addition to it and adding PSHUFB combining.
There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).
However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.
Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.
The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214628 91177308-0d34-0410-b5e6-96231b3b80d8
Spotted this missed refactoring by inspection when reading code, and it
doesn't changethe functionality at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214627 91177308-0d34-0410-b5e6-96231b3b80d8
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.
This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.
As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.
Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214625 91177308-0d34-0410-b5e6-96231b3b80d8
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.
It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:
1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0
<rdar://problem/16952634>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214580 91177308-0d34-0410-b5e6-96231b3b80d8
so that we can use it to get the old-style JIT out of the subtarget.
This code should be removed when the old-style JIT is removed
(imminently).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214560 91177308-0d34-0410-b5e6-96231b3b80d8
This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.
This fixes errors about requiring more registers than available in
cases like this:
void f();
void __declspec(naked) g() {
__asm pusha
__asm call f
__asm popa
__asm ret
}
There are no registers available to pass the address of 'f' into the asm
blob. The asm should now directly call 'f'.
Tests will land in Clang shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214550 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds code to emits the StackMap section on ELF systems. This section is required to support llvm.experimental.stackmap and llvm.experimental.patchpoint intrinsics.
Reviewers: ributzka, echristo
Differential Revision: http://reviews.llvm.org/D4574
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214538 91177308-0d34-0410-b5e6-96231b3b80d8
This improves the diagnostics from the regular assembler, but more
importantly it fixes an assertion when parsing inline assembly. Test
landing in Clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214468 91177308-0d34-0410-b5e6-96231b3b80d8
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.
Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.
rdar://16228228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214460 91177308-0d34-0410-b5e6-96231b3b80d8
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214366 91177308-0d34-0410-b5e6-96231b3b80d8
This works towards making the Intel syntax asm matcher use a completely
different disambiguation strategy.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214352 91177308-0d34-0410-b5e6-96231b3b80d8