Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference. However, these are really only useful in GCC internal code
generation. In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.
This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing. If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.
I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217622 91177308-0d34-0410-b5e6-96231b3b80d8
Do
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.
This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217610 91177308-0d34-0410-b5e6-96231b3b80d8
r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
between the low and the high 256 bits of src1 into the low part of the
destination and another unpack of the low and high 256 bits of src2 into the
high part of the destination.
I don't think that's how unpack works. AVX512 unpack simply has more 128-bit
lanes but other than it works the same way as AVX. So in each 128-bit lane,
we're always interleaving certain parts of both operands rather different
parts of one of the operands.
E.g. for this:
__v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
__v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
24, 17, 25, 20, 28, 21, 29);
we generated punpcklps (notice how the elements of a and b are not interleaved
in the shuffle). In turn, c was set to this:
0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29
Obviously this should have just returned the mask vector of the shuffle
vector.
I mostly reverted this change and made sure the original AVX code worked
for 512-bit vectors as well.
Also updated the tests because they matched the logic from the code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217602 91177308-0d34-0410-b5e6-96231b3b80d8
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820
That patch allowed combining of splatted vector FP constants that are multiplied.
This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.
This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.
Differential Revision: http://reviews.llvm.org/D5254
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217599 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the operations are all implemented, we can test this sub-arch here.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217595 91177308-0d34-0410-b5e6-96231b3b80d8
David Blaikie's commits r217563 & r217564, which added shared_ptr to the
CostPool have fixed some memory leak issues exposed by the PBQP with
coalescing constraints.
The sanitizer bot was failing because of those leaks. Now that the leaks
are gone, we can reenable the aarch64/pbqp test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217580 91177308-0d34-0410-b5e6-96231b3b80d8
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.
By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217504 91177308-0d34-0410-b5e6-96231b3b80d8
using static relocation model and small code model.
Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.
Patch from: Keith Walker
Reviewers: Renato Golin, Tim Northover
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217503 91177308-0d34-0410-b5e6-96231b3b80d8
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.
Patch by Sergey Dmitrouk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217498 91177308-0d34-0410-b5e6-96231b3b80d8
This ensures the inline assembly register constraints are properly recognised in
TargetLowering::getRegForInlineAsmConstraint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217479 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In AT&T annotation for both x86_64 and x32 calls should be printed as
callq in assembly. It's only a matter of correct mnemonic, object output
is ok.
Test Plan: trivial test added
Reviewers: nadav, dschuff, craig.topper
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D5213
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217435 91177308-0d34-0410-b5e6-96231b3b80d8
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack.
Patch by Luqman Aden!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217410 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.
Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.
This removes some redundant code at -O0, and more importantly fixes PR20863.
Differential Revision: http://reviews.llvm.org/D5249
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217401 91177308-0d34-0410-b5e6-96231b3b80d8
I hadn't actually run all the tests yet and these combines have somewhat
surprisingly far reaching effects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217333 91177308-0d34-0410-b5e6-96231b3b80d8
support for MOVDDUP which is really important for matrix multiply style
operations that do lots of non-vector-aligned load and splats.
The original motivation was to add support for MOVDDUP as the lack of it
regresses matmul_f64_4x4 by 5% or so. However, all of the rules here
were somewhat suspicious.
First, we should always be using the floating point domain shuffles,
regardless of how many copies we have to make as a movapd is *crazy*
faster than the domain switching cost on some chips. (Mostly because
movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick
of the PSHUF instructions, there is no need to avoid canonicalizing on
UNPCK variants, so do that canonicalizing. This also ensures we have the
chance to form MOVDDUP. =]
Second, we assume SSE2 support when doing any vector lowering, and given
that we should just use UNPCKLPD and UNPCKHPD as they can operate on
registers or memory. If vectors get spilled or come from memory at all
this is going to allow the load to be folded into the operation. If we
want to optimize for encoding size (the only difference, and only
a 2 byte difference) it should be done *much* later, likely after RA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217332 91177308-0d34-0410-b5e6-96231b3b80d8
parsing (and latent bug in the instruction definitions).
This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.
The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:
insertps $192, %xmm0, %xmm1
insertps $-64, %xmm0, %xmm1
These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.
The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.
Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.
The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.
In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.
I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217310 91177308-0d34-0410-b5e6-96231b3b80d8
computation was totally wrong, but somehow it didn't really show up with
llc.
I've added an assert that triggers on multiple existing test cases and
updated one of them to show the correct value.
There appear to still be more bugs lurking around insertps's mask. =/
However, note that this only really impacts the new vector shuffle
lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217289 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r217211.
Both the bfd ld and gold outputs were valid. They were using a Rela relocation,
so the value present in the relocated location was not used, which caused me
to misread the output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217264 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes hitting the same negative base offset problem
that was already fixed for regular loads and stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217256 91177308-0d34-0410-b5e6-96231b3b80d8
round halfway cases away from zero
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217250 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle lowering for integer vectors and share it from v4i32, v8i16, and
v16i8 code paths.
Ironically, the SSE2 v16i8 code for this is now better than the SSSE3!
=] Will have to fix the SSSE3 code next to just using a single pshufb.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217240 91177308-0d34-0410-b5e6-96231b3b80d8
Patched by Sergey Dmitrouk.
This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:
GE -> GT
GT -> GE
LT -> LE
LE -> LT
and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217220 91177308-0d34-0410-b5e6-96231b3b80d8
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.
Patch by Asiri Rathnayake.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217159 91177308-0d34-0410-b5e6-96231b3b80d8
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.
Patch by Asiri Rathnayake.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217138 91177308-0d34-0410-b5e6-96231b3b80d8
vzext patterns and insert-element patterns that for SSE4 have dedicated
instructions.
With this we can enable the experimental mode in a regression test that
happens to cover some of the past set of issues. You can see that the
new logic does significantly better here on the floating point cases.
A follow-up to this change and the previous ones will hoist the logic
into helpers so it can be shared across element type sizes as in this
particular case it generalizes cleanly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217136 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r216803, because it might have broken the buildbot.
The issue is tracked in PR20842.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217120 91177308-0d34-0410-b5e6-96231b3b80d8
This change adds support for immediate and shift-left folding into logical
operations.
This fixes rdar://problem/18223183.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217118 91177308-0d34-0410-b5e6-96231b3b80d8