This is just testing the largest merge mode for comdats. No need to use
hard to read names and fancy types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223665 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.
With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.
Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
%shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
%shl = shl <8 x i16> %A, %shamt
ret <8 x i16> %shl
}
;;
Before (with -mattr=+avx):
vmovd %xmm1, %eax
movzwl %ax, %eax
vmovd %eax, %xmm1
vpsllw %xmm1, %xmm0, %xmm0
retq
Now:
vpxor %xmm2, %xmm2, %xmm2
vpblendw $1, %xmm1, %xmm2, %xmm1
vpsllw %xmm1, %xmm0, %xmm0
retq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223660 91177308-0d34-0410-b5e6-96231b3b80d8
As a fixup to r223616, follow the convention of naming the files after
the LLVM release whose bitcode they're maintaining compatability with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223623 91177308-0d34-0410-b5e6-96231b3b80d8
Add assembly and bitcode tests that I neglected to add in r223564 (IR:
Disallow complicated function-local metadata) and r223574 (IR: Disallow
function-local metadata attachments).
Found a couple of bugs:
- The error message for function-local attachments gave the wrong line
number -- it indicated the next token (typically on the next line)
instead of the token that started the attachment. Fixed.
- Metadata arguments of the form `!{i32 0, i32 %v}` (or with the
arguments reversed) fired an assertion in `ValueEnumerator` in LLVM
v3.5, so I suppose this never really worked. I suppose this was
"fixed" by r223564.
(Thanks to dblaikie for pointing out my omission.)
Part of PR21532.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223616 91177308-0d34-0410-b5e6-96231b3b80d8
matching offsets. I don't expect this to really matter, but its what the
latest incarnation of my script for maintaining these tests happens to
produce, and so its simpler for me if everything matches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223613 91177308-0d34-0410-b5e6-96231b3b80d8
script. Notably this folds all the SSE cases together into a single
FileCheck block. It also adds a vex prefix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223610 91177308-0d34-0410-b5e6-96231b3b80d8
Consider:
void f() {}
void __attribute__((weak)) g() {}
bool b = &f != &g;
It's possble for g to resolve to f if --defsym=g=f is passed on to the
linker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223585 91177308-0d34-0410-b5e6-96231b3b80d8
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.
Differential Revision: http://reviews.llvm.org/D6510
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223566 91177308-0d34-0410-b5e6-96231b3b80d8
Disallow complex types of function-local metadata. The only valid
function-local metadata is an `MDNode` whose sole argument is a
non-metadata function-local value.
Part of PR21532.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223564 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.
An example of the codegen change is:
From:
vmovss 16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps $16, 20(%rdi), %xmm1, %xmm1
vinsertps $32, 24(%rdi), %xmm1, %xmm1
vinsertps $48, 28(%rdi), %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
retq
To:
vmovups (%rdi), %ymm0
retq
Differential Revision: http://reviews.llvm.org/D6536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223518 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617
In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).
However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.
Test Plan: test/CodeGen/X86/frameaddr.ll
Reviewers: dschuff, nadav
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D6514
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223510 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of
the shift count.
This patch teaches function 'getTargetVShiftNode' how to deal with shifts
where the shift count node is of type MVT::i64.
Before this patch, function 'getTargetVShiftNode' only knew how to deal with
shift count nodes of type MVT::i32. This forced the backend to wrongly
truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223505 91177308-0d34-0410-b5e6-96231b3b80d8
When a loop gets bundled up, its outgoing edges are quite large, and can
just barely overflow 64-bits. If one successor has multiple incoming
edges -- and that successor is getting all the incoming mass --
combining just its edges can overflow. Handle that by saturating rather
than asserting.
This fixes PR21622.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223500 91177308-0d34-0410-b5e6-96231b3b80d8
No functional changes. Got myself bitten in r223113 when adding support for
modified immediate syntax (regressions reported by joerg@britannica.bec.de,
fixes in r223366 and r223381). Our assembler tests did not cover serveral
different syntax variants. This patch expands the test coverage to check for
the following cases:
1. Modified immediate operands may be expressed with expressions, as in #(4 * 2)
instead of #8.
2. Modified immediate operands may be _optionally_ prefixed by a '#' symbol or a
'$' symbol.
3. Certain instructions (e.g. ADD) support single input register variants;
[ADD r0, #mod_imm] is same as [ADD r0, r0, #mod_imm].
4. Certain instructions have aliases which convert plain immediates to modified
immediates. For an example, [ADD r0, -10] is not valid because -10 (in two's
complement) cannot be encoded as a modified immediate, but ARMInstrInfo.td
defines an alias which can transform this into a [SUB r0, 10].
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223475 91177308-0d34-0410-b5e6-96231b3b80d8
Do not realign origin address if the corresponding application
address is at least 4-byte-aligned.
Saves 2.5% code size in track-origins mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223464 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.
However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.
Example:
;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
%c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
%shl = shl <4 x i32> %a, %c
ret <4 x i32> %shl
}
;;
Before this patch, llc generated the following code (-mattr=+avx):
vpshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0]
vpxor %xmm2, %xmm2
vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
vpslld %xmm1, %xmm0, %xmm0
retq
With this patch, the redundant splat operation is removed from the code.
vpxor %xmm2, %xmm2
vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
vpslld %xmm1, %xmm0, %xmm0
retq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,
* Tag_ABI_FP_rounding
* Tag_ABI_FP_denormal
* Tag_ABI_FP_exceptions
* Tag_ABI_FP_user_exceptions
* Tag_ABI_FP_number_model
Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'
Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223454 91177308-0d34-0410-b5e6-96231b3b80d8
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223442 91177308-0d34-0410-b5e6-96231b3b80d8
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.
In the test case, we start before post-RA scheduling with a block that looks
like this:
...
INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
...
%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
...
%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>
where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).
In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.
To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.
Fixes PR21742.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223440 91177308-0d34-0410-b5e6-96231b3b80d8
with fixes. Includes the move of tests for llvm-objdump for universal files to an X86
directory. And the fix where it was failing on linux Rafael tracked down with asan.
I had both Jim Grosbach and Adam Hemet look over the second fix since I could not
set up asan to reproduce with the old version but not with the fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223416 91177308-0d34-0410-b5e6-96231b3b80d8