DenseSet used to be implemented as DenseMap<Key, char>, which usually doubled
the memory footprint of the map. Now we use a compressed set so the second
element uses no memory at all. This required some surgery on DenseMap as
all accesses to the bucket now have to go through methods; this should
have no impact on the behavior of DenseMap though. The new default bucket
type for DenseMap is a slightly extended std::pair as we expose it through
DenseMap's iterator and don't want to break any existing users.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223588 91177308-0d34-0410-b5e6-96231b3b80d8
All our patterns use MVT::i64, but the ISelLowering nodes were inconsistent in
their choice.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223551 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.
An example of the codegen change is:
From:
vmovss 16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps $16, 20(%rdi), %xmm1, %xmm1
vinsertps $32, 24(%rdi), %xmm1, %xmm1
vinsertps $48, 28(%rdi), %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
retq
To:
vmovups (%rdi), %ymm0
retq
Differential Revision: http://reviews.llvm.org/D6536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223518 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617
In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).
However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.
Test Plan: test/CodeGen/X86/frameaddr.ll
Reviewers: dschuff, nadav
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D6514
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223510 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of
the shift count.
This patch teaches function 'getTargetVShiftNode' how to deal with shifts
where the shift count node is of type MVT::i64.
Before this patch, function 'getTargetVShiftNode' only knew how to deal with
shift count nodes of type MVT::i32. This forced the backend to wrongly
truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223505 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.
However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.
Example:
;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
%c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
%shl = shl <4 x i32> %a, %c
ret <4 x i32> %shl
}
;;
Before this patch, llc generated the following code (-mattr=+avx):
vpshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0]
vpxor %xmm2, %xmm2
vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
vpslld %xmm1, %xmm0, %xmm0
retq
With this patch, the redundant splat operation is removed from the code.
vpxor %xmm2, %xmm2
vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
vpslld %xmm1, %xmm0, %xmm0
retq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,
* Tag_ABI_FP_rounding
* Tag_ABI_FP_denormal
* Tag_ABI_FP_exceptions
* Tag_ABI_FP_user_exceptions
* Tag_ABI_FP_number_model
Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'
Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223454 91177308-0d34-0410-b5e6-96231b3b80d8
r32900 introduced custom lowering for fcopysign, with two checks to
change the magnitude value's type if it's larger/smaller than the sign
value's type. r32932 replaced that code for the smaller case.
r43205 did the same for the larger case, but left the old code, now dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223415 91177308-0d34-0410-b5e6-96231b3b80d8
Offset == 0 is a valid offset for kernel code model according to the
x86_64 System V ABI. Found by inspection, no testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223383 91177308-0d34-0410-b5e6-96231b3b80d8
r223113 added support for ARM modified immediate assembly syntax. Which
assumes all immediate operands are prefixed with a '#'. This assumption
is wrong as per the ARMARM - which recommends that all '#' characters be
treated optional. The current patch fixes this regression and adds a test
case. A follow-up patch will expand the test coverage to other instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223381 91177308-0d34-0410-b5e6-96231b3b80d8
So there are a couple of issues with indirect calls on thumbv4t. First, the most
'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
because the saved off pc has its thumb bit cleared, so when the callee returns
we end up in ARM mode.... yuck.
The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.
We could cut down on code size by sharing the landing pads between call sites
that are close enough, but for the moment let's do correctness first and look at
performance later.
Patch by: Iain Sandoe
http://reviews.llvm.org/D6519
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223380 91177308-0d34-0410-b5e6-96231b3b80d8
r223113 added support for ARM modified immediate assembly syntax. That patch
has broken support for immediate expressions, as in:
add r0, #(4 * 4)
It wasn't caught because we don't have any tests for this feature. This patch
fixes this regression and adds test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223366 91177308-0d34-0410-b5e6-96231b3b80d8
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.
Differential Revision: http://reviews.llvm.org/D6501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223360 91177308-0d34-0410-b5e6-96231b3b80d8
Replaced some logic that checked if a build_vector node is doing a splat of a
non-undef value with a call to method BuildVectorSDNode::getSplatValue().
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223354 91177308-0d34-0410-b5e6-96231b3b80d8
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
Commit on
- This patch fixes the bug described in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html
The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.
Impact at run-time on affected functions is::
- One extra store in the prologue, The store saves the base pointer.
- One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.
Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.
http://reviews.llvm.org/D6388
Patch by Arch Robison <arch.robison@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223329 91177308-0d34-0410-b5e6-96231b3b80d8
We had mistakenly believed that GCC's 'cc' referred to the entire
condition-code register (cr0 through cr7) -- and implemented this in r205630 to
fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM
to clobber too much with legacy code with inline asm using the 'cc' clobber.
Fixes PR21451.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223328 91177308-0d34-0410-b5e6-96231b3b80d8
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223323 91177308-0d34-0410-b5e6-96231b3b80d8
On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is
a register containing the address. As a result, this register cannot be r0, and
we need to enforce this register subclass constraint to prevent miscompiling
the code (we'd get this constraint for free with the usual instruction
definitions, but that scheme has no knowledge of how we end up printing inline
asm memory operands, and so here we need to do it 'by hand'). We can accomplish
this within the current address-mode selection framework by introducing an
explicit COPY_TO_REGCLASS node.
Fixes PR21443.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223318 91177308-0d34-0410-b5e6-96231b3b80d8