try to reduce the width of this load, and would end up transforming:
(truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
(truncate (zextload i32 <ptr+4> as i64) to i32)
We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:
movswl 6(...),%eax
Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:
movl 6(...), %eax
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169802 91177308-0d34-0410-b5e6-96231b3b80d8
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169796 91177308-0d34-0410-b5e6-96231b3b80d8
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8
the VSRI instruction before it since it does not affect the MSB.
Thanks Craig Topper for suggesting this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169638 91177308-0d34-0410-b5e6-96231b3b80d8
check if loads that happen in between stores alias with the first store in the
chain, only with the second store onwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169516 91177308-0d34-0410-b5e6-96231b3b80d8
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169425 91177308-0d34-0410-b5e6-96231b3b80d8
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169312 91177308-0d34-0410-b5e6-96231b3b80d8
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169218 91177308-0d34-0410-b5e6-96231b3b80d8
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168773 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a hole in the "cheap" alias analysis logic implemented within
the DAG builder itself, regardless of whether proper alias analysis is
enabled. It now handles this pattern produced by LSR+CodeGenPrepare.
%sunkaddr1 = ptrtoint * %obj to i64
%sunkaddr2 = add i64 %sunkaddr1, %lsr.iv
%sunkaddr3 = inttoptr i64 %sunkaddr2 to i32*
store i32 %v, i32* %sunkaddr3
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168768 91177308-0d34-0410-b5e6-96231b3b80d8
when the destination register is wider than the memory load.
These load instructions load from m32 or m64 and set the upper bits to zero,
while the folded instructions may accept m128.
rdar://12721174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168710 91177308-0d34-0410-b5e6-96231b3b80d8
This pass was conservative in that it always reserved the FP to enable dynamic
stack realignment, which allowed the RA to use aligned spills for vector
registers. This happens even when spills were not necessary. The RA has
since been improved to use unaligned spills when necessary.
The new behavior is to realign the stack if the frame pointer was already
reserved for some other reason, but don't reserve the frame pointer just
because a function contains vector virtual registers.
Part of rdar://12719844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168627 91177308-0d34-0410-b5e6-96231b3b80d8
The stack realignment code was fixed to work when there is stack realignment and
a dynamic alloca is present so this shouldn't cause correctness issues anymore.
Note that this also enables generation of AVX instructions for memset
under the assumptions:
- Unaligned loads/stores are always fast on CPUs supporting AVX
- AVX is not slower than SSE
We may need some tweaked heuristics if one of those assumptions turns out not to
be true.
Effectively reverts r58317. Part of PR2962.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167967 91177308-0d34-0410-b5e6-96231b3b80d8
temporarily as it is breaking the gdb bots.
This reverts commit r167806/e7ff4c14b157746b3e0228d2dce9f70712d1c126.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167886 91177308-0d34-0410-b5e6-96231b3b80d8
chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://12684358
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167859 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a type 'int a[1]' and a type 'int b[0]', the generated DWARF is the
same for both of them because we use the 'upper_bound' attribute. Instead use
the 'count' attrbute, which gives the correct number of elements in the array.
<rdar://problem/12566646>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167806 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix operand order for atomic sub, where the minuend is the value
loaded from memory and the subtrahend is the parameter specified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167718 91177308-0d34-0410-b5e6-96231b3b80d8