fsub X, +0 ==> X
fsub X, -0 ==> X, when we know X is not -0
fsub +/-0.0, (fsub -0.0, X) ==> X
fsub nsz +/-0.0, (fsub +/-0.0, X) ==> X
fsub nnan ninf X, X ==> 0.0
fadd nsz X, 0 ==> X
fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
where nnan and ninf have to occur at least once somewhere in this expression
fmul X, 1.0 ==> X
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169940 91177308-0d34-0410-b5e6-96231b3b80d8
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:
Instruction Relocation Symbol
addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x
addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
<use address in r3>
The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation. This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr. Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation. So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.
Most of the code is pretty straightforward. I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call. Something in the
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations. This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().
Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.
Comments welcome!
Thanks,
Bill
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169910 91177308-0d34-0410-b5e6-96231b3b80d8
try to reduce the width of this load, and would end up transforming:
(truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
(truncate (zextload i32 <ptr+4> as i64) to i32)
We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:
movswl 6(...),%eax
Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:
movl 6(...), %eax
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169802 91177308-0d34-0410-b5e6-96231b3b80d8
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169796 91177308-0d34-0410-b5e6-96231b3b80d8
controls each of the abbreviation sets (only a single one at the
moment) and computes offsets separately as well for each set
of DIEs.
No real function change, ordering of abbreviations for the skeleton
CU changed but only because we're computing in a separate order. Fix
the testcase not to care.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169793 91177308-0d34-0410-b5e6-96231b3b80d8
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8
Analyse Phis under the starting assumption that they are NoAlias. Recursively
look at their inputs.
If they MayAlias/MustAlias there must be an input that makes them so.
Addresses bug 14351.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169788 91177308-0d34-0410-b5e6-96231b3b80d8
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.
This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.
Andy reviewed, tested and simplified this patch; Thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169744 91177308-0d34-0410-b5e6-96231b3b80d8
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.
In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.
The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169719 91177308-0d34-0410-b5e6-96231b3b80d8
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.
Reviewed by: Nadav
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169711 91177308-0d34-0410-b5e6-96231b3b80d8
The limit seems to break newer pythons (see PR13598) so just drop it for now.
Eventually lit should learn to set limits for its children instead of a global
limit in the makefile.
If some PPC bots fail after this change: That's a good thing, they actually run
clang tests now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169695 91177308-0d34-0410-b5e6-96231b3b80d8
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169683 91177308-0d34-0410-b5e6-96231b3b80d8
the VSRI instruction before it since it does not affect the MSB.
Thanks Craig Topper for suggesting this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169638 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169609 91177308-0d34-0410-b5e6-96231b3b80d8
Buildbots for some hosts may choose to build only their own backend in order to
maximise testing-turnaround time. Move the test into a prefixed directory so
lit's standard "backend specific" suppression can be done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169604 91177308-0d34-0410-b5e6-96231b3b80d8
by virtue of inbounds GEPs that preclude a null pointer.
This is a very common pattern in the code generated by std::vector and
other standard library routines which use allocators that test for null
pervasively. This is one step closer to teaching Clang+LLVM to be able
to produce an empty function for:
void f() {
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
}
Which is related to getting them to completely fold SmallVector
push_back sequences into constants when inlining and other optimizations
make that a possibility.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169573 91177308-0d34-0410-b5e6-96231b3b80d8
check if loads that happen in between stores alias with the first store in the
chain, only with the second store onwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169516 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of unconditionally storing origin with every application store,
only do this when the shadow of the stored value is != 0.
This change also delays instrumentation of stores until after the walk over
function's instructions, because adding new basic blocks confuses InstVisitor.
We only keep 1 origin value per 4 bytes of application memory. This change
fixes the bug when a store of a single clean byte wiped the origin for the
whole 4-byte area.
Since stores of uninitialized values are relatively uncommon, this change
improves performance of track-origins mode by 5% median and by up to 47% on
specs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169490 91177308-0d34-0410-b5e6-96231b3b80d8
Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169484 91177308-0d34-0410-b5e6-96231b3b80d8
RUN: a
RUN: b || true
as "a && (b || true)" in Tcl mode, and as "(a && b) || true" in sh mode.
Everyone seems to (quite reasonably) write tests assuming the Tcl behavior,
so use that in sh mode too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169441 91177308-0d34-0410-b5e6-96231b3b80d8
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169425 91177308-0d34-0410-b5e6-96231b3b80d8