This
define float @foo(float %x, float %y) nounwind readnone {
entry:
%0 = tail call float @copysignf(float %x, float %y) nounwind readnone
ret float %0
}
Was compiled to:
vmov s0, r1
bic r0, r0, #-2147483648
vmov s1, r0
vcmpe.f32 s0, #0
vmrs apsr_nzcv, fpscr
it lt
vneglt.f32 s1, s1
vmov r0, s1
bx lr
This fails to copy the sign of -0.0f because it's lost during the float to int
conversion. Also, it's sub-optimal when the inputs are in GPR registers.
Now it uses integer and + or operations when it's profitable. And it's correct!
lsrs r1, r1, #31
bfi r0, r1, #31, #1
bx lr
rdar://8984306
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125357 91177308-0d34-0410-b5e6-96231b3b80d8
gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125271 91177308-0d34-0410-b5e6-96231b3b80d8
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125267 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125266 91177308-0d34-0410-b5e6-96231b3b80d8
When matching operands for a candidate opcode match in the auto-generated
AsmMatcher, check each operand against the expected operand match class.
Previously, operands were classified independently of the opcode being
handled, which led to difficulties when operand match classes were
more complicated than simple subclass relationships.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125245 91177308-0d34-0410-b5e6-96231b3b80d8
could end up removing a different function than we intended because it was
functionally equivalent, then end up with a comparison of a function against
itself in the next round of comparisons (the one in the function set and the
one on the deferred list). To fix this, I introduce a choice in the form of
comparison for ComparableFunctions, either normal or "pointer only" used to
find exact Function*'s in lookups.
Also add some debugging statements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125180 91177308-0d34-0410-b5e6-96231b3b80d8
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125014 91177308-0d34-0410-b5e6-96231b3b80d8
failures with relocations.
The code committed is a first cut at compatibility for emitted relocations in
ELF .o.
Why do this? because existing ARM tools like emitting relocs symbols as
explicit relocations, not as section-offset relocs.
Result is that with these changes,
1) relocs are now substantially identical what to gcc outputs.
2) larger apps (including many spec2k tests) compile, cross-link, and pass
Added reminder fixme to tests for future conversion to .s form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124996 91177308-0d34-0410-b5e6-96231b3b80d8
Unified EmitTextAttribute for both Asm and Obj emission (.cpu only)
Added necessary cortex-A8 related attrs for codegen compat tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124995 91177308-0d34-0410-b5e6-96231b3b80d8
(yes, this is different from R_ARM_CALL)
- Adds a new method getARMBranchTargetOpValue() which handles the
necessary distinction between the conditional and unconditional br/bl
needed for ARM/ELF
At least for ARM mode, the needed fixup for conditional versus unconditional
br/bl is identical, but the ARM docs and existing ARM tools expect this
reloc type...
Added a few FIXME's for future naming fixups in ARMInstrInfo.td
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124895 91177308-0d34-0410-b5e6-96231b3b80d8
auto-simplifier). This has a big impact on Ada code, but not much else.
Unfortunately the impact is mostly negative! This is due to PR9004 (aka
SCCP failing to resolve conditional branch conditions in the destination
blocks of the branch), in which simple correlated expressions are not
resolved but complicated ones are, so simplifying has a bad effect!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124788 91177308-0d34-0410-b5e6-96231b3b80d8