* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)
Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.
All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.
Patch by Dan Bailey
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133253 91177308-0d34-0410-b5e6-96231b3b80d8
The reserved R14-R15 are always saved in the prolog, and using CSRs
starting from R13 allows them to be saved in one instruction.
Thanks to Anton for explaining this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133233 91177308-0d34-0410-b5e6-96231b3b80d8
Also switch the return type to ArrayRef<unsigned> which works out nicely
for ARM's implementation of this function because of the clever ArrayRef
constructors.
The name change indicates that the returned allocation order may contain
reserved registers as has been the case for a while.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133216 91177308-0d34-0410-b5e6-96231b3b80d8
This is intended to support using REG_SEQUENCE SDNode's with type MVT::untyped, and is part of the long road to eliminating some of the hacks we currently use to support register pairs and other strange constraints, particularly on ARM NEON.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133178 91177308-0d34-0410-b5e6-96231b3b80d8
accumulator forwarding. Specifically (from SVN log entry):
Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2
Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133127 91177308-0d34-0410-b5e6-96231b3b80d8
This simplifies many of the target description files since it is common
for register classes to be related or contain sequences of numbered
registers.
I have verified that this doesn't change the files generated by TableGen
for ARM and X86. It alters the allocation order of MBlaze GPR and Mips
FGR32 registers, but I believe the change is benign.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133105 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133080 91177308-0d34-0410-b5e6-96231b3b80d8
Note that this actually changes code generation, and someone who
understands this target better should check the changes.
- R12Q is now allocatable. I think it was omitted from the allocation
order by mistake since it isn't reserved. It as apparently used as a
GOT pointer sometimes, and it should probably be reserved if that is
the case.
- The GR64 registers are allocated in a different order now. The
register allocator will automatically put the CSRs last. There were
other changes to the order that may have been significant.
The test fix is because r0 and r1 swapped places in the allocation order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133067 91177308-0d34-0410-b5e6-96231b3b80d8
At the time I wrote this code (circa 2007), TargetRegisterInfo was using a std::set to perform these queries. Switching to the static hashtables was an obvious improvement, but in reality there's no reason to do anything other than scan.
With this change, total LLC time on a whole-program 403.gcc is reduced by approximately 1.5%, almost all of which comes from a 15% reduction in LiveVariables time. It also reduces the binary size of LLC by 86KB, thanks to eliminating a bunch of very large static tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133051 91177308-0d34-0410-b5e6-96231b3b80d8
the bits being cleared by the AND are not demanded by the BFI.
The previous BFI dag combine rule was actually incorrect (or used to be
correct until BFI representation changed).
rdar://9609030
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133034 91177308-0d34-0410-b5e6-96231b3b80d8
Roman, since you're writing tests for other PPC-SVR4 vararg-related stuff, would you mind writing a test for this?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133018 91177308-0d34-0410-b5e6-96231b3b80d8
The logic for reserving R4 for use as a scratch needs to match that for
actually using it. Also, it's not necessary for immediate <=508, so adjust
the value checked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132934 91177308-0d34-0410-b5e6-96231b3b80d8
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132882 91177308-0d34-0410-b5e6-96231b3b80d8
functionality change.
Later on, we'll use the flag to emit SEH pseudo-ops that describe how the
call frame was built.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132880 91177308-0d34-0410-b5e6-96231b3b80d8
memcpy/memset symbol doesn't get marked up correctly in PIC modes otherwise.
Should fix llvm-x86_64-linux-checks buildbot. Followup to r132864.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132869 91177308-0d34-0410-b5e6-96231b3b80d8
causing an assertion failure downstream. This fixes <rdar://problem/9562908>.
This really seems like it should always be set at CCState creation time, so mistakes like
this can never happen. I'll take a look at doing that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132811 91177308-0d34-0410-b5e6-96231b3b80d8
VK_PPC_{HA,LO}16 into darwin and gas variants.
Darwin wants {ha,lo}16(symbol) while gnu as wants symbol@{ha,l}.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132802 91177308-0d34-0410-b5e6-96231b3b80d8
The register allocators automatically filter out reserved registers and
place the callee saved registers last in the allocation order, so custom
methods are no longer necessary just for that.
Some targets still use custom allocation orders:
ARM/Thumb: The high registers are removed from GPR in thumb mode. The
NEON allocation orders prefer to use non-VFP2 registers first.
X86: The GR8 classes omit AH-DH in x86-64 mode to avoid REX trouble.
SystemZ: Some of the allocation orders are omitting R12 aliases without
explanation. I don't understand this target well enough to fix that. It
looks like all the boilerplate could be removed by reserving the right
registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132781 91177308-0d34-0410-b5e6-96231b3b80d8