1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133503 91177308-0d34-0410-b5e6-96231b3b80d8
source vector type is to be split while the target vector is to be promoted.
(eg: <4 x i64> -> <4 x i8> )
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133424 91177308-0d34-0410-b5e6-96231b3b80d8
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133337 91177308-0d34-0410-b5e6-96231b3b80d8
range without a libcall to a new mulo<mode> libcall
that we'd have to create.
Finishes the rest of rdar://9090077 and rdar://9210061
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133318 91177308-0d34-0410-b5e6-96231b3b80d8
* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)
Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.
All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.
Patch by Dan Bailey
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133253 91177308-0d34-0410-b5e6-96231b3b80d8
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).
<rdar://problem/9624323>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133210 91177308-0d34-0410-b5e6-96231b3b80d8
accumulator forwarding. Specifically (from SVN log entry):
Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2
Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133127 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133080 91177308-0d34-0410-b5e6-96231b3b80d8
Note that this actually changes code generation, and someone who
understands this target better should check the changes.
- R12Q is now allocatable. I think it was omitted from the allocation
order by mistake since it isn't reserved. It as apparently used as a
GOT pointer sometimes, and it should probably be reserved if that is
the case.
- The GR64 registers are allocated in a different order now. The
register allocator will automatically put the CSRs last. There were
other changes to the order that may have been significant.
The test fix is because r0 and r1 swapped places in the allocation order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133067 91177308-0d34-0410-b5e6-96231b3b80d8