1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133503 91177308-0d34-0410-b5e6-96231b3b80d8
ops.
This is a rewrite of the IV simplification algorithm used by
-disable-iv-rewrite. To avoid perturbing the default mode, I
temporarily split the driver and created SimplifyIVUsersNoRewrite. The
idea is to avoid doing opcode/pattern matching inside
IndVarSimplify. SCEV already does it. We want to optimize with the
full generality of SCEV, but optimize def-use chains top down on-demand rather
than rewriting the entire expression bottom-up. This was easy to do
for operations that SCEV can prove are identity function. So we're now
eliminating bitmasks and zero extends this way.
A result of this rewrite is that indvars -disable-iv-rewrite no longer
requires IVUsers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133502 91177308-0d34-0410-b5e6-96231b3b80d8
source vector type is to be split while the target vector is to be promoted.
(eg: <4 x i64> -> <4 x i8> )
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133424 91177308-0d34-0410-b5e6-96231b3b80d8
In cases such as the attached test, where the case value for a switch
destination is used in a phi node that follows the destination, it
might be better to replace that value with the condition value of the
switch, so that more blocks can be folded away with
TryToSimplifyUncondBranchFromEmptyBlock because there are less
conflicts in the phi node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133344 91177308-0d34-0410-b5e6-96231b3b80d8
type's bitwidth matches the (allocated) size of the alloca. This severely
pessimizes vector scalar replacement when the only vector type being used is
something like <3 x float> on x86 or ARM whose allocated size matches a
<4 x float>.
I hope to fix some of the flawed assumptions about allocated size throughout
scalar replacement and reenable this in most cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133338 91177308-0d34-0410-b5e6-96231b3b80d8
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133337 91177308-0d34-0410-b5e6-96231b3b80d8
range without a libcall to a new mulo<mode> libcall
that we'd have to create.
Finishes the rest of rdar://9090077 and rdar://9210061
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133318 91177308-0d34-0410-b5e6-96231b3b80d8
* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)
Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.
All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.
Patch by Dan Bailey
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133253 91177308-0d34-0410-b5e6-96231b3b80d8
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).
<rdar://problem/9624323>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133210 91177308-0d34-0410-b5e6-96231b3b80d8
REQUIRES: Asserts
REQUIRES: Debug
This required chaining test configuration properties. It seems like a
generally good thing to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133131 91177308-0d34-0410-b5e6-96231b3b80d8
accumulator forwarding. Specifically (from SVN log entry):
Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2
Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133127 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133080 91177308-0d34-0410-b5e6-96231b3b80d8
Note that this actually changes code generation, and someone who
understands this target better should check the changes.
- R12Q is now allocatable. I think it was omitted from the allocation
order by mistake since it isn't reserved. It as apparently used as a
GOT pointer sometimes, and it should probably be reserved if that is
the case.
- The GR64 registers are allocated in a different order now. The
register allocator will automatically put the CSRs last. There were
other changes to the order that may have been significant.
The test fix is because r0 and r1 swapped places in the allocation order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133067 91177308-0d34-0410-b5e6-96231b3b80d8
the bits being cleared by the AND are not demanded by the BFI.
The previous BFI dag combine rule was actually incorrect (or used to be
correct until BFI representation changed).
rdar://9609030
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133034 91177308-0d34-0410-b5e6-96231b3b80d8
converted to add x,x if x is a undef. add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133022 91177308-0d34-0410-b5e6-96231b3b80d8
types (with power of two types such as 8,16,32 .. 512).
Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).
Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132985 91177308-0d34-0410-b5e6-96231b3b80d8
cache prefetch and now that the info from "prefetch" to "ARMPreload" is present,
only add a testcase for PLI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132978 91177308-0d34-0410-b5e6-96231b3b80d8
sharp all or nothing transition when one extra predecessor was added. Now
we still test first ones for merging.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132974 91177308-0d34-0410-b5e6-96231b3b80d8
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132926 91177308-0d34-0410-b5e6-96231b3b80d8
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.
The attached test case provides an example where the fast allocator
would spill a register when other registers are available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132900 91177308-0d34-0410-b5e6-96231b3b80d8
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132882 91177308-0d34-0410-b5e6-96231b3b80d8
causing an assertion failure downstream. This fixes <rdar://problem/9562908>.
This really seems like it should always be set at CCState creation time, so mistakes like
this can never happen. I'll take a look at doing that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132811 91177308-0d34-0410-b5e6-96231b3b80d8
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132809 91177308-0d34-0410-b5e6-96231b3b80d8
pad, separating the exception and selector calls from the new lpad. Teaching
it not to do that, or to properly adjust the CFG afterwards, is out of
scope because it would require the other edges to the landing pad to be split
as well (effectively). Instead, just recover from the most likely cases
during inlining. The best long-term solution is to change the exception
representation and commit to either requiring or not requiring the more
complex edge-splitting logic; this is just a shorter-term hack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132799 91177308-0d34-0410-b5e6-96231b3b80d8
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132766 91177308-0d34-0410-b5e6-96231b3b80d8
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132759 91177308-0d34-0410-b5e6-96231b3b80d8
- cfi directives are not inserted at the right location or in the right order.
- The source MachineLocation for the cfi directive that changes the cfa register
to $fp should be MachineLocation::VirtualFP.
- A PROLOG_LABEL that marks the beginning of cfi_offset directives for
callee-saved register is emitted even when no callee-saved registers are
saved.
- When a callee-saved double precision register is saved, two cfi_offset
directives, one for each of the paired single precision registers, should be
emitted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132703 91177308-0d34-0410-b5e6-96231b3b80d8
When local live range splitting creates a live range with the same
number of instructions as the old range, mark it as RS_Local. When such
a range is seen again, require that it be split in a way that reduces
the number of instructions. That guarantees we are making progress while
still being able to perform 3 -> 2+3 splits as required by PR10070.
This also means that the PrevSlot map is no longer needed. This was also
used to estimate new spill weights, but that is no longer necessary
after slotIndexes::insertMachineInstrInMaps() got the extra Late
insertion argument.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132697 91177308-0d34-0410-b5e6-96231b3b80d8
then we don't want to set the destination in the indirect branch to the
destination. This is because the indirect branch needs its destinations to have
had their block addresses taken. This isn't so of the new critical edge that's
split during this process. If it turns out that the destination block has only
one predecessor, and that being a BB with an indirect branch, then it won't be
marked as 'used' and may be removed.
PR10072
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132638 91177308-0d34-0410-b5e6-96231b3b80d8
redundant with partially-aliasing loads.
When computing what portion of a clobbering load value is needed,
it doesn't consider phi-translation which may have occurred
between the clobbing load and the redundant load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132631 91177308-0d34-0410-b5e6-96231b3b80d8
A TableGen backend can define how certain classes can be expanded into
ordered sets of defs, typically by evaluating a specific field in the
record. The SetTheory class can then evaluate DAG expressions that refer
to these named sets.
A number of standard set and list operations are predefined, and the
backend can add more specialized operators if needed. The -print-sets
backend is used by SetTheory.td to provide examples.
This is intended to simplify how register classes are defined:
def GR32_NOSP : RegisterClass<"X86", [i32], 32, (sub GR32, ESP)>;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132621 91177308-0d34-0410-b5e6-96231b3b80d8
queries in the case of a DAG, where a query reaches a node
visited earlier, but it's not on a cycle. This avoids
MayAlias results in cases where BasicAA is expected to
return MustAlias or PartialAlias in order to protect TBAA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132609 91177308-0d34-0410-b5e6-96231b3b80d8
of reserved registers.
Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132581 91177308-0d34-0410-b5e6-96231b3b80d8
addressing mode problem mentioned in r132559.
Backend part of rdar://9037836 and part of rdar://9119939
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132561 91177308-0d34-0410-b5e6-96231b3b80d8
- Check for MTCTR8 in addition to MTCTR when looking up a hazard.
- When lowering an indirect call use CTR8 when targeting 64bit.
- Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND.
The last change fixes PR8487. With those changes, we are able to compile a
running "ls" and "sh" on FreeBSD/PowerPC64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132552 91177308-0d34-0410-b5e6-96231b3b80d8
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132535 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132483 91177308-0d34-0410-b5e6-96231b3b80d8
In the given testcase, the "Clobber" was pointing to a load, and GVN was incorrectly assuming that meant that the "Clobber" load overlapped the load being analyzed (when they are actually unrelated).
The included testcase tests both this commit and r132434.
Part two of rdar://9429882. (r132434 was mislabeled.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132442 91177308-0d34-0410-b5e6-96231b3b80d8
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs. Only profitable when the user wants a materialized 0
or 1 at runtime. rdar://problem/5993888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132404 91177308-0d34-0410-b5e6-96231b3b80d8
patch we add a flag to enable a new type legalization decision - to promote
integer elements in vectors. Currently, the rest of the codegen does not support
this kind of legalization. This flag will be removed when the transition is
complete.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132394 91177308-0d34-0410-b5e6-96231b3b80d8
must be encoded decremented by one. Only add encoding tests for ssat16
because ssat can't be parsed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132324 91177308-0d34-0410-b5e6-96231b3b80d8
nand), atomic.swap and atomic.cmp.swap, all in i8, i16 and i32 versions.
The intrinsics are implemented by creating pseudo-instructions, which are
then expanded in the method MipsTargetLowering::EmitInstrWithCustomInserter.
Patch by Sasa Stankovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132323 91177308-0d34-0410-b5e6-96231b3b80d8
This only affects targets like Mips where branch instructions may kill virtual
registers. Most other targets branch on flag values, so virtual registers are
not involved.
The problem is that MachineBasicBlock::updateTerminator deletes branches and
inserts new ones while LiveVariables keeps a list of pointers to instructions
that kill virtual registers. That list wasn't properly updated in
MBB::SplitCriticalEdge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132298 91177308-0d34-0410-b5e6-96231b3b80d8
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).
Take 2, now with more basic competence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132295 91177308-0d34-0410-b5e6-96231b3b80d8
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132291 91177308-0d34-0410-b5e6-96231b3b80d8