generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146026 91177308-0d34-0410-b5e6-96231b3b80d8
The block offset can be computed from the previous block. That is more
robust than keeping track of a delta.
Eliminate one redundant AdjustBBOffsetsAfter call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146018 91177308-0d34-0410-b5e6-96231b3b80d8
Data type suffix aliasing. Previously handled via lots of instruction
aliases. Cleanup of those forthcoming.
rdar://10435076
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146007 91177308-0d34-0410-b5e6-96231b3b80d8
This flag is used when bundling machine instructions. It indicates
whether the operand reads a value defined inside or outside its bundle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145997 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change yet. Will be implementing range-checked immediates
for better diagnostics and disambiguation of instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145994 91177308-0d34-0410-b5e6-96231b3b80d8
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145975 91177308-0d34-0410-b5e6-96231b3b80d8
This pseudo-instruction contains a .align directive in its expansion, so
the total size may vary by 2 bytes.
It is too difficult to accurately keep track of this alignment
directive, just use the worst-case size instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145971 91177308-0d34-0410-b5e6-96231b3b80d8
ARMConstantIslandPass may sometimes leave empty constant islands behind
(it really shouldn't). Remove the alignment from the empty islands so
the size calculations are still correct.
This should fix the many Thumb1 assembler errors in the nightly test
suite.
The reduced test case for this problem is way too big. That is to be
expected for ARMConstantIslandPass bugs.
<rdar://problem/10534709>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145970 91177308-0d34-0410-b5e6-96231b3b80d8
- Walking over pred_begin/pred_end is an expensive operation.
- PHINodes contain a value for each predecessor anyway.
- While it may look like we used to save a few iterations with the set,
be aware that getIncomingValueForBlock does a linear search on
the values of the phi node.
- Another -5% on ARMDisassembler.cpp (Release build). This was the last
entry in the profile that was obviously wasting time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145937 91177308-0d34-0410-b5e6-96231b3b80d8
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145906 91177308-0d34-0410-b5e6-96231b3b80d8
The new register allocator is much more able to split back up ranges too constrained by register classes.
Fixes <rdar://problem/10466609>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145899 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, all ARM::CONSTPOOL_ENTRY instructions had a hardwired
alignment of 4 bytes emitted by ARMAsmPrinter. Now the same alignment
is set on the basic block.
This is in preparation of supporting ARM constant pool islands with
different alignments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145890 91177308-0d34-0410-b5e6-96231b3b80d8
This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly
documented as taking log2(bytes) units, but the x86 target would still
set a preferred loop alignment of '16'.
CodePlacementOpt passed this number on to the basic block, and
AsmPrinter interpreted it as bytes.
Now both MachineFunction and MachineBasicBlock use logarithmic
alignments.
Obviously, MachineConstantPool still measures alignments in bytes, so we
can emulate the thrill of using as.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145889 91177308-0d34-0410-b5e6-96231b3b80d8
Whether a fixup needs relaxation for the associated instruction is a
target-specific function, as the FIXME indicated. Create a hook for that
and use it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145881 91177308-0d34-0410-b5e6-96231b3b80d8
don't do this now, but add a test case to prevent this from happening in the
future.
Additional test for rdar://9892684
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145879 91177308-0d34-0410-b5e6-96231b3b80d8
Not right yet, as the rules for when to relax in the MCAssembler aren't
(yet) correct for ARM. This is a step in the proper direction, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145871 91177308-0d34-0410-b5e6-96231b3b80d8
memory fences) in statistics registration, which works the same way that
ManagedStatic registration does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145869 91177308-0d34-0410-b5e6-96231b3b80d8
where this would be bad as the backend shouldn't have a problem inlining small
memcpys.
rdar://10510150
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145865 91177308-0d34-0410-b5e6-96231b3b80d8
Combined destination and first source operand for f32 variant of the VMUL
(by scalar) instruction.
rdar://10522016
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145842 91177308-0d34-0410-b5e6-96231b3b80d8
- Calling getUser in a loop is much more expensive than iterating over a few instructions.
- Use it instead of the open-coded loop in AddrModeMatcher.
- 5% speedup on ARMDisassembler.cpp Release builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145810 91177308-0d34-0410-b5e6-96231b3b80d8
-15% on ARMDisassembler.cpp (Release build). It's not that great to add another
layer of caching to the caching-heavy LVI but I don't see a better way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145770 91177308-0d34-0410-b5e6-96231b3b80d8
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit. This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit. This should result in lesser calls to __morestack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145766 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp. This isn't correct since __morestack expects the call
to be followed directly by a ret.
This commit also adjusts the relevant test-case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145765 91177308-0d34-0410-b5e6-96231b3b80d8
the same value) to this variable. This code could be refactored, but it doesn't
matter since the old JIT is going away. Add tsan annotations to ignore the
race.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145745 91177308-0d34-0410-b5e6-96231b3b80d8
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145714 91177308-0d34-0410-b5e6-96231b3b80d8
argument value type. Otherwise, the sign/zero-extend has no effect on arguments
passed via the stack (i.e., undefined high-order bits).
rdar://10515467
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145701 91177308-0d34-0410-b5e6-96231b3b80d8
The callee is usually smaller than the caller, too. This reduces the compile
time of ARMDisassembler.cpp by 32% (Release build). It still takes ages to
compile though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145690 91177308-0d34-0410-b5e6-96231b3b80d8
The alias pseudos need cleaned up for size suffix handling, but this gets
the basics working. Will be cleaning up and adding more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145655 91177308-0d34-0410-b5e6-96231b3b80d8
It was getting ignored after r144788.
Also fix an accidental implicit cast from the OptLevel enum
to an optional bool argument. MSVC warned on this, but gcc
didn't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145633 91177308-0d34-0410-b5e6-96231b3b80d8
While at it remove the barcelona/instanbul/shanghai subtargets, they're
unsupported by GCC and look pretty broken.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145494 91177308-0d34-0410-b5e6-96231b3b80d8
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.
This also makes the AVX variants redundant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145440 91177308-0d34-0410-b5e6-96231b3b80d8
weak variable are compiled by different compilers, such as GCC and LLVM, while
LLVM may increase the alignment to the preferred alignment there is no reason to
think that GCC will use anything more than the ABI alignment. Since it is the
GCC version that might end up in the final program (as the linkage is weak), it
is wrong to increase the alignment of loads from the global up to the preferred
alignment as the alignment might only be the ABI alignment.
Increasing alignment up to the ABI alignment might be OK, but I'm not totally
convinced that it is. It seems better to just leave the alignment of weak
globals alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145413 91177308-0d34-0410-b5e6-96231b3b80d8
as MC is the only assembler we support.
This splits MS/Windows and GNU/Windows ASM infos into two seperate classes.
While there is currently only one difference, full MS C++ ABI support will
require many more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145409 91177308-0d34-0410-b5e6-96231b3b80d8
clang/lib/Driver/Driver.cpp: Don't pass through negative exit status, or parent would be confused.
llvm::sys::Program::Wait(): Suppose 0x8000XXXX and 0xC000XXXX as abnormal exit code and pass it as negative value.
Win32 Exception Handler: Exit with ExceptionCode on an unhandle exception.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145389 91177308-0d34-0410-b5e6-96231b3b80d8
non_lazy_symbol_pointers section (__IMPORT,__pointers). Ignore the 'hidden' part
since that will place it in the wrong section.
<rdar://problem/10443720>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145356 91177308-0d34-0410-b5e6-96231b3b80d8
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145300 91177308-0d34-0410-b5e6-96231b3b80d8
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145273 91177308-0d34-0410-b5e6-96231b3b80d8
uninitialized: GCC doesn't understand that the variables are only used
if !UseImm, in which case they have been initialized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145239 91177308-0d34-0410-b5e6-96231b3b80d8
Now that it needs to be exported in a public header (Valgrind.h)
it should be prefixed to avoid collision with other projects.
Add it to llvm-config.h as well.
This'll require regenerating the configure script after this
commit, but I don't have the required autoconf version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145214 91177308-0d34-0410-b5e6-96231b3b80d8
gcc, though I thought it was older (my gcc 4.4 has it as a local patch. Whoops!)
This fixes PR10589.
Also add some debugging statements.
Remove GcnoFiles, the mapping from CompilationUnit to raw_ostream. Now that we
start by iterating over each CU and descending into them, there's no need to
maintain a mapping.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145208 91177308-0d34-0410-b5e6-96231b3b80d8
fallthrough) in cases where we might fail to rotate an exit to an outer
loop onto the end of the loop chain.
Having *some* rotation, but not performing this rotation, is the primary
fix of thep performance regression with -enable-block-placement for
Olden/em3d (a whopping 30% regression). Still working on reducing the
test case that actually exercises this and the new rotation strategy out
of this code, but I want to check if this regresses other test cases
first as that may indicate it isn't the correct fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145195 91177308-0d34-0410-b5e6-96231b3b80d8
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.
The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.
This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145183 91177308-0d34-0410-b5e6-96231b3b80d8
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.
The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145179 91177308-0d34-0410-b5e6-96231b3b80d8
trampoline forms. Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145174 91177308-0d34-0410-b5e6-96231b3b80d8
I think this is the last of autoupgrade that can be removed in 3.1.
Can the atomic upgrade stuff also go?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145169 91177308-0d34-0410-b5e6-96231b3b80d8
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.
This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145158 91177308-0d34-0410-b5e6-96231b3b80d8
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145141 91177308-0d34-0410-b5e6-96231b3b80d8
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.
Patch by Jan Sjodin
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145133 91177308-0d34-0410-b5e6-96231b3b80d8
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.
Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145120 91177308-0d34-0410-b5e6-96231b3b80d8
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.
I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145119 91177308-0d34-0410-b5e6-96231b3b80d8
- lower unaligned loads/stores.
- encode the size operand of instructions INS and EXT.
- emit relocation information needed for JAL (jump-and-link).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145113 91177308-0d34-0410-b5e6-96231b3b80d8
and positive: positive, because it could be directly computed to be positive;
negative, because the nsw flags means it is either negative or undefined (the
multiplication always overflowed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145104 91177308-0d34-0410-b5e6-96231b3b80d8
Before:
movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
testq %rax, %rdi ## encoding: [0x48,0x85,0xf8]
jne LBB0_2 ## encoding: [0x75,A]
After:
btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
jb LBB0_2 ## encoding: [0x72,A]
btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145103 91177308-0d34-0410-b5e6-96231b3b80d8
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.
This was found during a bootstrap with block placement turned on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145100 91177308-0d34-0410-b5e6-96231b3b80d8
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.
The patch was reviewed by Bruno.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145099 91177308-0d34-0410-b5e6-96231b3b80d8
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145098 91177308-0d34-0410-b5e6-96231b3b80d8
This was a bug in keeping track of the available domains when merging
domain values.
The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.
Also add an assertion to catch future attempts at emitting AVX2
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145096 91177308-0d34-0410-b5e6-96231b3b80d8
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.
Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145094 91177308-0d34-0410-b5e6-96231b3b80d8
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.
This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.
This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145062 91177308-0d34-0410-b5e6-96231b3b80d8
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145060 91177308-0d34-0410-b5e6-96231b3b80d8
This was put in because in a certain version of DragonFlyBSD stat(2) lied about the
size of some files. This was fixed a long time ago so we can remove the workaround.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145059 91177308-0d34-0410-b5e6-96231b3b80d8
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.
The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.
This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145010 91177308-0d34-0410-b5e6-96231b3b80d8
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.
Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.
Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144994 91177308-0d34-0410-b5e6-96231b3b80d8
The loop tree's inclusive block lists are painful and expensive to
update. (I have no idea why they're inclusive). The design was
supposed to handle this case but the implementation missed it and my
unit tests weren't thorough enough.
Fixes PR11335: loop unroll update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144970 91177308-0d34-0410-b5e6-96231b3b80d8
The right way to check for a binary operation is
cast<BinaryOperator>. The original check: cast<Instruction> &&
numOperands() == 2 would match phi "instructions", leading to an
infinite loop in extreme corner case: a useless phi with operands
[self, constant] that prior optimization passes failed to remove,
being used in the loop by another useless phi, in turn being used by an
lshr or udiv.
Fixes PR11350: runaway iteration assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144935 91177308-0d34-0410-b5e6-96231b3b80d8
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD. Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to
coalesce ADDs.
rdar://10412592
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144886 91177308-0d34-0410-b5e6-96231b3b80d8