my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.
This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147248 91177308-0d34-0410-b5e6-96231b3b80d8
'bsf' instructions here.
This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147246 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147244 91177308-0d34-0410-b5e6-96231b3b80d8
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146974 91177308-0d34-0410-b5e6-96231b3b80d8
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146370 91177308-0d34-0410-b5e6-96231b3b80d8
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146150 91177308-0d34-0410-b5e6-96231b3b80d8
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit. This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit. This should result in lesser calls to __morestack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145766 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp. This isn't correct since __morestack expects the call
to be followed directly by a ret.
This commit also adjusts the relevant test-case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145765 91177308-0d34-0410-b5e6-96231b3b80d8
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.
This also makes the AVX variants redundant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145440 91177308-0d34-0410-b5e6-96231b3b80d8
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145300 91177308-0d34-0410-b5e6-96231b3b80d8
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145273 91177308-0d34-0410-b5e6-96231b3b80d8
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.
The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.
This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145183 91177308-0d34-0410-b5e6-96231b3b80d8
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.
The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145179 91177308-0d34-0410-b5e6-96231b3b80d8
trampoline forms. Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145174 91177308-0d34-0410-b5e6-96231b3b80d8
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.
This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145158 91177308-0d34-0410-b5e6-96231b3b80d8
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145141 91177308-0d34-0410-b5e6-96231b3b80d8
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.
Patch by Jan Sjodin
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145133 91177308-0d34-0410-b5e6-96231b3b80d8
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.
Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145120 91177308-0d34-0410-b5e6-96231b3b80d8
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.
I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145119 91177308-0d34-0410-b5e6-96231b3b80d8
Before:
movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
testq %rax, %rdi ## encoding: [0x48,0x85,0xf8]
jne LBB0_2 ## encoding: [0x75,A]
After:
btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
jb LBB0_2 ## encoding: [0x72,A]
btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145103 91177308-0d34-0410-b5e6-96231b3b80d8
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.
This was found during a bootstrap with block placement turned on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145100 91177308-0d34-0410-b5e6-96231b3b80d8
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.
The patch was reviewed by Bruno.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145099 91177308-0d34-0410-b5e6-96231b3b80d8
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145098 91177308-0d34-0410-b5e6-96231b3b80d8
This was a bug in keeping track of the available domains when merging
domain values.
The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.
Also add an assertion to catch future attempts at emitting AVX2
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145096 91177308-0d34-0410-b5e6-96231b3b80d8
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.
Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145094 91177308-0d34-0410-b5e6-96231b3b80d8
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.
This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.
This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145062 91177308-0d34-0410-b5e6-96231b3b80d8
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.
The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.
This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145010 91177308-0d34-0410-b5e6-96231b3b80d8
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.
Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145006 91177308-0d34-0410-b5e6-96231b3b80d8
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.
Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.
Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144994 91177308-0d34-0410-b5e6-96231b3b80d8
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).
Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144674 91177308-0d34-0410-b5e6-96231b3b80d8
These tests are actually correct, clang was miscompiling ExeDepsFix::processUses.
Evan fixed the miscompilation in r144628.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144630 91177308-0d34-0410-b5e6-96231b3b80d8
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.
The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144627 91177308-0d34-0410-b5e6-96231b3b80d8
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.
The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.
The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144602 91177308-0d34-0410-b5e6-96231b3b80d8
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144559 91177308-0d34-0410-b5e6-96231b3b80d8
Constant idx case is still done in tablegen but other cases are then expanded
Fixes <rdar://problem/10435460>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144557 91177308-0d34-0410-b5e6-96231b3b80d8
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.
I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.
The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144526 91177308-0d34-0410-b5e6-96231b3b80d8
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.
The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.
Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144516 91177308-0d34-0410-b5e6-96231b3b80d8
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.
The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.
The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.
I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144495 91177308-0d34-0410-b5e6-96231b3b80d8
It was off by default.
The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144481 91177308-0d34-0410-b5e6-96231b3b80d8
This test was committed with a bugfix to RemoveCopyByCommutingDef, but
that optimization is no longer triggered by this test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144470 91177308-0d34-0410-b5e6-96231b3b80d8
This test doesn't expose the issue with RAGreedy.
I filed PR11363 to track the missing InlineSpiller feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144459 91177308-0d34-0410-b5e6-96231b3b80d8
The test is checking that the output doesn't contains any 'mov '
strings. It does contain movl, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144458 91177308-0d34-0410-b5e6-96231b3b80d8
instruction lower optimization" in the pre-RA scheduler.
The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.
Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144267 91177308-0d34-0410-b5e6-96231b3b80d8
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction. This is not what we want when those implicit
operands refer to the register being spilled.
Implicit operands referring to other registers are preserved.
This fixes PR11347.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144247 91177308-0d34-0410-b5e6-96231b3b80d8
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages). Original commit message:
Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144188 91177308-0d34-0410-b5e6-96231b3b80d8
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.
After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.
This will properly fix execution domains on software pipelined code,
like the included test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144151 91177308-0d34-0410-b5e6-96231b3b80d8
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144124 91177308-0d34-0410-b5e6-96231b3b80d8
DomainValues that are only used by "don't care" instructions are now
collapsed to the first possible execution domain after all basic blocks
have been processed. This typically means the PS domain on x86.
For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are
completely collapsed to the PS domain instead of containing a mix of
execution domains created by isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144037 91177308-0d34-0410-b5e6-96231b3b80d8
The xorps instruction is smaller than pxor, so prefer that encoding.
The ExecutionDepsFix pass will switch the encoding to pxor and xorpd
when appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143996 91177308-0d34-0410-b5e6-96231b3b80d8
I'm going to wait for any review comments and perform some additional testing before turning this on by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143750 91177308-0d34-0410-b5e6-96231b3b80d8
On x86: (shl V, 1) -> add V,V
Hardware support for vector-shift is sparse and in many cases we scalarize the
result. Additionally, on sandybridge padd is faster than shl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143311 91177308-0d34-0410-b5e6-96231b3b80d8
If all of the inputs are zero/any_extended, create a new simple BV
which can be further optimized by other BV optimizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143297 91177308-0d34-0410-b5e6-96231b3b80d8
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143206 91177308-0d34-0410-b5e6-96231b3b80d8
Don't assume APInt::getRawData() would hold target-aware endianness nor host-compliant endianness. rawdata[0] holds most lower i64, even on big endian host.
FIXME: Add a testcase for big endian target.
FIXME: Ditto on CompileUnit::addConstantFPValue() ?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143194 91177308-0d34-0410-b5e6-96231b3b80d8
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
Delete #if 0 code accidentally left in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143188 91177308-0d34-0410-b5e6-96231b3b80d8
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143177 91177308-0d34-0410-b5e6-96231b3b80d8
MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively. Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV). It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143062 91177308-0d34-0410-b5e6-96231b3b80d8
discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.
The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.
As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.
Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.
The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142743 91177308-0d34-0410-b5e6-96231b3b80d8
SHL inserts zeros from the right, thus even when the original
sign_extend_inreg value was of 1-bit, we need to sra.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142724 91177308-0d34-0410-b5e6-96231b3b80d8
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.
SetCC return type needs to be legalized via PromoteTargetBoolean.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142660 91177308-0d34-0410-b5e6-96231b3b80d8
it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.
This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.
Test is updated to include both basic loop alignment and nested loop
alignment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142645 91177308-0d34-0410-b5e6-96231b3b80d8
canonical example I used when developing it, and is one of the primary
motivating real-world use cases for __builtin_expect (when burried under
a macro).
I'm working on more test cases here, but I'm trying to make sure both
that the pass is doing the right thing with the test cases and that they
aren't too brittle to changes elsewhere in the code generation pipeline.
Feedback and/or suggestions on how to test this are very welcome.
Especially feedback on whether testing the block comments is a good
strategy; I couldn't find any good examples to steal from but all the
other ideas I had were a lot uglier or more fragile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142644 91177308-0d34-0410-b5e6-96231b3b80d8
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142542 91177308-0d34-0410-b5e6-96231b3b80d8