version uses a new algorithm for evaluating the binomial coefficients
which is significantly more efficient for AddRecs of more than 2 terms
(see the comments in the code for details on how the algorithm works).
It also fixes some bugs: it removes the arbitrary length restriction for
AddRecs, it fixes the silent generation of incorrect code for AddRecs
which require a wide calculation width, and it fixes an issue where we
were incorrectly truncating the iteration count too far when evaluating
an AddRec expression narrower than the induction variable.
There are still a few related issues I know of: I think there's
still an issue with the SCEVExpander expansion of AddRec in terms of
the width of the induction variable used. The hack to avoid generating
too-wide integers shouldn't be necessary; instead, the callers should be
considering the cost of the expansion before expanding it (in addition
to not expanding too-wide integers, we might not want to expand
expressions that are really expensive, especially when optimizing for
size; calculating an length-17 32-bit AddRec currently generates about 250
instructions of straight-line code on X86). Also, for long 32-bit
AddRecs on X86, CodeGen really sucks at scheduling the code. I'm planning on
filing follow-up PRs for these issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54332 91177308-0d34-0410-b5e6-96231b3b80d8
subreg form on x86-64, to avoid the problem with x86-32
having GPRs that don't have 8-bit subregs.
Also, change several 16-bit instructions to use
equivalent 32-bit instructions. These have a smaller
encoding and avoid partial-register updates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54223 91177308-0d34-0410-b5e6-96231b3b80d8
to different address spaces. This alters the naming scheme for those
intrinsics, e.g., atomic.load.add.i32 => atomic.load.add.i32.p0i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54195 91177308-0d34-0410-b5e6-96231b3b80d8
time applying to the implicit comparison in smin expressions. The
correct way to transform an inequality into the opposite
inequality, either signed or unsigned, is with a not expression.
I looked through the SCEV code, and I don't think there are any more
occurrences of this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54194 91177308-0d34-0410-b5e6-96231b3b80d8
SGT exit condition. Essentially, the correct way to flip an inequality
in 2's complement is the not operator, not the negation operator.
That said, the difference only affects cases involving INT_MIN.
Also, enhance the pre-test search logic to be a bit smarter about
inequalities flipped with a not operator, so it can eliminate the smax
from the iteration count for simple loops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54184 91177308-0d34-0410-b5e6-96231b3b80d8
to be marked invalid regardless of whether it is
a debug, an exception handling or (hopefully) a
GC label.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54172 91177308-0d34-0410-b5e6-96231b3b80d8
partially unroll a loop when fully unrolling would not fit under the threshold.
Patch by Mikael Lepistö.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54160 91177308-0d34-0410-b5e6-96231b3b80d8
that says "unconditional loads from this argument are safe", we now keep track
of the safety per set of indices from which loads happen. This prevents
ArgPromotion from promoting loads that aren't really valid. As an added effect,
this will now disregard the the type of the indices passed to a GEP, so
"load GEP %A, i32 1" and "load GEP %A, i64 1" will result in a single argument,
not two.
This fixes PR2598, for which a testcase has been added as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54159 91177308-0d34-0410-b5e6-96231b3b80d8
which is represented in codegen as an 'and' operation. This matches them
with movz instructions, instead of leaving them to be matched by and
instructions with an immediate field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54147 91177308-0d34-0410-b5e6-96231b3b80d8
because opt exited while llvm-as was still
writing to the pipe, causing it to get a
SIGPIPE. It seems best to change things to
avoid the race altogether.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54138 91177308-0d34-0410-b5e6-96231b3b80d8
mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.
Also commit Mon Ping's VSETCC patch
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54039 91177308-0d34-0410-b5e6-96231b3b80d8
and knowledge of PseudoSourceValues. This unfortunately isn't sufficient to allow
constants to be rematerialized in PIC mode -- the extra indirection is a
complication.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54000 91177308-0d34-0410-b5e6-96231b3b80d8
command-line option, and disable it by default. It introduced performance
regressions because CodeGen is currently not able to remat such loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53997 91177308-0d34-0410-b5e6-96231b3b80d8
case for this.
This allows instructions like loads from global variables declared to
be constant to be moved out of loops."
Patch by Stefanus Du Toit!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53945 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the GetResultInst instruction. It is still accepted in LLVM assembly
and bitcode, where it is now auto-upgraded to ExtractValueInst. Also, remove
support for return instructions with multiple values. These are auto-upgraded
to use InsertValueInst instructions.
The IRBuilder still accepts multiple-value returns, and auto-upgrades them
to InsertValueInst instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53941 91177308-0d34-0410-b5e6-96231b3b80d8
leads into a cycle involving a different PHI, LSR got stuck running
around that cycle looking for the original PHI. To avoid this, keep
track of visited PHIs and stop searching if we see one more than once.
This fixes PR2570.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53879 91177308-0d34-0410-b5e6-96231b3b80d8
force evaluation (ComputeIterationCountExhaustively) should be turned off.
It doesn't apply to trip-count2.ll because this file tests the brute force
evaluation.
The test for PR2364 (2008-05-25-NegativeStepToZero.ll) currently fails
showing that the patch for this bug doesn't work. I'll fix it in a few hours
with a patch for PR2088.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53792 91177308-0d34-0410-b5e6-96231b3b80d8
multiply to be done as unsigned, so that they have well defined
behavior on overflow. This fixes PR2408.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53767 91177308-0d34-0410-b5e6-96231b3b80d8
replacement of multiple values. This is slightly more efficient
than doing multiple ReplaceAllUsesOfValueWith calls, and theoretically
could be optimized even further. However, an important property of this
new function is that it handles the case where the source value set and
destination value set overlap. This makes it feasible for isel to use
SelectNodeTo in many very common cases, which is advantageous because
SelectNodeTo avoids a temporary node and it doesn't require CSEMap
updates for users of values that don't change position.
Revamp MorphNodeTo, which is what does all the work of SelectNodeTo, to
handle operand lists more efficiently, and to correctly handle a number
of corner cases to which its new wider use exposes it.
This commit also includes a change to the encoding of post-isel opcodes
in SDNodes; now instead of being sandwiched between the target-independent
pre-isel opcodes and the target-dependent pre-isel opcodes, post-isel
opcodes are now represented as negative values. This makes it possible
to test if an opcode is pre-isel or post-isel without having to know
the size of the current target's post-isel instruction set.
These changes speed up llc overall by 3% and reduce memory usage by 10%
on the InstructionCombining.cpp testcase with -fast and -regalloc=local.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53728 91177308-0d34-0410-b5e6-96231b3b80d8
of all sizes from i1 to i256. The code is not
always that great, for example (x86)
movw %di, %ax
movw %ax, i17_s
where the store could be directly from %di.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53677 91177308-0d34-0410-b5e6-96231b3b80d8
sizes from i1 to i256. The generated code is
like one huge bug report of things that the DAG
combiner fails to simplify!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53676 91177308-0d34-0410-b5e6-96231b3b80d8
simply does the atomic.cmp.swap on the larger type,
which means it blows away whatever is sitting in
the bytes just after the memory location, i.e.
causes a buffer overflow. This really requires
target specific code, which is why LegalizeTypes
doesn't try to handle this case generically. The
existing (wrong) code in LegalizeDAG will go away
automatically once the type legalization code is
removed from LegalizeDAG so I'm leaving it there
for the moment. Meanwhile, don't test for this
feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53669 91177308-0d34-0410-b5e6-96231b3b80d8
allowed to canonicalize return values).
Add a test that checks if return value and function attributes are not removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53612 91177308-0d34-0410-b5e6-96231b3b80d8
return values that are still (partially) live. Instead of updating all uses of
a call instruction after removing some elements, it now just rebuilds the
original struct (With undef gaps where the unused values were) and leaves it to
instcombine to clean this up.
The added testcase still fails currently, but this is due to instcombine which
isn't good enough yet. I will fix that part next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53608 91177308-0d34-0410-b5e6-96231b3b80d8
In LegalizeDAG the value is zero-extended to
the new type before byte swapping. It doesn't
matter how the extension is done since the new
bits are shifted off anyway after the swap, so
extend by any old rubbish bits. This results
in the final assembler for the testcase being
one line shorter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53604 91177308-0d34-0410-b5e6-96231b3b80d8
the second half of link-global-to-func.ll and causes some minor changes in
messages.
There are two TODOs here. First, this causes a regression in
2008-07-06-AliasWeakDest.ll, which is now failing (so I xfailed it). Anton,
I would really appreciate it if you could take a look at this. It should be
a matter of adding proper alias support to GetLinkageResult, and was probably
already a latent bug that would manifest with globals.
The second todo is to reimplement LinkAlias in the same pattern as
function and global linking. This should be pretty straight-forward for
someone who knows aliases, but isn't a requirement for correctness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53548 91177308-0d34-0410-b5e6-96231b3b80d8
(replacing a function with a global). This is needed when building
llvm itself with LTO on darwin, because of the EXPLICIT_SYMBOL hack
in lib/system/DynamicLibrary.cpp.
Implementation of linking the other way will need to wait for a
cleanup of LinkFunctionProtos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53546 91177308-0d34-0410-b5e6-96231b3b80d8
8 %reg1024<def> = IMPLICIT_DEF
12 %reg1024<def> = INSERT_SUBREG %reg1024<kill>, %reg1025, 2
The live range [12, 14) are not part of the r1024 live interval since it's defined by an implicit def. It will not conflicts with live interval of r1025. Now suppose both registers are spilled, you can easily see a situation where both registers are reloaded before the INSERT_SUBREG and both target registers that would overlap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53503 91177308-0d34-0410-b5e6-96231b3b80d8
was using the algorithm for folding unsigned comparisons which is
completely wrong. This has been broken since the signless types change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53444 91177308-0d34-0410-b5e6-96231b3b80d8
This cause a regression in InstCombine/JavaCompare, which was doing the right
thing on accident. To handle the missed case, generalize the comparisons based
on masked bits a little bit to handle comparisons against the max value. For
example, we can now xform (slt i32 (and X, 4), 4) -> (setne i32 (and X, 4), 4)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53443 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrite the DeadArgumentElimination pass, to use a more explicit tracking of
dependencies between return values and/or arguments. Also make the handling of
arguments and return values the same.
The pass now looks properly inside returned structs, but only at the first
level (ie, not inside nested structs).
This version fixed a few more bugs and was cleaned up a bit. It now passes all
of LLVM's testing, and should still pass SPEC2006. There is still a minor bug
with regard to returning nested structs. Since there is currently nothing that
emits such IR, I will fix that in a seperate commit (partly because it requires
a non-trivial fix).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53400 91177308-0d34-0410-b5e6-96231b3b80d8