regex and matching it instead of trying to match chunks at a time.
Matching chunks at a time broke with check lines like
CHECK: foo {{.*}}bar
because the .* would eat the entire rest of the line and bar would
never match.
Now we just escape the fixed strings for the user, so that something
like:
CHECK: a() {{.*}}???
is matched as:
CHECK: {{a\(\) .*\?\?\?}}
transparently "under the covers".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82779 91177308-0d34-0410-b5e6-96231b3b80d8
For the AAPCS ABI, SP must always be 4-byte aligned, and at any "public
interface" it must be 8-byte aligned. For the older ARM APCS ABI, the stack
alignment is just always 4 bytes. For X86, we currently align SP at
entry to a function (e.g., to 16 bytes for Darwin), but no stack alignment
is needed at other times, such as for a leaf function.
After discussing this with Dan, I decided to go with the approach of adding
a new "TransientStackAlignment" field to TargetFrameInfo. This value
specifies the stack alignment that must be maintained even in between calls.
It defaults to 1 except for ARM, where it is 4. (Some other targets may
also want to set this if they have similar stack requirements. It's not
currently required for PPC because it sets targetHandlesStackFrameRounding
and handles the alignment in target-specific code.) The existing StackAlignment
value specifies the alignment upon entry to a function, which is how we've
been using it anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82767 91177308-0d34-0410-b5e6-96231b3b80d8
LiveVariables add implicit kills to correctly track partial register kills. This works well enough and is fairly accurate. But coalescer can make it impossible to maintain these markers. e.g.
BL <ga:sss1>, %R0<kill,undef>, %S0<kill>, %R0<imp-def>, %R1<imp-def,dead>, %R2<imp-def,dead>, %R3<imp-def,dead>, %R12<imp-def,dead>, %LR<imp-def,dead>, %D0<imp-def>, ...
...
%reg1031<def> = FLDS <cp#1>, 0, 14, %reg0, Mem:LD4[ConstantPool]
...
%S0<def> = FCPYS %reg1031<kill>, 14, %reg0, %D0<imp-use,kill>
When reg1031 and S0 are coalesced, the copy (FCPYS) will be eliminated the the implicit-kill of D0 is lost. In this case it's possible to move the marker to the FLDS. But in many cases, this is not possible. Suppose
%reg1031<def> = FOO <cp#1>, %D0<imp-def>
...
%S0<def> = FCPYS %reg1031<kill>, 14, %reg0, %D0<imp-use,kill>
When FCPYS goes away, the definition of S0 is the "FOO" instruction. However, transferring the D0 implicit-kill to FOO doesn't work since it is the def of D0 itself. We need to fix this in another time by introducing a "kill" pseudo instruction to track liveness.
Disabling the assertion is not ideal, but machine verifier is doing that job now. It's important to know double-def is not a miscomputation since it means a register should be free but it's not tracked as free. It's a performance issue instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82677 91177308-0d34-0410-b5e6-96231b3b80d8
of the defs are processed.
Also fix a implicit_def propagation bug: a implicit_def of a physical register
should be applied to uses of the sub-registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82616 91177308-0d34-0410-b5e6-96231b3b80d8
take into consideration that the result of an invoke is only valid in
the normal dest, not the unwind dest. This caused 'PHINode::hasConstantValue'
to return true in an invalid situation, causing mem2reg to delete a phi that
was actually needed. This caused a crash building 483.xalancbmk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82491 91177308-0d34-0410-b5e6-96231b3b80d8
variable increment / decrement slighter high priority.
This has major impact on some micro-benchmarks. On MultiSource/Applications
and spec tests, it's a minor win. It also reduce 256.bzip instruction count
by 8%, 55 on 164.gzip on i386 / Darwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82485 91177308-0d34-0410-b5e6-96231b3b80d8
And fix a bug with the behavior of min/max instructions formed from
fcmp uge comparisons.
Also, use FiniteOnlyFPMath() for this code instead of UnsafeFPMath,
as it is more specific.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82466 91177308-0d34-0410-b5e6-96231b3b80d8
This doesn't kick in too much because of phi translation issues,
but this can be resolved in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82447 91177308-0d34-0410-b5e6-96231b3b80d8
from a piece of a large store when both are in the same block.
This allows clang to compile the testcase in PR4216 to this code:
_test_bitfield:
movl 4(%esp), %eax
movl %eax, %ecx
andl $-65536, %ecx
orl $32962, %eax
andl $40186, %eax
orl %ecx, %eax
ret
This is not ideal, but is a whole lot better than the code produced
by llvm-gcc:
_test_bitfield:
movw $-32574, %ax
orw 4(%esp), %ax
andw $-25350, %ax
movw %ax, 4(%esp)
movw 7(%esp), %cx
shlw $8, %cx
movzbl 6(%esp), %edx
orw %cx, %dx
movzwl %dx, %ecx
shll $16, %ecx
movzwl %ax, %eax
orl %ecx, %eax
ret
and dramatically better than that produced by gcc 4.2:
_test_bitfield:
pushl %ebx
call L3
"L00000000001$pb":
L3:
popl %ebx
movl 8(%esp), %eax
leal 0(,%eax,4), %edx
sarb $7, %dl
movl %eax, %ecx
andl $7168, %ecx
andl $-7201, %ebx
movzbl %dl, %edx
andl $1, %edx
sall $5, %edx
orl %ecx, %ebx
orl %edx, %ebx
andl $24, %eax
andl $-58336, %ebx
orl %eax, %ebx
orl $32962, %ebx
movl %ebx, %eax
popl %ebx
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82439 91177308-0d34-0410-b5e6-96231b3b80d8
early for the stated reasons: this allows it to find more
equivalences and depend less on code layout.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82404 91177308-0d34-0410-b5e6-96231b3b80d8
so that nonlocal and partially redundant loads can use it as well.
The testcase shows examples of craziness this can handle. This triggers
*many* times in 176.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82403 91177308-0d34-0410-b5e6-96231b3b80d8
(and load -> load) when the base pointers must alias but when
they are different types. This occurs very very frequently in
176.gcc and other code that uses bitfields a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82399 91177308-0d34-0410-b5e6-96231b3b80d8
Add another line to the ConstantExprFold test to demonstrate the GEPs may not
wrap around in either the signed or unsigned senses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82361 91177308-0d34-0410-b5e6-96231b3b80d8
we pushed the beginning of the interval back 1, so the
interval would overlap with inputs that die. We were
also pushing the end of the interval back 1, though,
which means the earlyclobber didn't overlap with other
output operands. Don't do this. PR 4964.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82342 91177308-0d34-0410-b5e6-96231b3b80d8
getSymbolForDwarfGlobalReference is smart enough to know that it
needs to register the stub it references with MachineModuleInfoMachO,
so that it gets emitted at the end of the file.
Move stub emission from X86ATTAsmPrinter::doFinalization to the
new X86ATTAsmPrinter::EmitEndOfAsmFile asmprinter hook. The important
thing here is that EmitEndOfAsmFile is called *after* the ehframes are
emitted, so we get all the stubs.
This allows us to remove a gross hack from the asmprinter where it would
"just know" that it needed to output stubs for personality functions.
Now this is all driven from a consistent interface.
The testcase change is just reordering the expected output now that the
stubs come out after the ehframe instead of before.
This also unblocks other changes that Bill wants to make.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82269 91177308-0d34-0410-b5e6-96231b3b80d8
move a SUBFC (etc.) below the SUBFE (etc.) that consumed
the carry bit. Add missing ADDIC8, noticed along the way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82266 91177308-0d34-0410-b5e6-96231b3b80d8
on x86, to avoid explicit test instructions. A few existing tests changed
due to arbitrary register allocation differences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82263 91177308-0d34-0410-b5e6-96231b3b80d8
where the induction variable has a non-unit stride, such as {0,+,2}, and
there are expressions such as {1,+,2} inside the loop formed with
or or add nsw operators.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@82151 91177308-0d34-0410-b5e6-96231b3b80d8
more than one phi, since that leads to higher register pressure on
entry to the phi. This is especially problematic when the phi is in
a loop header, as it increases register pressure throughout the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81993 91177308-0d34-0410-b5e6-96231b3b80d8
instead of cloning and RAUWing it.
- Make AbstractTypeUser a friend of Value so that it can offer
its subclasses a way to update a Value's type in place. This
is better than a universally visible setType method on Value,
and it's sufficient for the immediate need.
- Eliminate the constant "convert" functions. This eliminates a
lot of logic duplication, and fixes a complicated bug where a
constant can't actually be cloned during the type refinement
process because some of the types that its folder needs are
half-destroyed, being in the middle of refinement themselves.
- Move the getValType functions from being static overloaded
functions in Constants.cpp to be members of class template
specializations in ConstantsContext.h. This means that the
code ends up getting instantiated twice, however it also
makes it possible to eliminate all "convert" functions, so
it's not a big net code size increase. And if desired, the
duplicate instantiations could be eliminated with some
reorganization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81861 91177308-0d34-0410-b5e6-96231b3b80d8
While I'm there, change code that does:
SomeTy == Type::getFooType(Context)
into:
SomeTy->getTypeID() == FooTyID
to decrease the amount of useless type creation which may involve locking, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81846 91177308-0d34-0410-b5e6-96231b3b80d8
has multiple uses, as one of the other uses may be on a path
to a different node above the callseq_start, because that
leads to a cyclic graph. This problem is exposed when
-combiner-global-alias-analysis is used. This fixes PR4880.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81821 91177308-0d34-0410-b5e6-96231b3b80d8
parses the .word directive as 4 bytes and ARMAsmParser::ParseInstruction will
give an error is called. Broke out the test of the .word directive into two
different test cases, one for x86 and one for arm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81817 91177308-0d34-0410-b5e6-96231b3b80d8
Change the picbase symbol on non-darwin systems from ".Lllvm$4.$piclabel" to
".L4$pb". The actual name doesn't matter and the darwin name is shorter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81688 91177308-0d34-0410-b5e6-96231b3b80d8
sse, this code falls back to SelectionDAG isel which uses an x87
instruction, which is fine, but not what this test is testing for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81656 91177308-0d34-0410-b5e6-96231b3b80d8
input filename so that opt doesn't print the input filename in the
output so that grep lines in the tests don't unintentionally match
strings in the input filename.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81537 91177308-0d34-0410-b5e6-96231b3b80d8
safe. This can happen we a subreg_to_reg 0 has been coalesced. One
exception is when the instruction that folds the load is a move, then we
can simply turn it into a 32-bit load from the stack slot.
rdar://7170444
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81494 91177308-0d34-0410-b5e6-96231b3b80d8
how to fold notionally-out-of-bounds array getelementptr indices instead
of just doing these in lib/Analysis/ConstantFolding.cpp, because it can
be done in a fairly general way without TargetData, and because not all
constants are visited by lib/Analysis/ConstantFolding.cpp. This enables
more constant folding.
Also, set the "inbounds" flag when the getelementptr indices are
one-past-the-end.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81483 91177308-0d34-0410-b5e6-96231b3b80d8
within the notional bounds of the static type of the getelementptr (which
is not the same as "inbounds") from GlobalOpt into a utility routine,
and use it in ConstantFold.cpp to check whether there are any mis-behaved
indices.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81478 91177308-0d34-0410-b5e6-96231b3b80d8
loop exit edge -- new PHIs may be needed not only for the additional
splits that are made to preserve LoopSimplify form, but also for the
original split. Factor out the code that inserts new PHIs so that it
can be used for both. Remove LoopRotation.cpp's code for manually
updating LCSSA form, as it is now redundant. This fixes PR4934.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81363 91177308-0d34-0410-b5e6-96231b3b80d8
to instructions instead of zero extended ones. This makes the asmprinter
print signed values more consistently. This apparently only really affects
the X86 backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81265 91177308-0d34-0410-b5e6-96231b3b80d8
that get created during loop unswitching, and fix SplitBlockPredecessors'
LCSSA updating code to create new PHIs instead of trying to just move
existing ones.
Also, optimize Loop::verifyLoop, since it gets called a lot. Use
searches on a sorted list of blocks instead of calling the "contains"
function, as is done in other places in the Loop class, since "contains"
does a linear search. Also, don't call verifyLoop from LoopSimplify or
LCSSA, as the PassManager is already calling verifyLoop as part of
LoopInfo's verifyAnalysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81221 91177308-0d34-0410-b5e6-96231b3b80d8
depth first order, so it wouldn't process unreachable blocks.
When compiling at -O0, late dead block elimination isn't done
and the bad instructions got to isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81187 91177308-0d34-0410-b5e6-96231b3b80d8
extractelement operations into a bitcast of the pointer,
then a gep, then a scalar load. Disable this when the vector
only has one element, because it leads to infinite loops in
instcombine (PR4908).
This transformation seems like a really bad idea to me, as it
will likely disable CSE of vector load/stores etc and can be
better done in the code generator when profitable. This
goes all the way back to the first days of packed types,
r25299 specifically.
I'll let those people who care about the performance of vector
code decide what to do with this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81185 91177308-0d34-0410-b5e6-96231b3b80d8
from floating-point to integer first, and bitcast the result
back to floating-point. Previously, this test was passing by
falling back to SelectionDAG lowering. The resulting code isn't
as nice, but it's correct and CodeGen now stays on the fast path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81171 91177308-0d34-0410-b5e6-96231b3b80d8
- I'd appreciate it if someone else eyeballs my changes to make sure I captured
the intent of the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81083 91177308-0d34-0410-b5e6-96231b3b80d8
linear scan reg alloc. This fixes a problem I ran into where extracting
a function from a larger file caused the generated code to change (masking
the problem I was trying to debug) because the allocator behaved differently.
This changes the results for two X86 regression checks. stack-color-with-reg
is improved, with one less instruction, but pr3495 is worse, with one more
copy. As far as I can tell, these tests were just getting lucky or unlucky,
so I've changed the expected results.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81060 91177308-0d34-0410-b5e6-96231b3b80d8
Constant uniquing tables. This allows distinct ConstantExpr objects
with the same operation and different flags.
Even though a ConstantExpr "a + b" is either always overflowing or
never overflowing (due to being a ConstantExpr), it's still necessary
to be able to represent it both with and without overflow flags at
the same time within the IR, because the safety of the flag may
depend on the context of the use. If the constant really does overflow,
it wouldn't ever be safe to use with the flag set, however the use
may be in code that is never actually executed.
This also makes it possible to merge all the flags tests into a single test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@80998 91177308-0d34-0410-b5e6-96231b3b80d8
There's a bug with ocamlc that uses "char*" instead of "const char*" for
global string variables. This causes g++ to be very noisy when linking
ocamlc programs. That's why the ocaml test used to cat to /dev/null.
ocamlopt doesn't have this problem, so we can get rid of the >/dev/null,
which may obscure some problems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@80968 91177308-0d34-0410-b5e6-96231b3b80d8
D test/Analysis/Profiling
--- Reverse-merging r80907 into '.':
U lib/Analysis/ProfileInfoLoaderPass.cpp
Attempt to remove failure in the self-hosting build bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@80966 91177308-0d34-0410-b5e6-96231b3b80d8
on a self-hosted build (although it seems to work on non-self hosted). I'll work
with Andreas to figure this out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@80947 91177308-0d34-0410-b5e6-96231b3b80d8