registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.
Fixes rdar://9207598
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129711 91177308-0d34-0410-b5e6-96231b3b80d8
the generated FastISel. X86 doesn't need to generate code to match ADD16ri8
since ADD16ri will do just fine. This is a small codesize win in the generated
instruction selector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129692 91177308-0d34-0410-b5e6-96231b3b80d8
simplifying them and exposing more information to tblgen. It would be nice
if other target authors adopted this as well, particularly arm since it has fastisel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129676 91177308-0d34-0410-b5e6-96231b3b80d8
kind of predicate: one that is specific to imm nodes. The predicate function
specified here just checks an int64_t directly instead of messing around with
SDNode's. The virtue of this is that it means that fastisel and other things
can reason about these predicates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129675 91177308-0d34-0410-b5e6-96231b3b80d8
structure and fix some fixmes. We now have a TreePredicateFn class
that handles all of the decoding of these things. This is an internal
cleanup that has no impact on the code generated by tblgen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129670 91177308-0d34-0410-b5e6-96231b3b80d8
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129666 91177308-0d34-0410-b5e6-96231b3b80d8
when we have a global variable base an an index. Instead, just give up on
folding the global variable.
Before we'd geenrate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
leaq (%rax), %rax
addq %rdi, %rax
movzbl (%rax), %eax
ret
now we generate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
movzbl (%rax,%rdi), %eax
ret
The difference is even more significant when there is a scale
involved.
This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129664 91177308-0d34-0410-b5e6-96231b3b80d8
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129662 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129656 91177308-0d34-0410-b5e6-96231b3b80d8
which don't need to check for falling off the end of a block *and* end of phi
nodes, since terminators are never phis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129655 91177308-0d34-0410-b5e6-96231b3b80d8
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129650 91177308-0d34-0410-b5e6-96231b3b80d8
Break the arc-profile code out to a function like the notes emission code is,
and reorder the functions in the file.
The only functionality change is that we no longer modify the Module when the
Module has no debug info to use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129631 91177308-0d34-0410-b5e6-96231b3b80d8
does. Also mostly implement it. Still a work-in-progress, but generates legal
output on crafted test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129630 91177308-0d34-0410-b5e6-96231b3b80d8
The transferValues() function can now handle both singly and multiply defined
values, as long as the resulting live range is known. Only rematerialized values
have their live range recomputed by extendRange().
The updateSSA() function can now insert PHI values in bulk across multiple
values in multiple target registers in one pass. The list of blocks received
from transferValues() is in layout order which seems to work well for the
iterative algorithm. Blocks from extendRange() are still in reverse BFS order,
but this function is used so rarely now that it doesn't matter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129580 91177308-0d34-0410-b5e6-96231b3b80d8
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129571 91177308-0d34-0410-b5e6-96231b3b80d8
canonical, and generally leads to better code. Found while looking at
an article about saturating arithmetic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129545 91177308-0d34-0410-b5e6-96231b3b80d8
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129508 91177308-0d34-0410-b5e6-96231b3b80d8
instruction around, reducing work.
Greatly simplify handling of debug instructions. There is no need to
build up a vector of them and then move them into the one predecessor
if we're processing a block. Instead just rescan the block and *copy*
them into the pred. If a block gets merged into multiple preds, this
will retain more debug info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129502 91177308-0d34-0410-b5e6-96231b3b80d8
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>). When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129472 91177308-0d34-0410-b5e6-96231b3b80d8
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129466 91177308-0d34-0410-b5e6-96231b3b80d8
will allow multiple context with different loop unroll parameters to run. This is a minor change and no effect
on existing application.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129449 91177308-0d34-0410-b5e6-96231b3b80d8
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129432 91177308-0d34-0410-b5e6-96231b3b80d8
alignment for its type, use the minimum of the specified alignment and the ABI
alignment. This fixes <rdar://problem/9275290>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129428 91177308-0d34-0410-b5e6-96231b3b80d8
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129421 91177308-0d34-0410-b5e6-96231b3b80d8
The ARMARM specifies these instructions as unpredictable when storing the
writeback register. This shouldn't affect code generation much since storing a
pointer to itself is quite rare.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129409 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129401 91177308-0d34-0410-b5e6-96231b3b80d8
In addition, the base register is not rGPR, but GPR with th exception that:
if n == 15 then UNPREDICTABLE
rdar://problem/9273836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129391 91177308-0d34-0410-b5e6-96231b3b80d8
Use a Bitvector instead, we didn't need the smaller memory footprint anyway.
This makes the greedy register allocator 10% faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129390 91177308-0d34-0410-b5e6-96231b3b80d8
Add handling for tracking the relocations on symbols and resolving them.
Keep track of the relocations even after they are resolved so that if
the RuntimeDyld client moves the object, it can update the address and any
relocations to that object will be updated.
For our trival object file load/run test harness (llvm-rtdyld), this enables
relocations between functions located in the same object module. It should
be trivially extendable to load multiple objects with mutual references.
As a simple example, the following now works (running on x86_64 Darwin 10.6):
$ cat t.c
int bar() {
return 65;
}
int main() {
return bar();
}
$ clang t.c -fno-asynchronous-unwind-tables -o t.o -c
$ otool -vt t.o
t.o:
(__TEXT,__text) section
_bar:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movl $0x00000041,%eax
0000000000000009 popq %rbp
000000000000000a ret
000000000000000b nopl 0x00(%rax,%rax)
_main:
0000000000000010 pushq %rbp
0000000000000011 movq %rsp,%rbp
0000000000000014 subq $0x10,%rsp
0000000000000018 movl $0x00000000,0xfc(%rbp)
000000000000001f callq 0x00000024
0000000000000024 addq $0x10,%rsp
0000000000000028 popq %rbp
0000000000000029 ret
$ llvm-rtdyld t.o -debug-only=dyld ; echo $?
Function sym: '_bar' @ 0
Function sym: '_main' @ 16
Extracting function: _bar from [0, 15]
allocated to 0x100153000
Extracting function: _main from [16, 41]
allocated to 0x100154000
Relocation at '_main' + 16 from '_bar(Word1: 0x2d000000)
Resolving relocation at '_main' + 16 (0x100154010) from '_bar (0x100153000)(pcrel, type: 2, Size: 4).
loaded '_main' at: 0x100154000
65
$
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129388 91177308-0d34-0410-b5e6-96231b3b80d8
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129383 91177308-0d34-0410-b5e6-96231b3b80d8
This merges the behavior of splitSingleBlocks into splitAroundRegion, so the
RS_Region and RS_Block register stages can be coalesced. That means the leftover
intervals after region splitting go directly to spilling instead of a second
pass of per-block splitting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129379 91177308-0d34-0410-b5e6-96231b3b80d8
Use debug info in the IR to find the directory/file:line:col. Each time that location changes, bump a counter.
Unlike the existing profiling system, we don't try to look at argv[], and thusly don't require main() to be present in the IR. This matches GCC's technique where you specify the profiling flag when producing each .o file.
The runtime library is minimal, currently just calling printf at program shutdown time. The API is designed to make it possible to emit GCOV data later on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129340 91177308-0d34-0410-b5e6-96231b3b80d8
reassociation opportunities are exposed. This fixes a bug where
the nested reassociation expects to be the IR to be consistent,
but it isn't, because the outer reassociation has disconnected
some of the operands. rdar://9167457
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129324 91177308-0d34-0410-b5e6-96231b3b80d8
.long 80+08
go ahead and assume that if we've got an Error token that we handled it
already. Otherwise if it's a token we can't handle then go ahead and
return the default error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129322 91177308-0d34-0410-b5e6-96231b3b80d8
has some bugs. If this is interesting functionality, it should be
reimplemented in the argpromotion pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129314 91177308-0d34-0410-b5e6-96231b3b80d8
mean that it has to be ConstantArray of ConstantStruct. We might have
ConstantAggregateZero, at either level, so don't crash on that.
Also, semi-deprecate the sentinal value. The linker isn't aware of sentinals so
we end up with the two lists appended, each with their "sentinals" on them.
Different parts of LLVM treated sentinals differently, so make them all just
ignore the single entry and continue on with the rest of the list.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129307 91177308-0d34-0410-b5e6-96231b3b80d8
the 'unwind' instruction. However, later on that instruction was converted into
a jump to the basic block it was located in, causing an infinite loop when we
get there.
It turns out, we get there if the _Unwind_Resume_or_Rethrow call returns (which
it's not supposed to do). It returns if it cannot find a place to unwind
to. Thus we would get what appears to be a "hang" when in reality it's just that
the EH couldn't be propagated further along.
Instead of infinitely looping (or calling `unwind', which none of our back-ends
support (it's lowered into nothing...)), call the @llvm.trap() intrinsic
instead. This may not conform to specific rules of a particular language, but
it's rather better than infinitely looping.
<rdar://problem/9175843&9233582>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129302 91177308-0d34-0410-b5e6-96231b3b80d8
disassembler API. Hooked this up to the ARM target so such tools as Darwin's
otool(1) can now print things like branch targets for example this:
blx _puts
instead of this:
blx #-36
And even print the expression encoded in the Mach-O relocation entried for
things like this:
movt r0, :upper16:((_foo-_bar)+1234)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129284 91177308-0d34-0410-b5e6-96231b3b80d8
Both coalescing and register allocation already check aliases for interference,
so these extra segments are only slowing us down.
This speeds up both linear scan and the greedy register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129283 91177308-0d34-0410-b5e6-96231b3b80d8
--- Reverse-merging r129235 into '.':
D test/Feature/bb_attrs.ll
U include/llvm/BasicBlock.h
U include/llvm/Bitcode/LLVMBitCodes.h
U lib/VMCore/AsmWriter.cpp
U lib/VMCore/BasicBlock.cpp
U lib/AsmParser/LLParser.cpp
U lib/AsmParser/LLLexer.cpp
U lib/AsmParser/LLToken.h
U lib/Bitcode/Reader/BitcodeReader.cpp
U lib/Bitcode/Writer/BitcodeWriter.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129259 91177308-0d34-0410-b5e6-96231b3b80d8
* Add a "landing pad" attribute to the BasicBlock.
* Modify the bitcode reader and writer to handle said attribute.
Later: The verifier will ensure that the landing pad attribute is used in the
appropriate manner. I.e., not applied to the entry block, and applied only to
basic blocks that are branched to via a `dispatch' instruction.
(This is a work-in-progress.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129235 91177308-0d34-0410-b5e6-96231b3b80d8
delete the instruction pointed to by CGP's current instruction
iterator, leading to a crash on the testcase. This fixes PR9578.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129200 91177308-0d34-0410-b5e6-96231b3b80d8
It is common for large live ranges to have few basic blocks with register uses
and many live-through blocks without any uses. This approach grows the Hopfield
network incrementally around the use blocks, completely avoiding checking
interference for some through blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129188 91177308-0d34-0410-b5e6-96231b3b80d8
error stream, in cases where the AsmParser is
being invoked by EDDisassembler. Before, they
were being sent to errs() because no error handler
was installed in the SourceMgr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129177 91177308-0d34-0410-b5e6-96231b3b80d8
can be used even when main() isn't present in the Module, but it means that you
don't get to read argv[].
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129163 91177308-0d34-0410-b5e6-96231b3b80d8
If lower bound is more then upper bound then consider it is an unbounded array.
An array is unbounded if non-zero lower bound is same as upper bound.
If lower bound and upper bound are zero than array has one element.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129156 91177308-0d34-0410-b5e6-96231b3b80d8
The previous cleanup of LDRD got overzealous and removed it, causing post-RA
scheduling to get overzealous in breaking antidependencies and invalidate these instructions. Hilarity and invalid assembly ensued.
rdar://9244161
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129144 91177308-0d34-0410-b5e6-96231b3b80d8
Teach 32-bit section loading to use the Memory Manager interface, just like
the 64-bit loading does. Tidy up a few other things here and there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129138 91177308-0d34-0410-b5e6-96231b3b80d8
with the newer, cleaner model. It uses the IAPrinter class to hold the
information that is needed to match an instruction with its alias. This also
takes into account the available features of the platform.
There is one bit of ugliness. The way the logic determines if a pattern is
unique is O(N**2), which is gross. But in reality, the number of items it's
checking against isn't large. So while it's N**2, it shouldn't be a massive time
sink.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129110 91177308-0d34-0410-b5e6-96231b3b80d8