- There is a minor semantic change here (evidenced by the test change) for
Darwin triples that have no version component. I debated changing the default
behavior of isOSVersionLT, but decided it made more sense for triples to be
explicit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129802 91177308-0d34-0410-b5e6-96231b3b80d8
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.
This is currently disabled by default. We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129772 91177308-0d34-0410-b5e6-96231b3b80d8
en-mass for C++ PODs. On my c++ test file, this cuts the fast isel rejects by 10x
and shrinks the generated .s file by 5%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129755 91177308-0d34-0410-b5e6-96231b3b80d8
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.
Fixes rdar://9207598
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129711 91177308-0d34-0410-b5e6-96231b3b80d8
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129666 91177308-0d34-0410-b5e6-96231b3b80d8
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129662 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129656 91177308-0d34-0410-b5e6-96231b3b80d8
which don't need to check for falling off the end of a block *and* end of phi
nodes, since terminators are never phis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129655 91177308-0d34-0410-b5e6-96231b3b80d8
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129650 91177308-0d34-0410-b5e6-96231b3b80d8
The transferValues() function can now handle both singly and multiply defined
values, as long as the resulting live range is known. Only rematerialized values
have their live range recomputed by extendRange().
The updateSSA() function can now insert PHI values in bulk across multiple
values in multiple target registers in one pass. The list of blocks received
from transferValues() is in layout order which seems to work well for the
iterative algorithm. Blocks from extendRange() are still in reverse BFS order,
but this function is used so rarely now that it doesn't matter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129580 91177308-0d34-0410-b5e6-96231b3b80d8
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129571 91177308-0d34-0410-b5e6-96231b3b80d8
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129508 91177308-0d34-0410-b5e6-96231b3b80d8
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129421 91177308-0d34-0410-b5e6-96231b3b80d8
Use a Bitvector instead, we didn't need the smaller memory footprint anyway.
This makes the greedy register allocator 10% faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129390 91177308-0d34-0410-b5e6-96231b3b80d8
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129383 91177308-0d34-0410-b5e6-96231b3b80d8
This merges the behavior of splitSingleBlocks into splitAroundRegion, so the
RS_Region and RS_Block register stages can be coalesced. That means the leftover
intervals after region splitting go directly to spilling instead of a second
pass of per-block splitting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129379 91177308-0d34-0410-b5e6-96231b3b80d8