en-mass for C++ PODs. On my c++ test file, this cuts the fast isel rejects by 10x
and shrinks the generated .s file by 5%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129755 91177308-0d34-0410-b5e6-96231b3b80d8
when they are a truncate from something else. This eliminates fully half of all the
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.
This fixed rdar://9297003 - fast isel bails out on all functions taking bools
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129752 91177308-0d34-0410-b5e6-96231b3b80d8
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones. Also, we used to emit extraneous code. e.g. test12 was:
movb $0, %al
movzbl %al, %edi
callq _test12
and test13 was:
movb $0, %al
xorl %edi, %edi
movb %al, 7(%rsp)
callq _test13f
Now we get:
movl $0, %edi
callq _test12
and:
movl $0, %edi
callq _test13f
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129751 91177308-0d34-0410-b5e6-96231b3b80d8
value constraints on them (when defined as ImmLeaf's). This is particularly important
for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand,
which has a value constraint. Before this patch we ended up iseling the examples into
such amazing code as:
movabsq $7, %rax
imulq %rax, %rdi
movq %rdi, %rax
ret
now we produce:
imulq $7, %rdi, %rax
ret
This dramatically shrinks the generated code at -O0 on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129691 91177308-0d34-0410-b5e6-96231b3b80d8
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129666 91177308-0d34-0410-b5e6-96231b3b80d8
when we have a global variable base an an index. Instead, just give up on
folding the global variable.
Before we'd geenrate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
leaq (%rax), %rax
addq %rdi, %rax
movzbl (%rax), %eax
ret
now we generate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
movzbl (%rax,%rdi), %eax
ret
The difference is even more significant when there is a scale
involved.
This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129664 91177308-0d34-0410-b5e6-96231b3b80d8
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129662 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129656 91177308-0d34-0410-b5e6-96231b3b80d8
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129650 91177308-0d34-0410-b5e6-96231b3b80d8
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129571 91177308-0d34-0410-b5e6-96231b3b80d8
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129508 91177308-0d34-0410-b5e6-96231b3b80d8
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129466 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129401 91177308-0d34-0410-b5e6-96231b3b80d8
There can be multiple defs for a single virtual register when they are defining
sub-registers.
The missing <dead> flag was stopping the inline spiller from eliminating dead
code after rematerialization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128888 91177308-0d34-0410-b5e6-96231b3b80d8
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:
%vreg75<def> = COPY %ESP; GR32:%vreg75
MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
CALLpcrel32 ...
Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.
The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.
I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128845 91177308-0d34-0410-b5e6-96231b3b80d8
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.
Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128642 91177308-0d34-0410-b5e6-96231b3b80d8
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128434 91177308-0d34-0410-b5e6-96231b3b80d8
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.
The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128327 91177308-0d34-0410-b5e6-96231b3b80d8
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.
Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128284 91177308-0d34-0410-b5e6-96231b3b80d8
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128181 91177308-0d34-0410-b5e6-96231b3b80d8