Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122995 91177308-0d34-0410-b5e6-96231b3b80d8
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122955 91177308-0d34-0410-b5e6-96231b3b80d8
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122952 91177308-0d34-0410-b5e6-96231b3b80d8
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)
to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122950 91177308-0d34-0410-b5e6-96231b3b80d8
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122936 91177308-0d34-0410-b5e6-96231b3b80d8
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122806 91177308-0d34-0410-b5e6-96231b3b80d8
prologue and epilogue if the adjustment is 8. Similarly, use pushl / popl if
the adjustment is 4 in 32-bit mode.
In the epilogue, takes care to pop to a caller-saved register that's not live
at the exit (either return or tailcall instruction).
rdar://8771137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122783 91177308-0d34-0410-b5e6-96231b3b80d8
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122727 91177308-0d34-0410-b5e6-96231b3b80d8
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122712 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122710 91177308-0d34-0410-b5e6-96231b3b80d8
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122707 91177308-0d34-0410-b5e6-96231b3b80d8
blocks in a loop, instead of just the header block. This makes it more
aggressive, able to handle Duncan's Ada examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122704 91177308-0d34-0410-b5e6-96231b3b80d8
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122695 91177308-0d34-0410-b5e6-96231b3b80d8
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122685 91177308-0d34-0410-b5e6-96231b3b80d8
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122654 91177308-0d34-0410-b5e6-96231b3b80d8
preprocessed .s files and matches darwin gas. rdar://8798690
Also fix a comment on the next line of AsmParser.cpp after this new code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122531 91177308-0d34-0410-b5e6-96231b3b80d8
If the basic block containing the BCCi64 (or BCCZi64) instruction ends with
an unconditional branch, that branch needs to be deleted before appending
the expansion of the BCCi64 to the end of the block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122521 91177308-0d34-0410-b5e6-96231b3b80d8
See http://caml.inria.fr/mantis/view.php?id=4166
If we call only external functions from a module, then its 'let _' bindings
don't get executed, which means that the exceptions don't get registered for use
in the C code.
This in turn causes llvm_raise to call raise_with_arg() with a NULL pointer and
cause a segmentation fault.
The workaround is to declare all 'external' functions as 'val' in these .mli
files.
Also added a separate testcase (the testcase must call only external functions
for the bug to occur).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122497 91177308-0d34-0410-b5e6-96231b3b80d8
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122472 91177308-0d34-0410-b5e6-96231b3b80d8
the original instruction, half the cases were missed (making it not
wrong but suboptimal). Also correct a typo (A <-> B) in the second
chunk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122414 91177308-0d34-0410-b5e6-96231b3b80d8
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122399 91177308-0d34-0410-b5e6-96231b3b80d8
loads properly. We miscompiled the testcase into:
_test: ## @test
movl $128, (%rdi)
movzbl 1(%rdi), %eax
ret
Now we get a proper:
_test: ## @test
movl $128, (%rdi)
movsbl (%rdi), %eax
movzbl %ah, %eax
ret
This fixes PR8757.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122392 91177308-0d34-0410-b5e6-96231b3b80d8
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122378 91177308-0d34-0410-b5e6-96231b3b80d8
the shift type was needed one place, the shift count
type another. The transform in 123555 had the same
problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122366 91177308-0d34-0410-b5e6-96231b3b80d8
count operand. These should be the same but apparently are
not always, and this is cleaner anyway. This improves the
code in an existing test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122354 91177308-0d34-0410-b5e6-96231b3b80d8
being tested. This ensures that we test the tools just built and not
some random tools that might happen to be in the user's PATH. This
makes LLVM testing much more stable and predictable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122341 91177308-0d34-0410-b5e6-96231b3b80d8
not assume this (for example in case more transforms get added below
it). Suggested by Frits van Bommel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122332 91177308-0d34-0410-b5e6-96231b3b80d8
quite often, but don't make much difference in practice presumably because
instcombine also knows them and more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122328 91177308-0d34-0410-b5e6-96231b3b80d8
a couple of existing transforms. This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122326 91177308-0d34-0410-b5e6-96231b3b80d8
Type legalization splits up i64 values into pairs of i32 values, which leads
to poor quality code when inserting or extracting i64 vector elements.
If the vector element is loaded or stored, it can be treated as an f64 value
and loaded or stored directly from a VPR register. Use the pre-legalization
DAG combiner to cast those vector elements to f64 types so that the type
legalizer won't mess them up. Radar 8755338.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122319 91177308-0d34-0410-b5e6-96231b3b80d8
(they had just been forgotten before). Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122245 91177308-0d34-0410-b5e6-96231b3b80d8
argument. The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122234 91177308-0d34-0410-b5e6-96231b3b80d8
the same as setcc. Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS). This is
a step towards finishing off PR5443. In the testcase in that bug we now get:
movq %rdi, %rax
addq %rsi, %rax
sbbq %rcx, %rcx
testb $1, %cl
setne %dl
ret
instead of:
movq %rdi, %rax
addq %rsi, %rax
movl $0, %ecx
adcq $0, %rcx
testq %rcx, %rcx
setne %dl
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122219 91177308-0d34-0410-b5e6-96231b3b80d8
doesn't, match it back to setb.
On a 64-bit version of the testcase before we'd get:
movq %rdi, %rax
addq %rsi, %rax
sbbb %dl, %dl
andb $1, %dl
ret
now we get:
movq %rdi, %rax
addq %rsi, %rax
setb %dl
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122217 91177308-0d34-0410-b5e6-96231b3b80d8
consistently by moving it out of lowering into dag combine.
Add some missing patterns for matching away extended versions of setcc_c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122201 91177308-0d34-0410-b5e6-96231b3b80d8
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122182 91177308-0d34-0410-b5e6-96231b3b80d8
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:
unsigned char X(char a, char b) {
int res = a+b;
if ((unsigned )(res+128) > 255U)
abort();
return res;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122178 91177308-0d34-0410-b5e6-96231b3b80d8
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.
This is likely to be the fix for 8816.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122177 91177308-0d34-0410-b5e6-96231b3b80d8
isel is *required* to split the edge. PHI values get evaluated
on the edge, not in their predecessor block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122170 91177308-0d34-0410-b5e6-96231b3b80d8
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122164 91177308-0d34-0410-b5e6-96231b3b80d8
It turns out that ppc backend has really weird interdependencies
over different hooks and all stuff is fragile wrt small changes.
This should fix PR8749
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122155 91177308-0d34-0410-b5e6-96231b3b80d8
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.
Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)
<rdar://problem/8782198>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122104 91177308-0d34-0410-b5e6-96231b3b80d8
BUILD_VECTOR operands where the element type is not legal. I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122102 91177308-0d34-0410-b5e6-96231b3b80d8
on the DragonEgg self-host bot. Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122072 91177308-0d34-0410-b5e6-96231b3b80d8
a null endptr argument, because they may write to errno.
This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122014 91177308-0d34-0410-b5e6-96231b3b80d8
dragonegg self-host buildbot. Original commit message:
Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121965 91177308-0d34-0410-b5e6-96231b3b80d8
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
Fixes <rdar://problem/8558713>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121905 91177308-0d34-0410-b5e6-96231b3b80d8
Clang is now providing intrinsics for these and so we need to support them
in the backend. Radar 8068427.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121902 91177308-0d34-0410-b5e6-96231b3b80d8
and "save_volatiles" correctly. This completes the custom calling convention
functionality changes for the MBlaze backend that were started in 121888.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121891 91177308-0d34-0410-b5e6-96231b3b80d8
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121859 91177308-0d34-0410-b5e6-96231b3b80d8
With this we don't need the EffectiveSize field anymore. Without that field
LayoutFragment only updates offsets and we don't need to invalidate the
current fragment when it is relaxed (only the ones following it).
This is also a very small improvement in the accuracy of the layout info as
we now use the after relaxation size immediately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121857 91177308-0d34-0410-b5e6-96231b3b80d8
Since we now don't update addresses so early, we might relax a bit more than
we need to. This is simillar to the issue in PR8467.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121856 91177308-0d34-0410-b5e6-96231b3b80d8
update the condition codes. These come from my test generator and are just
the ones that MC currently assembles correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121830 91177308-0d34-0410-b5e6-96231b3b80d8
regB = move RCX
regA = op regB, regC
RAX = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
movl %edi, %eax
shlq $32, %rcx
leaq (%rcx,%rax), %rax
=>
movl %edi, %eax
shlq $32, %rcx
orq %rcx, %rax
rdar://8762995
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121793 91177308-0d34-0410-b5e6-96231b3b80d8
which is simpler than finding a place to insert in BB.
- Don't perform the 'if condition hoisting' xform on certain
i1 PHIs, as it interferes with switch formation.
This re-fixes "example 7", without breaking the world hopefully.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121764 91177308-0d34-0410-b5e6-96231b3b80d8
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.
second, it doesn't clean up the "if diamond" that it just
eliminated away. This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121762 91177308-0d34-0410-b5e6-96231b3b80d8
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now. It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior. Since that isn't obviously wrong, I've just
changed the test file. This completes the work for Radar 8711675.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121730 91177308-0d34-0410-b5e6-96231b3b80d8
when the wider type is legal. This allows us to compile:
define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
%div = udiv i16 %x, 33
ret i16 %div
}
into:
test1: # @test1
movzwl 4(%esp), %eax
imull $63551, %eax, %eax # imm = 0xF83F
shrl $21, %eax
ret
instead of:
test1: # @test1
movw $-1985, %ax # imm = 0xFFFFFFFFFFFFF83F
mulw 4(%esp)
andl $65504, %edx # imm = 0xFFE0
movl %edx, %eax
shrl $5, %eax
ret
Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320
We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121696 91177308-0d34-0410-b5e6-96231b3b80d8
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121690 91177308-0d34-0410-b5e6-96231b3b80d8
class A<bit a, bits<3> x, bits<3> y> {
bits<3> z;
let z = !if(a, x, y);
}
The variable z will get the value of x when 'a' is 1 and 'y' when a is '0'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121666 91177308-0d34-0410-b5e6-96231b3b80d8
(x & 2^n) ? 2^m+C : C
we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121609 91177308-0d34-0410-b5e6-96231b3b80d8
Alignments smaller than the total size of the memory being loaded or stored,
unless the alignment is 8 bytes, are not allowed. Add tests for this, too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121506 91177308-0d34-0410-b5e6-96231b3b80d8
cmake/modules/AddLLVM.cmake: Add empty "phony" target in add_llvm_loadable_module() even if loadable module were not supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121455 91177308-0d34-0410-b5e6-96231b3b80d8
the condition codes. Where the ones that do have an 's' suffix and the ones
that don't don't have the suffix. The trick is if MatchInstructionImpl() fails
we try again after adding a CCOut operand with the correct value and removing
the 's' if present. Four simple test cases added for now, lots more to come.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121401 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121391 91177308-0d34-0410-b5e6-96231b3b80d8
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121358 91177308-0d34-0410-b5e6-96231b3b80d8
Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM
as well as ELF/OBJ (including fixup)
Also added support for ELF::R_ARM_TLS_IE32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121312 91177308-0d34-0410-b5e6-96231b3b80d8
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}
vpush and vpop do not allow gaps in the register list.
rdar://8728956
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121197 91177308-0d34-0410-b5e6-96231b3b80d8
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121111 91177308-0d34-0410-b5e6-96231b3b80d8
as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc
bootstrap on darwin10 using darwin9's assembler and linker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121006 91177308-0d34-0410-b5e6-96231b3b80d8
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120974 91177308-0d34-0410-b5e6-96231b3b80d8
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120960 91177308-0d34-0410-b5e6-96231b3b80d8
Also add asserts that the indices are valid in InsertValueInst::init(). ExtractValueInst already asserts when constructed with invalid indices.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120956 91177308-0d34-0410-b5e6-96231b3b80d8
result. This allows us to compile:
void *test12(long count) {
return new int[count];
}
into:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
movq $-1, %rdi
cmovnoq %rax, %rdi
jmp __Znam ## TAILCALL
instead of:
test12:
movl $4, %ecx
movq %rdi, %rax
mulq %rcx
seto %cl
testb %cl, %cl
movq $-1, %rdi
cmoveq %rax, %rdi
jmp __Znam
Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120936 91177308-0d34-0410-b5e6-96231b3b80d8
backend that they were all implemented except umul. This one fell back
to the default implementation that did a hi/lo multiply and compared the
top. Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test. Now we compile:
void *func(long count) {
return new int[count];
}
into:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
seto %cl ## encoding: [0x0f,0x90,0xc1]
testb %cl, %cl ## encoding: [0x84,0xc9]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
instead of:
__Z4funcl: ## @_Z4funcl
movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
mulq %rcx ## encoding: [0x48,0xf7,0xe1]
testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2]
movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8]
jmp __Znam ## TAILCALL
Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120935 91177308-0d34-0410-b5e6-96231b3b80d8
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
supports SSE4.2 (nehalem) or SSE4A (barcelona).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120917 91177308-0d34-0410-b5e6-96231b3b80d8
foo = a - b
.long foo
instead of just
.long a - b
First, on darwin9 64 bits the assembler produces the wrong result. Second,
if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
consider a - b to be a constant but will if the dummy foo is created.
Split how we handle these cases. The first one is something MC should take care
of. The second one has to be handled by the caller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120889 91177308-0d34-0410-b5e6-96231b3b80d8
doing that if the target is darwin10 or newer.
This fixes
*) Direct object emission was producing objects without the workaround on
darwin9.
*) Assembly printing was producing objects with the workaround on linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120866 91177308-0d34-0410-b5e6-96231b3b80d8
named the same, so it had to qualify type names according to the enclosing
scope to ensure uniqueness. This is no longer needed for correctness (though
it may be helpful when reading the IR), so this test has lost its importance.
Zap it because dragonegg will never be able to produce the qualified type name
since modern gcc zaps language specific info (such as whether a type is nested
inside another - needed to get X::Y here) before dragonegg is reached.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120721 91177308-0d34-0410-b5e6-96231b3b80d8
Lifted adjustFixupValue() from Darwin for sharing w ELF.
Test added
TODO:
refactor ELFObjectWriter::RecordRelocation more.
Possibly share more code with Darwin?
Lots more relocations...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120534 91177308-0d34-0410-b5e6-96231b3b80d8
20040709-1.c from the gcc testsuite. I was using the size of a
pointer instead of the pointee. This fixes rdar://8713376
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120519 91177308-0d34-0410-b5e6-96231b3b80d8