First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created.
Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway.
Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll
Jim has okayed this off-list.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180799 91177308-0d34-0410-b5e6-96231b3b80d8
The `llvm.tls_init_funcs' (created by the front-end) holds pointers to the TLS
initialization functions. These need to be placed into the correct section so
that they are run before `main()'.
<rdar://problem/13733006>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180737 91177308-0d34-0410-b5e6-96231b3b80d8
to determine whether or not we're on a darwin platform for debug code
emitting.
Solves the problem of a module with no triple on the command line
and no triple in the module using non-gdb ok features on darwin. Fix
up the member-pointers test to check the correct things for cross
platform (DW_FORM_flag is a good prefix).
Unfortunately no testcase because I have no ideas how to test something
without a triple and without a triple in the module yet check
precisely on two platforms. Ideas welcome.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180660 91177308-0d34-0410-b5e6-96231b3b80d8
Clarify documentation and API to make the difference between register and
register-indirect addressed locations more explicit. Put in a comment
to point out that with the current implementation we cannot specify
a register-indirect location with offset 0 (a breg 0 in DWARF).
No functionality change intended.
rdar://problem/13658587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180641 91177308-0d34-0410-b5e6-96231b3b80d8
TLVs probably won't be as common as the other types of variables. Check for them
last before defaulting to "DATA".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180631 91177308-0d34-0410-b5e6-96231b3b80d8
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180597 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR15838. Need to check for blocks with nothing but dbg.value.
I'm not sure how to force this situation with a unit test. I tried to
reduce the test case in PR15838 (1k lines of metadata) but gave up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180227 91177308-0d34-0410-b5e6-96231b3b80d8
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.
The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180193 91177308-0d34-0410-b5e6-96231b3b80d8
When MachineScheduler is enabled, this functionality can be
removed. Until then, provide a way to disable it for test cases and
designing MachineScheduler heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180192 91177308-0d34-0410-b5e6-96231b3b80d8
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180129 91177308-0d34-0410-b5e6-96231b3b80d8
1) Disallow 'returned' on parameter that is also 'sret' (no sensible semantics, as far as I can tell).
2) Conservatively disallow tail calls through 'returned' parameters that also are 'zext' or 'sext' (for consistency with treatment of other zero-extending and sign-extending operations in tail call position detection...can be revised later to handle situations that can be determined to be safe).
This is a new attribute that is not yet used, so there is no impact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180118 91177308-0d34-0410-b5e6-96231b3b80d8
Also add a check for llvm.used in the verifier and simplify clients now that
they can assume they have a ConstantArray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180019 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179996 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179940 91177308-0d34-0410-b5e6-96231b3b80d8
trying to move as much FastISel logic as possible out of the main path in
SelectionDAGISel - intermixing them just adds confusion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179902 91177308-0d34-0410-b5e6-96231b3b80d8
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179836 91177308-0d34-0410-b5e6-96231b3b80d8
This is a rework of the broken parts in r179373 which were subsequently reverted in r179374 due to incompatibility with C++98 compilers. This version should be ok under C++98.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179520 91177308-0d34-0410-b5e6-96231b3b80d8
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179449 91177308-0d34-0410-b5e6-96231b3b80d8