register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.
When a i64 sub is expanded to subc + sube.
libcall #1
\
\ subc
\ / \
\ / \
\ / libcall #2
sube
If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.
subc
|
libcall #2
|
libcall #1
|
sube
However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.
The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.
rdar://10019576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138791 91177308-0d34-0410-b5e6-96231b3b80d8
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138621 91177308-0d34-0410-b5e6-96231b3b80d8
SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138317 91177308-0d34-0410-b5e6-96231b3b80d8
Therefore, rather then generate a pseudo instruction, which is later expanded,
generate the necessary instructions in place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138163 91177308-0d34-0410-b5e6-96231b3b80d8
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137986 91177308-0d34-0410-b5e6-96231b3b80d8
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137810 91177308-0d34-0410-b5e6-96231b3b80d8
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137807 91177308-0d34-0410-b5e6-96231b3b80d8
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:
For this shuffle:
shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
<i32 1, i32 0, i32 7, i32 6>
This was expanded to:
vextractf128 $1, %ymm1, %xmm2
vpextrq $0, %xmm2, %rax
vmovd %rax, %xmm1
vpextrq $1, %xmm2, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1
vpextrq $0, %xmm0, %rax
vmovd %rax, %xmm2
vpextrq $1, %xmm0, %rax
vmovd %rax, %xmm0
vpunpcklqdq %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
ret
Now we get:
vshufpd $1, %xmm0, %xmm0, %xmm0
vextractf128 $1, %ymm1, %xmm1
vshufpd $1, %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137733 91177308-0d34-0410-b5e6-96231b3b80d8
Mips1 does not support double precision loads or stores, therefore two single
precision loads or stores must be used in place of these instructions. This
patch treats double precision loads and stores as if they are legal
instructions until MCInstLowering, instead of generating the single precision
instructions during instruction selection or Prolog/Epilog code insertion.
Without the changes made in this patch, llc produces code that has the same
problem described in r137484 or bails out when
MipsInstrInfo::storeRegToStackSlot or loadRegFromStackSlot is called before
register allocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137711 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed. Our register allocator must be awesome!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137551 91177308-0d34-0410-b5e6-96231b3b80d8
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137519 91177308-0d34-0410-b5e6-96231b3b80d8