A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.
radar://14338767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185712 91177308-0d34-0410-b5e6-96231b3b80d8
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring. Extend it to cope with single instructions
that copy from one frame index to another.
The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185705 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes foldMemoryOperandImpl() so that it doesn't create duplicated
frame MMOs. I hadn't realized when writing r185434 that it was the caller's
responsibility to add these.
No behavioural change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185704 91177308-0d34-0410-b5e6-96231b3b80d8
...now that the problem that prompted the restriction has been fixed.
The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185701 91177308-0d34-0410-b5e6-96231b3b80d8
When a target@got@tprel or target@got@tprel@l symbol variant is used in
a fixup_ppc_half16 (*not* fixup_ppc_half16ds) context, we currently fail,
since the corresponding R_PPC64_GOT_TPREL16 / R_PPC64_GOT_TPREL16_LO
relocation types do not exist.
However, since such symbol variants resolve to GOT offsets which are
always 4-aligned, we can simply instead use the _DS variants of the
relocation types, which *do* exist.
The same applies for the @got@dtprel variants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185700 91177308-0d34-0410-b5e6-96231b3b80d8
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.
The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout. I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185698 91177308-0d34-0410-b5e6-96231b3b80d8
SystemZ wants normal register scavenging slots, as close to the stack or
frame pointer as possible. The only reason it was using custom code was
because PrologEpilogInserter assumed an x86-like layout, where the frame
pointer is at the opposite end of the frame from the stack pointer.
This meant that when frame pointer elimination was disabled,
the slots ended up being as close as possible to the incoming
stack pointer, which is the opposite of what we want on SystemZ.
This patch adds a new knob to say which layout is used and converts
SystemZ to use target-independent scavenging slots. It's one of the pieces
needed to support frame-to-frame MVCs, where two slots might be required.
The ABI requires us to allocate 160 bytes for calls, so one approach
would be to use that area as temporary spill space instead. It would need
some surgery to make sure that the slot isn't live across a call though.
I stuck to the "isFPCloseToIncomingSP - ..." style comment on the
"do what the surrounding code does" principle. The FP case is already
covered by several Systemz/frame-* tests, which fail without the
PrologueEpilogueInserter change, so no new ones are needed.
No behavioural change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185696 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the last missing construct to parse TLS-related
assembler code:
add 3, 4, symbol@tls
The ADD8TLS currently hard-codes the @tls into the assembler string.
This cannot be handled by the asm parser, since @tls is parsed as
a symbol variant. This patch changes ADD8TLS to have the @tls suffix
printed as symbol variant on output too, which allows us to remove
the isCodeGenOnly marker from ADD8TLS. This in turn means that we
can add a AsmOperand to accept @tls marked symbols on input.
As a side effect, this means that the fixup_ppc_tlsreg fixup type
is no longer necessary and can be merged into fixup_ppc_nofixup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185692 91177308-0d34-0410-b5e6-96231b3b80d8
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.
In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185688 91177308-0d34-0410-b5e6-96231b3b80d8
This implements a proper PPCAsmBackend::writeNopData routine
that actually writes PowerPC nop instructions.
This fixes the last remaining difference in object file output
(text section) between the integrated assembler and GNU as
that I've seen anywhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185662 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a new decoder table/namespace 'VFPV8', as these instructions have their
top 4 bits as 0b1111, while other Thumb instructions have 0b1110.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185642 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for specifying condition registers and
condition register fields via expressions using the symbols
defined by the PowerISA, like "4*cr2+eq".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185633 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely academic because GHC calls are always tail calls so the register mask will never be used; however, this change makes the code clearer and brings the ARM implementation of the GHC calling convention in line with the X86 implementation. Also, it might save someone else some time trying to figuring out what is happening...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185592 91177308-0d34-0410-b5e6-96231b3b80d8
In the ARM back-end, build_vector nodes are lowered to a target specific
build_vector that uses floating point type.
This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.
In other words, this conversion may introduce artificial dependencies when the
code leading to the build vector cannot be completed with a floating point type.
In particular, this happens when loads are not aligned.
Before this patch, in that case, the compiler generates general purpose loads
and creates the floating point vector from them, instead of directly using the
vector unit.
The patch uses a vector friendly sequence of code when the inserted bitcasts to
floating point survived DAGCombine.
This is done by a target specific DAGCombine that changes the target specific
build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.
<rdar://problem/14170854>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185587 91177308-0d34-0410-b5e6-96231b3b80d8
Before the fix Thumb2 instructions of type "add rD, rN, #imm" (T3 encoding, see ARM ARM A8.8.4) with rD and rN both being low registers (r0-r7) were classified as having the T4 encoding.
The T4 encoding doesn't have a cc_out operand so for above instructions the operand gets erroneously removed, corrupting the token stream and leading to parse errors later in the process.
This bug prevented "add r1, r7, #0xcbcbcbcb" from being assembled correctly.
Fixes <rdar://problem/14224440>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185575 91177308-0d34-0410-b5e6-96231b3b80d8
Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.
Current code however always emits mtcrf. This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible. It does create
inefficient code with the integrated assembler, however.
To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything. Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.
As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185561 91177308-0d34-0410-b5e6-96231b3b80d8
When accessing just a single CR register, it is always preferable to
use mfocrf instead of mfcr, if the former is available on the CPU.
Current code makes that distinction in many, but not all places
where a single CR register value is retrieved. One missing
location is PPCRegisterInfo::lowerCRSpilling.
To fix this and make this simpler in the future, this patch changes
the bulk of the back-end to always assume mfocrf is available and
simply generate it when needed.
On machines that actually do not support mfocrf, the instruction
is replaced by mfcr at the very end, in EmitInstruction.
This has the additional benefit that we no longer need the
MFCRpseud hack, since before EmitInstruction we always have
a MFOCRF instruction pattern, which already models data flow
as required.
The patch also adds the MFOCRF8 version of the instruction,
which was missing so far.
Except for the PPCRegisterInfo::lowerCRSpilling case, no change
in generated code intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185556 91177308-0d34-0410-b5e6-96231b3b80d8
The subroutine getCRIdxForSetCC has a parameter "Other" and comment:
If this returns with Other != -1, then the returned comparison
is an or of two simpler comparisons.
However for at least the last five years this routine has never
returned a value of Other != -1; these cases are now handled
differently to begin with.
This patch removes the parameter and the code in SelectSETCC that
attempted to handle the Other != -1 case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185541 91177308-0d34-0410-b5e6-96231b3b80d8
A couple of AltiVec patterns are just specialized forms of the
generic instruction pattern, and should therefore be marked
isCodeGenOnly to avoid confusing the asm parser:
VCFSX_0, VCTUXS_0, VCFUX_0, VCTSXS_0, and V_SETALLONES.
Noticed by inspection of the generated PPCGenAsmMatcher.inc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185533 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the generic forms of mtspr/mfspr
for the asm parser. The compiler will continue to use
the specialized patters for mtlr etc. since those are
needed to correctly describe data flow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185532 91177308-0d34-0410-b5e6-96231b3b80d8
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>. Use it to cut down on the number of spill loads.
Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.
This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch. Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185529 91177308-0d34-0410-b5e6-96231b3b80d8
1. it should accept only 4-byte aligned addresses
2. the maximum offset should be 1020
3. it should be encoded with the offset scaled by two bits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185528 91177308-0d34-0410-b5e6-96231b3b80d8
Swift cores implement store barriers that are stronger than the ARM
specification but weaker than general barriers. They are, in fact, just about
enough to provide the ordering needed for atomic operations with release
semantics.
This patch makes use of that quirk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185527 91177308-0d34-0410-b5e6-96231b3b80d8
Rename Function->DispKey and PairType->DispSize. I'd originally used
"Function" because I thought it might be useful for other InstMappings.
However, it turns out that having two very similar instructions with the
same Function makes it pretty useless for anything other than the displacement
size key. Other InstMappings will want the key to be defined for only one
instruction in the pair.
No behavioural change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185526 91177308-0d34-0410-b5e6-96231b3b80d8
Get rid of some old code (and associated FIXME) for handling the
caller-allocated register save area. No behavioural change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185525 91177308-0d34-0410-b5e6-96231b3b80d8
*NOTE* In a recent version of posix, they added the restrict keyword to the
arguments for this function. From some spelunking it seems that on some
platforms, the call has restrict on its arguments and others it does not. Thus I
left off the restrict keyword from the function prototype in the comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185501 91177308-0d34-0410-b5e6-96231b3b80d8
This patch now adds support for recognizing TLS call sequences in
the asm parser. This needs a new pattern BL8_TLS, which is like
BL8_NOP_TLS except without nop. That pattern is used for the
asm parser only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185478 91177308-0d34-0410-b5e6-96231b3b80d8
As part of the global-dynamic and local-dynamic TLS sequences, we need
to use a special form of the call instruction:
bl __tls_get_addr(sym@tlsld)
bl __tls_get_addr(sym@tlsgd)
which generates two fixups. The current implementation of this causes
problems with recognizing this form in the asm parser. To fix this,
this patch reworks operand processing for this special form by using
a single operand to hold both __tls_get_addr and sym@tlsld and defining
a print method to output the above form, and an encoding method to
generate the two fixups.
As a side simplification, the patch replaces the two instruction
patterns BL8_NOP_TLSGD and BL8_NOP_TLSLD by a single BL8_NOP_TLS,
since the patterns already operate in an identical fashion (whether
we have a local-dynamic or global-dynamic symbol is already encoded
in the symbol modifier).
No change in code generation intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185477 91177308-0d34-0410-b5e6-96231b3b80d8
The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
This causes some confusion with the asm parser, since VK_PPC_TLSGD
is output as @tlsgd, which is then read back in as VK_TLSGD.
To avoid this confusion, this patch removes the PowerPC-specific
modifiers and uses the generic modifiers throughout. (The only
drawback is that the generic modifiers are printed in upper case
while the usual convention on PowerPC is to use lower-case modifiers.
But this is just a cosmetic issue.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185476 91177308-0d34-0410-b5e6-96231b3b80d8
This adds an implementation of getDebugThreadLocalSymbol for
(64-bit) PowerPC. This needs to return a generic MCExpr
since on ppc64, we need to add a bias of 0x8000 to the
value returned by the R_PPC64_DTPREL64 relocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185461 91177308-0d34-0410-b5e6-96231b3b80d8