"hint" space for Thumb actually overlaps the encoding space of the CPS
instruction. In actuality, hints can be defined as CPS instructions where imod
and M bits are all nil.
Handle decoding of permitted nop-compatible hints (i.e. nop, yield, wfi, wfe,
sev) in DecodeT2CPSInstruction.
This commit adds a proper diagnostic message for Imm0_4 and updates all tests.
Patch by Mihail Popa <Mihail.Popa@arm.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180617 91177308-0d34-0410-b5e6-96231b3b80d8
In the default PowerPC assembler syntax, registers are specified simply
by number, so they cannot be distinguished from immediate values (without
looking at the opcode). This means that the default operand matching logic
for the asm parser does not work, and we need to specify custom matchers.
Since those can only be specified with RegisterOperand classes and not
directly on the RegisterClass, all instructions patterns used by the asm
parser need to use a RegisterOperand (instead of a RegisterClass) for
all their register operands.
This patch adds one RegisterOperand for each RegisterClass, using the
same name as the class, just in lower case, and updates all instruction
patterns to use RegisterOperand instead of RegisterClass operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180611 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180608 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes). Note that apparently
the compiler currently never generates pre-inc instructions
for floating point types for some reason ...
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180607 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180606 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I ran into an error when using a conditional
branch to an external symbol (this doesn't occur in compiler-generated
code) due to missing support in PPCELFObjectWriter::getRelocTypeInner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180605 91177308-0d34-0410-b5e6-96231b3b80d8
Mips have delayslots for certain instructions
like jumps and branches. These are instructions
that follow the branch or jump and are executed
before the jump or branch is completed.
Early Mips compilers could not cope with delayslots
and left them up to the assembler. The assembler would
fill the delayslots with the appropriate instruction,
usually just a nop to allow correct runtime behavior.
The default behavior for this is set with .set reorder.
To tell the assembler that you don't want it to mess with
the delayslot one used .set noreorder.
For backwards compatibility we need to support
.set reorder and have it be the default behavior in the
assembler.
Our support for it is to insert a NOP directly after an
instruction with a delayslot when in .set reorder mode.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180584 91177308-0d34-0410-b5e6-96231b3b80d8
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180573 91177308-0d34-0410-b5e6-96231b3b80d8
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180129 91177308-0d34-0410-b5e6-96231b3b80d8
AArch64 always demands a register-scavenger, so the pointer should never be
NULL. However, in the spirit of paranoia, we'll assert it before use just in
case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180080 91177308-0d34-0410-b5e6-96231b3b80d8
now taken care of by the frontend, which allows us to parse arbitrary C/C++
variables.
Part of rdar://13663589
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180037 91177308-0d34-0410-b5e6-96231b3b80d8
-- C.4 and C.5 statements, when NSAA is not equal to SP.
-- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
variadic procedure.
Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
No exceptions, no CPRCs here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180011 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
With a little help from the frontend, it looks like the standard va_*
intrinsics can do the job.
Also clean up an old bitcast hack in LowerVAARG that dealt with
unaligned double loads. Load SDNodes can specify an alignment now.
Still missing: Calling varargs functions with float arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179961 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.
This removes the invalid 0-offset from the instruction being
produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179956 91177308-0d34-0410-b5e6-96231b3b80d8
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179940 91177308-0d34-0410-b5e6-96231b3b80d8