Although I had added some support for the BDZ/BDNZ branches into the selector
(in r158204), I had not correctly adjusted the condition at the top of the
loop. As a result, these branches were still essentially unsupported.
This fixes PR16086. Unfortunately, any test case would be very large (because
it would need to force the loop backedge to exceed the range of the 16-bit
immediate).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182385 91177308-0d34-0410-b5e6-96231b3b80d8
pic calls. These need to be there so we don't try and use helper
functions when we call those.
As part of this, make sure that we properly exclude helper functions in pic
mode when indirect calls are involved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182343 91177308-0d34-0410-b5e6-96231b3b80d8
By default, a teq instruction is inserted after integer divide. No divide-by-zero
checks are performed if option "-mnocheck-zero-division" is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182306 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the preheader insertion logic in LoopSimplify is externally exposed,
use it, and remove the copy-and-pasted version.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182300 91177308-0d34-0410-b5e6-96231b3b80d8
As the pairing of this instruction form with the bdnz/bdz branches is now
enforced by the verification pass, make it clear from the name that these
are used only for counter-based loops.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182296 91177308-0d34-0410-b5e6-96231b3b80d8
When asserts are enabled, this adds a verification pass for PPC counter-loop
formation. Unfortunately, without sacrificing code quality, there is no better
way of forming counter-based loops except at the (late) IR level. This means
that we need to recognize, at the IR level, anything which might turn into a
function call (or indirect branch). Because this is currently a finite set of
things, and because SelectionDAG lowering is basic-block local, this can be
done. Nevertheless, it is fragile, and failure results in a miscompile. This
verification pass checks that all (reachable) counter-based branches are
dominated by a loop mtctr instruction, and that no instructions in between
clobber the counter register. If these conditions are not satisfied, then an
ICE will be triggered.
In short, this is to help us sleep better at night.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182295 91177308-0d34-0410-b5e6-96231b3b80d8
R600TextureIntrinsicsReplacer.cpp:232: warning: the address of ‘ArgsType’ will always evaluate as ‘true’
This doesn't have any effect on the output as a vararg intrinsic behaves the
same way as a non-vararg one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182293 91177308-0d34-0410-b5e6-96231b3b80d8
This will simplify the instructions and also the pattern definitions.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182288 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to reorder the operands without breaking the
encoding.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182283 91177308-0d34-0410-b5e6-96231b3b80d8
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182274 91177308-0d34-0410-b5e6-96231b3b80d8
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182254 91177308-0d34-0410-b5e6-96231b3b80d8
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:
Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ...
FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.
We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.
This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182237 91177308-0d34-0410-b5e6-96231b3b80d8
Also clean up the arguments to all the MOVCC instructions so the
operands always are (true-val, false-val, cond-code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182221 91177308-0d34-0410-b5e6-96231b3b80d8
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182191 91177308-0d34-0410-b5e6-96231b3b80d8
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182184 91177308-0d34-0410-b5e6-96231b3b80d8
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182175 91177308-0d34-0410-b5e6-96231b3b80d8
The errors were:
non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list
and
non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182168 91177308-0d34-0410-b5e6-96231b3b80d8
It should increase PV substitution opportunities and lower gpr
usage (pending computations path are "flushed" sooner)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182128 91177308-0d34-0410-b5e6-96231b3b80d8
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182126 91177308-0d34-0410-b5e6-96231b3b80d8
Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182124 91177308-0d34-0410-b5e6-96231b3b80d8
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.
We still prefer palignr over psrldq because it has higher throughput on
sandybridge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182102 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the equivalent change to r182091/r182092
in the old-style code emitter. Instead of having two separate
16-bit immediate encoding routines depending on the instruction,
this patch introduces a single encoder that checks the machine
operand flags to decide whether the low or high half of a
symbol address is required.
Since now both encoders make no further distinction between
"symbolLo" and "symbolHi", the .td operand can now use a
single getS16ImmEncoding method.
Tested by running the old-style JIT tests on 32-bit Linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182097 91177308-0d34-0410-b5e6-96231b3b80d8
Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly
the same everywhere, it no longer makes sense to have two fixup types.
This patch merges them both into a single type fixup_ppc_half16,
and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency.
(The half16 and half16ds names are taken from the description of
relocation types in the PowerPC ABI.)
No change in code generation expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182092 91177308-0d34-0410-b5e6-96231b3b80d8
The current PowerPC MC back end distinguishes between fixup_ppc_ha16
and fixup_ppc_lo16, which are determined by the instruction the fixup
applies to, and uses this distinction to decide whether a fixup ought
to resolve to the high or the low part of a symbol address.
This isn't quite correct, however. It is valid -if unusual- assembler
to use, e.g.
li 1, symbol@ha
or
lis 1, symbol@l
Whether the high or the low part of the address is used depends solely
on the @ suffix, not on the instruction.
In addition, both
li 1, symbol
and
lis 1, symbol
are valid, assuming the symbol address fits into 16 bits; again, both
will then refer to the actual symbol value (so li will load the value
itself, while lis will load the value shifted by 16).
To fix this, two places need to be adapted. If the fixup cannot be
resolved at assembler time, a relocation needs to be emitted via
PPCELFObjectWriter::getRelocType. This routine already looks at
the VK_ type to determine the relocation. The only problem is that
will reject any _LO modifier in a ha16 fixup and vice versa. This
is simply incorrect; any of those modifiers ought to be accepted
for either fixup type.
If the fixup *can* be resolved at assembler time, adjustFixupValue
currently selects the high bits of the symbol value if the fixup
type is ha16. Again, this is incorrect; see the above example
lis 1, symbol
Now, in theory we'd have to respect a VK_ modifier here. However,
in fact common code never even attempts to resolve symbol references
using any nontrivial VK_ modifier at assembler time; it will always
fall back to emitting a reloc and letting the linker handle it.
If this ever changes, presumably there'd have to be a target callback
to resolve VK_ modifiers. We'd then have to handle @ha etc. there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182091 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, three instructions were needed:
trunc.w.s $f0, $f2
mfc1 $4, $f0
sw $4, 0($2)
Now we need only two:
trunc.w.s $f0, $f2
swc1 $f0, 0($2)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182053 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes alias definition for addiu $rs,$imm
and instead uses the TwoOperandAliasConstraint field in
the ArithLogicI instruction class.
This way all instructions that inherit ArithLogicI class
have the same macro defined.
The usage examples are added to test files.
Patch by Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182048 91177308-0d34-0410-b5e6-96231b3b80d8
Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182045 91177308-0d34-0410-b5e6-96231b3b80d8
invalid instruction sequence.
Rather than emitting an int-to-FP move instruction and an int-to-FP conversion
instruction during instruction selection, we emit a pseudo instruction which gets
expanded post-RA. Without this change, register allocation can possibly insert a
floating point register move instruction between the two instructions, which is not
valid according to the ISA manual.
mtc1 $f4, $4 # int-to-fp move instruction.
mov.s $f2, $f4 # move contents of $f4 to $f2.
cvt.s.w $f0, $f2 # int-to-fp conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182042 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds bnez and beqz instructions which represent alias definitions for bne and beq instructions as follows:
bnez $rs,$imm => bne $rs,$zero,$imm
beqz $rs,$imm => beq $rs,$zero,$imm
The corresponding test cases are added.
Patch by Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182040 91177308-0d34-0410-b5e6-96231b3b80d8
This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.
This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.
The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions. This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).
Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.
This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.
This change must be made simultaneously in all places that
access machine operands of this type. However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182032 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC32, i64 FP conversions are implemented using runtime calls (which clobber
the counter register). These must be excluded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182023 91177308-0d34-0410-b5e6-96231b3b80d8
DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to
decompose a memory address into a base/offset pair. It expects the
offset (if constant) to be the true displacement value in order to
perform optional additional optimizations; in particular, to convert
other uses of the original pointer into uses of the new base pointer
after pre-increment.
The PowerPC implementation of getPreIndexedAddressParts, however,
simply calls SelectAddressRegImm, which returns a TargetConstant.
This value is appropriate for encoding into the instruction, but
it is not always usable as true displacement value:
- Its type is always MVT::i32, even on 64-bit, where addresses
ought to be i64 ... this causes the optimization to simply
always fail on 64-bit due to this line in DAGCombiner:
// FIXME: In some cases, we can be smarter about this.
if (Op1.getValueType() != Offset.getValueType()) {
- Its value is truncated to an unsigned 16-bit value if negative.
This causes the above opimization to generate wrong code.
This patch fixes both problems by simply returning the true
displacement value (in its original type). This doesn't
affect any other user of the displacement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182012 91177308-0d34-0410-b5e6-96231b3b80d8
getExceptionHandlingType is not ExceptionHandling::DwarfCFI on xcore, so
etFrameInstructions is never called. There is no point creating cfi
instructions if they are never used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181979 91177308-0d34-0410-b5e6-96231b3b80d8
This creates stubs that help Mips32 functions call Mips16
functions which have floating point parameters that are normally passed
in floating point registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181972 91177308-0d34-0410-b5e6-96231b3b80d8
Increase the number of instructions LLVM recognizes as setting the ZF
flag. This allows us to remove test instructions that redundantly
recalculate the flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181937 91177308-0d34-0410-b5e6-96231b3b80d8
The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We cannot
directly adapt the new Hexagon hardware loops pass, however, because the
Hexagon pass contains a fundamental assumption that non-constant-trip-count
loops will contain a guard, and this is not always true (the result being that
incorrect negative counts can be generated). With this commit, we replace the
pass with a late IR-level pass which makes use of SE to calculate the
backedge-taken counts and safely generate the loop-count expressions (including
any necessary max() parts). This IR level pass inserts custom intrinsics that
are lowered into the desired decrement-and-branch instructions.
The most fragile part of this new implementation is that interfering uses of
the counter register must be detected on the IR level (and, on PPC, this also
includes any indirect branches in addition to function calls). Also, to make
all of this work, we need a variant of the mtctr instruction that is marked
as having side effects. Without this, machine-code level CSE, DCE, etc.
illegally transform the resulting code. Hopefully, this can be improved
in the future.
This new pass is smaller than the original (and much smaller than the new
Hexagon hardware loops pass), and can handle many additional cases correctly.
In addition, the preheader-creation code has been copied from LoopSimplify, and
after we decide on where it belongs, this code will be refactored so that it
can be explicitly shared (making this implementation even smaller).
The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
the new Hexagon pass. There are a few classes of loops that this pass does not
transform (noted by FIXMEs in the files), but these deficiencies can be
addressed within the SE infrastructure (thus helping many other passes as well).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181927 91177308-0d34-0410-b5e6-96231b3b80d8
We want the order to be deterministic on all platforms. NAKAMURA Takumi
fixed that in r181864. This patch is just two small cleanups:
* Move the function to the cpp file. It is only passed to array_pod_sort.
* Remove the ppc implementation which is now redundant
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181910 91177308-0d34-0410-b5e6-96231b3b80d8
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for
v6+ Darwin as well as for v7+ on other targets.
The distinction is made because v6 doesn't guarantee support (but LLVM assumes
that Apple controls hardware+kernel and therefore have conformant v6 CPUs),
whereas v7 does provide this guarantee (and Linux behaves sanely).
Overall this should slightly improve performance in most cases because of
reduced I$ pressure.
Patch by JF Bastien
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181897 91177308-0d34-0410-b5e6-96231b3b80d8
Now that applyFixup understands differently-sized fixups, we can define
fixup_ppc_lo16/fixup_ppc_lo16_ds/fixup_ppc_ha16 to properly be 2-byte
fixups, applied at an offset of 2 relative to the start of the
instruction text.
This has the benefit that if we actually need to generate a real
relocation record, its address will come out correctly automatically,
without having to fiddle with the offset in adjustFixupOffset.
Tested on both 64-bit and 32-bit PowerPC, using external and
integrated assembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181894 91177308-0d34-0410-b5e6-96231b3b80d8
The PPCAsmBackend::applyFixup routine handles the case where a
fixup can be resolved within the same object file. However,
this routine is currently hard-coded to assume the size of
any fixup is always exactly 4 bytes.
This is sort-of correct for fixups on instruction text; even
though it only works because several of what really would be
2-byte fixups are presented as 4-byte fixups instead (requiring
another hack in PPCELFObjectWriter::adjustFixupOffset to clean
it up).
However, this assumption breaks down completely for fixups
on data, which legitimately can be of any size (1, 2, 4, or 8).
This patch makes applyFixup aware of fixups of varying sizes,
introducing a new helper routine getFixupKindNumBytes (along
the lines of what the ARM back end does). Note that in order
to handle fixups of size 8, we also need to fix the return type
of adjustFixupValue to uint64_t to avoid truncation.
Tested on both 64-bit and 32-bit PowerPC, using external and
integrated assembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181891 91177308-0d34-0410-b5e6-96231b3b80d8
The transformation happening here is that we want to turn a
"mul(ext(X), ext(X))" into a "vmull(X, X)", stripping off the extension. We have
to make sure that X still has a valid vector type - possibly recreate an
extension to a smaller type. In case of a extload of a memory type smaller than
64 bit we used create a ext(load()). The problem with doing this - instead of
recreating an extload - is that an illegal type is exposed.
This patch fixes this by creating extloads instead of ext(load()) sequences.
Fixes PR15970.
radar://13871383
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181842 91177308-0d34-0410-b5e6-96231b3b80d8
The changes to CR spill handling missed a case for 32-bit PowerPC.
The code in PPCFrameLowering::processFunctionBeforeFrameFinalized()
checks whether CR spill has occurred using a flag in the function
info. This flag is only set by storeRegToStackSlot and
loadRegFromStackSlot. spillCalleeSavedRegisters does not call
storeRegToStackSlot, but instead produces MI directly. Thus we don't
see the CR is spilled when assigning frame offsets, and the CR spill
ends up colliding with some other location (generally the FP slot).
This patch sets the flag in spillCalleeSavedRegisters for PPC32 so
that the CR spill is properly detected and gets its own slot in the
stack frame.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181800 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by: Alex Deucher
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
NOTE: This is a candidate for the 3.3 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181792 91177308-0d34-0410-b5e6-96231b3b80d8
The GNU assembler treats things like:
brasl %r14, 100
in the same way as:
brasl %r14, .+100
rather than as a branch to absolute address 100. We implemented this in
LLVM by creating an immediate operand rather than the usual expr operand,
and by handling immediate operands specially in the code emitter.
This was undesirable for (at least) three reasons:
- the specialness of immediate operands was exposed to the backend MC code,
rather than being limited to the assembler parser.
- in disassembly, an immediate operand really is an absolute address.
(Note that this means reassembling printed disassembly can't recreate
the original code.)
- it would interfere with any assembly manipulation that we might
try in future. E.g. operations like branch shortening can change
the relative position of instructions, but any code that updates
sym+offset addresses wouldn't update an immediate "100" operand
in the same way as an explicit ".+100" operand.
This patch changes the implementation so that the assembler creates
a "." label for immediate PC-relative operands, so that the operand
to the MCInst is always the absolute address. The patch also adds
some error checking of the offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181773 91177308-0d34-0410-b5e6-96231b3b80d8
Marking instructions as isAsmParserOnly stops them from being disassembled.
However, in cases where separate asm and codegen versions exist, we actually
want to disassemble to the asm ones.
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181772 91177308-0d34-0410-b5e6-96231b3b80d8
The SystemZ port currently relies on the order of the instruction operands
matching the order of the instruction field lists. This isn't desirable
for disassembly, where the two are matched only by name. E.g. the R1 and R2
fields of an RR instruction should have corresponding R1 and R2 operands.
The main complication is that addresses are compound operands,
and as far as I know there is no mechanism to allow individual
suboperands to be selected by name in "let Inst{...} = ..." assignments.
Luckily it doesn't really matter though. The SystemZ instruction
encoding groups all address fields together in a predictable order,
so it's just as valid to see the entire compound address operand as
a single field. That's the approach taken in this patch.
Matching by name in turn means that the operands to COPY SIGN and
CONVERT TO FIXED instructions can be given in natural order.
(It was easier to do this at the same time as the rename,
since otherwise the intermediate step was too confusing.)
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181771 91177308-0d34-0410-b5e6-96231b3b80d8
The SystemZ port currently relies on the order of the instruction operands
matching the order of the instruction field lists. This isn't desirable
for disassembly, where the two are matched only by name. E.g. the R1 and R2
fields of an RR instruction should have corresponding R1 and R2 operands.
The main complication is that addresses are compound operands,
and as far as I know there is no mechanism to allow individual
suboperands to be selected by name in "let Inst{...} = ..." assignments.
Luckily it doesn't really matter though. The SystemZ instruction
encoding groups all address fields together in a predictable order,
so it's just as valid to see the entire compound address operand as
a single field. That's the approach taken in this patch.
Matching by name in turn means that the operands to COPY SIGN and
CONVERT TO FIXED instructions can be given in natural order.
(It was easier to do this at the same time as the rename,
since otherwise the intermediate step was too confusing.)
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181769 91177308-0d34-0410-b5e6-96231b3b80d8
Mips16/32 floating point interoperability.
When Mips16 code calls external functions that would normally have some
of its parameters or return values passed in floating point registers,
it needs (Mips32) helper functions to do this because while in Mips16 mode
there is no ability to access the floating point registers.
In Pic mode, this is done with a set of predefined functions in libc.
This case is already handled in llvm for Mips16.
In static relocation mode, for efficiency reasons, the compiler generates
stubs that the linker will use if it turns out that the external function
is a Mips32 function. (If it's Mips16, then it does not need the helper
stubs).
These stubs are identically named and the linker knows about these tricks
and will not create multiple copies and will delete them if they are not
needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181753 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds alias for addiu instruction which enables following syntax:
addiu $rs,imm
The macro is translated as:
addiu $rs,$rs,imm
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181729 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes warning messages observed in the oggenc application test in
projects/test-suite. Special handling is needed for the 64-bit
PowerPC SVR4 ABI when a constant is initialized with a pointer to a
function in a shared library. Because a function address is
implemented as the address of a function descriptor, the use of copy
relocations can lead to problems with initialization. GNU ld
therefore replaces copy relocations with dynamic relocations to be
resolved by the dynamic linker. This means the constant cannot reside
in the read-only data section, but instead belongs in .data.rel.ro,
which is designed for constants containing dynamic relocations.
The implementation creates a class PPC64LinuxTargetObjectFile
inheriting from TargetLoweringObjectFileELF, which behaves like its
parent except to place constants of this sort into .data.rel.ro.
The test case is reduced from the oggenc application.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181723 91177308-0d34-0410-b5e6-96231b3b80d8
This option is used when the user wants to avoid emitting double precision FP
loads and stores. Double precision FP loads and stores are expanded to single
precision instructions after register allocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181718 91177308-0d34-0410-b5e6-96231b3b80d8
return values are bitcasts.
The chain had previously been being clobbered with the entry node to
the dag, which sometimes caused other code in the function to be
erroneously deleted when tailcall optimization kicked in.
<rdar://problem/13827621>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181696 91177308-0d34-0410-b5e6-96231b3b80d8
It was just a less powerful and more confusing version of
MCCFIInstruction. A side effect is that, since MCCFIInstruction uses
dwarf register numbers, calls to getDwarfRegNum are pushed out, which
should allow further simplifications.
I left the MachineModuleInfo::addFrameMove interface unchanged since
this patch was already fairly big.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181680 91177308-0d34-0410-b5e6-96231b3b80d8
To add a frame now there is a dedicated addFrameMove which also takes
care of constructing the move itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181657 91177308-0d34-0410-b5e6-96231b3b80d8
mips16/mips32 floating point interoperability.
This patch fixes returns from mips16 functions so that if the function
was in fact called by a mips32 hard float routine, then values
that would have been returned in floating point registers are so returned.
Mips16 mode has no floating point instructions so there is no way to
load values into floating point registers.
This is needed when returning float, double, single complex, double complex
in the Mips ABI.
Helper functions in libc for mips16 are available to do this.
For efficiency purposes, these helper functions have a different calling
convention from normal Mips calls.
Registers v0,v1,a0,a1 are used to pass parameters instead of
a0,a1,a2,a3.
This is because v0,v1,a0,a1 are the natural registers used to return
floating point values in soft float. These values can then be moved
to the appropriate floating point registers with no extra cost.
The only register that is modified is ra in this call.
The helper functions make sure that the return values are in the floating
point registers that they would be in if soft float was not in effect
(which it is for mips16, though the soft float is implemented using a mips32
library that uses hard float).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181641 91177308-0d34-0410-b5e6-96231b3b80d8
The issue was that the MatchingInlineAsm and VariantID args to the
MatchInstructionImpl function weren't being set properly. Specifically, when
parsing intel syntax, the parser thought it was parsing inline assembly in the
at&t dialect; that will never be the case.
The crash was caused when the emitter tried to emit the instruction, but the
operands weren't set. When parsing inline assembly we only set the opcode, not
the operands, which is used to lookup the instruction descriptor.
rdar://13854391 and PR15945
Also, this commit reverts r176036. Now that we're correctly parsing the intel
syntax the pushad/popad don't match properly. I've reimplemented that fix using
a MnemonicAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181620 91177308-0d34-0410-b5e6-96231b3b80d8
This commit implements the AsmParser for fnstart, fnend,
cantunwind, personality, handlerdata, pad, setfp, save, and
vsave directives.
This commit fixes some minor issue in the ARMELFStreamer:
* The switch back to corresponding section after the .fnend
directive.
* Emit the unwind opcode while processing .fnend directive
if there is no .handlerdata directive.
* Emit the unwind opcode to .ARM.extab while processing
.handlerdata even if .personality directive does not exist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181603 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
NOTE: This is a candidate for the 3.3 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181579 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes piglit test for OpenCL builtin mul24, and allows mad24 to run.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
NOTE: This is a candidate for the 3.3 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181578 91177308-0d34-0410-b5e6-96231b3b80d8
v2: Add v4i32 test
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
NOTE: This is a candidate for the 3.3 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181577 91177308-0d34-0410-b5e6-96231b3b80d8