There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241411 91177308-0d34-0410-b5e6-96231b3b80d8
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241394 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.
Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241323 91177308-0d34-0410-b5e6-96231b3b80d8
This function can really fail since the string table offset can be out of
bounds.
Using ErrorOr makes sure the error is checked.
Hopefully a lot of the boilerplate code in tools/* can go away once we have
a diagnostic manager in Object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241297 91177308-0d34-0410-b5e6-96231b3b80d8
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative. We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations. As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result. This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.
I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch. Test based on a
report from Anton Blanchard.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241290 91177308-0d34-0410-b5e6-96231b3b80d8
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register. We can implement this using xxspltd (a special form of
xxpermdi). This patch teaches the swap optimization pass about this
idiom.
As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG. Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register. However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result. In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.
The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element). To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations. That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region. This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI. Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.
A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241285 91177308-0d34-0410-b5e6-96231b3b80d8
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.
This allows us to inline things like:
int foo() { return baz(); }
int __attribute__((target("sse4.2"))) bar() {
return foo();
}
so that generic code can be inlined into specialized functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241221 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
* Add 64-bit address space feature.
* Rename SIMD feature to SIMD128.
* Handle single-thread model with an IR pass (same way ARM does).
* Rename generic processor to MVP, to follow design's lead.
* Add bleeding-edge processors, with all features included.
* Fix a few DEBUG_TYPE to match other backends.
Test Plan: ninja check
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10880
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241211 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
According to PTX ISA:
For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.
So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.
As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]
This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.
Patched by Gang Hu.
Test Plan: Test case attached.
Reviewers: jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D10876
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241191 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241185 91177308-0d34-0410-b5e6-96231b3b80d8
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction. This patch moves the
offending logic so lxvdsx is once again generated.
This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant. Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe). This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.
There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization. However, vsx.ll was only
being tested for POWER7 with big-endian code generation. I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241183 91177308-0d34-0410-b5e6-96231b3b80d8
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present. Currently I'm hitting this at
-O0 due to the clang bug PR24009.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241170 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:
add Rd, Rn, -imm -> sub Rd, Rn, imm
sub Rd, Rn, -imm -> add Rd, Rn, imm
adds Rd, Rn, -imm -> subs Rd, Rn, imm
subs Rd, Rn, -imm -> adds Rd, Rn, imm
cmp Rn, -imm -> cmn Rn, imm
cmn Rn, -imm -> cmp Rn, imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !
This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241166 91177308-0d34-0410-b5e6-96231b3b80d8
Only consider an instruction a candidate for relaxation if the last operand of the
instruction is an expression. We previously checked whether any operand is an expression,
which is useless, since for all instructions concerned, the only operand that may be
affected by relaxation is the last one.
In addition, this removes the check for having RIP as an argument, since it was
plain wrong - even when one of the arguments is RIP, relaxation may still be needed.
This fixes PR9807.
Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10766
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241152 91177308-0d34-0410-b5e6-96231b3b80d8
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.
The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.
The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.
Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D10848
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241125 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.
Patched by Xuetian Weng.
Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, sfantao, jholewinski
Differential Revision: http://reviews.llvm.org/D10844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241099 91177308-0d34-0410-b5e6-96231b3b80d8
Duplicating an FP register "as itself" is a bad idea, since it violates the
invariant that every FP register is mapped to at most one FPU stack slot.
Use the scratch FP register instead.
This fixes PR23957.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241069 91177308-0d34-0410-b5e6-96231b3b80d8
Realistically, this will be returning ErrorOr for some time as refactoring the
user code to check once per section will take some time.
Given that, use it for checking if a relocation has addend or not.
While at it, add ELFRelocationRef to simplify the users.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241028 91177308-0d34-0410-b5e6-96231b3b80d8
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.
This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.
The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:
#pragma comment(linker, "/defaultlib:foo /defaultlib:bar")
The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.
Differential Revision: http://reviews.llvm.org/D10548
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241010 91177308-0d34-0410-b5e6-96231b3b80d8
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.
rdar://21504262
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241003 91177308-0d34-0410-b5e6-96231b3b80d8
This is a new version of http://reviews.llvm.org/D10260.
It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.
This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10813
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241002 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.
Reviewers: rengolin, ranjeet.singh
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10645
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240930 91177308-0d34-0410-b5e6-96231b3b80d8