We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208508 91177308-0d34-0410-b5e6-96231b3b80d8
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208413 91177308-0d34-0410-b5e6-96231b3b80d8
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208289 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207315 91177308-0d34-0410-b5e6-96231b3b80d8
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206971 91177308-0d34-0410-b5e6-96231b3b80d8
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206684 91177308-0d34-0410-b5e6-96231b3b80d8
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.
At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206485 91177308-0d34-0410-b5e6-96231b3b80d8
This code has been moved to a new function in the TargetLowering
class called expandMUL(). The purpose of this is to be able
to share lowering code between the SelectionDAGLegalize and
DAGTypeLegalizer classes.
No functionality changed intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206036 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205649 91177308-0d34-0410-b5e6-96231b3b80d8
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205453 91177308-0d34-0410-b5e6-96231b3b80d8
There are two general methods for expanding a BUILD_VECTOR node:
1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
them together.
2. Build the vector on the stack and then load it.
Currently, we use a fixed heuristic: If there are only one or two unique
defined values, then we attempt an expansion in terms of SCALAR_TO_VECTOR and
vector shuffles (provided that the required shuffle mask is legal). Otherwise,
always expand via the stack. Even when SCALAR_TO_VECTOR is not legal, this
can still be a good idea depending on what tricks the target can play when
lowering the resulting shuffle. If the target can't do anything special,
however, and if SCALAR_TO_VECTOR is expanded via the stack, this heuristic
leads to sub-optimal code (two stack loads instead of one).
Because only the target knows whether the SCALAR_TO_VECTORs and shuffles for a
build vector of a particular type are likely to be optimial, this adds a new
TLI function: shouldExpandBuildVectorWithShuffles which takes the vector type
and the count of unique defined values. If this function returns true, then
method (1) will be used, subject to the constraint that all of the necessary
shuffles are legal (as determined by isShuffleMaskLegal). If this function
returns false, then method (2) is always used.
This commit does not enhance the current code to support expanding a
build_vector with more than two unique values using shuffles, but I'll commit
an implementation of the more-general case shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205230 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
Given IR like:
%bit = and %val, #imm-with-1-bit-set
%tst = icmp %bit, 0
br i1 %tst, label %true, label %false
some targets can emit just a single instruction (tbz/tbnz in the
AArch64 case). However, with ISel acting at the basic-block level, all
three instructions need to be together for this to be possible.
This adds another transformation to CodeGenPrep to expose these
opportunities, if targets opt in via the hook.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205086 91177308-0d34-0410-b5e6-96231b3b80d8
After some discussion on IRC, emitting a call to the library function seems
like a better default, since it will move from a compiler internal error to
a linker error, that the user can work around until LLVM is fixed.
I'm also adding a note on the responsibility of the user to confirm that
the cache was cleared on platforms where nothing is done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204806 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204802 91177308-0d34-0410-b5e6-96231b3b80d8
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.
In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.
Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203767 91177308-0d34-0410-b5e6-96231b3b80d8
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203204 91177308-0d34-0410-b5e6-96231b3b80d8
Unfortunately, it is currently impossible to use a PatFrag as part of an output
pattern (the part of the pattern that has instructions in it) in TableGen.
Looking at the current implementation, this was clearly intended to work (there
is already code in place to expand patterns in the output DAG), but is
currently broken by the baked-in type-checking assumption and the order in which
the pattern fragments are processed (output pattern fragments need to be
processed after the instruction definitions are processed).
Fixing this is fairly simple, but requires some way of differentiating output
patterns from the existing input patterns. The simplest way to handle this
seems to be to create a subclass of PatFrag, and so that's what I've done here.
As a simple example, this allows us to write:
def crnot : OutPatFrag<(ops node:$in),
(CRNOR $in, $in)>;
def : Pat<(not i1:$in),
(crnot $in)>;
which captures the core use case: handling of repeated subexpressions inside
of complicated output patterns.
This will be used by an upcoming commit to the PowerPC backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202450 91177308-0d34-0410-b5e6-96231b3b80d8
This is a temporary workaround for native arm linux builds:
PR18996: Changing regalloc order breaks "lencod" on native arm linux builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202433 91177308-0d34-0410-b5e6-96231b3b80d8
This replaces the old NoIntegratedAssembler with at TargetOption. This is
more flexible and will be used to forward clang's -no-integrated-as option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201836 91177308-0d34-0410-b5e6-96231b3b80d8
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.
This fixes pr18900.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201711 91177308-0d34-0410-b5e6-96231b3b80d8
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201700 91177308-0d34-0410-b5e6-96231b3b80d8