vector types to be legal and a ZERO_EXTEND node is encountered.
When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.
This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.
It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.
There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.
However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.
Differential Revision: http://reviews.llvm.org/D4405
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch re-uses the implementation of 'llvm-mc -show-inst' and makes it
available to llc as 'llc -asm-show-inst'.
This is necessary to test parts of MIPS32r6/MIPS64r6 without resorting to
'llc -filetype=obj' tests. For example, on MIPS32r2 and earlier we use the
'jr $rs' instruction for indirect branches and returns. On MIPS32r6, we no
longer have 'jr $rs' and use 'jalr $zero, $rs' instead. The catch is that,
on MIPS32r6, 'jr $rs' is an alias for 'jalr $zero, $rs' and is the preferred
way of writing this instruction. As a result, all MIPS ISA's emit 'jr $rs' in
their assembly output and the assembler encodes this to different opcodes
according to the ISA.
Using this option, we can check that the MCInst really is a JR or a JALR by
matching the emitted comment. This removes the need for a 'llc -filetype=obj'
test.
Reviewers: rafael, dsanders
Reviewed By: dsanders
Subscribers: zoran.jovanovic, llvm-commits
Differential Revision: http://reviews.llvm.org/D4267
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212603 91177308-0d34-0410-b5e6-96231b3b80d8
tracks which elements of the build vector are in fact undef.
This should make actually inpsecting them (likely in my next patch)
reasonably pretty. Also makes the output parameter optional as it is
clear now that *most* users are happy with undefs in their splats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212581 91177308-0d34-0410-b5e6-96231b3b80d8
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.
Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.
This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.
Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212572 91177308-0d34-0410-b5e6-96231b3b80d8
This is and always was strong community consensus. Make this clear in the header
in case newcomers may not be aware.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212570 91177308-0d34-0410-b5e6-96231b3b80d8
nodes about whether they are splats. This is factored out and improved
from r212324 which got reverted as it was far too aggressive. The new
API should help more conservatively handle buildvectors that are
a mixture of splatted and undef values.
No functionality change at this point. The hope is to slowly
re-introduce the undef-tolerant optimization of splats, but each time
being forced to make a concious decision about how to handle the undefs
in a way that doesn't lead to contradicting assumptions about the
collapsed value.
Hal has pointed out in discussions that this may not end up being the
desired API and instead it may be more convenient to get a mask of the
undef elements or something similar. I'm starting simple and will expand
the API as I adapt actual callers and see exactly what they need.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212514 91177308-0d34-0410-b5e6-96231b3b80d8
All blacklisting logic is now moved to the frontend (Clang).
If a function (or source file it is in) is blacklisted, it doesn't
get sanitize_address attribute and is therefore not instrumented.
If a global variable (or source file it is in) is blacklisted, it is
reported to be blacklisted by the entry in llvm.asan.globals metadata,
and is not modified by the instrumentation.
The latter may lead to certain false positives - not all the globals
created by Clang are described in llvm.asan.globals metadata (e.g,
RTTI descriptors are not), so we may start reporting errors on them
even if "module" they appear in is blacklisted. We assume it's fine
to take such risk:
1) errors on these globals are rare and usually indicate wild memory access
2) we can lazily add descriptors for these globals into llvm.asan.globals
lazily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212505 91177308-0d34-0410-b5e6-96231b3b80d8
According to a FIXME in ARMMCTargetDesc.cpp the ARM version parsing should be
in the Triple helper class.
Patch by: Gabor Ballabas
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212479 91177308-0d34-0410-b5e6-96231b3b80d8
lanes in vector splats.
The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.
Original commit message for r212324:
[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can
easily tweak the lowering if they want.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212475 91177308-0d34-0410-b5e6-96231b3b80d8
Use 0 for the invalid buffer instead of -1/~0 and switch to unsigned
representation to enable more idiomatic usage.
Also introduce a trivial SourceMgr::getMainFileID() instead of hard-coding 0/1
to identify the main file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212398 91177308-0d34-0410-b5e6-96231b3b80d8
A number of the ARM intrinsics are aliased with alternative names in MSVC
compatibility mode. This change indicates those intrinsics to permit tablegen
to construct an appropriate list of MSBuiltins. With the corresponding change
in clang, these intrinsics can then be mapped from the frontend.
The tests to validate the intrinsics are aliased correctly will be added with
the corresponding clang change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212377 91177308-0d34-0410-b5e6-96231b3b80d8
The slice(N, M) interface is powerful but not concise when wanting to
drop a few elements off of an ArrayRef, fix this by adding a drop_back
method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212370 91177308-0d34-0410-b5e6-96231b3b80d8
This better aligns with other LLVM-specific and C++ standard library smart
pointer types.
In particular there are at least a few uses of intrusive refcounting in the
frontend where it's worth investigating std::shared_ptr as a more appropriate
alternative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212366 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r212342.
We can get a StringRef into the current Record, but not one in the bitcode
itself since the string is compressed in it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212356 91177308-0d34-0410-b5e6-96231b3b80d8
Add MSBuiltin which is similar in vein to GCCBuiltin. This allows for adding
intrinsics for Microsoft compatibility to individual instructions. This is
needed to permit the creation of ARM specific MSVC extensions.
This is not currently in use, and requires an associated change in clang to
enable use of the intrinsics defined by this new class. This merely sets the
LLVM portion of the infrastructure in place to permit the use of this
functionality. A separate set of changes will enable the new intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212350 91177308-0d34-0410-b5e6-96231b3b80d8
IRObjectFile provides all the logic for producing mangled names and getting
symbols from inline assembly.
LTOModule then adds logic for linking specific tasks, like constructing
llvm.compiler_user or extracting linker options from the bitcode.
The rule of the thumb is that IRObjectFile has the functionality that is
needed by both LTO and llvm-ar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212349 91177308-0d34-0410-b5e6-96231b3b80d8
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212324 91177308-0d34-0410-b5e6-96231b3b80d8
subtarget. This involved having the movt predicate take the current
function - since we care about size in instruction selection for
whether or not to use movw/movt take the function so we can check
the attributes. This required adding the current MachineFunction to
FastISel and propagating through.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212309 91177308-0d34-0410-b5e6-96231b3b80d8
We want to encourage users of the C++ LTO API to reuse memory buffers instead
of repeatedly opening and reading the same file contents.
This reverts commit r212305 and implements a tidier scheme.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212308 91177308-0d34-0410-b5e6-96231b3b80d8
Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions
modelling by adding has-side-effects property.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212276 91177308-0d34-0410-b5e6-96231b3b80d8
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part. When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second. This is true both on big-endian and little-endian
PowerPC systems. (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)
This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:
- When printing an immediate ppcf128 constant to assembler output
in emitGlobalConstantFP, emit the high part first on both big-
and little-endian systems.
- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
(which is used e.g. when generating code to load an argument into a
register pair), use correct low/high part ordering on little-endian
systems.
- In a related issue, because lowering ppcf128 into a pair of f64 must
operate differently from lowering an int128 into a pair of i64,
bitcasts between ppcf128 and int128 must not be optimized away by the
DAG combiner on little-endian systems, but must effect a word-swap.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212274 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a lib/MC/MCAnalysis, the dependency was there just because
of two helper classes. Move the two over to MC.
This will allow IRObjectFile to parse inline assembly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212248 91177308-0d34-0410-b5e6-96231b3b80d8
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.
This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.
This also provides a foundation for other targets to have more granular
control over how vector types are legalized.
Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]
http://reviews.llvm.org/D4322
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212242 91177308-0d34-0410-b5e6-96231b3b80d8
The new library is 150KB on a Release+Asserts build, so it is quiet a bit of
code that regular users of MC don't need to link with now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212209 91177308-0d34-0410-b5e6-96231b3b80d8