There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.
There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The N32/N64 ABI's return f128 values in $f0 and $f2 for hard-float and $v0 and
$a0 for soft-float. The registers used in the soft-float case differ from the
usual $v0, and $v1 specified for return values.
Both cases were previously handled by duplicating the CCState::AnalyzeReturn()
and CCState::AnalyzeCallReturn() functions and modifying them to delegate to
a different assignment function for f128 and further replace the register type
for the hard-float case. There is a simpler way to do both of these.
We now use the common functions and select an initial assignment function based
on whether the original type is f128 or not. We then handle the hard-float case
using CCBitConvertToType<>.
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5269
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218036 91177308-0d34-0410-b5e6-96231b3b80d8
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.
and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8
Certain directives are unsupported on Windows (some of which could/should be
supported). We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer. Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218014 91177308-0d34-0410-b5e6-96231b3b80d8
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.
I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.
I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.
As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218012 91177308-0d34-0410-b5e6-96231b3b80d8
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218004 91177308-0d34-0410-b5e6-96231b3b80d8
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8
This type isn't owned polymorphically (as demonstrated by making the
dtor protected and everything still compiling) so just address the
warning by protecting the base dtor and making the derived class final.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217990 91177308-0d34-0410-b5e6-96231b3b80d8
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.
This fixes rdar://problem/18224511.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217986 91177308-0d34-0410-b5e6-96231b3b80d8
Try to fold the multiply into the add/sub or logical operations (when
possible).
This is related to rdar://problem/18369687.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217978 91177308-0d34-0410-b5e6-96231b3b80d8
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).
This fixes rdar://problem/18369443.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217977 91177308-0d34-0410-b5e6-96231b3b80d8
This encapsulates how we handle the coverage regions of a file or
function. In the old model, the user had to deal with nested regions,
so they needed to maintain their own auxiliary data structures to get
any useful information out of this. The new API provides a sequence of
non-overlapping coverage segments, which makes it possible to render
coverage information in a single pass and avoids a fair amount of
extra work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217975 91177308-0d34-0410-b5e6-96231b3b80d8
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).
This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217973 91177308-0d34-0410-b5e6-96231b3b80d8
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.
This is related to rdar://problem/18358882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217972 91177308-0d34-0410-b5e6-96231b3b80d8
Since read2 / write2 are emitted for 4-byte aligned 8-byte
accesses, these are seen by the scheduler.
The DAG scheduler is semi-deprecated, so just
ignore these for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217969 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.
This also adds unit tests to checks all the different condition codes.
This is related o rdar://problem/18358882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217966 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
Test Plan: Added more cases to atomic-load-store.ll + make check-all
Reviewers: jfb, t.p.northover, luqmana
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5304
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217965 91177308-0d34-0410-b5e6-96231b3b80d8
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.
Does not include f64 tests because folding those inline
immediates currently does not work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217964 91177308-0d34-0410-b5e6-96231b3b80d8
It isn't always useful to skip blank lines, as evidenced by the
somewhat awkward use of line_iterator in llvm-cov. This adds a knob to
control whether or not to skip blanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217960 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions are now generally selected to the e64 forms originally,
and shrunk down later. Rename foldOperands to legalizeOperands,
since that's really most of what it tries to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217959 91177308-0d34-0410-b5e6-96231b3b80d8
This improves other optimizations such as LSR. A sext may be added to the
compare's other operand, but this can often be hoisted outside of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217953 91177308-0d34-0410-b5e6-96231b3b80d8
Example:
define i1 @foo(i32 %a) {
%shr = ashr i32 -9, %a
%cmp = icmp ne i32 %shr, -5
ret i1 %cmp
}
Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.
This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217950 91177308-0d34-0410-b5e6-96231b3b80d8