Commit Graph

5198 Commits

Author SHA1 Message Date
Matt Arsenault
3e8ed89484 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 19:12:07 +00:00
Chandler Carruth
4c27c85cde [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we were
not widening the input type to the node sufficiently to let the ext take
place in a register.

This would in turn result in a mysterious bitcast assertion failure
downstream. First change here is to add back the helpful assert I had in
an earlier version of the code to catch this immediately.

Next change is to add support to the type legalization to detect when we
have widened the operand either too little or too much (for whatever
reason) and find a size-matched legal vector type to convert it to
first. This can also fail so we get a new fallback path, but that seems
OK.

With this, we no longer crash on vec_cast2.ll when using widening. I've
also added the CHECK lines for the zero-extend cases here. We still need
to support sign-extend and trunc (or something) to get plausible code
for the other two thirds of this test which is one of the regression
tests that showed the most scalarization when widening was
force-enabled. Slowly closing in on widening being a viable legalization
strategy without it resorting to scalarization at every turn. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212614 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 12:36:54 +00:00
Benjamin Kramer
4afbd3e941 X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.

Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 11:12:39 +00:00
Chandler Carruth
ce184e95f9 [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening
vector types to be legal and a ZERO_EXTEND node is encountered.

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Differential Revision: http://reviews.llvm.org/D4405

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:58:18 +00:00
Chandler Carruth
98eac0a244 [x86] Re-apply a variant of the x86 side of r212324 now that the rest
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.

The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.

This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
  (where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
  now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
  a correct check of whether the splat is the only user of a loaded
  value, checking that the splat actually crosses multiple lanes before
  using a broadcast, and handling broadcasts of non-constant splats.

As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:06:58 +00:00
Andrea Di Biagio
b8245a4599 [DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches how to fold a shuffle according to rule:
  shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)

We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.

This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.

That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212539 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 15:22:29 +00:00
Chandler Carruth
25b7d54e7f [x86,SDAG] Sink the logic for folding shuffles of splats more
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.

This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.

First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.

Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.

Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.

But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 08:45:38 +00:00
Andrea Di Biagio
cfb83b7bac [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types.
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.

Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.

define <4 x float> @test1(<4 x float> %V) {
  %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  ret <4 x float> %2
}



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212498 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 23:25:23 +00:00
Juergen Ributzka
1154be8198 [FastISel][X86] Fix smul.with.overflow.i8 lowering.
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.

This fixes <rdar://problem/17549300>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 21:52:21 +00:00
Chandler Carruth
7fcb422bb2 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212475 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:03:32 +00:00
Chandler Carruth
36ad61f4ea [x86] Teach the new vector shuffle lowering code to handle what is
essentially a DAG combine that never gets a chance to run.

We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.

This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212444 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 09:06:58 +00:00
NAKAMURA Takumi
4a183c6b87 llvm/test/CodeGen/X86/vector-gep.ll: Appease to add -mtriple=i686-linux.
This doesn't pass if stack alignment is not 16, like cygming, *bsd.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212334 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-04 11:55:40 +00:00
Chandler Carruth
f9749edd61 [x86] Relax the line in this check to pacify build bots.
I still don't love testing the comments, but its the only sane way to
check shuffle instructions...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212326 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-04 08:39:30 +00:00
Chandler Carruth
b2a3131c88 [x86] Move some check lines to be slightly easier for me to find.
(meant to put this cleanup in the previous patch, sorry)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212325 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-04 08:19:37 +00:00
Chandler Carruth
a46e60eb2e [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.

This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.

With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.

We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212324 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-04 08:11:49 +00:00
Eric Christopher
3a3941576d Temporarily revert "Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information." as it appears to be breaking some LTO constructs.
This reverts commit r212203.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212298 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 22:24:54 +00:00
Andrea Di Biagio
d4167d0b29 [X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes.
This patch adds tablegen patterns to select F16C float-to-half-float
conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes.

If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32'
are expanded into library calls.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212293 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 21:51:06 +00:00
Sanjay Patel
1a699b6833 bug fix for PR20020: anti-dependency-breaker causes miscompilation
This patch sets the 'KeepReg' bit for any tied and live registers during the PrescanInstruction() phase of the dependency breaking algorithm. It then checks those 'KeepReg' bits during the ScanInstruction() phase to avoid changing any tied registers. For more details, please see comments in:
http://llvm.org/bugs/show_bug.cgi?id=20020

I added two FIXME comments for code that I think can be removed by using register iterators that include self. I don't want to include those code changes with this patch, however, to keep things as small as possible.

The test case is larger than I'd like, but I don't know how to reduce it further and still produce the failing asm.

Differential Revision: http://reviews.llvm.org/D4351



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212275 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 15:19:40 +00:00
Ulrich Weigand
1ef2cec146 Fix ppcf128 component access on little-endian systems
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part.  When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second.  This is true both on big-endian and little-endian
PowerPC systems.  (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)

This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:

- When printing an immediate ppcf128 constant to assembler output
  in emitGlobalConstantFP, emit the high part first on both big-
  and little-endian systems.

- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
  (which is used e.g. when generating code to load an argument into a
  register pair), use correct low/high part ordering on little-endian
  systems.

- In a related issue, because lowering ppcf128 into a pair of f64 must
  operate differently from lowering an int128 into a pair of i64,
  bitcasts between ppcf128 and int128 must not be optimized away by the
  DAG combiner on little-endian systems, but must effect a word-swap.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212274 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 15:06:47 +00:00
NAKAMURA Takumi
693b9ee16f Let llvm/test/CodeGen/X86/lower-bitcast.ll tolerant of win32 calling convention.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212258 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 07:25:00 +00:00
Chandler Carruth
90ac572331 [x86] Fix the completely broken vector widening legalization of bswap.
This operation was classified as a binary operation in the widening
logic for some reason (clearly, untested). It is in fact a unary
operation. Add a RUN line to a test to exercise this for x86.

Note that again the vector widening strategy doesn't regress anything
and in one case removes a totally unecessary instruction that we
couldn't avoid when promoting the element type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 07:04:38 +00:00
Chandler Carruth
7e924a68b5 [x86] Fix crashes in lowering bitcast instructions with the widening
mode.

This also runs the test in that mode which would reproduce the crash.
What I love is that *every single FIXME* in the test is addressed by
switching to widening.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212254 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 03:43:47 +00:00
Adam Nemet
998743e185 [X86] AVX512: Allow writemask argument in vpermt* intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212223 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-02 21:26:01 +00:00
Adam Nemet
df5d431084 [X86] AVX512: Add writemask variants for vperm*2*
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).

<rdar://problem/17492620>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212221 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-02 21:25:54 +00:00
David Blaikie
f9b6d0373a Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information.
If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.

While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.

Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.

Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-02 18:31:35 +00:00
Benjamin Kramer
11be760757 X86: When combining shuffles just remove shuffles that are completely redundant.
CombineTo doesn't allow replacing a node with itself so this would crash if the
combined shuffle is the same as the input shuffle.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212181 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-02 15:09:44 +00:00
Elena Demikhovsky
0780b6db5d AVX-512: dec/inc instructions are slow on KNL
After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC.
Added a test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212178 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-02 14:11:05 +00:00
Tim Northover
99ec36c684 X86: delegate expanding atomic libcalls to generic code.
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.

X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.

This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212134 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-01 21:44:59 +00:00
Tim Northover
5c8b83eb7a X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212119 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-01 18:53:31 +00:00
Andrea Di Biagio
e6cfdc8471 [X86] Add support for builtin to read performance monitoring counters.
This patch adds support for a new builtin instruction called
__builtin_ia32_rdpmc.

Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can
be used to read performance monitoring counters. It takes as input the index
of the performance counter to read, and returns the value of the specified
performance counter as a 64-bit number.

Calls to this new builtin will map to instruction RDPMC.
The index in input to the builtin call is moved to register %ECX. The result
of the builtin call is the value of the specified performance counter (RDPMC
would return that quantity in registers RDX:RAX).

This patch:
 - Adds builtin int_x86_rdpmc as a GCCBuiltin;
 - Adds a new x86 DAG node called 'RDPMC_DAG';
 - Teaches how to lower this new builtin;
 - Adds an ISel pattern to select instruction RDPMC;
 - Fixes the definition of instruction RDPMC adding %RAX and %RDX as
   implicit definitions, and adding %ECX as implicit use;
 - Adds a LLVM test to verify that the new builtin is correctly selected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212049 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-30 17:14:21 +00:00
Chandler Carruth
3e5215bf73 [x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like
lowering for v16i8.

ASan and some bots caught this bug with existing test cases. Fixing it
even fixed a miscompile with one of the test cases. I'm still a bit
suspicious of this test case as I've not taken a proper amount of time
to think about it, but the fix here is strict goodness.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211976 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-28 05:46:28 +00:00
Chandler Carruth
fe05f61e5d [x86] Add handling for splat-like widenings of v16i8 shuffles.
These show up really frequently, not the least with actual splats. =] We
lowered these quite badly before. The new code path tries to widen i8
shuffles to i16 shuffles in a splat-like way. There are still some
inefficiencies in our i16 splat logic though, so we aren't really done
here.

Also, for certain patterns (bit of a gather-and-splat) we still
generate pretty silly code, and I've left a fixme for addressing it.
However, I'm not actually worried about this code pattern as much. The
old shuffle lowering generates a 29 instruction monstrosity for it that
should execute much more slowly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211974 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-28 05:16:40 +00:00
David Majnemer
be56f16052 This file wasn't supposed to be checked in
This was generated while trying to debug a test, it shouldn't have been
checked in.

Thanks to Alexander Kornienko for spotting this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211973 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-28 01:56:50 +00:00
Chandler Carruth
63195d7e5a [x86] Fix another bug hit when bootstrapping with the new shuffle
lowering.

For maximum irony, I had already discovered this bug, diagnosed it, and
left FIXMEs about it in the test cases. =[ I just failed to go back over
those until after i had reduced a bootstrap miscompile down to a single
TU, stared at the assembly for an hour, and figured out the bug. Again.

Oh well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211955 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 20:07:40 +00:00
Chandler Carruth
75504d45ec [x86] Fix a miscompile in the new shuffle lowering uncovered by
a bootstrap.

I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 18:25:23 +00:00
David Majnemer
c8a1169c93 IR: Add COMDATs to the IR
This new IR facility allows us to represent the object-file semantic of
a COMDAT group.

COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.

This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.

Differential Revision: http://reviews.llvm.org/D4178

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211920 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 18:19:56 +00:00
Chandler Carruth
c5114dbcc3 [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.

Differential Revision: http://reviews.llvm.org/D4292

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211892 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 11:40:13 +00:00
Chandler Carruth
4363b0729b [x86] Teach the target-specific combining how to aggressively fold
half-shuffles, even looking through intervening instructions in a chain.

Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.

Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.

Differential Revision: http://reviews.llvm.org/D4291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211890 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 11:34:40 +00:00
Chandler Carruth
f91161874e [x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are
trivially redundant.

This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.

I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.

Differential Revision: http://reviews.llvm.org/D4240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211889 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 11:27:52 +00:00
Chandler Carruth
050d187bc8 [x86] Begin a significant overhaul of how vector lowering is done in the
x86 backend.

This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.

Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
  There are cases where a previous shuffle could be modified slightly to
  arrange for elements to be in the correct position and a later shuffle
  eliminated. Doing this eagerly added quite a bit of complexity, and
  so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
  added. This is essential for real world shuffles as it turns out...

Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.

Differential Revision: http://reviews.llvm.org/D4225

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211888 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 11:23:44 +00:00
Juergen Ributzka
b1b6d10d09 [StackMaps] Enable patchpoint liveness analysis per default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211817 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-26 23:39:52 +00:00
Juergen Ributzka
307a6447e5 [Stackmaps] Remove the liveness calculation for stackmap intrinsics.
There is no need to calculate the liveness information for stackmaps. The
liveness information is still available for the patchpoint intrinsic and
that is also the intended usage model.

Related to <rdar://problem/17473725>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211816 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-26 23:39:44 +00:00
Andrea Di Biagio
817812f61c [X86] Improve the selection of SSE3/AVX addsub instructions.
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:

 - (shuffle (FADD A, B), (FSUB A, B), Mask) ->
       (shuffle (FSUB A, -B), (FADD A, -B), Mask)

Where 'Mask' is:
  <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.
  <0,3>                ;; for v2f64 shuffles.
  <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.

In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.

This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.

The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211771 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-26 10:45:21 +00:00
Juergen Ributzka
d01f1c4054 [FastISel][X86] Only fold the cmp into the select when both instructions are in the same basic block.
If the cmp is in a different basic block, then it is possible that not all
operands of that compare have defined registers. This can happen when one of
the operands to the cmp is a load and the load gets folded into the cmp. In
this case FastISel will skip the load instruction and the vreg is never
defined.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211730 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 20:06:12 +00:00
Andrea Di Biagio
cae1ea691d [X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.

In general:
 - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
 - BLENDPS/D instructions tend to always have better 'reciprocal throughput'
   than the equivalent SHUFPS/D;
 - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
   m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
   than one execution port.

This patch:
 - Moves the check for 'isBlendMask' immediately before the check for
   'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
 - Updates existing tests for sse/avx shuffle/blend instructions to verify
   that we select (v)blendps/d when possible (instead of (v)shufps/d or
   vperm2f128).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211720 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 17:41:58 +00:00
Chandler Carruth
2edf5e45ec [x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211694 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 13:12:54 +00:00
NAKAMURA Takumi
b720a3d15c Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211691 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 12:41:52 +00:00
Andrea Di Biagio
3e5582cc15 [X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.

The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.

Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211679 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 10:02:21 +00:00
Rafael Espindola
5178d868c8 Fix a regression from r211653.
The method was empty in the null streamer but I mistakenly replaced it with
the aborting one in MCStreamer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211666 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 05:31:22 +00:00
NAKAMURA Takumi
9a18ab013f CodeGen/X86/pr20088.ll: Add -march=x86-64, or llc fails due to non-x86 default target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211659 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 03:05:47 +00:00
Juergen Ributzka
35a6a81407 [FastISel][X86] Fold XALU condition into branch and compare.
Optimize the codegen of select and branch instructions to directly use the
EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211645 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-24 23:51:21 +00:00
David Blaikie
639c71bafb Fix up scoping in a few tests (and delete one that validates unnecessary behavior).
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.

test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-24 20:10:27 +00:00
Robert Khasanov
031ad1b930 vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them
Fix PR20088

Differential Revision: http://reviews.llvm.org/D4277


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211617 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-24 18:08:04 +00:00
Juergen Ributzka
20732d55c2 [FastISel][X86] Lower unsupported selects to control-flow.
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211545 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 21:55:44 +00:00
Juergen Ributzka
d0976a3d20 [FastISel][X86] Add support for floating-point select.
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211544 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 21:55:40 +00:00
Juergen Ributzka
5f4e6e1ec0 [FastISel][X86] Optimize selects when the condition comes from a compare.
Optimize the select instructions sequence to use the EFLAGS directly from a
compare when possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211543 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 21:55:36 +00:00
NAKAMURA Takumi
9124b45918 Revert r211399, "Generate native unwind info on Win64"
It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211480 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-22 22:00:56 +00:00
Filipe Cabecinhas
7798d5992a Fix PR20087 by using the source index when changing the vector load
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211472 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-22 17:21:37 +00:00
Benjamin Kramer
636a9bece4 Legalizer: Add support for splitting insert_subvectors.
We handle this by spilling the whole thing to the stack and doing the
insertion as a store.

PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211435 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-21 12:56:42 +00:00
Andrea Di Biagio
5d0ff9c928 [X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions.
This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions
from a sequence of "vadd + vsub + blend".

Example:

///
typedef float float4 __attribute__((ext_vector_type(4)));

float4 foo(float4 A, float4 B) {
  float4 X = A - B;
  float4 Y = A + B;
  return (float4){X[0], Y[1], X[2], Y[3]};
}
///

Before this patch, (with flag -mcpu=corei7) llc produced the following
assembly sequence:
  movaps  %xmm0, %xmm2
  addps   %xmm1, %xmm2
  subps   %xmm1, %xmm0
  blendps $10, %xmm2, %xmm0


With this patch, we now get a single
  addsubps  %xmm1, %xmm0



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211427 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-21 01:31:15 +00:00
Reid Kleckner
5b8e73ef81 Generate native unwind info on Win64
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211399 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-20 20:35:47 +00:00
Andrea Di Biagio
cfdf805286 [X86] Teach how to combine horizontal binop even in the presence of undefs.
Before this change, the backend was unable to fold a build_vector dag
node with UNDEF operands into a single horizontal add/sub.

This patch teaches how to combine a build_vector with UNDEF operands into a
horizontal add/sub when possible. The algorithm conservatively avoids to combine
a build_vector with only a single non-UNDEF operand.

Added test haddsub-undef.ll to verify that we correctly fold horizontal binop
even in the presence of UNDEFs.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211265 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-19 10:29:41 +00:00
Adam Nemet
f1b790f791 [X86] AVX512: Add non-temporal stores
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores.  These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.

<rdar://problem/17082571>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211176 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 16:51:10 +00:00
Cameron McInally
c52345c0fc Add pattern for unsigned v4i32->v4f64 convert on AVX512.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 14:04:37 +00:00
Tim Northover
5393254646 DAG: move sret demotion into most basic LowerCallTo implementation.
It looks like there are two versions of LowerCallTo here: the
SelectionDAGBuilder one is designed to operate on LLVM IR, and the
TargetLowering one in the case where everything is at DAG level.

Previously, only the SelectionDAGBuilder variant could handle demoting
an impossible return to sret semantics (before delegating to the
TargetLowering version), but this functionality is also useful for
certain libcalls (e.g. 128-bit operations on 32-bit x86).  So this
commit moves the sret handling down a level.

rdar://problem/17242889

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211155 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 11:52:44 +00:00
Louis Gerbarg
41b33299cf Allow X86FastIsel to cope with 64 bit absolute relocations
This patch is a follow up to r211040 & r211052. Rather than bailing out of fast
isel this patch will generate an alternate instruction (movabsq) instead of the
leaq. While this will always have enough room to handle the 64 bit displacment
it is generally over kill for internal symbols (most displacements will be
within 32 bits) but since we have no way of communicating the code model to the
the assmebler in order to avoid flagging an absolute leal/leaq as illegal when
using a symbolic displacement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211130 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-17 23:22:41 +00:00
Juergen Ributzka
e8cb2ee1cd [FastISel][X86] Optimize predicates and fold CMP instructions.
This optimizes predicates for certain compares, such as fcmp oeq %x, %x to
fcmp ord %x, %x. The latter one is more efficient to generate.

The same optimization is applied to conditional branches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211126 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-17 21:55:43 +00:00
Juergen Ributzka
1d5ff6bb7a [FastISel][X86] Fix previous refactoring commit (r211077)
Overlooked that fcmp_une uses an "or" instead of an "and" for combining the
flags.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211104 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-17 14:47:45 +00:00
Juergen Ributzka
408691f967 [FastISel][X86] Refactor the code to get the X86 condition from a helper function. NFC.
Make use of helper functions to simplify the branch and compare instruction
selection in FastISel. Also add test cases for compare and conditonal branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211077 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-16 23:58:24 +00:00
Louis Gerbarg
a564159d85 Fix illegal relocations in X86FastISel
On x86_86  the lea instruction can only use a 32 bit immediate value. When
the code is compiled statically the RIP register is not used, meaning the
immediate is all that can be used for the relocation, which is not sufficient
in the case of targets more than +/- 2GB away. This patch bails out of fast
isel in those cases and reverts to DAG which does the right thing.

Test case included.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211040 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-16 17:35:40 +00:00
Cameron McInally
5d57928a32 Hook up vector int_ctlz for AVX512.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211024 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-16 14:12:28 +00:00
David Blaikie
0fcb9cb1c1 DebugInfo: Following up to r209677, refactor local variable emission to delay the choice between emitting the definition attributes or using DW_AT_abstract_definition
This doesn't fix the abstract variable handling yet, but it introduces a
similar delay mechanism as was added for subprograms, causing
DW_AT_location to be reordered to the beginning of the attribute list
for local variables, and fixes all the test fallout for that.

A subsequent commit will remove the abstract variable handling in
DbgVariable and just do the abstract variable lookup at module end to
ensure that abstract variables introduced after their concrete
counterparts are appropriately referenced by the concrete variable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210943 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 22:18:23 +00:00
Tim Northover
eee7a7a836 X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly
Lowering this new node allows us to fold the almost universal
comparison for success before it's even formed. Instead we can create
a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg"
instructions set the zero-flag to the correct value.

rdar://problem/13201607

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 17:29:39 +00:00
Tim Northover
8f2a85e099 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210903 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 14:24:07 +00:00
Cameron McInally
1d5e37946d Fix bad copy-and-paste from r210652. AVX512 masked leading zero intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210901 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 13:20:01 +00:00
NAKAMURA Takumi
c5865cb4fc llvm/test/CodeGen/X86/fast-isel-args-fail2.ll: Don't expect to fail with -Asserts. It might or might not crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210894 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 12:05:06 +00:00
Juergen Ributzka
e431884ed7 [FastISel][X86] Add support for cvttss2si/cvttsd2si intrinsics.
This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding
insertelement instructions are folded into the conversion instruction (if
possible).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210870 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 02:21:58 +00:00
Juergen Ributzka
7f8d138f50 [FastISel][X86] - Add branch weights
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210863 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-13 00:45:11 +00:00
Juergen Ributzka
4eddf94a14 [FastISel][X86] Add MachineMemOperand to load/store instructions.
This commit adds MachineMemOperands to load and store instructions. This allows
the peephole optimizer to fold load instructions. Unfortunatelly the peephole
optimizer currently doesn't run at -O0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210858 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 23:27:57 +00:00
Juergen Ributzka
cc738f8367 Update test case to use "not" instead of "XFAIL".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210829 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 21:17:40 +00:00
Juergen Ributzka
c7147a3b6d [FastISel][X86] Argument lowering test case
This test case is supposed to xfail, because we do not handle structs or byval
arguments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210816 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 20:34:09 +00:00
Juergen Ributzka
3f2b28dcaf [FastIsel][X86] Add support for lowering the first 8 floating-point arguments.
Recommit with fixed argument attribute checking code, which is required to bail
out of all the cases we don't handle yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210815 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 20:12:34 +00:00
Juergen Ributzka
a15b05e1aa Revert "[FastIsel][X86] Add support for lowering the first 8 floating-point arguments."
Reverting it because it breaks several tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210810 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 19:21:43 +00:00
Tom Stellard
82a51defb6 Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210792 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 16:04:47 +00:00
Andrea Di Biagio
bf4e625cf1 [X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors.
This patch adds target combine rules to match:
 - [AVX] Horizontal add/sub of packed single/double precision floating point
   values from 256-bit vectors;
 - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210761 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 10:53:48 +00:00
Juergen Ributzka
729fc8facc [FastISel][x86] Add testcase for r210719.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210746 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 03:54:05 +00:00
Juergen Ributzka
f66c14f656 [x86] Improve frameaddress test from r210709.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210743 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 03:29:29 +00:00
Juergen Ributzka
2c9a12f081 [FastISel] Add support for the stackmap intrinsic.
This implements target-independent FastISel lowering for the stackmap intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210742 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 03:29:26 +00:00
Juergen Ributzka
02503401b4 [FastISel][X86] Add support for the sqrt intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210720 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-11 23:11:02 +00:00
Juergen Ributzka
a2d36a20fa [FastISel][X86] Add support for the frameaddress intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210709 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-11 21:44:44 +00:00
Cameron McInally
998d8f50a7 Add AVX512 masked leadz instrinsic support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210652 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-11 12:54:45 +00:00
Andrea Di Biagio
a069e64112 [X86] Refactor the logic to select horizontal adds/subs to a helper function.
This patch moves part of the logic implemented by the target specific
combine rules added at r210477 to a separate helper function.
This should make easier to add more rules for matching AVX/AVX2 horizontal
adds/subs.

This patch also fixes a problem caused by a wrong check performed on indices
of extract_vector_elt dag nodes in input to the scalar adds/subs.

New tests have been added to verify that we correctly check indices of
extract_vector_elt dag nodes when selecting a horizontal operation.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210644 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-11 07:57:50 +00:00
Juergen Ributzka
0adbcf3ba9 [FastISel][X86] Extend support for {s|u}{add|sub|mul}.with.overflow intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 23:52:44 +00:00
Andrea Di Biagio
c0edcf7de8 [X86] Improved target combine rules for selecting horizontal add/sub.
This patch slightly changes the algorithm introduced at revision 210477
to fix a problem where the algorithm was producing incorrect code for 
the VEX.256 encoded versions of horizontal add/sub.

For these cases, we now try to split the two 256-bit vectors into
128-bit chunks before emitting horizontal add/sub dag nodes.

Added a new test case into haddsub-2.ll.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210545 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 16:42:57 +00:00
Adam Nemet
8dea1c4167 [X86] AVX512: Add vmovntdqa
Along with the corresponding intrinsic and tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210543 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 16:39:53 +00:00
Tim Northover
efbf7d1ceb Revert "X86: elide comparisons after cmpxchg instructions."
This reverts commit r210523. It was committed prematurely without waiting for
review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210524 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 10:50:11 +00:00
Tim Northover
984ee65445 X86: elide comparisons after cmpxchg instructions.
The C++ and C semantics of the compare_and_swap operations actually
require us to return a boolean "success" value. In LLVM terms this
means a second comparison of the output of "cmpxchg" against the input
desired value.

However, x86's "cmpxchg" instruction sets all flags for the comparison
formed, so we can skip any secondary comparison. (N.b. this isn't true
for cmpxchg8b/16b, which only set ZF).

rdar://problem/13201607

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210523 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 10:49:07 +00:00
Alp Toker
8aeca44558 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210496 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 22:42:55 +00:00
Andrea Di Biagio
9b6992ddc2 [X86] Add target combine rules for horizontal add/sub.
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.

This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:

  (insert_vector_elt (insert_vector_elt A, I0), I1) ->
    (insert_vecto_elt (insert_vector_elt A, I1), I0)

This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210477 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 16:54:41 +00:00
NAKAMURA Takumi
5b7ac8ba18 llvm/test/CodeGen/X86/2014-05-29-factorial.ll: Relax an expression to match Win32 x64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210471 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 14:20:23 +00:00
Andrea Di Biagio
592d439efe [X86] Avoid emitting unnecessary test instructions.
This patch teaches the backend how to check for the 'NoSignedWrap' flag on
binary operations to improve the emission of 'test' instructions.

If the result of a binary operation is known not to overflow we know that
resetting the Overflow flag is unnecessary and so we can avoid emitting
the test instruction.

Patch by Marcello Maggioni.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210468 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 12:34:50 +00:00
Andrea Di Biagio
bc72c8f0d8 [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.
This patch modifies SelectionDAGBuilder to construct SDNodes with associated
NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator
instructions.

Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing
nsw/nuw/exact flags during codegen.

Patch by Marcello Maggioni.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210467 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 12:32:53 +00:00
Alp Toker
3c9b41b949 Fix typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210401 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-07 21:23:09 +00:00
Benjamin Kramer
0da5960e5b X86: Don't turn shifts into ands if there's another use that may not check for equality.
Fixes PR19964.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-06 21:08:55 +00:00
Filipe Cabecinhas
78cf19b9b9 Fixed a bug in lowering shuffle_vectors to insertps
Summary:
We were being too strict and not accounting for undefs.
Added a test case and fixed another one where we improved codegen.

Reviewers: grosbach, nadav, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4039

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210361 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-06 18:07:06 +00:00
Tom Roeder
5d0f7af3dc Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute.
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.

This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210280 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-05 19:29:43 +00:00
Eric Christopher
957b2cc0f2 Revert r209381 as it isn't a local variable. Add a testcase so that
we know next time this happens.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-03 21:01:39 +00:00
Rafael Espindola
2d21b25393 Allow alias to point to an arbitrary ConstantExpr.
This  patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.

This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like

@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
                                 i32 ptrtoint (i32* @bar to i32)) to i32*)

An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).

Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210062 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-03 02:41:57 +00:00
Andrea Di Biagio
0bedfa456a [X86] Fix checked arithmetic for i8 on X86.
When lowering a ISD::BRCOND into a test+branch, make sure that we
always use the correct condition code to emit the test operation.

This fixes PR19858: "i8 checked mul is wrong on x86".

Patch by Keno Fisher!



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210032 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-02 16:00:27 +00:00
Filipe Cabecinhas
c3648ce2dc Make blend tests more specific
Following the lead set by r209324, I'm making these tests match the whole
instruction, so we can be sure we're lowering them correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209947 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-31 00:52:23 +00:00
Andrea Di Biagio
1726e2ff15 [X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.

This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
        (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
        (shuffle (BINOP A, B), Undef, <Mask>).

Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209930 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 23:17:53 +00:00
Filipe Cabecinhas
d99cefbad1 Convert a vselect into a concat_vector if possible
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.

Added a test.

The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.

Reviewers: nadav, delena, grosbach, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 23:03:11 +00:00
Filipe Cabecinhas
94141a42ed Separate the check for blend shuffle_vector masks
Summary:
Separate the check for blend shuffle_vector masks into isBlendMask.
This function will also be used to check if a vector shuffle is legal. No
change in functionality was intended, but we ended up improving codegen on
two tests, which were being (more) optimized only if the resulting shuffle
was legal.

Reviewers: nadav, delena, andreadb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3964

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 21:31:21 +00:00
Rafael Espindola
6319c0c520 [pr19636] Fix known bit computation in urem instruction with power of two.
Patch by Andrey Kuharev.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209902 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 15:00:45 +00:00
Adam Nemet
0997206ac4 [X86] Move test from r209863 to CodeGen/X86
We should only run this if X86 is in the targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209866 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-29 23:52:53 +00:00
Adam Nemet
7cd893a985 [X86] Remove AVX1 vbroadcast intrinsics
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h.  This patch removes the LLVM intrinsics from the backend.

We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.

The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics.  As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.

CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change.  It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.

Also verified that there was no assembly change in the test-suite before and
after this patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209864 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-29 23:35:36 +00:00
Filipe Cabecinhas
ade072c1a9 Added tests for shufflevector lowering to blend instrs.
These tests ensure that a change I will propose in clang works as
expected.

Summary:
Added tests for the generation of blend+immediate instructions from a
shufflevector.
These tests were proposed along with a patch that was dropped. I'm
committing the tests anyway to protect against possible regressions in
codegen.

Reviewers: nadav, bkramer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3600

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209853 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-29 22:04:42 +00:00
Michael J. Spencer
11ef9456a8 [x86] Fold extract_vector_elt of a load into the Load's address computation.
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-29 01:42:45 +00:00
Rafael Espindola
665d42accf [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209759 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-28 18:15:43 +00:00
Filipe Cabecinhas
c5f611404c Convert some X86 blendv* intrinsics into IR.
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.

This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.

Both the transformation and the lowering of its result to asm got shiny
new tests.

The transformation is a bit convoluted because of blendvp[sd]'s
definition:

Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.

I will send an email to llvm-dev to discuss if we want to change this or
not.

Reviewers: grosbach, delena, nadav

Differential Revision: http://reviews.llvm.org/D3859

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209643 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-27 03:42:20 +00:00
Rafael Espindola
524b8836c2 Just check the entire string.
Thanks to David Blaikie for the suggestion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-26 04:08:51 +00:00
Rafael Espindola
c385367909 Emit data or code export directives based on the type.
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.

With this patch it is now possible to do things like

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-25 12:49:07 +00:00
Rafael Espindola
fba226fc52 Make these CHECKs a bit more strict.
The " at the end of the line makes sure we matched the entire directive.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209599 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-25 12:43:13 +00:00
Rafael Espindola
ef72e73da9 Use alias linkage and visibility to decide tls access mode.
This matches both what we do for the non-thread case and what gcc does.

With this patch clang would match gcc's behaviour in

static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }

if not for pr19843. Manually writing the IL does produce the same access modes.

It is also a step in the direction of fixing pr19844.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209543 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-23 19:16:56 +00:00
Nico Rieck
b16514017b Remove unused CHECK lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209539 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-23 19:06:44 +00:00
Rafael Espindola
c3be377e36 Convert test to use FileCheck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-23 16:51:13 +00:00
Andrea Di Biagio
3957d4245f [X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8.
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).

This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-22 16:21:39 +00:00
Tim Northover
de70176f5f Segmented stacks: omit __morestack call when there's no frame.
Patch by Florian Zeitz

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209436 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-22 13:03:43 +00:00
Quentin Colombet
fd0096a42c [X86] Fix a bug in the lowering of BLENDI introduced in r209043.
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.

The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.

Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).

<rdar://problem/16975435>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209324 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-21 22:00:39 +00:00
Eric Christopher
6a9366c0c6 Move the function and data section flags into the options struct and
make the functions to set them non-static.
Move and rename the llvm specific backend options to avoid conflicting
with the clang option.

Paired with a backend commit to update.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209238 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-20 21:25:34 +00:00
Quentin Colombet
50d4008b47 [LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN.
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.

<rdar://problem/16731508>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209230 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-20 19:25:04 +00:00
Renato Golin
5d7afdb2ec Avoids DCE on write_register
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209222 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-20 17:40:03 +00:00
Benjamin Kramer
51af588366 Legalizer: Make bswap promotion safe for vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-20 09:42:31 +00:00
David Blaikie
c108a06c86 DebugInfo: Assume all subprogram DIEs have been created before any abstract subprograms are constructed.
Since we visit the whole list of subprograms for each CU at module
start, this is clearly true - don't test for the case, just assert it.

A few old test cases seemed to have incomplete subprogram lists, but any
attempt to reproduce them shows full subprogram lists that even include
entities that have been completely inlined and the out of line
definition removed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209178 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 23:16:19 +00:00
Andrea Di Biagio
8e4a223f7b [X86] Add ISel patterns to improve the selection of TZCNT and LZCNT.
Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always
provide the operand size as output if the input operand is zero.

We can take advantage of this knowledge during instruction selection
stage in order to simplify a few corner case.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209159 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 20:38:59 +00:00
Filipe Cabecinhas
ca162faee2 Added more insertps optimizations
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.

Also added tests for SSE4.1, AVX, and AVX2.

Reviewers: delena, nadav, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3581

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209156 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 19:45:57 +00:00
Benjamin Kramer
bb81d9d5fa SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209123 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 13:12:38 +00:00
Filipe Cabecinhas
cb596baadd Change the blend tests to AVX, not AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209107 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 04:47:12 +00:00
Chandler Carruth
2ff4a49344 [x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflw
were added in SSE2, no SSSE3. Found this while auditing all uses of
SSSE3 in the X86 target. I don't actually expect this to make
a significant difference on anything and I don't have any detailed test
cases but I updated the existing test cases that already covered some of
this code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209056 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-17 03:29:20 +00:00
Filipe Cabecinhas
d77c1c4465 Implemented special cases for PerformVSELECTCombine.
vselects with constant masks, after legalization, will get turned into
specialized shuffle_vectors so they can be matched to blend+imm
instructions.

Fixed some tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209044 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 22:47:54 +00:00
Filipe Cabecinhas
5ea7215050 Lower vselects into X86ISD::BLENDI when appropriate.
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.

This is not enough for all cases. An additional SELECTCombine optimization
will be committed.

Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209043 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 22:47:49 +00:00
Rafael Espindola
27c076ae40 Fix most of PR10367.
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.

To avoid changing all alias related tests in this patches, I kept the common
syntax

@foo = alias i32* @bar

to mean the same as now. The cases that used to use cast now use the more
general syntax

@foo = alias i16, i32* @bar.

Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.

For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.

One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.

A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209007 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 19:35:39 +00:00
David Blaikie
33b37c7b12 DebugInfo: Assume the CU's Subprogram list only contains definitions.
DIBuilder maintains this invariant and the current DwarfDebug code could
end up doing weird things if it contained declarations (such as putting
the definition DIE inside a CU that contained the declaration - this
doesn't seem like a good idea, so rather than adding logic to handle
this case we'll just ban in for now & cross that bridge if we come to
it later).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209004 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 18:26:53 +00:00
NAKAMURA Takumi
70e3aba855 llvm/test/CodeGen/X86/combine-sse41-intrinsics.ll: Add explicit triple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-15 15:45:31 +00:00
Andrea Di Biagio
9836c47ea6 [X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics.
Added target specific combine rules to fold blend intrinsics according
to the following rules:
 1) fold(blend A, A, Mask) -> A;
 2) fold(blend A, B, <allZeros>) -> A;
 3) fold(blend A, B, <allOnes>) -> B.

Added two new tests to verify that the new folding rules work for all
the optimized blend intrinsics.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208895 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-15 15:18:15 +00:00
Jay Foad
6b543713a2 Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-14 21:14:37 +00:00
Benjamin Kramer
202be06318 X86: If we have an instruction that sets a flag and a zero test on the input of that instruction try to eliminate the test.
For example
	tzcntl	%edi, %ebx
	testl %edi, %edi
	je	.label

can be rewritten into
	tzcntl	%edi, %ebx
	jb 	.label

A minor complication is that tzcnt sets CF instead of ZF when the input
is zero, we have to rewrite users of the flags from ZF to CF. Currently
we recognize patterns using lzcnt, tzcnt and popcnt.

Differential Revision: http://reviews.llvm.org/D3454

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-14 16:14:45 +00:00
Joey Gouly
fdae312e65 [CGP] r205941 changed the logic, so that a cast happens *before* 'Result' is
compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is
nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We
should use dyn_cast_or_null here to check 'Result' is not null and it is an
instruction.

Bug found by Mats Petersson, and I reduced his IR to get a test case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-13 15:42:45 +00:00
Reid Kleckner
17335ce80f Try to fix an SDAG dependence issue with sret
r208453 added support for having sret on the second parameter.  In that
change, the code for copying sret into a virtual register was hoisted
into the loop that lowers formal parameters.  This caused a "Wrong
topological sorting" assertion failure during scheduling when a
parameter is passed in memory.  This change undoes that by creating a
second loop that deals with sret.

I'm worried that this fix is incomplete.  I don't fully understand the
dependence issues.  However, with this change we produce the same DAGs
we used to produce, so if they are broken, they are just as broken as
they have always been.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208637 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-12 22:01:27 +00:00
Adam Nemet
45fc47013f [Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg
Tested by comparing make check VERBOSE=1 before and after to make sure
no tests are missed.  (VERBOSE=1 prints the list of tests.)

Only one test :( remains where .cpp is required:

tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp | FileCheck %s --check-prefix=STDOUT

The topic was discussed in this thread:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-12 19:57:31 +00:00