11562 Commits

Author SHA1 Message Date
Sanjay Patel
2a7841dd4d move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235997 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 16:39:12 +00:00
Elena Demikhovsky
83259d70bb Fixed crash of variable shift inst on AVX2
https://llvm.org/bugs/show_bug.cgi?id=22955



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235993 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 14:46:35 +00:00
Sergey Dmitrouk
1f7a90d793 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235989 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 14:05:47 +00:00
Daniel Jasper
515cc265c9 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235987 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk
716c5d8a30 [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235977 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 11:56:37 +00:00
Elena Demikhovsky
44a0c9071a AVX-512: Added "pandn" intrinsics set
by Asaf Badouh (asaf.badouh@intel.com)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235971 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-28 08:12:42 +00:00
Sanjay Patel
fd55d49f65 remove obsolete pattern matches for scalar SSE ops
The blendi pattern should always replace the insertps pattern after:
http://reviews.llvm.org/rL232850
http://reviews.llvm.org/rL235124



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235930 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 22:23:17 +00:00
Sanjay Patel
95619d83af fix 80-cols; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235902 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 17:45:44 +00:00
Sanjay Patel
4e9cdece79 fix typos; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235896 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 17:03:31 +00:00
Elena Demikhovsky
f8ae1af2e1 AVX-512: added calling conventions for i1 vectors.
Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235889 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 15:11:19 +00:00
Elena Demikhovsky
17bbdd05dd AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235875 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 12:57:59 +00:00
Simon Pilgrim
6df35e7844 [X86][SSE] Add v16i8/v32i8 multiplication support
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.

The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.

Differential Revision: http://reviews.llvm.org/D9115

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235837 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-27 07:55:46 +00:00
Lang Hames
579cebfb15 [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.
AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a
reference for this is crufty.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235752 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-24 19:11:51 +00:00
Sanjay Patel
3f1f6571cc [x86] Add store-folded memop patterns for vcvtps2ph
Differential Revision: http://reviews.llvm.org/D7296



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235517 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-22 16:11:19 +00:00
Andrea Di Biagio
6c347524e2 [X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259).
This fixes a regression introduced at revision 218263.

On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.

Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.

However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.

This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235509 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-22 14:53:39 +00:00
Lang Hames
a1c0ce8518 [patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.

The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235483 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-22 06:02:31 +00:00
Sanjay Patel
2b2b3a87da [x86] allow 64-bit extracted vector element integer stores on a 32-bit system
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.

So instead of producing this:

  pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
  movd	%xmm0, (%eax)
  movd	%xmm1, 4(%eax)

We can do:

  movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Differential Revision: http://reviews.llvm.org/D9134



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-22 00:24:30 +00:00
Matthias Braun
9e0a1565b9 X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.

Differential Revision: http://reviews.llvm.org/D9137

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235394 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 17:21:36 +00:00
Elena Demikhovsky
bf704ed348 AVX-512: Added VPMOVx2M instructions for SKX,
fixed encoding of VPMOVM2x.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235385 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 14:38:31 +00:00
Elena Demikhovsky
695922de3d AVX-512: Added VPTESTM and VPTESTNM instructions for SKX
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235383 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 13:13:46 +00:00
Elena Demikhovsky
a1fa0de258 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235375 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 10:27:40 +00:00
Simon Pilgrim
01eaaa72bf [X86][SSE] Provide execution domains for scalar floating point operations
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.

I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.

Differential Revision: http://reviews.llvm.org/D9095

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235372 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 08:40:22 +00:00
Matthias Braun
6fbedc4cfd X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235367 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-21 01:13:41 +00:00
Andrea Di Biagio
14fc08301c [X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.

Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.

So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.

Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235295 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-20 11:56:59 +00:00
Simon Pilgrim
ca3e6fafc8 [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235281 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-19 22:16:49 +00:00
Craig Topper
6fa7febee4 Remove unnecessary include and probably a layering violation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235262 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-19 00:57:33 +00:00
Sanjay Patel
c7b16819e8 [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014

Review:
http://reviews.llvm.org/D8691



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235210 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-17 17:02:37 +00:00
Rafael Espindola
db244041cd Move AliasedSymbol to MachObjectWriter.
It was only used by MachO.
Part of pr19627.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235185 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-17 12:28:43 +00:00
Sanjay Patel
e3e5fcab94 [X86] add an exedepfix entry for movq == movlps == movlpd
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.

It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.

Differential Revision: http://reviews.llvm.org/D8691



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235014 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-15 15:47:51 +00:00
Sanjay Patel
0332323ab6 [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235012 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-15 15:22:55 +00:00
Rafael Espindola
c98092e28d Use raw_pwrite_stream in the object writer/streamer.
The ELF object writer will take advantage of that in the next commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234950 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-14 22:14:34 +00:00
Krzysztof Parzyszek
fcc330abfe Allow memory intrinsics to be tail calls
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234764 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-13 17:16:45 +00:00
Alexander Kornienko
c16fc54851 Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234679 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-11 02:11:45 +00:00
Benjamin Kramer
0973b7ddb8 Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234586 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-10 11:24:51 +00:00
Rafael Espindola
7e0993d377 clang-format bits of code to make a followup patch easy to read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234519 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-09 18:32:58 +00:00
Rafael Espindola
e4053cd377 Don't repeat name in comment. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234506 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-09 17:10:57 +00:00
Rafael Espindola
838c24a7c8 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234315 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-07 13:42:44 +00:00
Simon Pilgrim
2ec7242600 [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Differential Revision: http://reviews.llvm.org/D8839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234193 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-06 18:39:00 +00:00
Craig Topper
4d1e15c54f [X86] Apply AddedComplexity consistently for similar patterns. This keeps them together in the DAGISel tables and reduces table size slightly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234086 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-04 04:22:12 +00:00
Craig Topper
4ed0907298 [X86] Add a comment about the change in r234075.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234079 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-04 02:31:43 +00:00
Craig Topper
b1ff87ec86 [X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead.
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.

Fixes PR23100.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234075 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-04 02:08:20 +00:00
David Majnemer
f89ce9a09d [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234061 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-03 22:32:26 +00:00
Duncan P. N. Exon Smith
f4f021c0a4 CodeGen: Assert that inlined-at locations agree
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.

The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.

If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234038 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-03 19:20:26 +00:00
Simon Pilgrim
be149a8148 [X86] Added SSE4.2 CRC32 memory folding patterns + tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234013 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-03 14:24:40 +00:00
Simon Pilgrim
e5ecd32488 [X86][3DNow] Added 3DNow! memory folding patterns + tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234008 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-03 11:50:30 +00:00
Peter Collingbourne
c39f5dd0e2 MC: For variable symbols, maintain MCSymbol::Section as a cache.
Fixes PR19582.

Previously, when an asm assignment (.set or =) was created, we would look up
the section immediately in MCSymbol::setVariableValue. This caused symbols
to receive the wrong section if the RHS of the assignment had not been seen
yet. This had a knock-on effect in the object file emitters, causing them
to emit extra symbols, or to give symbols the wrong visibility or the wrong
section. For example, in the following asm:

.data
.Llocal:

.text
leaq .Llocal1(%rip), %rdi
.Llocal1 = .Llocal2
.Llocal2 = .Llocal

the first assignment would give .Llocal1 a null section, which would never get
fixed up by the second assignment. This would cause the ELF object file emitter
to consider .Llocal1 to be an undefined symbol and give it external linkage,
even though .Llocal1 should not have been emitted at all in the object file.

Or in the following asm:

alias_to_local = Ltmp0
Ltmp0:

the Mach-O object file emitter would give the alias_to_local symbol a n_type
of N_SECT and a n_sect of 0.  This is invalid under the Mach-O specification,
which requires N_SECT symbols to receive a non-zero section number if the
symbol is defined in a section in the object file.

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist

After this change we do not look up the section when the assignment is created,
but instead look it up on demand and store it in Section, which is treated
as a cache if the symbol is a variable symbol.

This change also fixes a bug in MCExpr::FindAssociatedSection. Previously,
if we saw a subtraction, we would return the first referenced section, even in
cases where we should have been returning the absolute pseudo-section. Now we
always return the absolute pseudo-section for expressions that subtract two
section-derived expressions. This isn't always correct (e.g. if one of the
sections ends up being laid out at an absolute address), but it's probably
the best we can do without more context.

This allows us to remove code in two places where we appear to have been
working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols
and in X86AsmPrinter::EmitStartOfAsmFile.

Re-applies r233595 (aka D8586), which was reverted in r233898.

Differential Revision: http://reviews.llvm.org/D8798

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233995 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-03 01:46:11 +00:00
Sanjay Patel
5b93ab6cde [AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
Without this patch, we split the 256-bit vector into halves and produced something like:
	movzwl	(%rdi), %eax
	vmovd	%eax, %xmm0
	vxorps	%xmm1, %xmm1, %xmm1
	vblendps	$15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233941 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-02 20:21:52 +00:00
Sanjay Patel
8765e82c83 [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
For code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

    We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
    We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.

[1] AddedComplexity has close to no documentation in the source. 
The best we have is this comment: "roughly corresponds to the number of nodes that are covered". 
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). 

I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Differential Revision: http://reviews.llvm.org/D8794



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233931 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-02 17:56:17 +00:00
Elena Demikhovsky
4eb165220f AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB
by Asaf Badouh (asaf.badouh@intel.com)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233906 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-02 10:51:40 +00:00
Peter Collingbourne
a8432640e8 Revert r233595, "MC: For variable symbols, maintain MCSymbol::Section as a cache."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233898 91177308-0d34-0410-b5e6-96231b3b80d8
2015-04-02 07:02:51 +00:00