Commit Graph

10498 Commits

Author SHA1 Message Date
Elena Demikhovsky
8a3751f813 AVX-512: minor change in rndscale intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207937 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-04 13:35:37 +00:00
Saleem Abdulrasool
f3b2ed7498 X86: repair export compatibility with MinGW/cygwin
Both MinGW and cygwin (i686) construct export directives without the global
leader prefix.  This is mostly due to the fact that they use GNU ld which does
not correctly handle the export directive.  This apparently has been been broken
for a while.  However, this was recently reported as being broken by
mingwandroid and diorcety of the msys2 project.

Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain
the global leader prefix.  Add an explicit test for cygwin's behaviour of export
directives.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207926 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-04 00:03:48 +00:00
Joey Gouly
72e96a51bf [ARM64] Correctly select ANDWri in FastISel.
http://reviews.llvm.org/D3598


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207917 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-03 17:27:06 +00:00
Tim Northover
b20252764d DAGCombine: prevent formation of illegal ConstantFP nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207850 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 17:25:02 +00:00
Tom Stellard
ab2fed6622 R600: Expand vector sin and cos.
v2: move code to AMDGPUISelLowering.cpp
    squash with tests (both EG and SI)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207845 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 15:41:47 +00:00
Tom Stellard
1d6859256c R600: Expand TruncStore i64 -> {i16,i8}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207844 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 15:41:46 +00:00
Tim Northover
ecc1896600 AArch64/ARM64: add patterns for post-indexed ST1 ops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207840 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 14:54:27 +00:00
Tim Northover
6f86e23c1a AArch64/ARM64: support indexed loads/stores on vector types.
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207838 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 14:54:15 +00:00
Benjamin Kramer
bcf1501839 Allow SelectionDAG::FoldConstantArithmetic to work when it's called with a vector VT but scalar values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207835 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-02 12:35:22 +00:00
Michael J. Spencer
d4b4f2d340 [IR] Make {extract,insert}element accept an index of any integer type.
Given the following C code llvm currently generates suboptimal code for
x86-64:

__m128 bss4( const __m128 *ptr, size_t i, size_t j )
{
    float f = ptr[i][j];
    return (__m128) { f, f, f, f };
}

=================================================

define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float>* nocapture readonly %ptr, i64 %i, i64 %j) #0 {
  %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i
  %a2 = load <4 x float>* %a1, align 16, !tbaa !1
  %a3 = trunc i64 %j to i32
  %a4 = extractelement <4 x float> %a2, i32 %a3
  %a5 = insertelement <4 x float> undef, float %a4, i32 0
  %a6 = insertelement <4 x float> %a5, float %a4, i32 1
  %a7 = insertelement <4 x float> %a6, float %a4, i32 2
  %a8 = insertelement <4 x float> %a7, float %a4, i32 3
  ret <4 x float> %a8
}

=================================================

        shlq    $4, %rsi
        addq    %rdi, %rsi
        movslq  %edx, %rax
        vbroadcastss    (%rsi,%rax,4), %xmm0
        retq

=================================================

The movslq is uneeded, but is present because of the trunc to i32 and then
sext back to i64 that the backend adds for vbroadcastss.

We can't remove it because it changes the meaning. The IR that clang
generates is already suboptimal. What clang really should emit is:

  %a4 = extractelement <4 x float> %a2, i64 %j

This patch makes that legal. A separate patch will teach clang to do it.

Differential Revision: http://reviews.llvm.org/D3519

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 22:12:39 +00:00
Reed Kotler
c02fc3d30d Add basic functionality for assignment of ints.
This creates a lot of core infrastructure in which to add, with little
effort, quite a bit more to mips fast-isel

Test Plan: simplestore.ll

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D3527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207790 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 20:39:21 +00:00
Matt Arsenault
2baa7c53c9 R600/SI: Fix verifier error with pseudo store instructions.
Use i32 instead of specifying SReg_32. When this is
the pseudo INDIRECT_BASE_ADDR, this would give a bogus
verifier error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207770 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 16:37:52 +00:00
Bradley Smith
b378cacf1d [ARM64] Prefer generation of bzero on Darwin only
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207760 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 13:11:59 +00:00
Tim Northover
f2f35a9ca3 AArch64/ARM64: print BFM instructions as BFI or BFXIL
The canonical form of the BFM instruction is always one of the more explicit
extract or insert operations, which makes reading output much easier.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207752 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-01 12:29:38 +00:00
Weiming Zhao
fa1cf8cd68 [ARM64] Prevent bit extraction to be adjusted by following shift
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
  %shr = lshr i64 %x, 4
  %and = and i64 %shr, 15
  %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
  %0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
  lsr x8, x0, #1
  and x8, x8, #0x78
  add x8, x9, x8

If using ubfx, it only needs 2 instrs:
  ubfx  x8, x0, #4, #4
  add x8, x9, x8, lsl #3

This fixes bug 19589


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 21:07:24 +00:00
Michael Zolotukhin
c80b103a2b [X86] Never hoist the shift value of a shift instruction.
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.

This change is like r206101, but for X86.

rdar://problem/16190769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 19:17:32 +00:00
Tim Northover
b1c1b8a78d ARM64: print fp immediates without using scientific notation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 16:13:34 +00:00
Tom Stellard
bd24b33e57 R600/SI: Use VALU instructions for copying i1 values
We can't use SALU instructions for this since they ignore the EXEC mask
and are always executed.

This fixes several OpenCV tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207661 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 15:31:33 +00:00
Tom Stellard
1d8e31fc7a R600/SI: Teach moveToVALU how to handle some SMRD instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207660 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 15:31:29 +00:00
Chad Rosier
fa2e88da1c [ARM64][fast-isel] Fast-isel doesn't know how to handle f128.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207659 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 15:29:57 +00:00
Sasa Stankovic
fbe7448e5d [mips] Fix MipsLongBranch pass to work when the offset from the branch to the
target cannot be determined accurately. This is the case for NaCl where the
sandboxing instructions are added in MC layer, after the MipsLongBranch pass.
It is also the case when the code has inline assembly. Instead of calculating
offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2)
expressions that are resolved during the fixup.

This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll
and implements microMIPS CHECKs in a much simpler way in a file
test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207656 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 15:06:25 +00:00
Tim Northover
44a2f5610d ARM64: print lsr instead of lsrv for variable shifts (etc)
The canonical syntax for shifts by a variable amount does not end with 'v', but
that syntax should be supported as an alias (presumably for legacy reasons).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207649 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 13:37:07 +00:00
Tim Northover
d805bf8d61 AArch64/ARM64: use HS instead of CS & LO instead of CC.
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207644 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 13:14:03 +00:00
Daniel Sanders
1c8add9978 [mips][msa] Fix vector insertions where the index is variable
Summary:
This isn't supported directly so we rotate the vector by the desired number of
elements, insert to element zero, then rotate back.

The i64 case generates rather poor code on MIPS32. There is an obvious
optimisation to be made in future (do both insert.w's inside a shared 
rotate/unrotate sequence) but for now it's sufficient to select valid code
instead of aborting.

Depends on D3536

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 12:09:32 +00:00
Tim Northover
ebde5a5e49 ARM64: use hex immediates for movz/movk instructions
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207635 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 11:19:40 +00:00
Tim Northover
87476b607c ARM64: hexify printing various immediate operands
This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they
accept weird shifts which are more naturally understandable in hex notation).

Also changes BRK/HINT etc, which is probably a neutral change, but easier than
the alternative.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 11:19:28 +00:00
Tim Northover
2a2cce79be ARM64: print canonical syntax for add/sub (imm) instructions.
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.

Also, print a comment giving the full addend since it can be helpful.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207633 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 11:19:15 +00:00
James Molloy
c447befac4 [ARM64] Ensure arm64_be is dealt with when emitting debug info.
This is a partial port of r204816 (cpirker "Elf support for MC-JIT
runtime dynamic linker") from AArch64 to ARM64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 10:15:35 +00:00
Tim Northover
5b188b1cb8 ARM64: make sure FastISel uses a GPR64 source in 64-bit extensions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207620 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 09:32:01 +00:00
Saleem Abdulrasool
ddbde80aae ARM: support stack probe emission for Windows on ARM
This introduces the stack lowering emission of the stack probe function for
Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where
any page allocation which crosses a page boundary of the following guard page
will cause a page fault. This page fault must be handled by the kernel to
ensure that the page is faulted in. If this does not occur and a write access
any memory beyond that, the page fault will go unserviced, resulting in an
abnormal program termination.

The watermark for the stack probe appears to be at 4080 bytes (for
accommodating the stack guard canaries and stack alignment) when SSP is
enabled.  Otherwise, the stack probe is emitted on the page size boundary of
4096 bytes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207615 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 07:05:07 +00:00
Saleem Abdulrasool
745fff806d ARM: partially handle 32-bit relocations for WoA
IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise
relocation is not split up and reordered. When expanding the mov32imm
pseudo-instruction, create a bundle if the machine operand is referencing an
address.  This helps ensure that the relocatable address load is not reordered
by subsequent passes.

Unfortunately, this only partially handles the case as the Constant Island Pass
occurs after the instructions are unbundled and does not properly handle
bundles.  That is a more fundamental issue with the pass itself and beyond the
scope of this change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207608 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 04:54:58 +00:00
Reid Kleckner
9902128e2a Implement X86 code generation for musttail
Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails.  Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.

The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt.  This change merely arranges for musttail
calls to use it.

No functional change for GuaranteedTailCallOpt.

Reviewers: espindola

Differential Revision: http://reviews.llvm.org/D3493

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207598 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 23:55:41 +00:00
Tom Stellard
40e455d992 R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors
SI_IF and SI_ELSE are terminators which also produce a value.  For
these instructions ISel always inserts a COPY to move their value
to another basic block.  This COPY ends up between SI_(IF|ELSE)
and the S_BRANCH* instruction at the end of the block.

This breaks MachineBasicBlock::getFirstTerminator() and also the
machine verifier which assumes that terminators are grouped together at
the end of blocks.

To solve this we coalesce the copy away right after ISel to make sure
there are no instructions in between terminators at the end of blocks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207591 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 23:12:53 +00:00
Tom Stellard
2a90e446c0 R600/SI: Only select SALU instructions in the entry or exit block
SALU instructions ignore control flow, so it is not always safe to use
them within branches.  This is a partial solution to this problem
until we can come up with something better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207590 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 23:12:48 +00:00
Tom Stellard
19a970b2da R600: optimize the UDIVREM 64 algorithm
This is a squash of several optimization commits:
 - calculate DIV_Lo and DIV_Hi separately
 - use BFE_U32 if we are operating on 32bit values
 - use precomputed constants instead of shifting in UDVIREM
 - skip the first 32 iterations of udivrem

v2: Check whether BFE is supported before using it

Patch by: Jan Vesely

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207589 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 23:12:46 +00:00
Reed Kotler
52c03fbb3b Add Simple return instruction to Mips fast-isel
Reviewers: dsanders

Reviewed by: dsanders

Differential Revision: http://reviews.llvm.org/D3430



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207565 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 17:57:50 +00:00
Daniel Sanders
4551e311c6 [mips][msa] Use CHECK-LABEL in basic_operations*.ll
Differential Revision: http://reviews.llvm.org/D3536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207529 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 14:28:58 +00:00
Daniel Sanders
285c5693b8 [mips][msa] Fix element extraction where the index is variable.
Summary:
This isn't supported directly so we splat the vector element and extract
the most convenient copy.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3530

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207524 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 13:31:37 +00:00
Tim Northover
65baf804ba ARM: fix test after change to indirect symbol emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207519 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 10:13:10 +00:00
Tim Northover
d5d3e188f0 X86: emit hidden stubs into a proper non_lazy_symbol_pointer section.
rdar://problem/16660411

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207518 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 10:06:10 +00:00
Tim Northover
8ea9566fee ARM: emit hidden stubs into a proper non_lazy_symbol_pointer section.
rdar://problem/16660411

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 10:06:05 +00:00
Benjamin Kramer
43705683fd AArch64: Mark vector long multiplication as expand.
There are no patterns for this. This was already fixed for ARM64 but I forgot
to apply it to AArch64 too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207515 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 09:37:54 +00:00
Elena Demikhovsky
e3e08acd09 AVX-512: optimized a shuffle pattern to VINSERTI64x4.
Added intrinsics for VPERMT2PS/PD/D/Q instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207513 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 09:09:15 +00:00
Hao Liu
5bbe6121c3 [ARM64]Fix a bug about incorrect operand order in an EXT instruction, which is introduced by r207485.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207500 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 07:51:19 +00:00
Hao Liu
270f09d712 [ARM64]Fix a bug when lowering shuffle vector to an EXT instruction.
E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207485 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 01:50:36 +00:00
Chad Rosier
2f3691eb61 [ARM64] Fix an issue where we were always assuming a copy was coming from a D subregister.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207423 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-28 16:21:50 +00:00
Hao Liu
0ddc7447d9 [ARM64]Fix a bug cannot select UQSHL/SQSHL with constant i64 shift amount.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207399 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-28 07:34:27 +00:00
Benjamin Kramer
ad1f916eaf X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts.
This is more expensive than pmuldq but still cheaper than scalarizing the whole thing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207370 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 18:47:41 +00:00
Benjamin Kramer
3fd5902758 Update test not to check for a shuffle of an all-zero vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207354 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 11:54:45 +00:00
Benjamin Kramer
55e03c1992 SelectionDAG: Aggressively fold shuffles of constant splats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207352 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 11:41:06 +00:00