5639 Commits

Author SHA1 Message Date
JF Bastien
13c782674a x86: Emit LOCK prefix after DATA16
Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact.

Test Plan: ninja check

Reviewers: craig.topper, jvoung

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6630

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224283 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 22:34:58 +00:00
Duncan P. N. Exon Smith
1ef70ff39b IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 19:07:53 +00:00
Michael Kuperstein
299e0d4c24 [X86] Break false dependencies before partial register updates when the source operand is in memory
Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list.

Differential Revision: http://reviews.llvm.org/D6620

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224246 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 13:18:21 +00:00
Elena Demikhovsky
3f2027522c AVX-512: Added EXPAND instructions and intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224241 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 10:03:52 +00:00
Robert Khasanov
5dc8ac87f1 [AVX512] Enabling bit logic lowering
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224132 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 17:02:18 +00:00
Robert Khasanov
a4f5a5525d [AVX512] Enabling MIN/MAX lowering.
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 15:10:43 +00:00
Andrea Di Biagio
de48903a20 Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g."
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224126 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 15:09:58 +00:00
Andrea Di Biagio
bf72e14565 Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
Test 'misched-code-difference-with-debug.ll' was failing on some buildbots.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224121 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 13:34:03 +00:00
Andrea Di Biagio
c15d82e259 [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.

The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.

This patch fixes the issue in MISched and provides a test.

Patch by Russell Gallop!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224118 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 12:41:22 +00:00
Ahmed Bougacha
11fcb48306 [X86] Add a temporary testcase for PR21876/r223996.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224074 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 23:07:52 +00:00
Cameron McInally
14273ae2e4 [AVX512] Add support for 512b variable bit shift intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224028 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 17:13:05 +00:00
Elena Demikhovsky
11fb1d0eb5 AVX-512: Added all forms of COMPRESS instruction
+ intrinsics + tests


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 15:02:24 +00:00
Michael Kuperstein
0378ed4368 Add newline missing in r224010.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 11:30:20 +00:00
Michael Kuperstein
6ad268ca66 [X86] When converting movs to pushes, don't assume MOVmi operand is an actual immediate
This should fix PR21878.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224010 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 11:26:16 +00:00
Elena Demikhovsky
bcb1a626b6 AVX-512: Fixed a bug in lowering setcc for MVT::i1 type
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224008 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 10:21:12 +00:00
Sanjay Patel
3cd5b83bb8 Match new shuffle codegen for MOVHPD patterns
Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen
when storing the high element of a v2f64. The existing patterns were
only checking for an unpckh type of shuffle. 

http://llvm.org/bugs/show_bug.cgi?id=21791

Differential Revision: http://reviews.llvm.org/D6586



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 16:58:54 +00:00
Michael Kuperstein
89db49fb9b [X86] Make a code path in EltsFromConsecutiveLoads work only on vectors it expects
EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash.
This adds a check to make this path fire only for 128-bit vectors.

Differential Revision: http://reviews.llvm.org/D6579

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 08:46:12 +00:00
Robert Khasanov
648f7c7eb1 [AVX512] Added lowering for VBROADCASTSS/SD instructions.
Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
Added lowering tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223804 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 18:45:30 +00:00
Chandler Carruth
de0cdb0890 [x86] Fix the test to actually test things for the CPU names, add the
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.

If anyone wants to clean up the 32-bit only x86 CPUs, go for it.

Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223774 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 14:25:55 +00:00
Chandler Carruth
85bb610daf [x86] Add a test for the CPU names that should have been in r223769.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223770 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 11:19:57 +00:00
Michael Kuperstein
77c1b73211 [X86] Convert esp-relative movs of function arguments into pushes, step 1
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.

Differential Revision: http://reviews.llvm.org/D6503

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223757 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 06:10:44 +00:00
Bruno Cardoso Lopes
43edafcc07 [CompactUnwind] Fix register encoding logic
Fix a compact unwind encoding logic bug which would try to encode
more callee saved registers than it should, leading to early bail out
in the encoding logic and abusive use of DWARF frame mode unnecessarily.

Also remove no-compact-unwind.ll which was testing the wrong thing
based on this bug and move it to valid 'compact unwind' tests. Added
other few more tests too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 18:18:32 +00:00
Andrea Di Biagio
eafdf26d89 [X86] Improved tablegen patters for matching TZCNT/LZCNT.
Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
condition code is X86_COND_NE.
Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
X86cond with condition code equal to X86_COND_E. To avoid introducing
extra rules, I added an 'ImmLeaf' definition that checks if the
condition code is COND_E or COND_NE.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223668 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 17:47:18 +00:00
Andrea Di Biagio
ae16ff1c42 [X86] Improved lowering of packed v8i16 vector shifts by non-constant count.
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.

With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.

Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
  %shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
  %shl = shl <8 x i16> %A, %shamt
  ret <8 x i16> %shl
}
;;

Before (with -mattr=+avx):
  vmovd  %xmm1, %eax
  movzwl  %ax, %eax
  vmovd  %eax, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq

Now:
  vpxor  %xmm2, %xmm2, %xmm2
  vpblendw  $1, %xmm1, %xmm2, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223660 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 14:36:51 +00:00
Chandler Carruth
67a6b29f1f [x86] Clean up the SSE1 test to use a slightly different pattern for
matching offsets. I don't expect this to really matter, but its what the
latest incarnation of my script for maintaining these tests happens to
produce, and so its simpler for me if everything matches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223613 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-07 17:16:00 +00:00
Chandler Carruth
b2c7673442 [x86] Switch a constant selection test to use positive assertions and to
store to real pointers so that its clear that the right code is in fact
being generated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223612 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-07 17:15:58 +00:00
Chandler Carruth
d4b678fb76 [x86] Cleanup the combining vector shuffle tests a bit by merging
identical checks for different SSE variants into a single block.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-07 17:15:56 +00:00
Chandler Carruth
48a08feeea [x86] Clean up the shift lowering vector shuffle tests a bit using my
script. Notably this folds all the SSE cases together into a single
FileCheck block. It also adds a vex prefix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-07 17:15:53 +00:00
Hans Wennborg
d7c9f76cee Add a proper triple to switch-jump-table.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223571 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 02:08:16 +00:00
NAKAMURA Takumi
2dd9af9feb llvm/test/CodeGen/X86/switch-jump-table.ll: Add explicit triple. Local labels have a prefix "." for targeting i686-cygming.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 02:03:49 +00:00
Ahmed Bougacha
3b9ac8c7c3 [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.
Most patterns will go away once the extload legalization changes land.

Differential Revision: http://reviews.llvm.org/D6125


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223567 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 01:31:07 +00:00
Hans Wennborg
a421ac689f SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.

Differential Revision: http://reviews.llvm.org/D6510

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223566 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 01:28:50 +00:00
Sanjay Patel
ab4ad4f98e Optimize merging of scalar loads for 32-byte vectors [X86, AVX]
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.

An example of the codegen change is:

From:

vmovss  16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps       $16, 20(%rdi), %xmm1, %xmm1
vinsertps       $32, 24(%rdi), %xmm1, %xmm1
vinsertps       $48, 28(%rdi), %xmm1, %xmm1
vinsertf128     $1, %xmm1, %ymm0, %ymm0
retq

To:

vmovups (%rdi), %ymm0
retq

Differential Revision: http://reviews.llvm.org/D6536



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223518 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 21:28:14 +00:00
Jan Wen Voung
a44126f432 Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617

In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).

However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.

Test Plan: test/CodeGen/X86/frameaddr.ll

Reviewers: dschuff, nadav

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6514

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:55:53 +00:00
Andrea Di Biagio
6a9a49d7ab [X86] Improved lowering of packed vector shifts to vpsllq/vpsrlq.
SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of
the shift count. 

This patch teaches function 'getTargetVShiftNode' how to deal with shifts
where the shift count node is of type MVT::i64.

Before this patch, function 'getTargetVShiftNode' only knew how to deal with
shift count nodes of type MVT::i32. This forced the backend to wrongly
truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223505 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:02:22 +00:00
Andrea Di Biagio
54529ed1c4 [X86] Avoid introducing extra shuffles when lowering packed vector shifts.
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.

However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.

Example:

;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
  %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
  %shl = shl <4 x i32> %a, %c
  ret <4 x i32> %shl
}
;;

Before this patch, llc generated the following code (-mattr=+avx):
  vpshufd $0, %xmm1, %xmm1   # xmm1 = xmm1[0,0,0,0]
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

With this patch, the redundant splat operation is removed from the code.
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 12:13:30 +00:00
Michael Kuperstein
5e343e6fd0 [X86] Improve a dag-combine that handles a vector extract -> zext sequence.
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.

Differential Revision: http://reviews.llvm.org/D6501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223360 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 13:49:51 +00:00
Patrik Hagglund
cfb121f286 Use DomTree in MachineSink to sink over diamonds.
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:

  x = computation
  if () {} else {}
  use x

The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.

Modified test added in r204522, due to one spill less present.

Minor fixes in comments.

Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223350 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 10:36:42 +00:00
Elena Demikhovsky
73ae1df82c Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 09:40:44 +00:00
Michael Liao
fd0832ea89 [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmp
Commit on 

- This patch fixes the bug described in
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html

The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.

Impact at run-time on affected functions is::

  - One extra store in the prologue, The store saves the base pointer.
  - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.

Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.

http://reviews.llvm.org/D6388

Patch by Arch Robison <arch.robison@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223329 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:56:38 +00:00
Quentin Colombet
331ec379a0 [RegAllocFast] Handle implicit definitions conservatively.
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.

This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.

Fixes PR21700.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223317 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 23:38:08 +00:00
Peter Collingbourne
bb660fc192 Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223189 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 02:08:38 +00:00
Simon Pilgrim
ec49b722fd [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 22:31:23 +00:00
Philip Reames
d021bb8003 [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka





git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223137 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 18:50:36 +00:00
Reid Kleckner
03c735b42c Parse 'ghccc' in .ll files as the GHC convention (cc 10)
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223076 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 21:04:44 +00:00
Hans Wennborg
c9605b6067 Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223054 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 17:36:43 +00:00
Hans Wennborg
e6f4d335d7 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223049 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 17:08:32 +00:00
Akira Hatanaka
ca780b4578 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222987 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 04:27:03 +00:00
Hans Wennborg
5c6c5e23bd Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222961 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-29 21:17:05 +00:00
Duncan P. N. Exon Smith
54786a0936 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 21:29:14 +00:00