26998 Commits

Author SHA1 Message Date
Cameron McInally
6c8faddaf5 Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load.
Patch by Aleksey Bader.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196435 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 00:11:25 +00:00
Kevin Enderby
f50f3a3bb9 Fix a bug in darwin's 32-bit X86 handling of evaluating fixups.
Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.

The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry.  The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.

rdar://15526046


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196432 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 23:36:24 +00:00
David Peixotto
0fc8c68b11 Add support for parsing ARM symbol variants on ELF targets
ARM symbol variants are written with parens instead of @ like this:

  .word __GLOBAL_I_a(target1)

This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.

The MCAsmInfo parens flag is enabled only for ARM on ELF.

By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.

To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.

Updated Tests:
  Changed case of symbol variant to match the generic kind.
  test/CodeGen/ARM/tls-models.ll
  test/CodeGen/ARM/tls1.ll
  test/CodeGen/ARM/tls2.ll
  test/CodeGen/Thumb2/tls1.ll
  test/CodeGen/Thumb2/tls2.ll

PR18080


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196424 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 22:43:20 +00:00
Cameron McInally
6d3d93c40b Fix assembly syntax for AVX512 vector blend instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196393 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 18:05:36 +00:00
Michael Liao
a2f6f94b74 [X86] Check YMM31/ZMM31 as well
- No test case as there's no calling convention preserve YMM31/ZMM31 only



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196391 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 17:44:22 +00:00
Chad Rosier
e834f41f7a Update the UseFusedMAC definition to directly specify its dependence on having
VFP4.
Patch by Daniel Stewart!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196390 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 17:16:36 +00:00
Cameron McInally
80955805e4 Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.
Patch by Aleksey Bader.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196386 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 14:52:33 +00:00
Kevin Qin
dd302615b1 [AArch64 Neon] Add ACLE intrinsic vceqz_f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196362 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 08:02:34 +00:00
Kevin Qin
c7f14e3d8c [AArch64 NEON] Add missing compare intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196360 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 07:53:28 +00:00
Juergen Ributzka
6abfcbdfc8 [Stackmap] Emit multi-byte nops for X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196334 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-04 00:39:08 +00:00
Reed Kotler
4f47f014cd final patch for very long conditional branches for mips16 constant islands.
this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196331 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 23:42:51 +00:00
Rafael Espindola
21a9fd247e Fix mingw32 thiscall + sret.
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196312 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 20:51:23 +00:00
James Molloy
616c94ba87 Addrspacecasts are no-ops on ARM.
Testcase added.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196269 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 11:23:11 +00:00
Richard Sandiford
90a34679ef [SystemZ] Fix choice of known-zero mask in insertion optimization
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.

Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy.  The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196267 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 11:01:54 +00:00
Michael Liao
239ffb30b0 Enhance the fix of PR17631
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
  not be issued before 'call' insn. There're other cases where helper
  calls will be inserted not limited to epilog. These helper calls do
  not follow the standard calling convention and won't clobber any YMM
  registers. (So far, all call conventions will clobber any or part of
  YMM registers.)
  This patch enhances the previous fix to cover more cases 'vzerosupper' should
  not be inserted by checking if that function call won't clobber any YMM
  registers and skipping it if so.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196261 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 09:17:32 +00:00
Hao Liu
1296bb3ba6 [AArch64]Add missing floating point convert, round and misc intrinsics.
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196210 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 06:06:55 +00:00
Hao Liu
5025a48f68 AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions.
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196208 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 05:58:30 +00:00
NAKAMURA Takumi
b26e6ecd8d Whitespace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196203 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 05:28:27 +00:00
Hao Liu
3d69ff4d07 AArch64: Add missing scalar pair intrinsics.
E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196198 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 03:39:47 +00:00
Jiangning Liu
bbc450c5cf Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196192 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 01:33:52 +00:00
Jiangning Liu
7f1f8d4146 Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196190 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 01:29:32 +00:00
Rafael Espindola
387d4ea375 Don't set PrivateGlobalPrefix for NVPTX and R600.
These targets have special asm printers that don't use these.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196187 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-03 01:03:35 +00:00
Hal Finkel
ff40d5e6d4 Remove PPCScoreboardHazardRecognizer
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196171 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 23:52:46 +00:00
Rafael Espindola
9472fd7403 Refactor the setting of PrivateGlobalPrefix.
No functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196170 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 23:39:26 +00:00
Rafael Espindola
485da3ca84 Don't set PrivateGlobalPrefix twice in the same function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196169 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 23:26:31 +00:00
Rafael Espindola
29a0d2abfe Convert two char* that are only ever used as booleans to bool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196168 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 23:04:51 +00:00
Chad Rosier
d4809bb0e3 [AArch64] Implemented vcopy_lane patterns using scalar DUP instruction.
Patch by Ana Pazos!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196151 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 21:05:16 +00:00
Vincent Lejeune
7043c7a35e R600: Workaround for cayman loop bug
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196121 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 17:29:37 +00:00
Rafael Espindola
cce5873de3 Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.
This allows it to be used in TargetLoweringObjectFileImpl.cpp.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196117 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 16:25:47 +00:00
Alp Toker
4f9dd99c28 Introduce poor man's consumeToken() in X86AsmParser
This makes the code a little more idiomatic.

No change in behaviour.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196113 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 16:06:06 +00:00
Rafael Espindola
15945a0b70 Remove dead code.
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.

Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196111 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 15:36:37 +00:00
Tim Northover
ad249171e4 ARM: decide whether to use movw/movt based on "minsize" attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196102 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 14:46:26 +00:00
NAKAMURA Takumi
e126dcbff9 XCoreFrameLowering.cpp: Use [in,out] instead of [in] [out]. [-Wdocumentation]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196094 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 11:31:25 +00:00
Robert Lytton
f19c6f5763 XCore target: Make handling of large frames not dependent upon an FP.
eliminateFrameIndex() has been reworked to handle both small & large frames
with either a FP or SP.
An additional Slot is required for Scavenging spills when not using FP for large frames.
Reworked the handling of Register Scavenging.

Whether we are using an FP or not, whether it is a large frame or not,
and whether we are using a large code model or not are now independent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196091 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 11:05:28 +00:00
Tim Northover
f715d51769 ARM: add pseudo-instructions for lit-pool global materialisation
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.

With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196090 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:35:41 +00:00
Benjamin Kramer
ee97bbfd5c XCore: Unbreak C++11 build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196089 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:29:26 +00:00
Robert Lytton
41d4ed4ed0 XCore target: fix large code model 'select' indirect address handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196088 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:18:37 +00:00
Robert Lytton
883abacb5b XCore target: Add large code model
When using large code model:
Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large"
The folded global address of such objects are lowered into the const pool.

During inspection it was noted that LowerConstantPool() was using a default offset of zero.
A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental.

Correct the flags emitted for explicitly specified sections.

We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize.
To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196087 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:18:31 +00:00
Robert Lytton
25464b948f XCore target: Fix eliminateFrameIndex() to handle large frames
Large frame offsets are loaded from the ConstantPool.
Where possible, offsets are encoded using the smaller MKMSK instruction.
Large frame offsets can only be used when there is a frame-pointer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196085 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:18:19 +00:00
Robert Lytton
1326c6f14b XCore target: Enable frames larger than 65535 to be lowered
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196084 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 10:18:14 +00:00
Rafael Espindola
e5c7512e70 Remove leftovers from a non-MC asm printer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196068 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 05:42:16 +00:00
Rafael Espindola
0619f0b345 Remove #if 0 declarations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196067 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 05:24:28 +00:00
Rafael Espindola
4a6855441c Change the default of AsmWriterClassName and isMCAsmWriter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196065 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 04:55:42 +00:00
Rafael Espindola
a9f21604b4 Remove dead declarations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196063 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 04:18:19 +00:00
Rafael Espindola
ae289dd322 Refactor for clarity and efficiency.
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196059 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-02 03:26:43 +00:00
Tim Northover
e54f6dca50 ARM: fix bug in -Oz stack adjustment folding
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.

This should fix PR18081.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196046 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-01 14:16:24 +00:00
Benjamin Kramer
e1139818e8 Revamp error checking in the ms inline asm parser.
- Actually abort when an error occurred.
- Check that the frontend lookup worked when parsing length/size/type operators.

Tested by a clang test. PR18096.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196044 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-01 11:47:42 +00:00
Hal Finkel
15c773b6f2 Add a scheduling model (with itinerary) for the PPC POWER7
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).

One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.

Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:

MultiSource/Benchmarks/BitBench/drop3/drop3
  with AA: N/A
  without AA: -28.7614% +/- 19.8356%
(significantly against AA)

MultiSource/Benchmarks/FreeBench/neural/neural
  with AA: -17.7406% +/- 11.2712%
  without AA: N/A
(significantly in favor of AA)

MultiSource/Benchmarks/SciMark2-C/scimark2
  with AA: -11.2079% +/- 1.80543%
  without AA: -11.3263% +/- 2.79651%

MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
  with AA: -41.8649% +/- 17.0053%
  without AA: -34.5256% +/- 23.7072%

MultiSource/Benchmarks/mafft/pairlocalalign
  with AA: 25.3016% +/- 17.8614%
  without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)

MultiSource/Benchmarks/sim/sim
  with AA: N/A
  without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)

SingleSource/Benchmarks/BenchmarkGame/Large/fasta
  with AA: 15.0664% +/- 6.70216%
  without AA: 12.7747% +/- 8.43043%

SingleSource/Benchmarks/BenchmarkGame/puzzle
  with AA: 82.2713% +/- 26.3567%
  without AA: 75.7525% +/- 41.1842%

SingleSource/Benchmarks/Misc/flops-2
  with AA: -37.1621% +/- 20.7964%
  without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)

These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195981 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-30 20:55:12 +00:00
Hal Finkel
bc0bdb26da Split some PPC itinerary classes
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.

No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195980 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-30 20:41:13 +00:00
Zoran Jovanovic
304964c103 Fixed issue with microMIPS long branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195975 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-30 19:12:28 +00:00