r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
This fixes PR22708.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232889 91177308-0d34-0410-b5e6-96231b3b80d8
of this add a test that shows we can generate code for functions
that specifically enable a subtarget feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232884 91177308-0d34-0410-b5e6-96231b3b80d8
of this add a test that shows we can generate code with
for functions that differ by subtarget feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232882 91177308-0d34-0410-b5e6-96231b3b80d8
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232879 91177308-0d34-0410-b5e6-96231b3b80d8
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232872 91177308-0d34-0410-b5e6-96231b3b80d8
vperm2* intrinsics are just shuffles.
In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
really angry (and so I have regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and
subsequently read.
There are a couple of TODOs in this patch to complete this optimization.
Differential Revision: http://reviews.llvm.org/D8486
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232852 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch, for this one exact case, we'll generate:
blendps %xmm0, %xmm1, $1
instead of:
insertps %xmm0, %xmm1, $0
If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.
The detailed performance data motivation for this may be found in D7866;
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.
Differential Revision: http://reviews.llvm.org/D8332
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232850 91177308-0d34-0410-b5e6-96231b3b80d8
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232842 91177308-0d34-0410-b5e6-96231b3b80d8
These are causing crashes in `DebugInfoFinder` after a WIP patch to
increase strictness of `DIDescriptor` accessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232839 91177308-0d34-0410-b5e6-96231b3b80d8
The main differences are:
* Split in 32 and 64 bit functions.
* First switch on the Modifier so that we have only one non fully covered
switch.
* Map the fixup kind first to a x86_64 (or i386) specific enum, to make
it easy to handle cases like X86::reloc_riprel_4byte_movq_load.
* Switch on IsPCRel last, which reduces code duplication.
Fixes pr22308.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232837 91177308-0d34-0410-b5e6-96231b3b80d8
A WIP patch makes `DIDescriptor` accessors more strict, which in turn
causes the `DebugInfoFinder` to crash on wrongly typed `!dbg`
attachments. Catch that error up front in
`Verifier::visitInstruction()`.
Also remove a test that we "handle" invalid `!dbg` attachments, added
back in r99938. We don't want to handle those anymore.
Note: I'm *not* recursing and verifying the debug info graph reachable
from this node; that work is already done by `verifyDebugInfo()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232834 91177308-0d34-0410-b5e6-96231b3b80d8
This test is supposed to be testing whether metadata attachments to
instructions work, but it was using invalid debug info to do so. (This
was causing assertion failures in the `DebugInfoFinder` with a WIP patch
to be more strict about `DIDescriptor` accessors.)
Rather than fix the debug info -- which is better tested elsewhere --
just test the IR feature directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232828 91177308-0d34-0410-b5e6-96231b3b80d8
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232827 91177308-0d34-0410-b5e6-96231b3b80d8
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232825 91177308-0d34-0410-b5e6-96231b3b80d8
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232802 91177308-0d34-0410-b5e6-96231b3b80d8
This works in a similar way to the gold plugin tests. We search for a compatible
linker on $PATH and use it to run tests against our just-built libLTO. To start
with, test the just added opt level functionality.
Differential Revision: http://reviews.llvm.org/D8472
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232785 91177308-0d34-0410-b5e6-96231b3b80d8
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232780 91177308-0d34-0410-b5e6-96231b3b80d8
This switches the sense of the i32 values and updates the test cases.
We can also use CHECK-SAME to clean up some tests, and reduce the visual
noise from bitcasts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232774 91177308-0d34-0410-b5e6-96231b3b80d8
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.
Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.
Differential Revision: http://reviews.llvm.org/D8366
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232773 91177308-0d34-0410-b5e6-96231b3b80d8
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
Differential Revision: http://reviews.llvm.org/D8455
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232770 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect
and that triggers an assertion in nvvm_reflect optimization pass. This
change allows nvvm_reflect pass to deal with both old and new ways to
pass an argument to __nvvm_reflect.
Test Plan: ninja check-all
Reviewers: eliben, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232732 91177308-0d34-0410-b5e6-96231b3b80d8
This replaces the -no-color flag with a -color={auto|always|never}
option, with auto as the default, which is much saner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232693 91177308-0d34-0410-b5e6-96231b3b80d8
This should bring the windows bots back.
It is a bit ugly, but it is better than what we had before: The triple would
say that the object format was COFF, but llc/llvm-mc would produce an ELF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232683 91177308-0d34-0410-b5e6-96231b3b80d8
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.
This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).
Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.
Differential Revision: http://reviews.llvm.org/D8416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232660 91177308-0d34-0410-b5e6-96231b3b80d8
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.
This fixes problems when coalescing towards a subclass with additional
subregisters available.
The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232652 91177308-0d34-0410-b5e6-96231b3b80d8
The checks here were so vague that we could nuke intrinsics
from existence and still pass the test because we'd match
the function name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232647 91177308-0d34-0410-b5e6-96231b3b80d8
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half,
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.
The lax checking in this file will be addressed in
another commit. There are bigger problems here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232644 91177308-0d34-0410-b5e6-96231b3b80d8
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232629 91177308-0d34-0410-b5e6-96231b3b80d8