system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206838 91177308-0d34-0410-b5e6-96231b3b80d8
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206822 91177308-0d34-0410-b5e6-96231b3b80d8
With this MC is able to handle _GLOBAL_OFFSET_TABLE_ in 64 bit mode, which is
needed for medium and large code models.
This fixes pr19470.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206793 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The INSERTPS pattern fragment was called insrtps (mising 'e'), which
would make it harder to grep for the patterns related to this instruction.
Renaming it to use the proper instruction name.
Reviewers: nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3443
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206779 91177308-0d34-0410-b5e6-96231b3b80d8
It can be reverted a few days later, after X86Disassembler.d is updated not to contain "X86Disassembler.c".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206758 91177308-0d34-0410-b5e6-96231b3b80d8
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.
<rdar://problem/15480077>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206738 91177308-0d34-0410-b5e6-96231b3b80d8
reason to expose a global symbol 'decodeInstruction' nor to pollute the global
scope with a bunch of external linkage entities (some of which conflict with
others elsewhere in LLVM).
This is just the initial transition to C++; more cleanups to follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206717 91177308-0d34-0410-b5e6-96231b3b80d8
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206684 91177308-0d34-0410-b5e6-96231b3b80d8
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps. Consider:
(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
(extract_vector_elt %vreg0, Constant<2>),
(extract_vector_elt %vreg0, Constant<3>),
(extract_vector_elt %vreg0, Constant<4>),
(extract_vector_elt %vreg0, Constant<5>),
(extract_vector_elt %vreg0, Constant<6>),
(extract_vector_elt %vreg0, Constant<7>),
%vreg1))
a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.
b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.
c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.
Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:
(v4f32 (BUILD_VECTOR (extract_vector_elt
(vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
(extract_vector_elt
(vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
Constant<0>),
%vreg1))
In order to figure out the underlying vector and their identity we need to see
through the shuffles.
[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.
Fixes <rdar://problem/16296956>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206634 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.
When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.
Example
(v4i32 (srl A, (build_vector < X, Y, Y, Y>)))
Can be rewritten as:
(v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))
[with X and Y ConstantInt]
The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206316 91177308-0d34-0410-b5e6-96231b3b80d8
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.
This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206241 91177308-0d34-0410-b5e6-96231b3b80d8
*not* Subtarget->hasSSE1()
*but* __SSE__, the flag that LLVM libraries are compiled
The callback calls internal LLVM JIT libraries. It may be built with -msse (or above).
FIXME: JIT may use "host" instead of "generic" by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206240 91177308-0d34-0410-b5e6-96231b3b80d8
I found this from a particular GDB test suite case of inlining
(something similar is provided as a test case) but came across a few
other related cases (other callers of the same functions, and one other
instance of the same coding mistake in a separate function).
I'm not sure what the best way to test this is (let alone to cover the
other cases I discovered), so hopefully this sufficies - open to ideas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206130 91177308-0d34-0410-b5e6-96231b3b80d8
This logic is properly in the realm of whatever is creating the
TargetMachine. This makes plain 'llc foo.ll' consistent across
heterogenous machines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206094 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.
The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.
Patch by Louis Gerbarg <lgg@apple.com>
rdar://16355124
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205938 91177308-0d34-0410-b5e6-96231b3b80d8
Before, we would have conditional operators where one side of the
operator would be of type RelocationTypeAMD64 and the other is of type
RelocationTypeI386. GCC would noisly warn with -Wenum-compare
diagnostic.
Instead, refactor the code so it is more like the X86 ELF object writer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205752 91177308-0d34-0410-b5e6-96231b3b80d8
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205649 91177308-0d34-0410-b5e6-96231b3b80d8
TargetInstrInfo::findCommutedOpIndices to enable VFMA*231 commutation, rather
than abusing commuteInstruction.
Thanks very much for the suggestion guys!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205489 91177308-0d34-0410-b5e6-96231b3b80d8
on FMA3 memory operands. FMA3 instructions are VEX encoded, so they can load
from unaligned memory.
Testcase to follow, along with related patch.
<rdar://problem/16478629>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205472 91177308-0d34-0410-b5e6-96231b3b80d8
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.
Suggestion by Saleem Abdulrasool!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205393 91177308-0d34-0410-b5e6-96231b3b80d8
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).
These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.
The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
ControlLoops-dbl - 13% speedup
ControlLoops-flt - 15% speedup
Reductions-dbl - 7.5% speedup
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205348 91177308-0d34-0410-b5e6-96231b3b80d8
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205342 91177308-0d34-0410-b5e6-96231b3b80d8