I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195929 91177308-0d34-0410-b5e6-96231b3b80d8
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195927 91177308-0d34-0410-b5e6-96231b3b80d8
MO_ConstantPoolIndex is handled in printLeaMemReference.
MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195847 91177308-0d34-0410-b5e6-96231b3b80d8
It is only used for asm printing.
On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.
MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.
The MO_GlobalAddress and MO_Register cases were not tested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195824 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
lowering where we need to check whether x is a vector type (in-reg
type) of i8, i16 or i32; otherwise, that optimization is not valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195779 91177308-0d34-0410-b5e6-96231b3b80d8
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)
Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195712 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.
The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
Crusoe, Microsoft VirtualBox - see
https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
Via c3 and c3-Nehemiah don't have nopl
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195679 91177308-0d34-0410-b5e6-96231b3b80d8
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.
rdar://15386341
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195496 91177308-0d34-0410-b5e6-96231b3b80d8
- When simplifying the mask generation for BLEND, check whether that mask is
also consumed by other non-BLEND insns. If true, skip that simplification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195476 91177308-0d34-0410-b5e6-96231b3b80d8
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.
It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195383 91177308-0d34-0410-b5e6-96231b3b80d8
clang optimizes tail calls, as in this example:
int foo(void);
int bar(void) {
return foo();
}
where the call is transformed to:
calll .L0$pb
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
movl foo@GOT(%eax), %eax
popl %ebp
jmpl *%eax # TAILCALL
However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.
This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.
Patch by Dimitry Andric!
PR15086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195318 91177308-0d34-0410-b5e6-96231b3b80d8
Hard-coded operand indices were scattered throughout lowering stages
and layers. It was super bug prone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195093 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195064 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r190888, to fix PR17967. The original change wasn't
the right way to get @feat.00 into the object file. The right fix is to
make @feat.00 be a global symbol.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195053 91177308-0d34-0410-b5e6-96231b3b80d8
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194997 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194942 91177308-0d34-0410-b5e6-96231b3b80d8
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194840 91177308-0d34-0410-b5e6-96231b3b80d8
If a null call target is provided, don't emit a dummy call. This
allows the runtime to reserve as little nop space as it needs without
the requirement of emitting a call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194676 91177308-0d34-0410-b5e6-96231b3b80d8
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194542 91177308-0d34-0410-b5e6-96231b3b80d8