In gcov, the -o flag can accept either a directory or a file name.
When given a directory, the gcda and gcno files are expected to be in
that directory. When given a file, the gcda and gcno files are
expected to be named based on the stem of that file. Non-existent
paths are treated as files.
This implements compatible behaviour.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201555 91177308-0d34-0410-b5e6-96231b3b80d8
Introducing llvm-profdata, a tool for merging profile data generated by
PGO instrumentation in clang.
- The name indicates a file extension of <name>.profdata. Eventually
profile data output by clang should be changed to that extension.
- llvm-profdata merges two profiles. However, the name is more general,
since it will likely pick up more tasks (such as summarizing a single
profile).
- llvm-profdata parses the current text-based format, but will be
updated once we settle on a binary format.
<rdar://problem/15949645>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201535 91177308-0d34-0410-b5e6-96231b3b80d8
ldrd r6, r7 [r2, #15]
simply gives an error and does not triggers an assertion.
As Jim points out, the diagnostic is really strange here,
but fixing that would be more complicated. The missing
comma results in the parser expecting a construct like r2[2],
which is the vector index thing the error message is talking
about. That's not what the user intended, though, and there's
nothing else in the instruction that looks at all like a vector.
Yet more fallout from not having a real parser here and trying
to do context-free generic matching for addressing modes.
rdar://15097243
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201531 91177308-0d34-0410-b5e6-96231b3b80d8
Until this point only macro definition with named parameters were parsed but the
names were ignored. This adds support for using that information for named
parameter instantiation.
In order to support the full semantics of the keyword arguments, the arguments
are no longer lazily initialised since the keyword arguments can be specified
out of order and partially if they are defaulted. Prepopulate the arguments
with the default value for any defaulted parameters, and then parse the
specified arguments.
This simplies some of the handling of the arguments in the inner loop since
empty arguments simply increment the parameter index and move on.
Note that keyword and positional arguments cannot be mixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201499 91177308-0d34-0410-b5e6-96231b3b80d8
NaCl's ARM ABI uses 16 byte stack alignment, so set that in
ARMSubtarget.cpp.
Using 16 byte alignment exposes an issue in code generation in which a
varargs function leaves a 4 byte gap between the values of r1-r3 saved
to the stack and the following arguments that were passed on the
stack. (Previously, this code only needed to support 4 byte and 8
byte alignment.)
With this issue, llc generated:
varargs_func:
sub sp, sp, #16
push {lr}
sub sp, sp, #12
add r0, sp, #16 // Should be 20
stm r0, {r1, r2, r3}
ldr r0, .LCPI0_0 // Address of va_list
add r1, sp, #16
str r1, [r0]
bl external_func
Fix the bug by checking for "Align > 4". Also simplify the code by
using OffsetToAlignment(), and update comments.
Differential Revision: http://llvm-reviews.chandlerc.com/D2677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201497 91177308-0d34-0410-b5e6-96231b3b80d8
During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop. This patch tries to base such derived
induction variables of the preceeding loop's induction variable.
This helps twolf on arm and seems to help scimark2 on x86.
Reapply with a fix for the case of a value derived from a pointer.
radar://15970709
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201496 91177308-0d34-0410-b5e6-96231b3b80d8
alongside DIEBlock and replace uses accordingly. Use DW_FORM_exprloc
in DWARF4 and later code. Update testcases.
Adding a DIELoc instead of using extra forms inside DIEBlock so
that we can keep location expressions separate from other uses. No
direct use at the moment, however, it's not a lot of code and
using a separately named class keeps it somewhat more obvious
what's going on in various locations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201481 91177308-0d34-0410-b5e6-96231b3b80d8
The Linux kernel defines empty macros for compatibility with ARM UAL syntax.
The comma after the name is optional, and if present can be safely lexed. This
improves compatibility with the GNU assembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201474 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a partial implementation of the .arch_extension directive to the
integrated ARM assembler. There are a number of limitations to this
implementation arising from the target backend support rather than the
implementation itself. Namely, iWMMXT (v1 and v2), Maverick, and XScale support
is not present in the ARM backend. Currently, there is no check for A-class
only (needed for virt), and no ARMv6k detection (needed for os and sec). The
remainder of the extensions are fully supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201471 91177308-0d34-0410-b5e6-96231b3b80d8
This broke in r185459 while TLS support was being generalized to handle
non-symbol TLS representations.
I thought about/tried having an enum rather than a bool to track the
TLS-ness of the address table entry, but namespaces and naming seemed
more hassle than it was worth for only one caller that needed to specify
this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201469 91177308-0d34-0410-b5e6-96231b3b80d8
During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop. This patch tries to base such derived
induction variables of the preceeding loop's induction variable.
This helps twolf on arm and seems to help scimark2 on x86.
radar://15970709
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201465 91177308-0d34-0410-b5e6-96231b3b80d8
Type units will share the statement list of their defining compile unit.
This is a tradeoff that reduces .o debug info size at the cost of some
linked debug info size (since the contents of those string tables won't
be deduplicated along with the type unit) which seems right for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201445 91177308-0d34-0410-b5e6-96231b3b80d8
Recommitting r201380 (reverted in r201389)
Recommitting r201351 and r201355 (reverted in r201351 and r201355)
We weren't emitting the an empty (header only) line table when the line
table was empty - this made the DWARF invalid (the compile unit would
point to the zero-size debug_lines section where there should've been an
empty line table but there was nothing at all). Fix that, and as a
consequence this works around/addresses PR18809.
Also, we emit a non-empty line table to workaround a darwin linker bug,
so XFAILing on darwin too.
Also, mark the test as 'REQUIRES: object-emission' because it does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201429 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds support for emitting DWARF path discriminator values in
the object streamer. It also changes the DWARF dumper to show
discriminator values in the line table output.
Reviewers: echristo
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201427 91177308-0d34-0410-b5e6-96231b3b80d8
1) Fix a specific bug when certain conversion functions are called in a program compiled as mips16 with hard float and
the program is linked as c++. There are two libraries that are reversed in the link order with gcc/g++ and clang/clang++ for
mips16 in this case and the proper stubs will then not be called. These stubs are normally handled in the Mips16HardFloat pass
but in this case we don't know at that time that we need to generate the stubs. This must all be handled later in code generation
and we have moved this functionality to MipsAsmPrinter. When linked as C (gcc or clang) the proper stubs are linked in from libc.
2) Set up the infrastructure to handle 90% of what is in the Mips16HardFloat pass in this new area of MipsAsmPrinter. This is a more
logical place to handle this and we have known for some time that we needed to move the code later and not implement it using
inline asm as we do now but it was not clear exactly where to do this and what mechanism should be used. Now it's clear to us
how to do this and this patch contains the infrastructure to move most of this to MipsAsmPrinter but the actual moving will be done
in a follow on patch. The same infrastructure is used to fix this current bug as described in #1. This change was requested by the list
during the original putback of the Mips16HardFloat pass but was not practical for us do at that time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201426 91177308-0d34-0410-b5e6-96231b3b80d8
As v1i1 is illegal, the type legalizer tries to scalarize such node. But if the type operands of SETCC is legal, the scalarization algorithm will cause an assertion failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201381 91177308-0d34-0410-b5e6-96231b3b80d8
Recommitting r201351 and r201355 (reverted in r201351 and r201355)
We weren't emitting the an empty (header only) line table when the line
table was empty - this made the DWARF invalid (the compile unit would
point to the zero-size debug_lines section where there should've been an
empty line table but there was nothing at all). Fix that, and as a
consequence this works around/addresses PR18809.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201380 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There should be a space before each of these two keywords to avoid
generating invalid assembly files.
NOTE: I could not find an obvious maintainers in CODE_OWNERS.TXT, but
this seems related to debug info.
Reviewers: echristo
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201359 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201333 91177308-0d34-0410-b5e6-96231b3b80d8
The front-end is now generating the generic @llvm.fabs for this
operation now, so the extra patterns are no longer needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201314 91177308-0d34-0410-b5e6-96231b3b80d8
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.
This fixes <rdar://problem/16050719>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201291 91177308-0d34-0410-b5e6-96231b3b80d8
As defined in LangRef, aliases do not have sections. However, LLVM's
GlobalAlias class inherits from GlobalValue, which means we can read and
set its section. We should probably ban that as a separate change,
since it doesn't make much sense for an alias to have a section that
differs from its aliasee.
Fixes PR18757, where the section was being lost on the global in code
from Clang like:
extern "C" {
__attribute__((used, section("CUSTOM"))) static int in_custom_section;
}
Reviewers: rafael.espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D2758
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201286 91177308-0d34-0410-b5e6-96231b3b80d8
logical operations on the i1's driving them. This is a bad idea for every
target I can think of (confirmed with micro tests on all of: x86-64, ARM,
AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into
a general purpose register, whereas consuming it directly into a select generally
allows it to exist only transiently in a predicate or flags register.
Chandler ran a set of performance tests with this change, and reported no
measurable change on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201275 91177308-0d34-0410-b5e6-96231b3b80d8
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.
The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.
With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.
This patch applies the following changes:
- The cost model computation now takes into account non-uniform constants;
- The cost of vector shift instructions has been improved in
X86TargetTransformInfo analysis pass;
- BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
between non-uniform and uniform constant operands.
Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of expanding a packed shift into a sequence of scalar shifts,
the backend now tries (when possible) to convert the vector shift into a
vector multiply.
Before this change, a shift of a MVT::v8i16 vector by a
build_vector of constants was always scalarized into a long sequence of "vector
extracts + scalar shifts + vector insert".
With this change, if there is SSE2 support, we emit a single vector multiply.
This change also affects SSE4.1, AVX, AVX2 shifts:
- A shift of a MVT::v4i32 vector by a build_vector of non uniform constants
is now lowered when possible into a single SSE4.1 vector multiply.
- Packed v16i16 shift left by constant build_vector are now expanded when
possible into a single AVX2 vpmullw.
This change also improves the lowering of AVX512f vector shifts.
Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected
by this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201271 91177308-0d34-0410-b5e6-96231b3b80d8
Since I just discovered this while poking at other things, here's the
test case so I have it to come back to later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201267 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201237 91177308-0d34-0410-b5e6-96231b3b80d8
There's still one piece missing here, which is adding the
DW_AT_stmt_list to the type unit that refer's to the compile unit's line
table. Working on that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201198 91177308-0d34-0410-b5e6-96231b3b80d8
* CPRCs may be allocated to co-processor registers or the stack – they may never be allocated to core registers
* When a CPRC is allocated to the stack, all other VFP registers should be marked as unavailable
The difference is only noticeable in rare cases where there are a large number of floating point arguments (e.g.
7 doubles + additional float, double arguments). Although it's probably still better to avoid vmov as it can cause
stalls in some older ARM cores. The other, more subtle benefit, is to minimize difference between the various
calling conventions.
rdar://16039676
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201193 91177308-0d34-0410-b5e6-96231b3b80d8
Debug info: Emit values in subregisters that do not have a separate
DWARF register number by emitting a super-register + DW_OP_bit_piece.
This is necessary because on x86_64, there are no DWARF register numbers
for i386-style subregisters.
Fixes a bunch of FIXMEs.
rdar://problem/16015314
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201190 91177308-0d34-0410-b5e6-96231b3b80d8
This comes up in empty files or files containing #file directives that
never reference the actual source file name. Came up in a small test of
line tables I was playing with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201187 91177308-0d34-0410-b5e6-96231b3b80d8
These tests were unnecessarily sensitive to the presence and ordering of
elements in the line table file_names list which will break on a future
change I'm working on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201185 91177308-0d34-0410-b5e6-96231b3b80d8
DWARF register number by emitting a super-register + DW_OP_bit_piece.
This is necessary because on x86_64, there are no DWARF register numbers
for i386-style subregisters.
Fixes a bunch of FIXMEs.
rdar://problem/16015314
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201180 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a slightly overzealous destination register restriction for the
'without .w' alias. Add some explicit testcases.
rdar://16033140
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201173 91177308-0d34-0410-b5e6-96231b3b80d8
BUILD_VECTOR nodes, e.g.:
(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)
This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201158 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR18753 and PR18782.
This is necessary for LICM to preserve LCSSA correctly and efficiently.
There is still some active discussion about whether we should be using
LCSSA, but we can't just immediately stop using it and we *need* LICM to
preserve it while we are using it. We can restore the old SSAUpdater
driven code if and when there is a serious effort to remove the reliance
on LCSSA from all of the loop passes.
However, this also serves as a great example of why LCSSA is very nice
to have. This change significantly simplifies the process of sinking
instructions for LICM, and makes it quite a bit less expensive.
It wouldn't even be as complex as it is except that I had to start the
process of removing the big recursive LCSSA formation hammer in order to
switch even this much of the re-forming code to asserting that LCSSA was
preserved. I'll fully remove that next just to tidy things up until the
LCSSA debate settles one way or the other.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201148 91177308-0d34-0410-b5e6-96231b3b80d8
Xcore target ABI requires const data that is externally visible
to be handled differently if it has C-language linkage rather than
C++ language linkage.
Clang now emits ".cp.rodata" section information.
All other externally visible constant data will be placed in the DP section.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201144 91177308-0d34-0410-b5e6-96231b3b80d8
profitability check due to some other checks in the addressing
mode matcher. I.e., test case for commit r201121.
<rdar://problem/16020230>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201132 91177308-0d34-0410-b5e6-96231b3b80d8
DS instructions that access local memory can only uses addresses that
are less than or equal to the value of M0. When M0 is uninitialized,
then we experience undefined behavior.
This patch also changes the behavior to emit S_WQM_B64 on pixel shaders
no matter what kind of DS instruction is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201097 91177308-0d34-0410-b5e6-96231b3b80d8
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201093 91177308-0d34-0410-b5e6-96231b3b80d8
For A- and R-class processors, r12 is not normally callee-saved, but is for
interrupt handlers. See AAPCS, 5.3.1.1, "Use of IP by the linker".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201089 91177308-0d34-0410-b5e6-96231b3b80d8
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201085 91177308-0d34-0410-b5e6-96231b3b80d8
Original commits messages:
Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code.
Simplify a bunch of code by removing the need for the x86 disassembler table builder to know about extended opcodes. The modrm forms are sufficient to convey the information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201065 91177308-0d34-0410-b5e6-96231b3b80d8
This enables a slightly odd feature of gas. The macro is defined when
the outermost macro is instantiated.
PR18599
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201045 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the tests more readable by using the -arm-attributes decoding support
in llvm-readobj since that is now available. Change the invocation commands to
be similar to other test and use a more precise triple (the tests only require
ARM EABI support).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201029 91177308-0d34-0410-b5e6-96231b3b80d8
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!
Fixes PR18724.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201028 91177308-0d34-0410-b5e6-96231b3b80d8
The bitcast instruction during constant materialization was not placed correcly
in the presence of phi nodes. This commit fixes the insertion point to be in the
idom instead.
This fixes PR18768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201009 91177308-0d34-0410-b5e6-96231b3b80d8
This is a small simplification and a small step in fixing pr18743 since
private functions on MachO should be using a 'l' prefix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200994 91177308-0d34-0410-b5e6-96231b3b80d8
Stores of <4 x i64> do work (although they do expand to 4 stores
instead of 2), but 3 x i64 vectors fail to select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200989 91177308-0d34-0410-b5e6-96231b3b80d8
According to the AAPCS, when a CPRC is allocated to the stack, all other
VFP registers should be marked as unavailable.
I have also modified the rules for allocating non-CPRCs to the stack, to make
it more explicit that all GPRs must be made unavailable. I cannot think of a
case where the old version would produce incorrect answers, so there is no test
for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200970 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a bug triggered in IfConverterTriangle when CvtBB has multiple predecessors
by getting the weights before removing a successor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200958 91177308-0d34-0410-b5e6-96231b3b80d8
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200957 91177308-0d34-0410-b5e6-96231b3b80d8
mode.
Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep
Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep
That way the computation can be folded into the addressing mode.
This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.
<rdar://problem/15519855>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200947 91177308-0d34-0410-b5e6-96231b3b80d8
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200932 91177308-0d34-0410-b5e6-96231b3b80d8
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.
It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.
Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.
rdar://problem/15981990
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.
However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:
- Be lazy about scanning function definitions. The existing pass eagerly
scans the entire module to build the initial graph. This new pass is
significantly more lazy, and I plan to push this even further to
maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
indirect call from functions whose address is taken. This node creates
a huge choke-point which would preclude good parallelization across
the fanout of the SCC graph when we got to the point of looking at
such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
rather than value handles and tracking call instructions. This will
require explicit update calls instead of some updates working
transparently, but should end up being significantly more efficient.
The explicit update calls ended up being needed in many cases for the
existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
for an SCC which is stable across updates. This is essential for the
new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
an SCC friendly order. This is a much simpler graph structure and
should be more memory dense. It does limit the ways in which it is
appropriate to use this analysis. I wish I had a better name than
"call graph". I've commented extensively this aspect.
This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.
Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.
A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200903 91177308-0d34-0410-b5e6-96231b3b80d8
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200900 91177308-0d34-0410-b5e6-96231b3b80d8
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.
As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.
r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.
The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.
Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200898 91177308-0d34-0410-b5e6-96231b3b80d8
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200892 91177308-0d34-0410-b5e6-96231b3b80d8
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200886 91177308-0d34-0410-b5e6-96231b3b80d8
find a register.
The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
the neighbors of the current allocated variable.
E.g.,
vA can use {R1, R2 }
vB can use { R2, R3}
vC can use {R1 }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.
vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.
Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.
<rdar://problem/15947839>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200883 91177308-0d34-0410-b5e6-96231b3b80d8
Small code model (and default reloc model) set Reloc::PIC_ in this test,
and PIC is not yet supported in MCJIT for MIPS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200852 91177308-0d34-0410-b5e6-96231b3b80d8
In Thumb1 mode, bl instruction might be selected for branches between
basic blocks in the function if the offset is greater than 2KB.
However, this might cause SEGV because the destination symbol
is not marked as thumb function and the execution mode will be reset
to ARM mode.
Since we are sure that these symbols are in the same data fragment, we
can simply resolve these local symbols, and don't emit any relocation
information for this bl instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200842 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the ldr-pseudo implementation to work when used in
inline assembly. The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.
Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.
An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.
Differential Revision: http://llvm-reviews.chandlerc.com/D2638
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200777 91177308-0d34-0410-b5e6-96231b3b80d8
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."
Patch by: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200773 91177308-0d34-0410-b5e6-96231b3b80d8
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200768 91177308-0d34-0410-b5e6-96231b3b80d8
Until now, when a path in a gcno file included a directory, we would
emit our .gcov file in that directory, whereas gcov always emits the
file in the current directory. In doing so, this implements gcov's
strange name-mangling -p flag, which is needed to avoid clobbering
files when two with the same name exist in different directories.
The path mangling is a bit ugly and only handles unix-like paths, but
it's simple, and it doesn't make any guesses as to how it should
behave outside of what gcov documents. If we decide this should be
cross platform later, we can consider the compatibility implications
then.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200754 91177308-0d34-0410-b5e6-96231b3b80d8
Missing braces on if meant we inserted both ARM and Thumb load for a litpool
entry. This didn't end well.
rdar://problem/15959157
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200752 91177308-0d34-0410-b5e6-96231b3b80d8
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.
Also add a pattern for (fneg (fabs ...)).
Fixes a bunch of bit encoding piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200743 91177308-0d34-0410-b5e6-96231b3b80d8
When gcov is run without gcda data, it acts as if the counts are all
zero and labels the file as - to indicate that there was no data. We
should do the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200740 91177308-0d34-0410-b5e6-96231b3b80d8
Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
and remove the ToDo comment.
Reviewer: Duncan P.N. Exan Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200736 91177308-0d34-0410-b5e6-96231b3b80d8
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200731 91177308-0d34-0410-b5e6-96231b3b80d8
It disturbs the layout of the parameters in memory and registers,
leading to problems in the backend.
The plan for optimizing internal inalloca functions going forward is to
essentially SROA the argument memory and demote any captured arguments
(things that aren't trivially written by a load or store) to an indirect
pointer to a static alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200717 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.
This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200706 91177308-0d34-0410-b5e6-96231b3b80d8
LowerExpectIntrinsic previously only understood the idiom of an expect
intrinsic followed by a comparison with zero. For llvm.expect.i1, the
comparison would be stripped by the early-cse pass.
Patch by Daniel Micay.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200664 91177308-0d34-0410-b5e6-96231b3b80d8
unrolling heuristic per default
Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200621 91177308-0d34-0410-b5e6-96231b3b80d8
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200619 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minimal implementation which accepts only constants rather than
full expressions, but that should be perfectly sufficient for all known
users for now.
Patch from PaX Team <pageexec@freemail.hu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200614 91177308-0d34-0410-b5e6-96231b3b80d8
LCSSA when we promote to SSA registers inside of LICM.
Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.
For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200612 91177308-0d34-0410-b5e6-96231b3b80d8
cost so that they don't impact the vector bonus. Fundamentally, counting
unsimplified instructions is just *wrong*; it will continue to introduce
instability as things which do not generate code bizarrely impact
inlining. For example, sufficiently nested inlined functions could turn
off the vector bonus with lifetime markers just like the debug
intrinsics do. =/
This is a short-term tactical fix. Long term, I think we need to remove
the vector bonus entirely. That's a separate patch and discussion
though.
The patch to fix this provided by Dario Domizioli. I've added some
comments about the planned direction and used a heavily pruned form of
debug info intrinsics for the test case. While this debug info doesn't
work or "do" anything useful, it lets us easily test all manner of
interference easily, and I suspect this will not be the last time we
want to craft a pattern where debug info interferes with the inliner in
a problematic way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200609 91177308-0d34-0410-b5e6-96231b3b80d8
Per the GAS documentation, .fill should permit pattern widths that
aren't a power of two. While I was in the neighborhood, I added some
sanity checking. This change was motivated by a use of this construct
in the Linux Kernel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200606 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r200576. It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200602 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200601 91177308-0d34-0410-b5e6-96231b3b80d8
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200596 91177308-0d34-0410-b5e6-96231b3b80d8
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block. Also add an
assertion in x86 fastisel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200593 91177308-0d34-0410-b5e6-96231b3b80d8
To remove this one simply move the end of file logic from the asm printer to
the target mc streamer.
This removes the last call to hasRawTextSupport from lib/Target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200590 91177308-0d34-0410-b5e6-96231b3b80d8
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).
<rdar://problem/15611947>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200577 91177308-0d34-0410-b5e6-96231b3b80d8
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.
Patch by Raul Silvera! (Much delayed due to my not running dcommit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200576 91177308-0d34-0410-b5e6-96231b3b80d8
This ensures DWARF consumers don't confuse these references for
definitions. I'd argue it might be nice to improve debuggers so we don't
need this, but it's just one field in an abbreviation anyway - so it
doesn't seem worth the fight.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200569 91177308-0d34-0410-b5e6-96231b3b80d8