It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.
None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass. No functionality change for other targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224625 91177308-0d34-0410-b5e6-96231b3b80d8
This also fixes problems with undef copies of subregisters. I can't
attach a testcase for that as none of the targets in trunk has
subregister liveness tracking enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224560 91177308-0d34-0410-b5e6-96231b3b80d8
- This also fixes a bug introduced in r223880 where values were not
correctly marked as Dead anymore.
- Cleanup computeDeadValues(): split up SubRange code variant, simplify
arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224538 91177308-0d34-0410-b5e6-96231b3b80d8
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.
Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224492 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a problem where stripCopies() would switch to values in the
main liverange when it crossed a copy instruction. However when joining
subranges we need to stay in the respective subregister ranges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224461 91177308-0d34-0410-b5e6-96231b3b80d8
The ExecutionDepsFix previously mapped each register to 1 or zero
registers of the register class it was called with and therefore
simulating liveness for. This was problematic for cases involving wider
registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx
registers. In these cases the wide register would get mapped to the last
matching D register, while it should have been all matching D registers.
This commit changes the AliasMap to use a SmallVector to map registers
to potentially multiple destination regclass registers. This is required
to avoid regressions with subregister liveness tracking enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224447 91177308-0d34-0410-b5e6-96231b3b80d8
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.
This fixes PR15872 and provides a partial fix for PR21711.
Differential Revision: http://reviews.llvm.org/D6678
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224429 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.
This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).
This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224425 91177308-0d34-0410-b5e6-96231b3b80d8
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.
Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.
Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.
** Context **
Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.
** Motivating Example **
Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
%ld = load i8* %addr1
%zextld = zext i8 %ld to i32
%ld2 = load i32* %addr2
%add = add nsw i32 %ld2, %zextld
%sextadd = sext i32 %add to i64
%zexta = zext i8 %a to i32
%addza = add nsw i32 %zexta, %zextld
%sextaddza = sext i32 %addza to i64
%addb = add nsw i32 %b, %zextld
%sextaddb = sext i32 %addb to i64
call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
ret void
}
As it is, this IR generates the following assembly on x86_64:
[...]
movzbl (%rdi), %eax # zero-extended load
movl (%rsi), %es # plain load
addl %eax, %esi # 32-bit add
movslq %esi, %rdi # sign extend the result of add
movzbl %dl, %edx # zero extend the first argument
addl %eax, %edx # 32-bit add
movslq %edx, %rsi # sign extend the result of add
addl %eax, %ecx # 32-bit add
movslq %ecx, %rdx # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.
Now, by promoting the additions to form more extended loads we would generate:
[...]
movzbl (%rdi), %eax # zero-extended load
movslq (%rsi), %rdi # sign-extended load
addq %rax, %rdi # 64-bit add
movzbl %dl, %esi # zero extend the first argument
addq %rax, %rsi # 64-bit add
movslq %ecx, %rdx # sign extend the second argument
addq %rax, %rdx # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.
This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.
Note: The throughput numbers are similar on Sandy Bridge and Haswell.
** Proposed Solution **
To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))
The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.
** Performance **
Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar: ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%
The results are consistent for both O3 and Os.
<rdar://problem/18310086>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224402 91177308-0d34-0410-b5e6-96231b3b80d8
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.
Also fix a missing CHECK in the test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224393 91177308-0d34-0410-b5e6-96231b3b80d8
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address.
This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.
The codegen change in the new test case improves from:
vmovsd 32(%rdi), %xmm0
vmovsd 48(%rdi), %xmm1
vmovhpd 56(%rdi), %xmm1, %xmm1
vmovhpd 40(%rdi), %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
To:
vmovups 32(%rdi), %ymm0
An existing test case is also improved from:
vmovsd (%rdi), %xmm0
vmovsd 16(%rdi), %xmm1
vmovsd 24(%rdi), %xmm2
vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd 8(%rdi), %xmm1, %xmm3
To:
vmovsd (%rdi), %xmm0
vmovsd 16(%rdi), %xmm1
vmovhpd 24(%rdi), %xmm0, %xmm0
vmovhpd 8(%rdi), %xmm1, %xmm1
This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).
Differential Revision: http://reviews.llvm.org/D6642
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224379 91177308-0d34-0410-b5e6-96231b3b80d8
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.
Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.
** Context **
Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.
** Motivating Example **
Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
%ld = load i8* %addr1
%zextld = zext i8 %ld to i32
%ld2 = load i32* %addr2
%add = add nsw i32 %ld2, %zextld
%sextadd = sext i32 %add to i64
%zexta = zext i8 %a to i32
%addza = add nsw i32 %zexta, %zextld
%sextaddza = sext i32 %addza to i64
%addb = add nsw i32 %b, %zextld
%sextaddb = sext i32 %addb to i64
call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
ret void
}
As it is, this IR generates the following assembly on x86_64:
[...]
movzbl (%rdi), %eax # zero-extended load
movl (%rsi), %es # plain load
addl %eax, %esi # 32-bit add
movslq %esi, %rdi # sign extend the result of add
movzbl %dl, %edx # zero extend the first argument
addl %eax, %edx # 32-bit add
movslq %edx, %rsi # sign extend the result of add
addl %eax, %ecx # 32-bit add
movslq %ecx, %rdx # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.
Now, by promoting the additions to form more extended loads we would generate:
[...]
movzbl (%rdi), %eax # zero-extended load
movslq (%rsi), %rdi # sign-extended load
addq %rax, %rdi # 64-bit add
movzbl %dl, %esi # zero extend the first argument
addq %rax, %rsi # 64-bit add
movslq %ecx, %rdx # sign extend the second argument
addq %rax, %rdx # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.
This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.
Note: The throughput numbers are similar on Sandy Bridge and Haswell.
** Proposed Solution **
To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))
The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.
** Performance **
Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar: ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%
The results are consistent for both O3 and Os.
<rdar://problem/18310086>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224351 91177308-0d34-0410-b5e6-96231b3b80d8
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224313 91177308-0d34-0410-b5e6-96231b3b80d8
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224294 91177308-0d34-0410-b5e6-96231b3b80d8
Revert until I find out why non-subreg enabled targets break.
This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224278 91177308-0d34-0410-b5e6-96231b3b80d8
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224272 91177308-0d34-0410-b5e6-96231b3b80d8
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224255 91177308-0d34-0410-b5e6-96231b3b80d8
options.
This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.
This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.
#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));
rdar://problem/19212448
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224186 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224126 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.
The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.
This patch fixes the issue in MISched and provides a test.
Patch by Russell Gallop!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224118 91177308-0d34-0410-b5e6-96231b3b80d8
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.
Added DW_OP_stack_value to the stack.
Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881)
is being investigated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224098 91177308-0d34-0410-b5e6-96231b3b80d8
Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224086 91177308-0d34-0410-b5e6-96231b3b80d8
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224084 91177308-0d34-0410-b5e6-96231b3b80d8
Since `MachineInstr` is required to have a trivial destructor, it cannot
remove itself from `LeakDetection`. Remove the calls.
As it happens, this requirement is because `MachineFunction` allocates
all `MachineInstr`s in a custom allocator; when the `MachineFunction` is
destroyed they're dropped of the edge. There's no benefit to detecting
leaks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224061 91177308-0d34-0410-b5e6-96231b3b80d8
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.
To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.
This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224059 91177308-0d34-0410-b5e6-96231b3b80d8
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.
To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224042 91177308-0d34-0410-b5e6-96231b3b80d8
Properly determine whether or not a phi was added by splitting.
Check against the current VNInfo of OrigLI instead of against the
OrigVNI argument.
Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224009 91177308-0d34-0410-b5e6-96231b3b80d8
The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224000 91177308-0d34-0410-b5e6-96231b3b80d8
This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes.
There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well.
p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc.
Differential Revision: http://reviews.llvm.org/D6587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223994 91177308-0d34-0410-b5e6-96231b3b80d8
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.
Added DW_OP_stack_value to the stack.
-This line, and those below, will be ignored--
M lib/CodeGen/AsmPrinter/DwarfDebug.cpp
A test/DebugInfo/incorrect-variable-debugloc1.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223981 91177308-0d34-0410-b5e6-96231b3b80d8
We can't mark partially undefined registers, so we have to allow reading
a register in the machine verifier if just parts of a register are
defined.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223896 91177308-0d34-0410-b5e6-96231b3b80d8
In the subregister liveness tracking case we do not create implicit
reads on partial register writes anymore, still we need to produce a new
SSA value for partial writes so the live segment has to end.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223895 91177308-0d34-0410-b5e6-96231b3b80d8
Adding the implicit defs/uses to the superregisters is semantically questionable
but was not dangerous before as the register allocator never assigned the same
register to two overlapping LiveIntervals even when the actually live
subregisters do not overlap. With subregister liveness tracking enabled this
does actually happen and leads to subsequent bugs if we don't stop adding
the superregister defs/uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223892 91177308-0d34-0410-b5e6-96231b3b80d8