POP instructions are aliased to the ARM LDM variants but have different syntax.
This caused two problems: we tried to access a non-existent operand to annotate
the '!', and the error message didn't make much sense.
With some vigorous hand-waving in the error message both problems can be
fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193322 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193317 91177308-0d34-0410-b5e6-96231b3b80d8
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193316 91177308-0d34-0410-b5e6-96231b3b80d8
On sandy bridge (PR17654) we now get
vpxor %xmm1, %xmm1, %xmm1
vpunpckhbw %xmm1, %xmm0, %xmm2
vpunpcklbw %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
On haswell it's a simple
vpmovzxbw %xmm0, %ymm0
There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193262 91177308-0d34-0410-b5e6-96231b3b80d8
- Skip instructions added in prolog. For specific targets, prolog may
insert helper function calls (e.g. _chkstk will be called when
there're more than 4K bytes allocated on stack). However, these
helpers don't use/def YMM/XMM registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193261 91177308-0d34-0410-b5e6-96231b3b80d8
Major steps include:
1). introduces a not-addr-taken bit-field in GlobalVariable
2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable
dosen't have its address taken.
3). AA use this info for disambiguation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193251 91177308-0d34-0410-b5e6-96231b3b80d8
This provides rudimentary testing of the llvm-c api.
The following commands are implemented:
* --module-dump
Read bytecode from stdin - print ir
* --module-list-functions
Read bytecode from stdin - list summary of functions
* --module-list-globals
Read bytecode from stdin - list summary of globals
* --targets-list
List available targets
* --object-list-sections
Read object file from stdin - list sections
* --object-list-symbols
Read object file from stdin - list symbols (like nm)
* --disassemble
Read lines of triple, hex ascii machine code from stdin - print disassembly
* --calc
Read lines of name, rpn from stdin - print generated module ir
Differential-Revision: http://llvm-reviews.chandlerc.com/D1776
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193233 91177308-0d34-0410-b5e6-96231b3b80d8
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193215 91177308-0d34-0410-b5e6-96231b3b80d8
The set of circumstances where the writeback register is allowed to be in the
list of registers is rather baroque, but I think this implements them all on
the assembly parsing side.
For disassembly, we still warn about an ARM-mode LDM even if the architecture
revision is < v7 (the required architecture information isn't available). It's
a silly instruction anyway, so hopefully no-one will mind.
rdar://problem/15223374
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193185 91177308-0d34-0410-b5e6-96231b3b80d8
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193179 91177308-0d34-0410-b5e6-96231b3b80d8
These branches have a 16-bit offset (R_MIPS_PC16).
List of conditional branch instructions:
bnz.{b,h,w,d}
bnz.v
bz.{b,h,w,d}
bz.v
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193157 91177308-0d34-0410-b5e6-96231b3b80d8
We can have a struct type with a single field and the field does not start
with 0. In that case, we should correctly update the offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193137 91177308-0d34-0410-b5e6-96231b3b80d8
The test before wasn't successfully testing this
since it was missing the datalayout piece to change
the size of the second address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193102 91177308-0d34-0410-b5e6-96231b3b80d8
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Fixes <rdar://problem/14968098>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193096 91177308-0d34-0410-b5e6-96231b3b80d8
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.
This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193090 91177308-0d34-0410-b5e6-96231b3b80d8
Found while adding type safety to the various DWARF enumerations (form,
attribute, tag, etc) that caused Clang to warn on an incompletely
covered switch. Converting the comment to a default/unreachable
uncovered this case of an unsupported form encoding. Seems we were
skipping fission strings entirely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193089 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are logically related as they allow read/write of MSA control registers.
Currently MSA control registers are emitted by number but hopefully that will change as soon
as GAS starts accepting them by name as that would make the assembly easier to read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193078 91177308-0d34-0410-b5e6-96231b3b80d8
The second parameter of the SLD intrinsic is the number of columns (GPR) to
slide left the source array.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193076 91177308-0d34-0410-b5e6-96231b3b80d8
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.
PR17621
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193064 91177308-0d34-0410-b5e6-96231b3b80d8
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:
1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.
The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.
If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.
This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.
rdar://15268442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193045 91177308-0d34-0410-b5e6-96231b3b80d8
This ensures that the prefix data is treated as part of the function for
the purpose of debug info. This provides a better debugging experience,
among other things by allowing a debug info client to correctly look up
a function in debug info given a function pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193042 91177308-0d34-0410-b5e6-96231b3b80d8
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193035 91177308-0d34-0410-b5e6-96231b3b80d8
SCEV currently fails to compute loop counts for nonunit stride
loops. This comes up frequently. It prevents loop optimization and
forces vectorization to insert extra loop checks.
For example:
void foo(int n, int *x) {
for (int i = 0; i < n; i += 3) {
x[i] = i;
x[i+1] = i+1;
x[i+2] = i+2;
}
}
We need to properly handle the case in which limit > INT_MAX-stride. In
the above case: n > INT_MAX-3. In this case the loop counter will step
beyond the limit and overflow at the same time. However, knowing that
signed integer overlow in undefined, we can assume the loop test
behavior is arbitrary after overflow. This obeys both C undefined
behavior rules, and the more strict LLVM poison value rules.
I'm finally fixing this in response to Hal Finkel's persistence.
The most probable reason that we never optimized this before is that
we were being careful to handle case where the developer expected a
side-effect free infinite loop relying on overflow:
for (int i = 0; i < n; i += s) {
++j;
}
return j;
If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might
expect an infinite loop. However there are plenty of ways to achieve
this effect without relying on undefined behavior of signed overflow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193015 91177308-0d34-0410-b5e6-96231b3b80d8
This is another (final?) stab at making us able to parse our own asm output
on Windows.
Symbols on Windows often contain @'s and ?'s in their names. Our asm parser
didn't like this. ?'s were not allowed, and @'s were intepreted as trying to
reference PLT/GOT/etc.
We can't just add quotes around the bad names, since e.g. for MinGW, we use gas
to assemble, and it doesn't like quotes in some places (notably in .def
directives).
This commit makes us allow ?'s in symbol names, and @'s in symbol names for MS
assembly.
Differential Revision: http://llvm-reviews.chandlerc.com/D1978
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193000 91177308-0d34-0410-b5e6-96231b3b80d8
PR17168 describes a test case that fails when compiling for debug with
fast-isel. Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.
For this problem to occur requires the following:
* Compile for debug
* Compile with fast-isel
* In a block B, fast-isel must partially succeed before punting to DAG-isel
* B must start with a PHI
* The first unhandled node in the DAG must not generate a machine instruction
* A debug value with an order less than that of that first node exists
When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire. Currently it tests whether the block is empty,
or whether the last instruction generated is a phi. When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi. This patch adds that check, and adds the test case from the
PR as a regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192976 91177308-0d34-0410-b5e6-96231b3b80d8
This caused the clang-native-mingw32-win7 buildbot to break.
The assembler was complaining about the following lines that were showing up
in the asm for CrashRecoveryContext.cpp:
movl $"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4", 4(%eax)
calll "_AddVectoredExceptionHandler@8"
.def "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4";
"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4":
calll "_RemoveVectoredExceptionHandler@4"
Reverting for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192940 91177308-0d34-0410-b5e6-96231b3b80d8
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
Error: cannot honor width suffix -- `ldrb r3,[r0],#1'
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192916 91177308-0d34-0410-b5e6-96231b3b80d8
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192908 91177308-0d34-0410-b5e6-96231b3b80d8
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192883 91177308-0d34-0410-b5e6-96231b3b80d8
Consider the following:
typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));
int4 __bbase_cvt_int(ushort4 v) {
ushort8 a;
a.lo = v;
return _mm_cvtepu16_epi32(a);
}
This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
%tmp = bitcast double %v.coerce to <4 x i16>
%tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
%0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
ret <4 x i32> %tmp2
}
The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
movsd %xmm0, -8(%rsp) <== Spill to the stack.
movq -8(%rsp), %xmm0 <== Reload it right back out.
pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for.
pblendw $1, %xmm1, %xmm0 <== We don't need this at all
pmovzxwd %xmm0, %xmm0 <== We already did this
ret
The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).
To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
pmovzxwd %xmm0, %xmm0
ret
rdar://15245794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192866 91177308-0d34-0410-b5e6-96231b3b80d8
like C++ should be the fully qualified names for the type.
Add a routine that does a language specific context walk to build
up the qualified name and use it when we add types/names to the
tables. Expand the gnu pubnames testcase as it's the most complex
to make sure that qualified types are also being added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192865 91177308-0d34-0410-b5e6-96231b3b80d8
The reason this got reverted was that the @feat.00 symbol which was emitted
for every TU became quoted, and on cygwin/mingw we use the gas assembler which
couldn't handle the quotes.
This commit fixes the problem by only emitting @feat.00 for win32, where we use
clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no
loss there.
With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since
it uses the Itanium ABI, which doesn't put these funny characters in symbols.
> Because of win32 mangling, we produce symbol and section names with
> funny characters in them, most notably @ characters.
>
> MC would choke on trying to parse its own assembly output. This patch addresses
> that by:
>
> - Making @ trigger quoting of symbol names
> - Also quote section names in the same way
> - Just parse section names like other identifiers (to allow for quotes)
> - Don't assume @ signifies a symbol variant if it is in a string.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192859 91177308-0d34-0410-b5e6-96231b3b80d8
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192813 91177308-0d34-0410-b5e6-96231b3b80d8
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192795 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Given a global array G[N], which is declared in this CU and has static initializer
avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N.
Also add a bit of stats.
This eliminates ~1% of instrumentations on SPEC2006
and also partially helps when asan is being run together with coverage.
Reviewers: samsonov
Reviewed By: samsonov
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1947
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192794 91177308-0d34-0410-b5e6-96231b3b80d8
The input to an RxSBG operation can be narrower as long as the upper bits
are don't care. This fixes a FIXME added in r192783.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192790 91177308-0d34-0410-b5e6-96231b3b80d8
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192784 91177308-0d34-0410-b5e6-96231b3b80d8
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.
The motivating examples are the first two in the testcase, which occur
in llvmpipe output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192783 91177308-0d34-0410-b5e6-96231b3b80d8
GNU AS didn't like quotes in symbol names.
Error: junk at end of line, first unrecognized character is `"'
.def "@feat.00";
"@feat.00" = 1
Reproduced on Cygwin's 2.23.52.20130309 and mingw32's 2.20.1.20100303.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192775 91177308-0d34-0410-b5e6-96231b3b80d8
1) Make sure we emit static member variables by checking
at the end of createGlobalVariableDIE rather than piecemeal
in the function.
(As a note, createGlobalVariableDIE needs rewriting.)
2) Make sure we use the definition rather than declaration DIE
for two things: a) determining linkage for gnu pubnames, and b)
as the address of the DIE for global variables.
(As a note, createGlobalVariableDIE really needs rewriting.)
Adjust the testcase to make sure we're checking the correct DIEs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192761 91177308-0d34-0410-b5e6-96231b3b80d8
Because of win32 mangling, we produce symbol and section names with
funny characters in them, most notably @ characters.
MC would choke on trying to parse its own assembly output. This patch addresses
that by:
- Making @ trigger quoting of symbol names
- Also quote section names in the same way
- Just parse section names like other identifiers (to allow for quotes)
- Don't assume @ signifies a symbol variant if it is in a string.
Differential Revision: http://llvm-reviews.chandlerc.com/D1945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192758 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.
Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.
On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.
Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.
The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.
Unit tests are updated so they don't depend on SD scheduling heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192750 91177308-0d34-0410-b5e6-96231b3b80d8
- Type of index used in extract_vector_elt or insert_vector_elt supposes
to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd
better to truncate (or zero-extend in case it's changed later) it to
mask element type to guarantee they are matching instead of asserting
that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192722 91177308-0d34-0410-b5e6-96231b3b80d8
- Lower signed division by constant powers-of-2 to target-independent
DAG operators instead of target-dependent ones to support them better
on targets where vector types are legal but shift operators on that
types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16>
though <16 x i16> is a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192721 91177308-0d34-0410-b5e6-96231b3b80d8
rdar:15221834 False AVX register dependencies cause 5x slowdown on
flops-5/6 and significant slowdown on several others.
This was blocking the switch to MI-Sched.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192669 91177308-0d34-0410-b5e6-96231b3b80d8
through bitcast, ptrtoint, and inttoptr instructions. This is valid
only if the related instructions are in that same basic block, otherwise
we may reference variables that were not live accross basic blocks
resulting in undefined virtual registers.
The bug was exposed when both SDISel and FastISel were used within the same
function, i.e., one basic block is issued with FastISel and another with SDISel,
as demonstrated with the testcase.
<rdar://problem/15192473>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192636 91177308-0d34-0410-b5e6-96231b3b80d8
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
to be generated.
I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192631 91177308-0d34-0410-b5e6-96231b3b80d8
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192608 91177308-0d34-0410-b5e6-96231b3b80d8
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192599 91177308-0d34-0410-b5e6-96231b3b80d8
INSERT is the first type of MSA instruction that requires a change to the way
MSA registers are parsed. This happens because MSA registers may be suffixed by
an index in the form of an immediate or a general purpose register. The changes
to parseMSARegs reflect that requirement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192582 91177308-0d34-0410-b5e6-96231b3b80d8
The alignment of allocated space was wrong, see Bugzila 17345.
Done by Zvi Rackover <zvi.rackover@intel.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192573 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.
This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.
radar://15024459
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192537 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a problem from a previous check-in where a return value was omitted.
Previously the remote/stubs-remote.ll and remote/stubs-sm-pic.ll tests were reporting passes, but they should have been failing. Those tests attempt to link against an external symbol and remote symbol resolution is not supported. The old RemoteMemoryManager implementation resulted in local symbols being used for resolution and the child process crashed but the test didn't notice. With this check-in remote symbol resolution fails, and so the test (correctly) fails.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192514 91177308-0d34-0410-b5e6-96231b3b80d8
When if converting something like:
true:
... = R0<kill>
false:
... = R0<kill>
then the instructions of the true block must not have a <kill> flag
anymore, as the instruction of the false block follow and do still read
the R0 value.
Specifically this patch determines the set of register live-in in the
false block (possibly after simulating the liveness changes of the
duplicated instructions). Each of these live-in registers mustn't be
killed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192482 91177308-0d34-0410-b5e6-96231b3b80d8