We don't need to represent UnwindHelp in IR. Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234061 91177308-0d34-0410-b5e6-96231b3b80d8
Check that the `MDLocalVariable::getInlinedAt()` in a debug info
intrinsic's variable always matches the `MDLocation::getInlinedAt()` of
its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), since it's expensive and unnecessary, but I'll let
this verifier check bake for a while (a week maybe?) first. I've
updated the testcases that had the wrong value for `inlinedAt:`.
This checks that things are sane in the IR, but currently things go out
of whack in a few places in the backend. I'll follow shortly with
assertions in the backend (with code fixes).
If you have out-of-tree testcases that just started failing, here's how
I updated these ones:
1. The verifier check gives you the basic block, function, instruction,
and relevant metadata arguments (metadata numbering doesn't
necessarily match the source file, unfortunately).
2. Look at the `@llvm.dbg.*()` instruction, and compare the
`inlinedAt:` fields of the variable argument (second `metadata`
argument) and the `!dbg` attachment.
3. Figure out based on the variable `scope:` chain and the functions in
the file whether the variable has been inlined (and into what), so
you can determine which `inlinedAt:` is actually correct. In all of
the in-tree testcases, the `!MDLocation()` was correct and the
`!MDLocalVariable()` was wrong, but YMMV.
4. Duplicate the metadata that you're going to change, and add/drop the
`inlinedAt:` field from one of them. Be careful that the other
references to the same metadata node point at the correct one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234021 91177308-0d34-0410-b5e6-96231b3b80d8
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced. I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian. With this changed to correctly identify the target
endianness, the optimizations work as expected.
I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was
an orphaned method with no callers, so I've just removed it.
The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234011 91177308-0d34-0410-b5e6-96231b3b80d8
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
Differential Revision: http://reviews.llvm.org/D8516
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234004 91177308-0d34-0410-b5e6-96231b3b80d8
Register coalescing can change the target of a RegPair hint to a
physreg, we should not crash on this. This also slightly improved the
way ARMBaseRegisterInfo::updateRegAllocHint() works.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233987 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to use the same representation of llvm.eh.actions
in outlined handlers as we use in the parent function because i32's are
just constants that can be copied freely between functions.
I had to add a sentinel alloca to the list of child allocas so that we
don't try to sink the catch object into the handler. Normally, one would
use nullptr for this kind of thing, but TinyPtrVector doesn't support
null elements. More than that, it's elements have to have a suitable
alignment. Therefore, I settled on this for my sentinel:
AllocaInst *getCatchObjectSentinel() {
return static_cast<AllocaInst *>(nullptr) + 1;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233947 91177308-0d34-0410-b5e6-96231b3b80d8
Without this patch, we split the 256-bit vector into halves and produced something like:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
Now, we eliminate the xor and blend because those zeros are free with the vmovd:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233941 91177308-0d34-0410-b5e6-96231b3b80d8
For code like this:
define <8 x i32> @load_v8i32() {
ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}
We produce this AVX code:
_load_v8i32: ## @load_v8i32
movl $7, %eax
vmovd %eax, %xmm0
vxorps %ymm1, %ymm1, %ymm1
vblendps $1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
retq
There are at least 2 bugs in play here:
We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.
The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.
[1] AddedComplexity has close to no documentation in the source.
The best we have is this comment: "roughly corresponds to the number of nodes that are covered".
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!).
I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ
Differential Revision: http://reviews.llvm.org/D8794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233931 91177308-0d34-0410-b5e6-96231b3b80d8
I'm playing with supporting custom stack map formats with statepoints. While
doing so, I noticed that the existing implementation didn't indicate inherently
unsized frames. This change essentially just ports the functionality that already
exists for the default StackMaps section to custom stackmaps.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233891 91177308-0d34-0410-b5e6-96231b3b80d8
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.
I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233832 91177308-0d34-0410-b5e6-96231b3b80d8
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.
Fixes PR22664.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233775 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us catch exceptions in simple cases.
N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
that we generate.
- IP-to-State tables are sensitive to the order of emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233767 91177308-0d34-0410-b5e6-96231b3b80d8
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.
Related to PR22754.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233755 91177308-0d34-0410-b5e6-96231b3b80d8
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.
rdar://problem/19267165
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233753 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:
1. If the preferred type is a floating-point value, do the mul-splat trick on
the corresponding integer type and then bitcast.
2. If the preferred type is a vector, do the mul-splat trick on one vector
element, and then build a vector out of them.
Fixes PR22754 (although, we should also turn off use of vector types at -O0).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233749 91177308-0d34-0410-b5e6-96231b3b80d8
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)
It improves the v4i64 case although not optimally. This AVX codegen:
vmovq {{.*#+}} xmm0 = mem[0],zero
vxorpd %ymm1, %ymm1, %ymm1
vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
Becomes:
vmovsd {{.*#+}} xmm0 = mem[0],zero
Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:
We're not handling v32i8 / v16i16.
We're not getting the FP / int domains right for instruction selection.
But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64,
I'm submitting this patch ahead of fixing the above.
Differential Revision: http://reviews.llvm.org/D8341
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233704 91177308-0d34-0410-b5e6-96231b3b80d8
So far, we do not yet support any instruction specific to zEC12.
Most of the facilities added with zEC12 are indeed not very useful
to compiler code generation, but there is one exception: the
miscellaneous-extensions facility provides the RISBGN instruction,
which is a variant of RISBG that does not set the condition code.
Add support for this facility, MC support for RISBGN, and CodeGen
support for prefering RISBGN over RISBG on zEC12, unless we can
actually make use of the condition code set by RISBG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233690 91177308-0d34-0410-b5e6-96231b3b80d8
We already exploit a number of instructions specific to z196,
but not yet POPCNT. Add support for the population-count
facility, MC support for the POPCNT instruction, CodeGen
support for using POPCNT, and implement the getPopcntSupport
TargetTransformInfo hook.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233689 91177308-0d34-0410-b5e6-96231b3b80d8
This hooks up the TargetTransformInfo machinery for SystemZ,
and provides an implementation of getIntImmCost.
In addition, the patch adds the isLegalICmpImmediate and
isLegalAddImmediate TargetLowering overrides, and updates
a couple of test cases where we now generate slightly
better code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233688 91177308-0d34-0410-b5e6-96231b3b80d8
it more liberally.
SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233681 91177308-0d34-0410-b5e6-96231b3b80d8
We used to miss non-Q YMM integer vectors, and, non-Q/D XMM integer
vectors.
While there, change the v4i32 patterns to prefer MOVNTDQ.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233668 91177308-0d34-0410-b5e6-96231b3b80d8
We used to match the register variant before the immediate when the register
argument could be implicitly zero-extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233653 91177308-0d34-0410-b5e6-96231b3b80d8
Generate tables in the .xdata section representing what actions to take
when an exception is thrown. This currently fills in state for
cleanups, catch handlers are still unfinished.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233636 91177308-0d34-0410-b5e6-96231b3b80d8
When we expand the RET_ReallyLR pseudo instruction we also need to transfer the
implicit operands.
The return register is an implicit operand and without it the liveness
calculation generates an incorrect live-out set for the patchpoint.
This fixes rdar://problem/19068476.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233635 91177308-0d34-0410-b5e6-96231b3b80d8
BPF has cpu_to_be and cpu_to_le instructions.
For now assume little endian and generate cpu_to_be for ISD::BSWAP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233620 91177308-0d34-0410-b5e6-96231b3b80d8
Adds a test to verify the behavior that r233153 restored: 'optnone'
does not spuriously disable the DAG combiner, and in fact there are
cases where the DAG combiner must run (even at -O0 or 'optnone') in
order for codegen to succeed.
Differential Revision: http://reviews.llvm.org/D8614
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233584 91177308-0d34-0410-b5e6-96231b3b80d8
When a new SM architecture is introduced, it is only supported by the
current PTX version and later. Make sure we are using at least the
minimum PTX version for the target architecture.
This also removes support for PTX ISA < 3.2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233583 91177308-0d34-0410-b5e6-96231b3b80d8
Compiling the following function with -O0 would crash, since LLVM would
hit an assertion in getTestUnderMaskCond:
int test(unsigned long x)
{
return x >= 0 && x <= 15;
}
Fixed by detecting the case in the caller of getTestUnderMaskCond.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233541 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The 'R' constraint is actually supposed to be much more complicated than
this and is defined in terms of whether it will cause macro expansion in
the assembler. 'R' is getting less useful due to architecture changes and
ought to be replaced by other constraints. We therefore implement 9-bit
offsets which will work for all subtargets and all instructions.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8440
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233537 91177308-0d34-0410-b5e6-96231b3b80d8
They're harmless and it's easy to generate them from clang, leading to
a crash in LLVM. Found by afl-fuzz.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233500 91177308-0d34-0410-b5e6-96231b3b80d8
Fix testcases that don't pass the verifier after a WIP patch to check
`MDSubprogram` operands more effectively. I found the following issues:
- When `isDefinition: false`, the `variables:` field might point at
`!{i32 786468}`, or at a tuple that pointed at an empty tuple with
the comment "previously: invalid DW_TAG_base_type" (I vaguely recall
adding those comments during an upgrade script). In these cases, I
just dropped the array.
- The `variables:` field might point at something like `!{!{!8}}`,
where `!8` was an `MDLocation`. I removed the extra layer of
indirection.
- Invalid `type:` (not an `MDSubroutineType`).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233466 91177308-0d34-0410-b5e6-96231b3b80d8
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233415 91177308-0d34-0410-b5e6-96231b3b80d8
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.
Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233410 91177308-0d34-0410-b5e6-96231b3b80d8
Expose bpf pseudo load instruction via intrinsic. It is used by front-ends that
can encode file descriptors directly into IR instead of relying on relocations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233396 91177308-0d34-0410-b5e6-96231b3b80d8
nodes.
When a node is terminal it is pushed at the end of the list of the copies to
coalesce instead of being completely ignored. In effect, this reduces its
priority over non-terminal nodes.
Because of that, we do not miss the rematerialization opportunities, nor the
copies that can be merged with more complex, than the terminal rule,
interference checks.
Related to PR22768.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233395 91177308-0d34-0410-b5e6-96231b3b80d8
Change `LLParser` to require a non-null `scope:` field for both
`MDLocation` and `MDLocalVariable`. There's no need to wait for the
verifier for this check. This also allows their `::getImpl()` methods
to assert that the incoming scope is non-null.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233394 91177308-0d34-0410-b5e6-96231b3b80d8
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233366 91177308-0d34-0410-b5e6-96231b3b80d8
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233357 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint. This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure. The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like. One major option is to give up on doing spilling in the backend and do it at the IR level instead. We'd give up the ability to have gc values in registers, but that's a minor cost in practice. We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure. Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.
I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233356 91177308-0d34-0410-b5e6-96231b3b80d8
This test returns nonnative integer types which aren't supported on all targets.
The real issue with the SelectionDAG scheduler is with x86 EFLAGS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233355 91177308-0d34-0410-b5e6-96231b3b80d8
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233354 91177308-0d34-0410-b5e6-96231b3b80d8
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.
This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.
Patch by Pawel Bylica!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233351 91177308-0d34-0410-b5e6-96231b3b80d8
Fix testcases whose variables are invalid. I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:
- `scope:` fields need to point at `MDLocalScope` and can't be null.
- `file:` fields need to point at `MDFile`.
- `inlinedAt:` fields need to point at `MDLocation`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233349 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The ARM backend can use a loop to implement copying byval parameters before
a call. In non-thumb2 mode it uses a constant pool load to materialize the
trip count. For targets that need movt instead (e.g. Native Client), use
the same code as in thumb2 mode to materialize the trip count.
Reviewers: jfb, t.p.northover
Differential Revision: http://reviews.llvm.org/D8442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233324 91177308-0d34-0410-b5e6-96231b3b80d8
those are in the same basic block.
The previous approach was the topological order of the basic block.
By default this rule is disabled.
Related to PR22768.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233241 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233224 91177308-0d34-0410-b5e6-96231b3b80d8
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233209 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233204 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows AVX blend instructions to handle insertion into the low
element of a 256-bit vector for the appropriate data types.
For f32, instead of:
vblendps $1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
vblendps $15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]
we get:
vblendps $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]
For f64, instead of:
vmovsd %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1]
vblendpd $3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]
we get:
vblendpd $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]
For the hardware-neglected integer data types, I left a TODO comment in the
code and added regression tests for a follow-on patch.
Differential Revision: http://reviews.llvm.org/D8609
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233199 91177308-0d34-0410-b5e6-96231b3b80d8
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233198 91177308-0d34-0410-b5e6-96231b3b80d8
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233153 91177308-0d34-0410-b5e6-96231b3b80d8
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233137 91177308-0d34-0410-b5e6-96231b3b80d8
V_FRACT is buggy on SI.
R600-specific code is left intact.
v2: drop the multiclass, use complex VOP3 patterns
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233075 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previous behaviour of 'R' and 'm' has been preserved for now. They will be
improved in subsequent commits.
The offset permitted by ZC varies according to the subtarget since it is
intended to match the restrictions of the pref, ll, and sc instructions.
The restrictions on these instructions are:
* For microMIPS: 12-bit signed offset.
* For Mips32r6/Mips64r6: 9-bit signed offset.
* Otherwise: 16-bit signed offset.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8414
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233063 91177308-0d34-0410-b5e6-96231b3b80d8
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233033 91177308-0d34-0410-b5e6-96231b3b80d8
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233030 91177308-0d34-0410-b5e6-96231b3b80d8
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233024 91177308-0d34-0410-b5e6-96231b3b80d8
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232988 91177308-0d34-0410-b5e6-96231b3b80d8
This change is incorrect since it converts double rounding into single rounding,
which can produce different results. Instead this optimization will be done by
modifying Clang's codegen to not produce double rounding in the first place.
This reverts commit r232954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232962 91177308-0d34-0410-b5e6-96231b3b80d8
This function assumed that SMRD instructions always have immediate
offsets, which is not always the case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232957 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically when the conversion is done in two steps, f16 -> f32 -> f64.
For example:
%1 = tail call float @llvm.convert.from.fp16.f32(i16 %0)
%conv = fpext float %1 to double
to:
vcvtb.f64.f16
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232954 91177308-0d34-0410-b5e6-96231b3b80d8
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232943 91177308-0d34-0410-b5e6-96231b3b80d8
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232935 91177308-0d34-0410-b5e6-96231b3b80d8
of this add a test that shows we can generate code for functions
that specifically enable a subtarget feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232884 91177308-0d34-0410-b5e6-96231b3b80d8
of this add a test that shows we can generate code with
for functions that differ by subtarget feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232882 91177308-0d34-0410-b5e6-96231b3b80d8
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232879 91177308-0d34-0410-b5e6-96231b3b80d8
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232872 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch, for this one exact case, we'll generate:
blendps %xmm0, %xmm1, $1
instead of:
insertps %xmm0, %xmm1, $0
If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.
The detailed performance data motivation for this may be found in D7866;
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.
Differential Revision: http://reviews.llvm.org/D8332
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232850 91177308-0d34-0410-b5e6-96231b3b80d8
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232842 91177308-0d34-0410-b5e6-96231b3b80d8
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232825 91177308-0d34-0410-b5e6-96231b3b80d8
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232802 91177308-0d34-0410-b5e6-96231b3b80d8
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232780 91177308-0d34-0410-b5e6-96231b3b80d8
This switches the sense of the i32 values and updates the test cases.
We can also use CHECK-SAME to clean up some tests, and reduce the visual
noise from bitcasts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232774 91177308-0d34-0410-b5e6-96231b3b80d8
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.
Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.
Differential Revision: http://reviews.llvm.org/D8366
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232773 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect
and that triggers an assertion in nvvm_reflect optimization pass. This
change allows nvvm_reflect pass to deal with both old and new ways to
pass an argument to __nvvm_reflect.
Test Plan: ninja check-all
Reviewers: eliben, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232732 91177308-0d34-0410-b5e6-96231b3b80d8
This should bring the windows bots back.
It is a bit ugly, but it is better than what we had before: The triple would
say that the object format was COFF, but llc/llvm-mc would produce an ELF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232683 91177308-0d34-0410-b5e6-96231b3b80d8
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.
This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).
Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.
Differential Revision: http://reviews.llvm.org/D8416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232660 91177308-0d34-0410-b5e6-96231b3b80d8
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.
This fixes problems when coalescing towards a subclass with additional
subregisters available.
The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232652 91177308-0d34-0410-b5e6-96231b3b80d8
The checks here were so vague that we could nuke intrinsics
from existence and still pass the test because we'd match
the function name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232647 91177308-0d34-0410-b5e6-96231b3b80d8
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half,
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.
The lax checking in this file will be addressed in
another commit. There are bigger problems here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232644 91177308-0d34-0410-b5e6-96231b3b80d8
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232629 91177308-0d34-0410-b5e6-96231b3b80d8
Memcpy, and other memory intrinsics, typically tries to use LDM/STM if
the source and target addresses are 4-byte aligned. In CodeGenPrepare
look for calls to memory intrinsics and, if the object is on the
stack, 4-byte align it if it's large enough that we expect that memcpy
would want to use LDM/STM to copy it.
Differential Revision: http://reviews.llvm.org/D7908
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232627 91177308-0d34-0410-b5e6-96231b3b80d8
These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous. From a private patch equivalent to r189742/PR17028.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232580 91177308-0d34-0410-b5e6-96231b3b80d8
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x). This saves an instruction on
architectures like X86 and POWER(64).
Differential Revision: http://reviews.llvm.org/D8350
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232572 91177308-0d34-0410-b5e6-96231b3b80d8
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Differential Revision: http://reviews.llvm.org/D8394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232570 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8369
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232562 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8374
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232539 91177308-0d34-0410-b5e6-96231b3b80d8
The input offset to needsFrameBaseReg is a negative value below the top of the
stack frame, but when converting to a positive offset from the bottom of the
stack frame this value was negated, causing the final offset to be too large
by twice the input offset's magnitude. Fix that by not negating the offset.
Patch by John Brawn
Differential Revision: http://reviews.llvm.org/D8316
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232513 91177308-0d34-0410-b5e6-96231b3b80d8
The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232486 91177308-0d34-0410-b5e6-96231b3b80d8
ARMv6K is another layer between ARMV6 and ARMV6T2. This is the LLVM
side of the changes.
ARMV6 family LLVM implementation.
+-------------------------------------+
| ARMV6 |
+----------------+--------------------+
| ARMV6M (thumb) | ARMV6K (arm,thumb) | <- From ARMV6K and ARMV6M processors
+----------------+--------------------+ have support for hint instructions
| ARMV6T2 (arm,thumb,thumb2) | (SEV/WFE/WFI/NOP/YIELD). They can
+-------------------------------------+ be either real or default to NOP.
| ARMV7 (arm,thumb,thumb2) | The two processors also use
+-------------------------------------+ different encoding for them.
Patch by Vinicius Tinti.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232468 91177308-0d34-0410-b5e6-96231b3b80d8
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality, e.g.,
(v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
(v2i16 (truncate (v2i64)))))
->
(v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
(v2i32 (truncate (v2i64)))))))
This isn't really target-specific, and, as such, would best go in the
DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232459 91177308-0d34-0410-b5e6-96231b3b80d8
We removed @llvm.eh.typeid.for.i32 and replaced it with
@llvm.eh.typeid.for quite some time ago. Fix up some test cases which
never got updated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232421 91177308-0d34-0410-b5e6-96231b3b80d8
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:
- Empty `filename:` fields in `MDFile`s. Compile units and some types
require non-empty filenames. A number of testcases have empty
filenames, probably due to hand-reduction of testcases.
- Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in
the debug info schema to `!{}`. They cause problems for
`!MDSubroutineType`'s `types:` array, since it requires all operands
to be valid types. (Note that `!{null}` is the correct type array
for functions that take no arguments and return `void`.)
- Significantly bitrotted testcases. Nodes got left behind a few
upgrades ago because of missing or invalid tags.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232415 91177308-0d34-0410-b5e6-96231b3b80d8
There are no opcodes for this. This also adds a test case.
v2: make test more robust
Patch by: Grigori Goronzy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232386 91177308-0d34-0410-b5e6-96231b3b80d8
Fix justify error for small structures bigger than 32 bits in fixed
arguments for MIPS64 big endian. There was a problem when small structures
are passed as fixed arguments. The structures that are bigger than 32 bits
but smaller than 64 bits were not left justified properly on MIPS64 big
endian. This is fixed by shifting the value to make it left justified when
appropriate.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D8174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232382 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically, if there are copy-like instructions in the loop header
they are moved into the loop close to their uses. This reduces the live
intervals of the values and can avoid register spills.
This is working towards a fix for http://llvm.org/PR22230.
Review: http://reviews.llvm.org/D7259
Next steps:
- Find a better cost model (which non-copy instructions should be sunk?)
- Make this dependent on register pressure
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232262 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes pr22854.
The core issue on the bug is that there are multiple instructions that
print the same in assembly. In fact, there doesn't seem to be any
syntax for specifying that a constant that fits in 8 bits should use a 32 bit
immediate.
The attached patch changes fast isel to consider i16immSExt8,
i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know
to call the predicate back in the day.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232223 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232184 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes a bug in the shuffle lowering logic implemented by function
'lowerV2X128VectorShuffle'.
The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a
shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR
nodes. The problematic expansion only occurs when the shuffle mask M has an
'undef' element at position 2, and M is equivalent to mask <0,1,4,5>.
In that case, the algorithm propagates the wrong vector to one of the two
new EXTRACT_SUBVECTOR nodes.
Example:
;;
define <4 x double> @test(<4 x double> %A, <4 x double> %B) {
entry:
%0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5>
ret <4 x double> %0
}
;;
Before this patch, llc (-mattr=+avx) generated:
vinsertf128 $1, %xmm0, %ymm0, %ymm0
With this patch, llc correctly generates:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
Added test lower-vec-shuffle-bug.ll
Differential Revision: http://reviews.llvm.org/D8259
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232179 91177308-0d34-0410-b5e6-96231b3b80d8
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.
This should complete the removal of insert/extract128 intrinsics.
The Clang precursor patch for this change was checked in at r232109.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232120 91177308-0d34-0410-b5e6-96231b3b80d8
Instead print them as part of the $dst operand. The AsmMatcher
requires the 32-bit and 64-bit encodings have the same mnemonic in
order to parse them correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232105 91177308-0d34-0410-b5e6-96231b3b80d8
The permps and permd instructions have their operands swapped compared to the
intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP
category.
I did not create a new category for those two, as they are the only one AFAICT
in that case.
<rdar://problem/20108262>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232085 91177308-0d34-0410-b5e6-96231b3b80d8
Part of the folding logic implemented by function 'PerformISDSETCCCombine'
only worked under the assumption that the condition code in input could have
been either SETNE or SETEQ.
Unfortunately that assumption was incorrect, and in some cases the algorithm
ended up incorrectly folding SETCC nodes.
The incorrect folding only affected SETCC dag nodes where:
- one of the operands was a build_vector of all zeroes;
- the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements;
- the condition code was neither SETNE nor SETEQ.
Example:
(setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge)
Before this patch, the entire dag node sequence from the example was
incorrectly folded to node %A.
With this patch, the dag node sequence is folded to a
(xor %A, (v4i1 VectorOfAllOnes)).
Added test setcc-combine.ll.
Thanks to Greg Bedwell for spotting this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232046 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we've replaced the vinsertf128 intrinsics,
do the same for their extract twins.
This is very much like D8086 (checked in at r231794):
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is also the LLVM sibling to the cfe D8275 patch.
Differential Revision: http://reviews.llvm.org/D8276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232045 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, run both EH preparation passes, and have them both ignore
functions with unrecognized EH personalities. Pass delegation involved
some hacky code for creating an AnalysisResolver that we don't need now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231995 91177308-0d34-0410-b5e6-96231b3b80d8
CodeGen incorrectly ignores (assert from APInt) constant index bigger
than 2^64 in getelementptr instruction. This is a test and fix for that.
Patch by Paweł Bylica!
Reviewed By: rnk
Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits
Differential Revision: http://reviews.llvm.org/D8219
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231984 91177308-0d34-0410-b5e6-96231b3b80d8
If a function is going in an unique section (because of -ffunction-sections
for example), putting a jump table in .rodata will keep .rodata alive and
that will keep alive any other function that also has a jump table.
Instead, put the jump table in a unique section that is associated with the
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231961 91177308-0d34-0410-b5e6-96231b3b80d8
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231959 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The generic ELF TargetObjectFile defaults to .ctors, but Linux's
defaults to .init_array by calling InitializeELF with the value of
UseInitArray from TargetMachine. Make NaCl's behavior match.
Reviewers: jvoung
Differential Revision: http://reviews.llvm.org/D8240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231934 91177308-0d34-0410-b5e6-96231b3b80d8
- Use TargetLowering to check for the actual cost of each extension.
- Provide a factorized method to check for the cost of an extension:
TargetLowering::isExtFree.
- Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able
to tune the cost of non-free extensions.
This refactoring offers a better granularity to model what really happens on
different targets.
No performance changes and very few code differences.
Part of <rdar://problem/19267165>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231855 91177308-0d34-0410-b5e6-96231b3b80d8
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231840 91177308-0d34-0410-b5e6-96231b3b80d8
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)
Differential Revision: http://reviews.llvm.org/D7760
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231800 91177308-0d34-0410-b5e6-96231b3b80d8
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088
Differential Revision: http://reviews.llvm.org/D8086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231794 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a subtle issue that was introduced in r205153.
When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.
To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.
rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231721 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.
This reverts commit r229952.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231719 91177308-0d34-0410-b5e6-96231b3b80d8
In the case where just tables are part of the function section, this produces
more readable assembly by avoiding switching to the eh section and back
to .text.
This would also break with non unique section names, as trying to switch to
a unique section actually creates a new one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231677 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Code is mostly copied from AArch64 port and modified where needed for Mips.
This handles the "non" legal cases of logical ops. Legal cases are handled by tablegen patterns.
Test Plan:
Make check test logopm.ll
All of test-suite passes at O0/O2 and mips32 r1/r2 with this new change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: echristo, llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D6599
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231665 91177308-0d34-0410-b5e6-96231b3b80d8
Also, replaced line with 'target triple' with flag -mtriple on the RUN line.
Removed the data layout string as it is not needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231654 91177308-0d34-0410-b5e6-96231b3b80d8
There were cases where the backend computed a wrong permute mask for a VPERM2X128 node.
Example:
\code
define <8 x float> @foo(<8 x float> %a, <8 x float> %b) {
%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
ret <8 x float> %shuffle
}
\code end
Before this patch, llc (with -mattr=+avx) emitted the following vperm2f128:
vperm2f128 $0, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[0,1,0,1]
With this patch, llc emits a vperm2f128 with a correct permute mask:
vperm2f128 $17, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[2,3,2,3]
Differential Revision: http://reviews.llvm.org/D8119
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231601 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))
An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.
Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.
Added X86 test and-load-fold.ll
Differential Revision: http://reviews.llvm.org/D8085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231563 91177308-0d34-0410-b5e6-96231b3b80d8
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.
This prevents many cases of spilling scalar data between the gpr + simd registers.
At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).
Differential Revision: http://reviews.llvm.org/D8132
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231554 91177308-0d34-0410-b5e6-96231b3b80d8
to disable lane switching if we don't actually have the instruction
set we want to switch to. Models the earlier check above the
conditional for the pass.
The testcase is one that triggered with the assert that's added
as part of the fix, use it to avoid adding a new testcase as it
highlights the same problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231539 91177308-0d34-0410-b5e6-96231b3b80d8
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
<rdar://problem/20072968>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231527 91177308-0d34-0410-b5e6-96231b3b80d8
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))
Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.
Differential Revision: http://reviews.llvm.org/D7622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231507 91177308-0d34-0410-b5e6-96231b3b80d8
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231443 91177308-0d34-0410-b5e6-96231b3b80d8
This patch reduces code size for all AVX targets and increases speed for some chips.
SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.
AVX subsequently made the instruction useful by adding a 4-register operand form.
So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.
This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter
No new test cases with this patch because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.
http://llvm.org/bugs/show_bug.cgi?id=22483
Differential Revision: http://reviews.llvm.org/D8063
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231408 91177308-0d34-0410-b5e6-96231b3b80d8
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231396 91177308-0d34-0410-b5e6-96231b3b80d8
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Differential Revision: http://reviews.llvm.org/D7939
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231380 91177308-0d34-0410-b5e6-96231b3b80d8
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231371 91177308-0d34-0410-b5e6-96231b3b80d8
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231366 91177308-0d34-0410-b5e6-96231b3b80d8
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231332 91177308-0d34-0410-b5e6-96231b3b80d8
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231307 91177308-0d34-0410-b5e6-96231b3b80d8