This patch corresponds to review:
http://reviews.llvm.org/D9941
It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238578 91177308-0d34-0410-b5e6-96231b3b80d8
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238144 91177308-0d34-0410-b5e6-96231b3b80d8
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238097 91177308-0d34-0410-b5e6-96231b3b80d8
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8. Thus LLVM crashes with an "unable to select" message. We
observed this since one of our buildbots is configured to generate
code for a POWER7.
This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.
I've added a test case variant for the vpkudum pattern when the
instruction isn't available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237952 91177308-0d34-0410-b5e6-96231b3b80d8
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237937 91177308-0d34-0410-b5e6-96231b3b80d8
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...
When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:
.Ltmp526:
addis 3, 2, .LC12@toc@ha
.Ltmp1628:
std 2, 40(1)
ld 5, 0(27)
ld 2, 8(27)
ld 11, 16(27)
ld 3, .LC12@toc@l(3)
rldicl 4, 4, 0, 32
mtctr 5
bctrl
ld 2, 40(1)
And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
nop
std r2,40(r1)
ld r5,0(r27)
ld r2,8(r27)
ld r11,16(r27)
ld r3,-32472(r2)
clrldi r4,r4,32
mtctr r5
bctrl
ld r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.
This is done as a separate pass because:
1. These extra r2 dependencies are not really properties of the
instructions, but rather due to a linker bug, and maybe one day we'll be
able to get rid of them when targeting linkers without this bug (and,
thus, keeping the logic centralized here will make that
straightforward).
2. There are ISel-level peephole optimizations that propagate the @toc@l
relocations to some user instructions, and so the exta dependencies do
not apply only to a fixed set of instructions (without undesirable
definition replication).
The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237556 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for the following new instructions in the
Power ISA 2.07:
vpksdss
vpksdus
vpkudus
vpkudum
vupkhsw
vupklsw
These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces. These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.
The first three instructions perform saturating pack operations. The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated. The other
instructions are only generated via built-in support for now.
Appropriate tests have been added.
There is a companion patch to clang for the rest of this support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237499 91177308-0d34-0410-b5e6-96231b3b80d8
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.
Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.
2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236850 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D9440
It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236755 91177308-0d34-0410-b5e6-96231b3b80d8
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive. For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations. This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.
Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat. When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector. We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.
A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.
An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization. From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236606 91177308-0d34-0410-b5e6-96231b3b80d8
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236120 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235910 91177308-0d34-0410-b5e6-96231b3b80d8
So long as the choice between printing msync and sync is not ambiguous, we can
print 'sync 0' and just 'sync'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235663 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.
Thus, after some hours of updating test cases...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235616 91177308-0d34-0410-b5e6-96231b3b80d8
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235608 91177308-0d34-0410-b5e6-96231b3b80d8
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235560 91177308-0d34-0410-b5e6-96231b3b80d8
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235497 91177308-0d34-0410-b5e6-96231b3b80d8
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:
({ \
register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \
register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \
register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \
register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \
__asm__ __volatile__ \
("sc" \
: "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \
: "0"(r0), "1"(r3), "2"(r4), "3"(r5) \
: "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \
r3; \
})
Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).
Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235283 91177308-0d34-0410-b5e6-96231b3b80d8
See r230786 and r230794 for similar changes to gep and load
respectively.
Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.
When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.
This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.
This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).
No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.
This leaves /only/ the varargs case where the explicit type is required.
Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.
About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.
import fileinput
import sys
import re
pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")
def conv(match, line):
if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
return line
return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]
for line in sys.stdin:
sys.stdout.write(conv(re.search(pat, line), line))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235145 91177308-0d34-0410-b5e6-96231b3b80d8
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.
By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.
This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.
This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.
For example, this will enable us to balance the tree based on profile info
in the future.
The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
does turn out to be a problem, we could limit the search space of the algorithm.
This commit also disables all optimizations during switch lowering in -O0.
Differential Revision: http://reviews.llvm.org/D8649
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235101 91177308-0d34-0410-b5e6-96231b3b80d8
Many of these predate llvm-readobj. With elf-dump we had to match
a relocation to symbol number and symbol number to symbol name or
section number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235015 91177308-0d34-0410-b5e6-96231b3b80d8
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234705 91177308-0d34-0410-b5e6-96231b3b80d8
As it turns out, even though these are part of ISA 2.06, the P7 does not
support them (or, at least, not any P7s we're tested so far).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234686 91177308-0d34-0410-b5e6-96231b3b80d8
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234670 91177308-0d34-0410-b5e6-96231b3b80d8
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234603 91177308-0d34-0410-b5e6-96231b3b80d8
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Fixes PR23173.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234561 91177308-0d34-0410-b5e6-96231b3b80d8
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced. I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian. With this changed to correctly identify the target
endianness, the optimizations work as expected.
I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was
an orphaned method with no callers, so I've just removed it.
The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234011 91177308-0d34-0410-b5e6-96231b3b80d8
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
Differential Revision: http://reviews.llvm.org/D8516
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234004 91177308-0d34-0410-b5e6-96231b3b80d8
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.
Fixes PR22664.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233775 91177308-0d34-0410-b5e6-96231b3b80d8
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.
Related to PR22754.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233755 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:
1. If the preferred type is a floating-point value, do the mul-splat trick on
the corresponding integer type and then bitcast.
2. If the preferred type is a vector, do the mul-splat trick on one vector
element, and then build a vector out of them.
Fixes PR22754 (although, we should also turn off use of vector types at -O0).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233749 91177308-0d34-0410-b5e6-96231b3b80d8
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233415 91177308-0d34-0410-b5e6-96231b3b80d8
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233366 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233204 91177308-0d34-0410-b5e6-96231b3b80d8
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232935 91177308-0d34-0410-b5e6-96231b3b80d8
of this add a test that shows we can generate code with
for functions that differ by subtarget feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232882 91177308-0d34-0410-b5e6-96231b3b80d8
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232780 91177308-0d34-0410-b5e6-96231b3b80d8
The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232486 91177308-0d34-0410-b5e6-96231b3b80d8