This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.
In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations.
The optimal balanced binary tree would reduce the chain to log2(N).
The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.
We can extend this patch to cover other associative operations such as fmul, fmax, fmin,
integer add, integer mul.
This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305
and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768https://llvm.org/bugs/show_bug.cgi?id=23116
The issue also came up in:
http://reviews.llvm.org/D8941
Differential Revision: http://reviews.llvm.org/D9232
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236031 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't seem to need to assert here, since this function's callers expect
to get a nullptr on error. This way we don't assert on user input.
Bug found with AFL fuzz.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9308
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236027 91177308-0d34-0410-b5e6-96231b3b80d8
We need to track if an AddrSpaceCast expression was seen when
generating an MCExpr for a ConstantExpr. This change introduces a
custom lowerConstant method to the NVPTX asm printer that will create
NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode
the information that a given symbol needs to be casted to a generic
address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236000 91177308-0d34-0410-b5e6-96231b3b80d8
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235989 91177308-0d34-0410-b5e6-96231b3b80d8
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235977 91177308-0d34-0410-b5e6-96231b3b80d8
As a space optimization, this instruction would just encode the pointer
type of the first operand and use the knowledge that the second and
third operands would be of the pointee type of the first. When typed
pointers go away, this assumption will no longer be available - so
encode the type of the second operand explicitly and rely on that for
the third.
Test case added to demonstrate the backwards compatibility concern,
which only comes up when the definition of the second operand comes
after the use (hence the weird basic block sequence) - at which point
the type needs to be explicitly encoded in the bitcode and the record
length changes to accommodate this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235966 91177308-0d34-0410-b5e6-96231b3b80d8
This matches other assemblers and is less unexpected (e.g. PR23227).
On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both
agree on LShr. On COFF, I couldn't get my hands on an assembler yet,
so don't change the behavior. For now, don't change it on non-AArch64
Darwin either, as the other assembler is gas v1.38, which does an AShr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235963 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the code would try to put a fall-through case last,
even if that meant moving a case with much higher branch weight
further down the chain.
Ordering by branch weight is most important, putting a fall-through
block last is secondary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235942 91177308-0d34-0410-b5e6-96231b3b80d8
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.
This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.
Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235922 91177308-0d34-0410-b5e6-96231b3b80d8
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235917 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235910 91177308-0d34-0410-b5e6-96231b3b80d8
Use a loop instruction with a constant extender for a hardware
loop instruction that is too far away from the start of the loop.
This is cheaper than changing the SA register value.
Differential Revision: http://reviews.llvm.org/D9262
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235882 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Changed the warning message to show the current value of $at, similar to what clang does for typedef's, and renamed warnIfAssemblerTemporary to a more descriptive name.
I also changed the type of variables which store registers from int to unsigned, updated the relevant test and tried to make the related comments clearer.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8479
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235881 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r235194, which was reverted in r235495 because it was causing a
failure in our out-of-tree buildbots for MIPS. With the sign-extension patch
in r235718, this patch doesn't cause any problem any more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235878 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When used, it is substituted with the number of .macro instantiations we've done up to that point in time.
So if this is the 1st time we've instantiated a .macro (any .macro, regardless of name), \@ will instantiate to 0, if it's the 2nd .macro instantiation, it will instantiate to 1 etc.
It can only be used inside a .macro definition, an .irp definition or an .irpc definition (those last 2 uses are undocumented).
Reviewers: echristo, rafael
Reviewed By: rafael
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D9197
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235862 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds constant folding of insertelement instruction to undef value when index operand is constant and is not less than vector size or is undef.
InstCombine does not support this case, but I'm happy to add it there also if this change is accepted.
Test Plan: Unittests and regression tests for ConstProp pass.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9287
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235854 91177308-0d34-0410-b5e6-96231b3b80d8
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.
The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.
Differential Revision: http://reviews.llvm.org/D9115
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235837 91177308-0d34-0410-b5e6-96231b3b80d8
There can be various constant pointers in the IR which do not get relocated at a safepoint. One example is the address of a global variable. Another example is a pointer created via inttoptr. Note that the optimizer itself likes to create such inttoptrs when locally propagating constants through dynamically dead code.
To deal with this, we need to exclude uses of constants from contributing to the liveness of a safepoint which might reach that use. At some later date, it might be worth exploring what could be done to support the relocation of various special types of "constants", but that's future work.
Differential Revision: http://reviews.llvm.org/D9236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235821 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-on to D8833 (insertps optimization when the zero mask is not used).
In this patch, we check for the case where the zmask is used, but both input vectors
to the insertps intrinsic are the same operand or the zmask overrides the destination
lane. This lets us replace the 2nd shuffle input operand with the zero vector.
Differential Revision: http://reviews.llvm.org/D9257
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235810 91177308-0d34-0410-b5e6-96231b3b80d8
Update `lib/Linker` to handle `Function` metadata attachments. The
attachments stick with the function body.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235786 91177308-0d34-0410-b5e6-96231b3b80d8
Add serialization support for function metadata attachments (added in
r235783). The syntax is:
define @foo() !attach !0 {
Metadata attachments are only allowed on functions with bodies. Since
they come before the `{`, they're not really part of the body; since
they require a body, they're not really part of the header. In
`LLParser` I gave them a separate function called from `ParseDefine()`,
`ParseOptionalFunctionMetadata()`.
In bitcode, I'm using the same `METADATA_ATTACHMENT` record used by
instructions. Instruction metadata attachments are included in a
special "attachment" block at the end of a `Function`. The attachment
records are laid out like this:
InstID (KindID MetadataID)+
Note that these records always have an odd number of fields. The new
code takes advantage of this to recognize function attachments (which
don't need an instruction ID):
(KindID MetadataID)+
This means we can use the same attachment block already used for
instructions.
This is part of PR23340.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235785 91177308-0d34-0410-b5e6-96231b3b80d8
When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.
Test case by Mark Lacey.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235771 91177308-0d34-0410-b5e6-96231b3b80d8
This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered
by copying the EAX value live into whatever basic block it is called
from. Obviously, this only works if you insert it late during codegen,
because otherwise mid-level passes might reschedule it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235768 91177308-0d34-0410-b5e6-96231b3b80d8
Same as r235145 for the call instruction - the justification, tradeoffs,
etc are all the same. The conversion script worked the same without any
false negatives (after replacing 'call' with 'invoke').
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235755 91177308-0d34-0410-b5e6-96231b3b80d8
Check that `@main` is calling `@foo2` (the renamed internal function),
not the `@foo` with external linkage that's been pulled in from the
override file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235730 91177308-0d34-0410-b5e6-96231b3b80d8