I spotted this by inspection when debugging something else, so I have no
test case what-so-ever, and am not even sure it is possible to
realistically trigger the bug. But this is what was intended here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218565 91177308-0d34-0410-b5e6-96231b3b80d8
and in the target shuffle combining when trying to widen vector
elements.
Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.
There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.
I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.
This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
Users of getSectionContents shouldn't try to pass in BSS or virtual
sections. In all instances, this is a bug in the code calling this
routine.
N.B. Some COFF implementations (like CL) will mark their BSS sections as
taking space on disk. This would confuse COFFObjectFile into thinking
the section is larger than the file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218549 91177308-0d34-0410-b5e6-96231b3b80d8
that managed to elude all of my fuzz testing historically. =/
Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.
This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8
The base implementation of commuteInstruction is used
in some cases, but it turns out this has been broken for a
long time since modifiers were inserted between the real operands.
The base implementation of commuteInstruction also fails on immediates,
which also needs to be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218530 91177308-0d34-0410-b5e6-96231b3b80d8
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8
This needs a test, but I'm not sure if it is currently possible and
I originally hit it due to a bug. Right now the only global address
operands have no reason to be VALU instructions, although it
theoretically could be a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218528 91177308-0d34-0410-b5e6-96231b3b80d8
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.
When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8
No tests hit this, and I don't see any way a GlobalAddress
node would survive beyond lowering on SI. It it would, the
move should probably be inserted by selection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218526 91177308-0d34-0410-b5e6-96231b3b80d8
The annotation instructions are dropped during codegen and have no
impact on size. In some cases, the annotations were preventing the
unroller from unrolling a loop because the annotation calls were
pushing the cost over the unrolling threshold.
Differential Revision: http://reviews.llvm.org/D5335
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218525 91177308-0d34-0410-b5e6-96231b3b80d8
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8
AVX support.
New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[
These were all detected by fuzz-testing. (I <3 fuzz testing.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.
Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.
Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.
The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.
Patch by Ranjeet Singh.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218521 91177308-0d34-0410-b5e6-96231b3b80d8
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218517 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r218513.
Buildbots using libstdc++ issue an error when trying to copy
SmallVector<std::unique_ptr<>>. Revert the commit until we have a fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218514 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There will be multiple TypeUnits in an unlinked object that will be extracted
from different sections. Now that we have DWARFUnitSection that is supposed
to represent an input section, we need a DWARFUnitSection<TypeUnit> per
input .debug_types section.
Once this is done, the interface is homogenous and we can move the Section
parsing code into DWARFUnitSection.
Reviewers: samsonov, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5482
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218513 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This will allow us to handle f128 arguments without duplicating code from
CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands().
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5292
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218509 91177308-0d34-0410-b5e6-96231b3b80d8
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.
Things to note:
a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.
b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.
c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
I initially thought that pulling the Pat<> into the instruction pattern was
not possible because it was doing a transform on the index in order to convert
it from a per-element (extract_subvector) index into a per-chunk (vextract*x4)
index.
Turns out this also works inside the pattern because the vextract_extract
PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the
index in $idx goes through the same conversion.
The existing test CodeGen/X86/avx512-insert-extract.ll extended in the
previous commit provides coverage for this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218480 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
These are now implemented as two levels of multiclasses heavily relying on the
new X86VectorVTInfo class. The multiclass at the first level that is called
with float or int provides the 128 or 256 bit subvector extracts. The second
level provides the register and memory variants and some more Pat<>s.
I've compared the td.expanded files before and after. One change is that
ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a
bugfix.
(BTW, this is the change that was blocked on the recent tablegen fix. The
class-instance values X86VectorVTInfo inside vextract_for_type weren't
properly evaluated.)
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218478 91177308-0d34-0410-b5e6-96231b3b80d8
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable. This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.
Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.
<rdar://problem/18021659>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218472 91177308-0d34-0410-b5e6-96231b3b80d8
llvm::format() is somewhat unsafe. The compiler does not check that integer
parameter size matches the %x or %d size and it does not complain when a
StringRef is passed for a %s. And correctly using a StringRef with format() is
ugly because you have to convert it to a std::string then call c_str().
The cases where llvm::format() is useful is controlling how numbers and
strings are printed, especially when you want fixed width output. This
patch adds some new formatting functions to raw_streams to format numbers
and StringRefs in a type safe manner. Some examples:
OS << format_hex(255, 6) => "0x00ff"
OS << format_hex(255, 4) => "0xff"
OS << format_decimal(0, 5) => " 0"
OS << format_decimal(255, 5) => " 255"
OS << right_justify(Str, 5) => " foo"
OS << left_justify(Str, 5) => "foo "
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218463 91177308-0d34-0410-b5e6-96231b3b80d8
The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.
This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218458 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).
This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.
Details of the optimization are discussed in D5091.
Test Plan: make check-all + a new test
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5422
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions do not indicate they are extendable or the
number of bits in the extendable operand. Rename to match
architected names. Add a testcase for the intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8