Fixing MinSize attribute handling was discussed in D11363.
This is a prerequisite patch to doing that.
The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).
The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize
attribute must also exist.
Fixing this more generally will be a follow-on patch or two.
Differential Revision: http://reviews.llvm.org/D11568
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243693 91177308-0d34-0410-b5e6-96231b3b80d8
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.
Differential Revision: http://reviews.llvm.org/D11439
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243569 91177308-0d34-0410-b5e6-96231b3b80d8
This fix was suggested as part of D11345 and is part of fixing PR24141.
With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.
There is no NFC-intended other than that.
Differential Revision: http://reviews.llvm.org/D11531
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243498 91177308-0d34-0410-b5e6-96231b3b80d8
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243438 91177308-0d34-0410-b5e6-96231b3b80d8
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.
Split off from D11518.
Differential Revision: http://reviews.llvm.org/D11541
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243395 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243361 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r243146.
Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243282 91177308-0d34-0410-b5e6-96231b3b80d8
We had a few places where we did
for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
but those could instead do
for (auto *EltTy : STy->elements()) {
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243136 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243114 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11103
(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243083 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.
Differential Revision: http://reviews.llvm.org/D11413
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242994 91177308-0d34-0410-b5e6-96231b3b80d8
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242992 91177308-0d34-0410-b5e6-96231b3b80d8
SKX supports conversion for all FP types. Integer types include doublewords and quardwords.
I added "Legal" status for these nodes and a bunch of tests.
I added "NoVLX" for AVX DAG selection to force VLX instructions selection when VLX is supported.
Differential Revision: http://reviews.llvm.org/D11255
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242637 91177308-0d34-0410-b5e6-96231b3b80d8
In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch.
I temporary removed the old intrinsics test (just to split this patch).
Half types are not covered here.
Differential Revision: http://reviews.llvm.org/D11134
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242023 91177308-0d34-0410-b5e6-96231b3b80d8
While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized.
This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together.
Differential Revision: http://reviews.llvm.org/D11063
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241989 91177308-0d34-0410-b5e6-96231b3b80d8
The runtime does not restore CSRs when transferring control back to the
function handling the exception. According to the experts on IRC, LLVM's
register allocator has no way to model register clobbers that only
happen on one edge of the CFG. For now, don't worry about trying to use
the meager three CSRs available on 32-bit X86 and just say that such
invokes preserve nothing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241865 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.
Differential Revision: http://reviews.llvm.org/D10977
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241827 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.
Reviewers: nadav, majnemer, sanjoy, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: http://reviews.llvm.org/D10767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.
Reviewers: nadav, majnemer, sanjoy, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: http://reviews.llvm.org/D10767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241790 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11042
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241779 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11040
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241778 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11038
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241777 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11037
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241776 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11028
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241775 91177308-0d34-0410-b5e6-96231b3b80d8
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241648 91177308-0d34-0410-b5e6-96231b3b80d8
Clang uses this for SEH finally. The new intrinsic will produce the
right value when stack realignment is required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241643 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241633 91177308-0d34-0410-b5e6-96231b3b80d8
This type of prologue isn't supported yet. Implementing it should be a
matter of copying the adjusted incoming EBP into ESI (the base pointer)
instead of EBP. The original EBP can be saved and restored from other
memory afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241597 91177308-0d34-0410-b5e6-96231b3b80d8
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.
As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.
Differential Revision: http://reviews.llvm.org/D10593
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241516 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.
As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.
From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.
Differential Revision: http://reviews.llvm.org/D10146
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241508 91177308-0d34-0410-b5e6-96231b3b80d8