The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241648 91177308-0d34-0410-b5e6-96231b3b80d8
Clang uses this for SEH finally. The new intrinsic will produce the
right value when stack realignment is required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241643 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241633 91177308-0d34-0410-b5e6-96231b3b80d8
This type of prologue isn't supported yet. Implementing it should be a
matter of copying the adjusted incoming EBP into ESI (the base pointer)
instead of EBP. The original EBP can be saved and restored from other
memory afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241597 91177308-0d34-0410-b5e6-96231b3b80d8
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.
As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.
Differential Revision: http://reviews.llvm.org/D10593
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241516 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.
As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.
From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.
Differential Revision: http://reviews.llvm.org/D10146
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241508 91177308-0d34-0410-b5e6-96231b3b80d8
With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations.
This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code.
Differential Revision: http://reviews.llvm.org/D10947
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241506 91177308-0d34-0410-b5e6-96231b3b80d8
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241413 91177308-0d34-0410-b5e6-96231b3b80d8
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241411 91177308-0d34-0410-b5e6-96231b3b80d8
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241394 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.
Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241323 91177308-0d34-0410-b5e6-96231b3b80d8
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present. Currently I'm hitting this at
-O0 due to the clang bug PR24009.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241170 91177308-0d34-0410-b5e6-96231b3b80d8
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.
The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.
The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.
Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D10848
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241125 91177308-0d34-0410-b5e6-96231b3b80d8
This is a new version of http://reviews.llvm.org/D10260.
It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.
This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10813
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241002 91177308-0d34-0410-b5e6-96231b3b80d8
We don't always have FMA, for example when using 'clang -mavx512f'
without an explicit CPU.
Also check for an explicit +avx512f instead of CPUs in a couple
related tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240616 91177308-0d34-0410-b5e6-96231b3b80d8
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.
This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.
Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240300 91177308-0d34-0410-b5e6-96231b3b80d8
Added explicit sign extension for v4i16/v8i16 to v4i32/v8i32 before conversion to floats. Matches existing support for v4i8/v8i8.
Follow up to D10433
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239966 91177308-0d34-0410-b5e6-96231b3b80d8
There is a one-to-one relationship between X86Subtarget and
X86FrameLowering, but every frame lowering method would previously pull
the subtarget off the MachineFunction and query some subtarget
properties.
Over time, these locals began to grow in complexity and it became
important to keep their names and meaning in sync across all of the
frame lowering methods, leading to duplication. We can eliminate that
duplication by computing them once in the constructor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239948 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.
Differential Revision: http://reviews.llvm.org/D10433
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239855 91177308-0d34-0410-b5e6-96231b3b80d8
When we multiply two 64-bit vectors, we extract lower and upper part and use the PMULUDQ instruction.
When one of the operands is a constant, the upper part may be zero, we know this at compile time.
Example: %a = mul <4 x i64> %b, <4 x i64> < i64 5, i64 5, i64 5, i64 5>.
I'm checking the value of the upper part and prevent redundant "multiply", "shift" and "add" operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239802 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic is like framerecover plus a load. It recovers the EH
registration stack allocation from the parent frame and loads the
exception information field out of it, giving back a pointer to an
EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
expressions instead of accessing the EXCEPTION_POINTERS parameter that
is available on x64.
This required a minor change to MC to allow defining a label variable to
another absolute framerecover label variable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239567 91177308-0d34-0410-b5e6-96231b3b80d8
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
Tested on SSE2, SSE41 and AVX machines.
Differential Revision: http://reviews.llvm.org/D9474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239509 91177308-0d34-0410-b5e6-96231b3b80d8
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.
The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239433 91177308-0d34-0410-b5e6-96231b3b80d8
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239427 91177308-0d34-0410-b5e6-96231b3b80d8
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10260
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239309 91177308-0d34-0410-b5e6-96231b3b80d8