Change inalloca rules to make it only apply to the last parameter

This makes things a lot easier, because we can now talk about the
"argument allocation", which allocates all the memory for the call in
one shot.

The only functional change is to the verifier for a feature that hasn't
shipped yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199434 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Reid Kleckner
2014-01-16 22:59:24 +00:00
parent 9b24eeee01
commit ad60d3c304
6 changed files with 124 additions and 105 deletions

View File

@@ -7,19 +7,19 @@ Introduction
.. Warning:: This feature is unstable and not fully implemented.
The :ref:`attr_inalloca` attribute is designed to allow taking the
address of an aggregate argument that is being passed by value through
memory. Primarily, this feature is required for compatibility with the
Microsoft C++ ABI. Under that ABI, class instances that are passed by
value are constructed directly into argument stack memory. Prior to the
addition of inalloca, calls in LLVM were indivisible instructions.
There was no way to perform intermediate work, such as object
construction, between the first stack adjustment and the final control
transfer. With inalloca, each argument is modelled as an alloca, which
can be stored to independently of the call. Unfortunately, this
complicated feature comes with a large set of restrictions designed to
bound the lifetime of the argument memory around the call, which are
explained in this document.
The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
taking the address of an aggregate argument that is being passed by
value through memory. Primarily, this feature is required for
compatibility with the Microsoft C++ ABI. Under that ABI, class
instances that are passed by value are constructed directly into
argument stack memory. Prior to the addition of inalloca, calls in LLVM
were indivisible instructions. There was no way to perform intermediate
work, such as object construction, between the first stack adjustment
and the final control transfer. With inalloca, all arguments passed in
memory are modelled as a single alloca, which can be stored to prior to
the call. Unfortunately, this complicated feature comes with a large
set of restrictions designed to bound the lifetime of the argument
memory around the call.
For now, it is recommended that frontends and optimizers avoid producing
this construct, primarily because it forces the use of a base pointer.
@@ -30,48 +30,60 @@ passing by value with a copy.
Intended Usage
==============
In the example below, ``f`` is attempting to pass a default-constructed
``Foo`` object to ``g`` by value.
The example below is the intended LLVM IR lowering for some C++ code
that passes a default-constructed ``Foo`` object to ``g`` in the 32-bit
Microsoft C++ ABI.
.. code-block:: c++
// Foo is non-trivial.
struct Foo { int a, b; Foo(); ~Foo(); Foo(const &Foo); };
void g(Foo a, Foo b);
void f() {
f(1, Foo(), 3);
}
.. code-block:: llvm
%Foo = type { i32, i32 }
%struct.Foo = type { i32, i32 }
%callframe.f = type { %struct.Foo, %struct.Foo }
declare void @Foo_ctor(%Foo* %this)
declare void @g(%Foo* inalloca %arg)
declare void @Foo_dtor(%Foo* %this)
declare void @g(%Foo* inalloca %memargs)
define void @f() {
...
bb1:
entry:
%base = call i8* @llvm.stacksave()
%arg = alloca %Foo
invoke void @Foo_ctor(%Foo* %arg)
%memargs = alloca %callframe.f
%b = getelementptr %callframe.f*, i32 0
%a = getelementptr %callframe.f*, i32 1
call void @Foo_ctor(%struct.Foo* %b)
; If a's ctor throws, we must destruct b.
invoke void @Foo_ctor(%struct.Foo* %arg1)
to label %invoke.cont unwind %invoke.unwind
invoke.cont:
call void @g(%Foo* inalloca %arg)
store i32 1, i32* %arg0
call void @g(%callframe.f* inalloca %memargs)
call void @llvm.stackrestore(i8* %base)
...
invoke.unwind:
call void @Foo_dtor(%struct.Foo* %b)
call void @llvm.stackrestore(i8* %base)
...
}
The alloca in this example is dynamic, meaning it is not in the entry
block, and it can be executed more than once. Due to the restrictions
against allocas between an alloca used with inalloca and its associated
call site, all allocas used with inalloca are considered dynamic.
To avoid any stack leakage, the frontend saves the current stack pointer
with a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it
allocates the argument stack space with alloca and calls the default
constructor. One important consideration is that the default
constructor could throw an exception, so the frontend has to create a
landing pad. At this point, if there were any other inalloca arguments,
the frontend would have to destruct them before restoring the stack
pointer. If the constructor does not unwind, ``g`` is called, and then
the stack is restored.
To avoid stack leaks, the frontend saves the current stack pointer with
a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
argument stack space with alloca and calls the default constructor. The
default constructor could throw an exception, so the frontend has to
create a landing pad. The frontend has to destroy the already
constructed argument ``b`` before restoring the stack pointer. If the
constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
``g`` will destroy its arguments, and then the stack is restored in
``f``.
Design Considerations
=====================
@@ -81,31 +93,43 @@ Lifetime
The biggest design consideration for this feature is object lifetime.
We cannot model the arguments as static allocas in the entry block,
because all calls need to use the memory that is at the end of the call
frame to pass arguments. We cannot vend pointers to that memory at
function entry because after code generation they will alias. In the
current design, the rule against allocas between the inalloca alloca
values and the call site avoids this problem, but it creates a cleanup
problem. Cleanup and lifetime is handled explicitly with stack save and
restore calls. In the future, we may be able to avoid this by using
:ref:`llvm.lifetime.start <int_lifestart>` and :ref:`llvm.lifetime.end
<int_lifeend>` instead.
because all calls need to use the memory at the top of the stack to pass
arguments. We cannot vend pointers to that memory at function entry
because after code generation they will alias.
The rule against allocas between argument allocations and the call site
avoids this problem, but it creates a cleanup problem. Cleanup and
lifetime is handled explicitly with stack save and restore calls. In
the future, we may want to introduce a new construct such as ``freea``
or ``afree`` to make it clear that this stack adjusting cleanup is less
powerful than a full stack save and restore.
Nested Calls and Copy Elision
-----------------------------
The next consideration is the ability for the frontend to perform copy
elision in the face of nested calls. Consider the evaluation of
``foo(foo(Bar()))``, where ``foo`` takes and returns a ``Bar`` object by
value and ``Bar`` has non-trivial constructors. In this case, we want
to be able to elide copies into ``foo``'s argument slots. That means we
need to have more than one set of argument frames active at the same
time. First, we need to allocate the frame for the outer call so we can
pass it in as the hidden struct return pointer to the middle call. Then
we do the same for the middle call, allocating a frame and passing its
address to ``Bar``'s default constructor. By wrapping the evaluation of
the inner ``foo`` with stack save and restore, we can have multiple
overlapping active call frames.
We also want to be able to support copy elision into these argument
slots. This means we have to support multiple live argument
allocations.
Consider the evaluation of:
.. code-block:: c++
// Foo is non-trivial.
struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
Foo bar(Foo b);
int main() {
bar(bar(Foo()));
}
In this case, we want to be able to elide copies into ``bar``'s argument
slots. That means we need to have more than one set of argument frames
active at the same time. First, we need to allocate the frame for the
outer call so we can pass it in as the hidden struct return pointer to
the middle call. Then we do the same for the middle call, allocating a
frame and passing its address to ``Foo``'s default constructor. By
wrapping the evaluation of the inner ``bar`` with stack save and
restore, we can have multiple overlapping active call frames.
Callee-cleanup Calling Conventions
----------------------------------