mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2024-12-26 21:32:10 +00:00
481 lines
19 KiB
ReStructuredText
481 lines
19 KiB
ReStructuredText
|
===================================
|
||
|
Stack maps and patch points in LLVM
|
||
|
===================================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
:depth: 2
|
||
|
|
||
|
Definitions
|
||
|
===========
|
||
|
|
||
|
In this document we refer to the "runtime" collectively as all
|
||
|
components that serve as the LLVM client, including the LLVM IR
|
||
|
generator, object code consumer, and code patcher.
|
||
|
|
||
|
A stack map records the location of ``live values`` at a particular
|
||
|
instruction address. These ``live values`` do not refer to all the
|
||
|
LLVM values live across the stack map. Instead, they are only the
|
||
|
values that the runtime requires to be live at this point. For
|
||
|
example, they may be the values the runtime will need to resume
|
||
|
program execution at that point independent of the compiled function
|
||
|
containing the stack map.
|
||
|
|
||
|
LLVM emits stack map data into the object code within a designated
|
||
|
:ref:`stackmap-section`. This stack map data contains a record for
|
||
|
each stack map. The record stores the stack map's instruction address
|
||
|
and contains a entry for each mapped value. Each entry encodes a
|
||
|
value's location as a register, stack offset, or constant.
|
||
|
|
||
|
A patch point is an instruction address at which space is reserved for
|
||
|
patching a new instruction sequence at run time. Patch points look
|
||
|
much like calls to LLVM. They take arguments that follow a calling
|
||
|
convention and may return a value. They also imply stack map
|
||
|
generation, which allows the runtime to locate the patchpoint and
|
||
|
find the location of ``live values`` at that point.
|
||
|
|
||
|
Motivation
|
||
|
==========
|
||
|
|
||
|
This functionality is currently experimental but is potentially useful
|
||
|
in a variety of settings, the most obvious being a runtime (JIT)
|
||
|
compiler. Example applications of the patchpoint intrinsics are
|
||
|
implementing an inline call cache for polymorphic method dispatch or
|
||
|
optimizing the retrieval of properties in dynamically typed languages
|
||
|
such as JavaScript.
|
||
|
|
||
|
The intrinsics documented here are currently used by the JavaScript
|
||
|
compiler within the open source WebKit project, see the `FTL JIT
|
||
|
<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
|
||
|
used whenever stack maps or code patching are needed. Because the
|
||
|
intrinsics have experimental status, compatibility across LLVM
|
||
|
releases is not guaranteed.
|
||
|
|
||
|
The stack map functionality described in this document is separate
|
||
|
from the functionality described in
|
||
|
:ref:`stack-map`. `GCFunctionMetadata` provides the location of
|
||
|
pointers into a collected heap captured by the `GCRoot` intrinsic,
|
||
|
which can also be considered a "stack map". Unlike the stack maps
|
||
|
defined above, the `GCFunctionMetadata` stack map interface does not
|
||
|
provide a way to associate live register values of arbitrary type with
|
||
|
an instruction address, nor does it specify a format for the resulting
|
||
|
stack map. The stack maps described here could potentially provide
|
||
|
richer information to a garbage collecting runtime, but that usage
|
||
|
will not be discussed in this document.
|
||
|
|
||
|
Intrinsics
|
||
|
==========
|
||
|
|
||
|
The following two kinds of intrinsics can be used to implement stack
|
||
|
maps and patch points: ``llvm.experimental.stackmap`` and
|
||
|
``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
|
||
|
stack map record, and they both allow some form of code patching. They
|
||
|
can be used independently (i.e. ``llvm.experimental.patchpoint``
|
||
|
implicitly generates a stack map without the need for an additional
|
||
|
call to ``llvm.experimental.stackmap``). The choice of which to use
|
||
|
depends on whether it is necessary to reserve space for code patching
|
||
|
and whether any of the intrinsic arguments should be lowered according
|
||
|
to calling conventions. ``llvm.experimental.stackmap`` does not
|
||
|
reserve any space, nor does it expect any call arguments. If the
|
||
|
runtime patches code at the stack map's address, it will destructively
|
||
|
overwrite the program text. This is unlike
|
||
|
``llvm.experimental.patchpoint``, which reserves space for in-place
|
||
|
patching without overwriting surrounding code. The
|
||
|
``llvm.experimental.patchpoint`` intrinsic also lowers a specified
|
||
|
number of arguments according to its calling convention. This allows
|
||
|
patched code to make in-place function calls without marshaling.
|
||
|
|
||
|
Each instance of one of these intrinsics generates a stack map record
|
||
|
in the :ref:`stackmap-section`. The record includes an ID, allowing
|
||
|
the runtime to uniquely identify the stack map, and the offset within
|
||
|
the code from the beginning of the enclosing function.
|
||
|
|
||
|
'``llvm.experimental.stackmap``' Intrinsic
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
Syntax:
|
||
|
"""""""
|
||
|
|
||
|
::
|
||
|
|
||
|
declare void
|
||
|
@llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
|
||
|
|
||
|
Overview:
|
||
|
"""""""""
|
||
|
|
||
|
The '``llvm.experimental.stackmap``' intrinsic records the location of
|
||
|
specified values in the stack map without generating any code.
|
||
|
|
||
|
Operands:
|
||
|
"""""""""
|
||
|
|
||
|
The first operand is an ID to be encoded within the stack map. The
|
||
|
second operand is the number of shadow bytes following the
|
||
|
intrinsic. The variable number of operands that follow are the ``live
|
||
|
values`` for which locations will be recorded in the stack map.
|
||
|
|
||
|
To use this intrinsic as a bare-bones stack map, with no code patching
|
||
|
support, the number of shadow bytes can be set to zero.
|
||
|
|
||
|
Semantics:
|
||
|
""""""""""
|
||
|
|
||
|
The stack map intrinsic generates no code in place, unless nops are
|
||
|
needed to cover its shadow (see below). However, its offset from
|
||
|
function entry is stored in the stack map. This is the relative
|
||
|
instruction address immediately following the instructions that
|
||
|
precede the stack map.
|
||
|
|
||
|
The stack map ID allows a runtime to locate the desired stack map
|
||
|
record. LLVM passes this ID through directly to the stack map
|
||
|
record without checking uniqueness.
|
||
|
|
||
|
LLVM guarantees a shadow of instructions following the stack map's
|
||
|
instruction offset during which neither the end of the basic block nor
|
||
|
another call to ``llvm.experimental.stackmap`` or
|
||
|
``llvm.experimental.patchpoint`` may occur. This allows the runtime to
|
||
|
patch the code at this point in response to an event triggered from
|
||
|
outside the code. The code for instructions following the stack map
|
||
|
may be emitted in the stack map's shadow, and these instructions may
|
||
|
be overwritten by destructive patching. Without shadow bytes, this
|
||
|
destructive patching could overwrite program text or data outside the
|
||
|
current function. We disallow overlapping stack map shadows so that
|
||
|
the runtime does not need to consider this corner case.
|
||
|
|
||
|
For example, a stack map with 8 byte shadow:
|
||
|
|
||
|
.. code-block:: llvm
|
||
|
|
||
|
call void @runtime()
|
||
|
call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
|
||
|
i64* %ptr)
|
||
|
%val = load i64* %ptr
|
||
|
%add = add i64 %val, 3
|
||
|
ret i64 %add
|
||
|
|
||
|
May require one byte of nop-padding:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
0x00 callq _runtime
|
||
|
0x05 nop <--- stack map address
|
||
|
0x06 movq (%rdi), %rax
|
||
|
0x07 addq $3, %rax
|
||
|
0x0a popq %rdx
|
||
|
0x0b ret <---- end of 8-byte shadow
|
||
|
|
||
|
Now, if the runtime needs to invalidate the compiled code, it may
|
||
|
patch 8 bytes of code at the stack map's address at follows:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
0x00 callq _runtime
|
||
|
0x05 movl $0xffff, %rax <--- patched code at stack map address
|
||
|
0x0a callq *%rax <---- end of 8-byte shadow
|
||
|
|
||
|
This way, after the normal call to the runtime returns, the code will
|
||
|
execute a patched call to a special entry point that can rebuild a
|
||
|
stack frame from the values located by the stack map.
|
||
|
|
||
|
'``llvm.experimental.patchpoint.*``' Intrinsic
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
Syntax:
|
||
|
"""""""
|
||
|
|
||
|
::
|
||
|
|
||
|
declare void
|
||
|
@llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
|
||
|
i8* <target>, i32 <numArgs>, ...)
|
||
|
declare i64
|
||
|
@llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
|
||
|
i8* <target>, i32 <numArgs>, ...)
|
||
|
|
||
|
Overview:
|
||
|
"""""""""
|
||
|
|
||
|
The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
|
||
|
call to the specified ``<target>`` and records the location of specified
|
||
|
values in the stack map.
|
||
|
|
||
|
Operands:
|
||
|
"""""""""
|
||
|
|
||
|
The first operand is an ID, the second operand is the number of bytes
|
||
|
reserved for the patchable region, the third operand is the target
|
||
|
address of a function (optionally null), and the fourth operand
|
||
|
specifies how many of the following variable operands are considered
|
||
|
function call arguments. The remaining variable number of operands are
|
||
|
the ``live values`` for which locations will be recorded in the stack
|
||
|
map.
|
||
|
|
||
|
Semantics:
|
||
|
""""""""""
|
||
|
|
||
|
The patch point intrinsic generates a stack map. It also emits a
|
||
|
function call to the address specified by ``<target>`` if the address
|
||
|
is not a constant null. The function call and its arguments are
|
||
|
lowered according to the calling convention specified at the
|
||
|
intrinsic's callsite. Variants of the intrinsic with non-void return
|
||
|
type also return a value according to calling convention.
|
||
|
|
||
|
Requesting zero patch point arguments is valid. In this case, all
|
||
|
variable operands are handled just like
|
||
|
``llvm.experimental.stackmap.*``. The difference is that space will
|
||
|
still be reserved for patching, a call will be emitted, and a return
|
||
|
value is allowed.
|
||
|
|
||
|
The location of the arguments are not normally recorded in the stack
|
||
|
map because they are already fixed by the calling convention. The
|
||
|
remaining ``live values`` will have their location recorded, which
|
||
|
could be a register, stack location, or constant. A special calling
|
||
|
convention has been introduced for use with stack maps, anyregcc,
|
||
|
which forces the arguments to be loaded into registers but allows
|
||
|
those register to be dynamically allocated. These argument registers
|
||
|
will have their register locations recorded in the stack map in
|
||
|
addition to the remaining ``live values``.
|
||
|
|
||
|
The patch point also emits nops to cover at least ``<numBytes>`` of
|
||
|
instruction encoding space. Hence, the client must ensure that
|
||
|
``<numBytes>`` is enough to encode a call to the target address on the
|
||
|
supported targets. If the call target is constant null, then there is
|
||
|
no minimum requirement. A zero-byte null target patchpoint is
|
||
|
valid.
|
||
|
|
||
|
The runtime may patch the code emitted for the patch point, including
|
||
|
the call sequence and nops. However, the runtime may not assume
|
||
|
anything about the code LLVM emits within the reserved space. Partial
|
||
|
patching is not allowed. The runtime must patch all reserved bytes,
|
||
|
padding with nops if necessary.
|
||
|
|
||
|
This example shows a patch point reserving 15 bytes, with one argument
|
||
|
in $rdi, and a return value in $rax per native calling convention:
|
||
|
|
||
|
.. code-block:: llvm
|
||
|
|
||
|
%target = inttoptr i64 -281474976710654 to i8*
|
||
|
%val = call i64 (i64, i32, ...)*
|
||
|
@llvm.experimental.patchpoint.i64(i64 78, i32 15,
|
||
|
i8* %target, i32 1, i64* %ptr)
|
||
|
%add = add i64 %val, 3
|
||
|
ret i64 %add
|
||
|
|
||
|
May generate:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
|
||
|
0x0a callq *%r11
|
||
|
0x0d nop
|
||
|
0x0e nop <--- end of reserved 15-bytes
|
||
|
0x0f addq $0x3, %rax
|
||
|
0x10 movl %rax, 8(%rsp)
|
||
|
|
||
|
Note that no stack map locations will be recorded. If the patched code
|
||
|
sequence does not need arguments fixed to specific calling convention
|
||
|
registers, then the ``anyregcc`` convention may be used:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
%val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
|
||
|
i8* %target, i32 1,
|
||
|
i64* %ptr)
|
||
|
|
||
|
The stack map now indicates the location of the %ptr argument and
|
||
|
return value:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
Stack Map: ID=78, Loc0=%r9 Loc1=%r8
|
||
|
|
||
|
The patch code sequence may now use the argument that happened to be
|
||
|
allocated in %r8 and return a value allocated in %r9:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
|
||
|
0x03 nop
|
||
|
...
|
||
|
0x0e nop <--- end of reserved 15-bytes
|
||
|
0x0f addq $0x3, %r9
|
||
|
0x10 movl %r9, 8(%rsp)
|
||
|
|
||
|
.. _stackmap-format:
|
||
|
|
||
|
Stack Map Format
|
||
|
================
|
||
|
|
||
|
The existence of a stack map or patch point intrinsic within an LLVM
|
||
|
Module forces code emission to create a :ref:`stackmap-section`. The
|
||
|
format of this section follows:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
uint32 : Reserved (header)
|
||
|
uint32 : NumConstants
|
||
|
Constants[NumConstants] {
|
||
|
uint64 : LargeConstant
|
||
|
}
|
||
|
uint32 : NumRecords
|
||
|
StkMapRecord[NumRecords] {
|
||
|
uint64 : PatchPoint ID
|
||
|
uint32 : Instruction Offset
|
||
|
uint16 : Reserved (record flags)
|
||
|
uint16 : NumLocations
|
||
|
Location[NumLocations] {
|
||
|
uint8 : Register | Direct | Indirect | Constant | ConstantIndex
|
||
|
uint8 : Reserved (location flags)
|
||
|
uint16 : Dwarf RegNum
|
||
|
int32 : Offset or SmallConstant
|
||
|
}
|
||
|
uint16 : NumLiveOuts
|
||
|
LiveOuts[NumLiveOuts]
|
||
|
uint16 : Dwarf RegNum
|
||
|
uint8 : Reserved
|
||
|
uint8 : Size in Bytes
|
||
|
}
|
||
|
}
|
||
|
|
||
|
The first byte of each location encodes a type that indicates how to
|
||
|
interpret the ``RegNum`` and ``Offset`` fields as follows:
|
||
|
|
||
|
======== ========== =================== ===========================
|
||
|
Encoding Type Value Description
|
||
|
-------- ---------- ------------------- ---------------------------
|
||
|
0x1 Register Reg Value in a register
|
||
|
0x2 Direct Reg + Offset Frame index value
|
||
|
0x3 Indirect [Reg + Offset] Spilled value
|
||
|
0x4 Constant Offset Small constant
|
||
|
0x5 ConstIndex Constants[Offset] Large constant
|
||
|
======== ========== =================== ===========================
|
||
|
|
||
|
In the common case, a value is available in a register, and the
|
||
|
``Offset`` field will be zero. Values spilled to the stack are encoded
|
||
|
as ``Indirect`` locations. The runtime must load those values from a
|
||
|
stack address, typically in the form ``[BP + Offset]``. If an
|
||
|
``alloca`` value is passed directly to a stack map intrinsic, then
|
||
|
LLVM may fold the frame index into the stack map as an optimization to
|
||
|
avoid allocating a register or stack slot. These frame indices will be
|
||
|
encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
|
||
|
also optimize constants by emitting them directly in the stack map,
|
||
|
either in the ``Offset`` of a ``Constant`` location or in the constant
|
||
|
pool, referred to by ``ConstantIndex`` locations.
|
||
|
|
||
|
At each callsite, a "liveout" register list is also recorded. These
|
||
|
are the registers that are live across the stackmap and therefore must
|
||
|
be saved by the runtime. This is an important optimization when the
|
||
|
patchpoint intrinsic is used with a calling convention that by default
|
||
|
preserves most registers as callee-save.
|
||
|
|
||
|
Each entry in the liveout register list contains a DWARF register
|
||
|
number and size in bytes. The stackmap format deliberately omits
|
||
|
specific subregister information. Instead the runtime must interpret
|
||
|
this information conservatively. For example, if the stackmap reports
|
||
|
one byte at ``%rax``, then the value may be in either ``%al`` or
|
||
|
``%ah``. It doesn't matter in practice, because the runtime will
|
||
|
simply save ``%rax``. However, if the stackmap reports 16 bytes at
|
||
|
``%ymm0``, then the runtime can safely optimize by saving only
|
||
|
``%xmm0``.
|
||
|
|
||
|
The stack map format is a contract between an LLVM SVN revision and
|
||
|
the runtime. It is currently experimental and may change in the short
|
||
|
term, but minimizing the need to update the runtime is
|
||
|
important. Consequently, the stack map design is motivated by
|
||
|
simplicity and extensibility. Compactness of the representation is
|
||
|
secondary because the runtime is expected to parse the data
|
||
|
immediately after compiling a module and encode the information in its
|
||
|
own format. Since the runtime controls the allocation of sections, it
|
||
|
can reuse the same stack map space for multiple modules.
|
||
|
|
||
|
.. _stackmap-section:
|
||
|
|
||
|
Stack Map Section
|
||
|
^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
A JIT compiler can easily access this section by providing its own
|
||
|
memory manager via the LLVM C API
|
||
|
``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
|
||
|
manager, the JIT provides a callback:
|
||
|
``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
|
||
|
this section, it invokes the callback and passes the section name. The
|
||
|
JIT can record the in-memory address of the section at this time and
|
||
|
later parse it to recover the stack map data.
|
||
|
|
||
|
On Darwin, the stack map section name is "__llvm_stackmaps". The
|
||
|
segment name is "__LLVM_STACKMAPS".
|
||
|
|
||
|
Stack Map Usage
|
||
|
===============
|
||
|
|
||
|
The stack map support described in this document can be used to
|
||
|
precisely determine the location of values at a specific position in
|
||
|
the code. LLVM does not maintain any mapping between those values and
|
||
|
any higher-level entity. The runtime must be able to interpret the
|
||
|
stack map record given only the ID, offset, and the order of the
|
||
|
locations, which LLVM preserves.
|
||
|
|
||
|
Note that this is quite different from the goal of debug information,
|
||
|
which is a best-effort attempt to track the location of named
|
||
|
variables at every instruction.
|
||
|
|
||
|
An important motivation for this design is to allow a runtime to
|
||
|
commandeer a stack frame when execution reaches an instruction address
|
||
|
associated with a stack map. The runtime must be able to rebuild a
|
||
|
stack frame and resume program execution using the information
|
||
|
provided by the stack map. For example, execution may resume in an
|
||
|
interpreter or a recompiled version of the same function.
|
||
|
|
||
|
This usage restricts LLVM optimization. Clearly, LLVM must not move
|
||
|
stores across a stack map. However, loads must also be handled
|
||
|
conservatively. If the load may trigger an exception, hoisting it
|
||
|
above a stack map could be invalid. For example, the runtime may
|
||
|
determine that a load is safe to execute without a type check given
|
||
|
the current state of the type system. If the type system changes while
|
||
|
some activation of the load's function exists on the stack, the load
|
||
|
becomes unsafe. The runtime can prevent subsequent execution of that
|
||
|
load by immediately patching any stack map location that lies between
|
||
|
the current call site and the load (typically, the runtime would
|
||
|
simply patch all stack map locations to invalidate the function). If
|
||
|
the compiler had hoisted the load above the stack map, then the
|
||
|
program could crash before the runtime could take back control.
|
||
|
|
||
|
To enforce these semantics, stackmap and patchpoint intrinsics are
|
||
|
considered to potentially read and write all memory. This may limit
|
||
|
optimization more than some clients desire. To address this problem
|
||
|
meta-data could be added to the intrinsic call to express aliasing,
|
||
|
thereby allowing optimizations to hoist certain loads above stack
|
||
|
maps.
|
||
|
|
||
|
Direct Stack Map Entries
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
As shown in :ref:`stackmap-section`, a Direct stack map location
|
||
|
records the address of frame index. This address is itself the value
|
||
|
that the runtime requested. This differs from Indirect locations,
|
||
|
which refer to a stack locations from which the requested values must
|
||
|
be loaded. Direct locations can communicate the address if an alloca,
|
||
|
while Indirect locations handle register spills.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
entry:
|
||
|
%a = alloca i64...
|
||
|
llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
|
||
|
|
||
|
The runtime can determine this alloca's relative location on the
|
||
|
stack immediately after compilation, or at any time thereafter. This
|
||
|
differs from Register and Indirect locations, because the runtime can
|
||
|
only read the values in those locations when execution reaches the
|
||
|
instruction address of the stack map.
|
||
|
|
||
|
This functionality requires LLVM to treat entry-block allocas
|
||
|
specially when they are directly consumed by an intrinsics. (This is
|
||
|
the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
|
||
|
transformations must not substitute the alloca with any intervening
|
||
|
value. This can be verified by the runtime simply by checking that the
|
||
|
stack map's location is a Direct location type.
|