210 строки
7.6 KiB
ReStructuredText
210 строки
7.6 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
.. iommu:
|
|
|
|
=====================================
|
|
IOMMU Userspace API
|
|
=====================================
|
|
|
|
IOMMU UAPI is used for virtualization cases where communications are
|
|
needed between physical and virtual IOMMU drivers. For baremetal
|
|
usage, the IOMMU is a system device which does not need to communicate
|
|
with userspace directly.
|
|
|
|
The primary use cases are guest Shared Virtual Address (SVA) and
|
|
guest IO virtual address (IOVA), wherein the vIOMMU implementation
|
|
relies on the physical IOMMU and for this reason requires interactions
|
|
with the host driver.
|
|
|
|
.. contents:: :local:
|
|
|
|
Functionalities
|
|
===============
|
|
Communications of user and kernel involve both directions. The
|
|
supported user-kernel APIs are as follows:
|
|
|
|
1. Bind/Unbind guest PASID (e.g. Intel VT-d)
|
|
2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
|
|
3. Invalidate IOMMU caches upon guest requests
|
|
4. Report errors to the guest and serve page requests
|
|
|
|
Requirements
|
|
============
|
|
The IOMMU UAPIs are generic and extensible to meet the following
|
|
requirements:
|
|
|
|
1. Emulated and para-virtualised vIOMMUs
|
|
2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
|
|
3. Extensions to the UAPI shall not break existing userspace
|
|
|
|
Interfaces
|
|
==========
|
|
Although the data structures defined in IOMMU UAPI are self-contained,
|
|
there are no user API functions introduced. Instead, IOMMU UAPI is
|
|
designed to work with existing user driver frameworks such as VFIO.
|
|
|
|
Extension Rules & Precautions
|
|
-----------------------------
|
|
When IOMMU UAPI gets extended, the data structures can *only* be
|
|
modified in two ways:
|
|
|
|
1. Adding new fields by re-purposing the padding[] field. No size change.
|
|
2. Adding new union members at the end. May increase the structure sizes.
|
|
|
|
No new fields can be added *after* the variable sized union in that it
|
|
will break backward compatibility when offset moves. A new flag must
|
|
be introduced whenever a change affects the structure using either
|
|
method. The IOMMU driver processes the data based on flags which
|
|
ensures backward compatibility.
|
|
|
|
Version field is only reserved for the unlikely event of UAPI upgrade
|
|
at its entirety.
|
|
|
|
It's *always* the caller's responsibility to indicate the size of the
|
|
structure passed by setting argsz appropriately.
|
|
Though at the same time, argsz is user provided data which is not
|
|
trusted. The argsz field allows the user app to indicate how much data
|
|
it is providing; it's still the kernel's responsibility to validate
|
|
whether it's correct and sufficient for the requested operation.
|
|
|
|
Compatibility Checking
|
|
----------------------
|
|
When IOMMU UAPI extension results in some structure size increase,
|
|
IOMMU UAPI code shall handle the following cases:
|
|
|
|
1. User and kernel has exact size match
|
|
2. An older user with older kernel header (smaller UAPI size) running on a
|
|
newer kernel (larger UAPI size)
|
|
3. A newer user with newer kernel header (larger UAPI size) running
|
|
on an older kernel.
|
|
4. A malicious/misbehaving user passing illegal/invalid size but within
|
|
range. The data may contain garbage.
|
|
|
|
Feature Checking
|
|
----------------
|
|
While launching a guest with vIOMMU, it is strongly advised to check
|
|
the compatibility upfront, as some subsequent errors happening during
|
|
vIOMMU operation, such as cache invalidation failures cannot be nicely
|
|
escalated to the guest due to IOMMU specifications. This can lead to
|
|
catastrophic failures for the users.
|
|
|
|
User applications such as QEMU are expected to import kernel UAPI
|
|
headers. Backward compatibility is supported per feature flags.
|
|
For example, an older QEMU (with older kernel header) can run on newer
|
|
kernel. Newer QEMU (with new kernel header) may refuse to initialize
|
|
on an older kernel if new feature flags are not supported by older
|
|
kernel. Simply recompiling existing code with newer kernel header should
|
|
not be an issue in that only existing flags are used.
|
|
|
|
IOMMU vendor driver should report the below features to IOMMU UAPI
|
|
consumers (e.g. via VFIO).
|
|
|
|
1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
|
|
2. IOMMU_NESTING_FEAT_BIND_PGTBL
|
|
3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
|
|
4. IOMMU_NESTING_FEAT_CACHE_INVLD
|
|
5. IOMMU_NESTING_FEAT_PAGE_REQUEST
|
|
|
|
Take VFIO as example, upon request from VFIO userspace (e.g. QEMU),
|
|
VFIO kernel code shall query IOMMU vendor driver for the support of
|
|
the above features. Query result can then be reported back to the
|
|
userspace caller. Details can be found in
|
|
Documentation/driver-api/vfio.rst.
|
|
|
|
|
|
Data Passing Example with VFIO
|
|
------------------------------
|
|
As the ubiquitous userspace driver framework, VFIO is already IOMMU
|
|
aware and shares many key concepts such as device model, group, and
|
|
protection domain. Other user driver frameworks can also be extended
|
|
to support IOMMU UAPI but it is outside the scope of this document.
|
|
|
|
In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
|
|
IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
|
|
transport, capability checking, security, and life cycle management of
|
|
process address space ID (PASID).
|
|
|
|
VFIO layer conveys the data structures down to the IOMMU driver. It
|
|
follows the pattern below::
|
|
|
|
struct {
|
|
__u32 argsz;
|
|
__u32 flags;
|
|
__u8 data[];
|
|
};
|
|
|
|
Here data[] contains the IOMMU UAPI data structures. VFIO has the
|
|
freedom to bundle the data as well as parse data size based on its own flags.
|
|
|
|
In order to determine the size and feature set of the user data, argsz
|
|
and flags (or the equivalent) are also embedded in the IOMMU UAPI data
|
|
structures.
|
|
|
|
A "__u32 argsz" field is *always* at the beginning of each structure.
|
|
|
|
For example:
|
|
::
|
|
|
|
struct iommu_cache_invalidate_info {
|
|
__u32 argsz;
|
|
#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
|
|
__u32 version;
|
|
/* IOMMU paging structure cache */
|
|
#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
|
|
#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */
|
|
#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */
|
|
#define IOMMU_CACHE_INV_TYPE_NR (3)
|
|
__u8 cache;
|
|
__u8 granularity;
|
|
__u8 padding[6];
|
|
union {
|
|
struct iommu_inv_pasid_info pasid_info;
|
|
struct iommu_inv_addr_info addr_info;
|
|
} granu;
|
|
};
|
|
|
|
VFIO is responsible for checking its own argsz and flags. It then
|
|
invokes appropriate IOMMU UAPI functions. The user pointers are passed
|
|
to the IOMMU layer for further processing. The responsibilities are
|
|
divided as follows:
|
|
|
|
- Generic IOMMU layer checks argsz range based on UAPI data in the
|
|
current kernel version.
|
|
|
|
- Generic IOMMU layer checks content of the UAPI data for non-zero
|
|
reserved bits in flags, padding fields, and unsupported version.
|
|
This is to ensure not breaking userspace in the future when these
|
|
fields or flags are used.
|
|
|
|
- Vendor IOMMU driver checks argsz based on vendor flags. UAPI data
|
|
is consumed based on flags. Vendor driver has access to
|
|
unadulterated argsz value in case of vendor specific future
|
|
extensions. Currently, it does not perform the copy_from_user()
|
|
itself. A __user pointer can be provided in some future scenarios
|
|
where there's vendor data outside of the structure definition.
|
|
|
|
IOMMU code treats UAPI data in two categories:
|
|
|
|
- structure contains vendor data
|
|
(Example: iommu_uapi_cache_invalidate())
|
|
|
|
- structure contains only generic data
|
|
(Example: iommu_uapi_sva_bind_gpasid())
|
|
|
|
|
|
|
|
Sharing UAPI with in-kernel users
|
|
---------------------------------
|
|
For UAPIs that are shared with in-kernel users, a wrapper function is
|
|
provided to distinguish the callers. For example,
|
|
|
|
Userspace caller ::
|
|
|
|
int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
|
|
struct device *dev,
|
|
void __user *udata)
|
|
|
|
In-kernel caller ::
|
|
|
|
int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
|
|
struct device *dev, ioasid_t ioasid);
|