WSL2-Linux-Kernel/Documentation/x86/sgx.rst

.. SPDX-License-Identifier: GPL-2.0

===============================
Software Guard eXtensions (SGX)
===============================

Overview
========

Software Guard eXtensions (SGX) hardware enables for user space applications
to set aside private memory regions of code and data:

* Privileged (ring-0) ENCLS functions orchestrate the construction of the.
  regions.
* Unprivileged (ring-3) ENCLU functions allow an application to enter and
  execute inside the regions.

These memory regions are called enclaves. An enclave can be only entered at a
fixed set of entry points. Each entry point can hold a single hardware thread
at a time.  While the enclave is loaded from a regular binary file by using
ENCLS functions, only the threads inside the enclave can access its memory. The
region is denied from outside access by the CPU, and encrypted before it leaves
from LLC.

The support can be determined by

	``grep sgx /proc/cpuinfo``

SGX must both be supported in the processor and enabled by the BIOS.  If SGX
appears to be unsupported on a system which has hardware support, ensure
support is enabled in the BIOS.  If a BIOS presents a choice between "Enabled"
and "Software Enabled" modes for SGX, choose "Enabled".

Enclave Page Cache
==================

SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
with an enclave. It is contained in a BIOS-reserved region of physical memory.
Unlike pages used for regular memory, pages can only be accessed from outside of
the enclave during enclave construction with special, limited SGX instructions.

Only a CPU executing inside an enclave can directly access enclave memory.
However, a CPU executing inside an enclave may access normal memory outside the
enclave.

The kernel manages enclave memory similar to how it treats device memory.

Enclave Page Types
------------------

**SGX Enclave Control Structure (SECS)**
   Enclave's address range, attributes and other global data are defined
   by this structure.

**Regular (REG)**
   Regular EPC pages contain the code and data of an enclave.

**Thread Control Structure (TCS)**
   Thread Control Structure pages define the entry points to an enclave and
   track the execution state of an enclave thread.

**Version Array (VA)**
   Version Array pages contain 512 slots, each of which can contain a version
   number for a page evicted from the EPC.

Enclave Page Cache Map
----------------------

The processor tracks EPC pages in a hardware metadata structure called the
*Enclave Page Cache Map (EPCM)*.  The EPCM contains an entry for each EPC page
which describes the owning enclave, access rights and page type among the other
things.

EPCM permissions are separate from the normal page tables.  This prevents the
kernel from, for instance, allowing writes to data which an enclave wishes to
remain read-only.  EPCM permissions may only impose additional restrictions on
top of normal x86 page permissions.

For all intents and purposes, the SGX architecture allows the processor to
invalidate all EPCM entries at will.  This requires that software be prepared to
handle an EPCM fault at any time.  In practice, this can happen on events like
power transitions when the ephemeral key that encrypts enclave memory is lost.

Application interface
=====================

Enclave build functions
-----------------------

In addition to the traditional compiler and linker build process, SGX has a
separate enclave “build” process.  Enclaves must be built before they can be
executed (entered). The first step in building an enclave is opening the
**/dev/sgx_enclave** device.  Since enclave memory is protected from direct
access, special privileged instructions are Then used to copy data into enclave
pages and establish enclave page permissions.

.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
   :functions: sgx_ioc_enclave_create
               sgx_ioc_enclave_add_pages
               sgx_ioc_enclave_init
               sgx_ioc_enclave_provision

Enclave vDSO
------------

Entering an enclave can only be done through SGX-specific EENTER and ERESUME
functions, and is a non-trivial process.  Because of the complexity of
transitioning to and from an enclave, enclaves typically utilize a library to
handle the actual transitions.  This is roughly analogous to how glibc
implementations are used by most applications to wrap system calls.

Another crucial characteristic of enclaves is that they can generate exceptions
as part of their normal operation that need to be handled in the enclave or are
unique to SGX.

Instead of the traditional signal mechanism to handle these exceptions, SGX
can leverage special exception fixup provided by the vDSO.  The kernel-provided
vDSO function wraps low-level transitions to/from the enclave like EENTER and
ERESUME.  The vDSO function intercepts exceptions that would otherwise generate
a signal and return the fault information directly to its caller.  This avoids
the need to juggle signal handlers.

.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
   :functions: vdso_sgx_enter_enclave_t

ksgxd
=====

SGX support includes a kernel thread called *ksgxwapd*.

EPC sanitization
----------------

ksgxd is started when SGX initializes.  Enclave memory is typically ready
For use when the processor powers on or resets.  However, if SGX has been in
use since the reset, enclave pages may be in an inconsistent state.  This might
occur after a crash and kexec() cycle, for instance.  At boot, ksgxd
reinitializes all enclave pages so that they can be allocated and re-used.

The sanitization is done by going through EPC address space and applying the
EREMOVE function to each physical page. Some enclave pages like SECS pages have
hardware dependencies on other pages which prevents EREMOVE from functioning.
Executing two EREMOVE passes removes the dependencies.

Page reclaimer
--------------

Similar to the core kswapd, ksgxd, is responsible for managing the
overcommitment of enclave memory.  If the system runs out of enclave memory,
*ksgxwapd* “swaps” enclave memory to normal memory.

Launch Control
==============

SGX provides a launch control mechanism. After all enclave pages have been
copied, kernel executes EINIT function, which initializes the enclave. Only after
this the CPU can execute inside the enclave.

ENIT function takes an RSA-3072 signature of the enclave measurement.  The function
checks that the measurement is correct and signature is signed with the key
hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
SHA256 of a public key.

Those MSRs can be configured by the BIOS to be either readable or writable.
Linux supports only writable configuration in order to give full control to the
kernel on launch control policy. Before calling EINIT function, the driver sets
the MSRs to match the enclave's signing key.

Encryption engines
==================

In order to conceal the enclave data while it is out of the CPU package, the
memory controller has an encryption engine to transparently encrypt and decrypt
enclave memory.

In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to
encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in
SRAM to maintain integrity of the encrypted data. This provides integrity and
anti-replay protection but does not scale to large memory sizes because the time
required to update the Merkle tree grows logarithmically in relation to the
memory size.

CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
means integrity and replay-attacks are not mitigated.  B, it includes
additional changes to prevent cipher text from being returned and SW memory
aliases from being Created.

DMA to enclave memory is blocked by range registers on both MEE and TME systems
(SDM section 41.10).

Usage Models
============

Shared Library
--------------

Sensitive data and the code that acts on it is partitioned from the application
into a separate library. The library is then linked as a DSO which can be loaded
into an enclave. The application can then make individual function calls into
the enclave through special SGX instructions. A run-time within the enclave is
configured to marshal function parameters into and out of the enclave and to
call the correct library function.

Application Container
---------------------

An application may be loaded into a container enclave which is specially
configured with a library OS and run-time which permits the application to run.
The enclave run-time and library OS work together to execute the application
when a thread enters the enclave.

Impact of Potential Kernel SGX Bugs
===================================

EPC leaks
---------

When EPC page leaks happen, a WARNING like this is shown in dmesg:

"EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."

This is effectively a kernel use-after-free of an EPC page, and due
to the way SGX works, the bug is detected at freeing. Rather than
adding the page back to the pool of available EPC pages, the kernel
intentionally leaks the page to avoid additional errors in the future.

When this happens, the kernel will likely soon leak more EPC pages, and
SGX will likely become unusable because the memory available to SGX is
limited. However, while this may be fatal to SGX, the rest of the kernel
is unlikely to be impacted and should continue to work.

As a result, when this happpens, user should stop running any new
SGX workloads, (or just any new workloads), and migrate all valuable
workloads. Although a machine reboot can recover all EPC memory, the bug
should be reported to Linux developers.


Virtual EPC
===========

The implementation has also a virtual EPC driver to support SGX enclaves
in guests. Unlike the SGX driver, an EPC page allocated by the virtual
EPC driver doesn't have a specific enclave associated with it. This is
because KVM doesn't track how a guest uses EPC pages.

As a result, the SGX core page reclaimer doesn't support reclaiming EPC
pages allocated to KVM guests through the virtual EPC driver. If the
user wants to deploy SGX applications both on the host and in guests
on the same machine, the user should reserve enough EPC (by taking out
total virtual EPC size of all SGX VMs from the physical EPC size) for
host SGX applications so they can run with acceptable performance.
Documentation/x86: Document SGX kernel architecture Document the Intel SGX kernel architecture. The fine-grained architecture details can be looked up from Intel SDM Volume 3D. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Cc: linux-doc@vger.kernel.org Link: https://lkml.kernel.org/r/20201112220135.165028-24-jarkko@kernel.org 2020-11-13 01:01:34 +03:00			`.. SPDX-License-Identifier: GPL-2.0`

			`===============================`
			`Software Guard eXtensions (SGX)`
			`===============================`

			`Overview`
			`========`

			`Software Guard eXtensions (SGX) hardware enables for user space applications`
			`to set aside private memory regions of code and data:`

			`* Privileged (ring-0) ENCLS functions orchestrate the construction of the.`
			`regions.`
			`* Unprivileged (ring-3) ENCLU functions allow an application to enter and`
			`execute inside the regions.`

			`These memory regions are called enclaves. An enclave can be only entered at a`
			`fixed set of entry points. Each entry point can hold a single hardware thread`
			`at a time. While the enclave is loaded from a regular binary file by using`
			`ENCLS functions, only the threads inside the enclave can access its memory. The`
			`region is denied from outside access by the CPU, and encrypted before it leaves`
			`from LLC.`

			`The support can be determined by`

			``grep sgx /proc/cpuinfo``

			`SGX must both be supported in the processor and enabled by the BIOS. If SGX`
			`appears to be unsupported on a system which has hardware support, ensure`
			`support is enabled in the BIOS. If a BIOS presents a choice between "Enabled"`
			`and "Software Enabled" modes for SGX, choose "Enabled".`

			`Enclave Page Cache`
			`==================`

			`SGX utilizes an Enclave Page Cache (EPC) to store pages that are associated`
			`with an enclave. It is contained in a BIOS-reserved region of physical memory.`
			`Unlike pages used for regular memory, pages can only be accessed from outside of`
			`the enclave during enclave construction with special, limited SGX instructions.`

			`Only a CPU executing inside an enclave can directly access enclave memory.`
			`However, a CPU executing inside an enclave may access normal memory outside the`
			`enclave.`

			`The kernel manages enclave memory similar to how it treats device memory.`

			`Enclave Page Types`
			`------------------`

			`SGX Enclave Control Structure (SECS)`
			`Enclave's address range, attributes and other global data are defined`
			`by this structure.`

			`Regular (REG)`
			`Regular EPC pages contain the code and data of an enclave.`

			`Thread Control Structure (TCS)`
			`Thread Control Structure pages define the entry points to an enclave and`
			`track the execution state of an enclave thread.`

			`Version Array (VA)`
			`Version Array pages contain 512 slots, each of which can contain a version`
			`number for a page evicted from the EPC.`

			`Enclave Page Cache Map`
			`----------------------`

			`The processor tracks EPC pages in a hardware metadata structure called the`
			`Enclave Page Cache Map (EPCM). The EPCM contains an entry for each EPC page`
			`which describes the owning enclave, access rights and page type among the other`
			`things.`

			`EPCM permissions are separate from the normal page tables. This prevents the`
			`kernel from, for instance, allowing writes to data which an enclave wishes to`
			`remain read-only. EPCM permissions may only impose additional restrictions on`
			`top of normal x86 page permissions.`

			`For all intents and purposes, the SGX architecture allows the processor to`
			`invalidate all EPCM entries at will. This requires that software be prepared to`
			`handle an EPCM fault at any time. In practice, this can happen on events like`
			`power transitions when the ephemeral key that encrypts enclave memory is lost.`

			`Application interface`
			`=====================`

			`Enclave build functions`
			`-----------------------`

			`In addition to the traditional compiler and linker build process, SGX has a`
			`separate enclave “build” process. Enclaves must be built before they can be`
			`executed (entered). The first step in building an enclave is opening the`
			`/dev/sgx_enclave device. Since enclave memory is protected from direct`
			`access, special privileged instructions are Then used to copy data into enclave`
			`pages and establish enclave page permissions.`

			`.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c`
			`:functions: sgx_ioc_enclave_create`
			`sgx_ioc_enclave_add_pages`
			`sgx_ioc_enclave_init`
			`sgx_ioc_enclave_provision`

			`Enclave vDSO`
			`------------`

			`Entering an enclave can only be done through SGX-specific EENTER and ERESUME`
			`functions, and is a non-trivial process. Because of the complexity of`
			`transitioning to and from an enclave, enclaves typically utilize a library to`
			`handle the actual transitions. This is roughly analogous to how glibc`
			`implementations are used by most applications to wrap system calls.`

			`Another crucial characteristic of enclaves is that they can generate exceptions`
			`as part of their normal operation that need to be handled in the enclave or are`
			`unique to SGX.`

			`Instead of the traditional signal mechanism to handle these exceptions, SGX`
			`can leverage special exception fixup provided by the vDSO. The kernel-provided`
			`vDSO function wraps low-level transitions to/from the enclave like EENTER and`
			`ERESUME. The vDSO function intercepts exceptions that would otherwise generate`
			`a signal and return the fault information directly to its caller. This avoids`
			`the need to juggle signal handlers.`

			`.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h`
			`:functions: vdso_sgx_enter_enclave_t`

			`ksgxd`
			`=====`

			`SGX support includes a kernel thread called ksgxwapd.`

			`EPC sanitization`
			`----------------`

			`ksgxd is started when SGX initializes. Enclave memory is typically ready`
			`For use when the processor powers on or resets. However, if SGX has been in`
			`use since the reset, enclave pages may be in an inconsistent state. This might`
			`occur after a crash and kexec() cycle, for instance. At boot, ksgxd`
			`reinitializes all enclave pages so that they can be allocated and re-used.`

			`The sanitization is done by going through EPC address space and applying the`
			`EREMOVE function to each physical page. Some enclave pages like SECS pages have`
			`hardware dependencies on other pages which prevents EREMOVE from functioning.`
			`Executing two EREMOVE passes removes the dependencies.`

			`Page reclaimer`
			`--------------`

			`Similar to the core kswapd, ksgxd, is responsible for managing the`
			`overcommitment of enclave memory. If the system runs out of enclave memory,`
			`ksgxwapd “swaps” enclave memory to normal memory.`

			`Launch Control`
			`==============`

			`SGX provides a launch control mechanism. After all enclave pages have been`
			`copied, kernel executes EINIT function, which initializes the enclave. Only after`
			`this the CPU can execute inside the enclave.`

			`ENIT function takes an RSA-3072 signature of the enclave measurement. The function`
			`checks that the measurement is correct and signature is signed with the key`
			`hashed to the four IA32_SGXLEPUBKEYHASH{0, 1, 2, 3} MSRs representing the`
			`SHA256 of a public key.`

			`Those MSRs can be configured by the BIOS to be either readable or writable.`
			`Linux supports only writable configuration in order to give full control to the`
			`kernel on launch control policy. Before calling EINIT function, the driver sets`
			`the MSRs to match the enclave's signing key.`

			`Encryption engines`
			`==================`

			`In order to conceal the enclave data while it is out of the CPU package, the`
			`memory controller has an encryption engine to transparently encrypt and decrypt`
			`enclave memory.`

			`In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to`
			`encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in`
			`SRAM to maintain integrity of the encrypted data. This provides integrity and`
			`anti-replay protection but does not scale to large memory sizes because the time`
			`required to update the Merkle tree grows logarithmically in relation to the`
			`memory size.`

			`CPUs starting from Icelake use Total Memory Encryption (TME) in the place of`
			`MEE. TME-based SGX implementations do not have an integrity Merkle tree, which`
			`means integrity and replay-attacks are not mitigated. B, it includes`
			`additional changes to prevent cipher text from being returned and SW memory`
			`aliases from being Created.`

			`DMA to enclave memory is blocked by range registers on both MEE and TME systems`
			`(SDM section 41.10).`

			`Usage Models`
			`============`

			`Shared Library`
			`--------------`

			`Sensitive data and the code that acts on it is partitioned from the application`
			`into a separate library. The library is then linked as a DSO which can be loaded`
			`into an enclave. The application can then make individual function calls into`
			`the enclave through special SGX instructions. A run-time within the enclave is`
			`configured to marshal function parameters into and out of the enclave and to`
			`call the correct library function.`

			`Application Container`
			`---------------------`

			`An application may be loaded into a container enclave which is specially`
			`configured with a library OS and run-time which permits the application to run.`
			`The enclave run-time and library OS work together to execute the application`
			`when a thread enters the enclave.`
x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() EREMOVE takes a page and removes any association between that page and an enclave. It must be run on a page before it can be added into another enclave. Currently, EREMOVE is run as part of pages being freed into the SGX page allocator. It is not expected to fail, as it would indicate a use-after-free of EPC pages. Rather than add the page back to the pool of available EPC pages, the kernel intentionally leaks the page to avoid additional errors in the future. However, KVM does not track how guest pages are used, which means that SGX virtualization use of EREMOVE might fail. Specifically, it is legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to KVM guest, because KVM/kernel doesn't track SECS pages. To allow SGX/KVM to introduce a more permissive EREMOVE helper and to let the SGX virtualization code use the allocator directly, break out the EREMOVE call from the SGX page allocator. Rename the original sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that it is used to free an EPC page assigned to a host enclave. Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so there's no functional change. At the same time, improve the error message when EREMOVE fails, and add documentation to explain to the user what that failure means and to suggest to the user what to do when this bug happens in the case it happens. [ bp: Massage commit message, fix typos and sanitize text, simplify. ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com 2021-03-25 12:30:57 +03:00
			`Impact of Potential Kernel SGX Bugs`
			`===================================`

			`EPC leaks`
			`---------`

			`When EPC page leaks happen, a WARNING like this is shown in dmesg:`

			`"EREMOVE returned ... and an EPC page was leaked. SGX may become unusable..."`

			`This is effectively a kernel use-after-free of an EPC page, and due`
			`to the way SGX works, the bug is detected at freeing. Rather than`
			`adding the page back to the pool of available EPC pages, the kernel`
			`intentionally leaks the page to avoid additional errors in the future.`

			`When this happens, the kernel will likely soon leak more EPC pages, and`
			`SGX will likely become unusable because the memory available to SGX is`
			`limited. However, while this may be fatal to SGX, the rest of the kernel`
			`is unlikely to be impacted and should continue to work.`

			`As a result, when this happpens, user should stop running any new`
			`SGX workloads, (or just any new workloads), and migrate all valuable`
			`workloads. Although a machine reboot can recover all EPC memory, the bug`
			`should be reported to Linux developers.`
x86/sgx: Introduce virtual EPC for use by KVM guests Add a misc device /dev/sgx_vepc to allow userspace to allocate "raw" Enclave Page Cache (EPC) without an associated enclave. The intended and only known use case for raw EPC allocation is to expose EPC to a KVM guest, hence the 'vepc' moniker, virt.{c,h} files and X86_SGX_KVM Kconfig. The SGX driver uses the misc device /dev/sgx_enclave to support userspace in creating an enclave. Each file descriptor returned from opening /dev/sgx_enclave represents an enclave. Unlike the SGX driver, KVM doesn't control how the guest uses the EPC, therefore EPC allocated to a KVM guest is not associated with an enclave, and /dev/sgx_enclave is not suitable for allocating EPC for a KVM guest. Having separate device nodes for the SGX driver and KVM virtual EPC also allows separate permission control for running host SGX enclaves and KVM SGX guests. To use /dev/sgx_vepc to allocate a virtual EPC instance with particular size, the hypervisor opens /dev/sgx_vepc, and uses mmap() with the intended size to get an address range of virtual EPC. Then it may use the address range to create one KVM memory slot as virtual EPC for a guest. Implement the "raw" EPC allocation in the x86 core-SGX subsystem via /dev/sgx_vepc rather than in KVM. Doing so has two major advantages: - Does not require changes to KVM's uAPI, e.g. EPC gets handled as just another memory backend for guests. - EPC management is wholly contained in the SGX subsystem, e.g. SGX does not have to export any symbols, changes to reclaim flows don't need to be routed through KVM, SGX's dirty laundry doesn't have to get aired out for the world to see, and so on and so forth. The virtual EPC pages allocated to guests are currently not reclaimable. Reclaiming an EPC page used by enclave requires a special reclaim mechanism separate from normal page reclaim, and that mechanism is not supported for virutal EPC pages. Due to the complications of handling reclaim conflicts between guest and host, reclaiming virtual EPC pages is significantly more complex than basic support for SGX virtualization. [ bp: - Massage commit message and comments - use cpu_feature_enabled() - vertically align struct members init - massage Virtual EPC clarification text - move Kconfig prompt to Virtualization ] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/0c38ced8c8e5a69872db4d6a1c0dabd01e07cad7.1616136308.git.kai.huang@intel.com 2021-03-19 10:22:21 +03:00

			`Virtual EPC`
			`===========`

			`The implementation has also a virtual EPC driver to support SGX enclaves`
			`in guests. Unlike the SGX driver, an EPC page allocated by the virtual`
			`EPC driver doesn't have a specific enclave associated with it. This is`
			`because KVM doesn't track how a guest uses EPC pages.`

			`As a result, the SGX core page reclaimer doesn't support reclaiming EPC`
			`pages allocated to KVM guests through the virtual EPC driver. If the`
			`user wants to deploy SGX applications both on the host and in guests`
			`on the same machine, the user should reserve enough EPC (by taking out`
			`total virtual EPC size of all SGX VMs from the physical EPC size) for`
			`host SGX applications so they can run with acceptable performance.`