* Fix the PMCR_EL0 reset value after the PMU rework * Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots * Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot * Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it * Reviewer updates: Alex is stepping down, replaced by Zenghui x86: * Fix various rare locking issues in Xen emulation and teach lockdep to detect them * Documentation improvements * Do not return host topology information from KVM_GET_SUPPORTED_CPUID -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPAT3EUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPmDAf+ICCVMwgm+PjAc6NuXzaUk6BFGWKF 1lzMvnKb6ARnhMKwyjl/Sf5EgnTuucnSTBHuE1kjaLkPUDNJvi4oRXVdDwKjtXnZ Zxk4dpsNLWVfALHTk1KweIkR5KNif0kugUh9RNp6zOBnoTVRh8XdCHpeDv73tJaG R1gCAreVTDbp+wNrVpiImUfYAZ4GrGpwwWRH/xLAGDWoTL9Z9J5tQygf+0C429n/ eJoTrToLjESbYadDgCNDD+TUkHbeDVg8aeio2JZga9SvH3RBhwriLqz26v9yvikL UoY96AySMaiox4pgCUYUl8nng8MR8AG4C4vpNnLalj7tfHxRfhtAwD0EYw== =gDOV -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui x86: - Fix various rare locking issues in Xen emulation and teach lockdep to detect them - Documentation improvements - Do not return host topology information from KVM_GET_SUPPORTED_CPUID" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking Documentation: kvm: fix SRCU locking order docs KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID KVM: nSVM: clarify recalc_intercepts() wrt CR8 MAINTAINERS: Remove myself as a KVM/arm64 reviewer MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots KVM: arm64: PMU: Fix PMCR_EL0 reset value
This commit is contained in:
Коммит
92783a90bc
|
@ -1354,6 +1354,14 @@ the memory region are automatically reflected into the guest. For example, an
|
|||
mmap() that affects the region will be made visible immediately. Another
|
||||
example is madvise(MADV_DROP).
|
||||
|
||||
Note: On arm64, a write generated by the page-table walker (to update
|
||||
the Access and Dirty flags, for example) never results in a
|
||||
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
|
||||
is because KVM cannot provide the data that would be written by the
|
||||
page-table walker, making it impossible to emulate the access.
|
||||
Instead, an abort (data abort if the cause of the page-table update
|
||||
was a load or a store, instruction abort if it was an instruction
|
||||
fetch) is injected in the guest.
|
||||
|
||||
4.36 KVM_SET_TSS_ADDR
|
||||
---------------------
|
||||
|
@ -8310,6 +8318,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
|
|||
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
|
||||
has enabled in-kernel emulation of the local APIC.
|
||||
|
||||
CPU topology
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Several CPUID values include topology information for the host CPU:
|
||||
0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different
|
||||
versions of KVM return different values for this information and userspace
|
||||
should not rely on it. Currently they return all zeroes.
|
||||
|
||||
If userspace wishes to set up a guest topology, it should be careful that
|
||||
the values of these three leaves differ for each CPU. In particular,
|
||||
the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
|
||||
for 0x8000001e; the latter also encodes the core id and node id in bits
|
||||
7:0 of EBX and ECX respectively.
|
||||
|
||||
Obsolete ioctls and capabilities
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
|
@ -24,21 +24,22 @@ The acquisition orders for mutexes are as follows:
|
|||
|
||||
For SRCU:
|
||||
|
||||
- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
|
||||
the kvm->slots_lock critical section, therefore kvm->slots_lock
|
||||
cannot be taken inside a kvm->srcu read-side critical section.
|
||||
Instead, kvm->slots_arch_lock is released before the call
|
||||
to ``synchronize_srcu()`` and _can_ be taken inside a
|
||||
kvm->srcu read-side critical section.
|
||||
- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
|
||||
for kvm->lock, vcpu->mutex and kvm->slots_lock. These locks _cannot_
|
||||
be taken inside a kvm->srcu read-side critical section; that is, the
|
||||
following is broken::
|
||||
|
||||
- kvm->lock is taken inside kvm->srcu, therefore
|
||||
``synchronize_srcu(&kvm->srcu)`` cannot be called inside
|
||||
a kvm->lock critical section. If you cannot delay the
|
||||
call until after kvm->lock is released, use ``call_srcu``.
|
||||
srcu_read_lock(&kvm->srcu);
|
||||
mutex_lock(&kvm->slots_lock);
|
||||
|
||||
- kvm->slots_arch_lock instead is released before the call to
|
||||
``synchronize_srcu()``. It _can_ therefore be taken inside a
|
||||
kvm->srcu read-side critical section, for example while processing
|
||||
a vmexit.
|
||||
|
||||
On x86:
|
||||
|
||||
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
|
||||
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock and kvm->arch.xen.xen_lock
|
||||
|
||||
- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and
|
||||
kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and
|
||||
|
|
|
@ -11356,9 +11356,9 @@ F: virt/kvm/*
|
|||
KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
|
||||
M: Marc Zyngier <maz@kernel.org>
|
||||
R: James Morse <james.morse@arm.com>
|
||||
R: Alexandru Elisei <alexandru.elisei@arm.com>
|
||||
R: Suzuki K Poulose <suzuki.poulose@arm.com>
|
||||
R: Oliver Upton <oliver.upton@linux.dev>
|
||||
R: Zenghui Yu <yuzenghui@huawei.com>
|
||||
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
|
||||
L: kvmarm@lists.linux.dev
|
||||
L: kvmarm@lists.cs.columbia.edu (deprecated, moderated for non-subscribers)
|
||||
|
|
|
@ -124,6 +124,8 @@
|
|||
#define APPLE_CPU_PART_M1_FIRESTORM_PRO 0x025
|
||||
#define APPLE_CPU_PART_M1_ICESTORM_MAX 0x028
|
||||
#define APPLE_CPU_PART_M1_FIRESTORM_MAX 0x029
|
||||
#define APPLE_CPU_PART_M2_BLIZZARD 0x032
|
||||
#define APPLE_CPU_PART_M2_AVALANCHE 0x033
|
||||
|
||||
#define AMPERE_CPU_PART_AMPERE1 0xAC3
|
||||
|
||||
|
@ -177,6 +179,8 @@
|
|||
#define MIDR_APPLE_M1_FIRESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_PRO)
|
||||
#define MIDR_APPLE_M1_ICESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM_MAX)
|
||||
#define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_MAX)
|
||||
#define MIDR_APPLE_M2_BLIZZARD MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD)
|
||||
#define MIDR_APPLE_M2_AVALANCHE MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE)
|
||||
#define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1)
|
||||
|
||||
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
|
||||
|
|
|
@ -114,6 +114,15 @@
|
|||
#define ESR_ELx_FSC_ACCESS (0x08)
|
||||
#define ESR_ELx_FSC_FAULT (0x04)
|
||||
#define ESR_ELx_FSC_PERM (0x0C)
|
||||
#define ESR_ELx_FSC_SEA_TTW0 (0x14)
|
||||
#define ESR_ELx_FSC_SEA_TTW1 (0x15)
|
||||
#define ESR_ELx_FSC_SEA_TTW2 (0x16)
|
||||
#define ESR_ELx_FSC_SEA_TTW3 (0x17)
|
||||
#define ESR_ELx_FSC_SECC (0x18)
|
||||
#define ESR_ELx_FSC_SECC_TTW0 (0x1c)
|
||||
#define ESR_ELx_FSC_SECC_TTW1 (0x1d)
|
||||
#define ESR_ELx_FSC_SECC_TTW2 (0x1e)
|
||||
#define ESR_ELx_FSC_SECC_TTW3 (0x1f)
|
||||
|
||||
/* ISS field definitions for Data Aborts */
|
||||
#define ESR_ELx_ISV_SHIFT (24)
|
||||
|
|
|
@ -319,21 +319,6 @@
|
|||
BIT(18) | \
|
||||
GENMASK(16, 15))
|
||||
|
||||
/* For compatibility with fault code shared with 32-bit */
|
||||
#define FSC_FAULT ESR_ELx_FSC_FAULT
|
||||
#define FSC_ACCESS ESR_ELx_FSC_ACCESS
|
||||
#define FSC_PERM ESR_ELx_FSC_PERM
|
||||
#define FSC_SEA ESR_ELx_FSC_EXTABT
|
||||
#define FSC_SEA_TTW0 (0x14)
|
||||
#define FSC_SEA_TTW1 (0x15)
|
||||
#define FSC_SEA_TTW2 (0x16)
|
||||
#define FSC_SEA_TTW3 (0x17)
|
||||
#define FSC_SECC (0x18)
|
||||
#define FSC_SECC_TTW0 (0x1c)
|
||||
#define FSC_SECC_TTW1 (0x1d)
|
||||
#define FSC_SECC_TTW2 (0x1e)
|
||||
#define FSC_SECC_TTW3 (0x1f)
|
||||
|
||||
/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
|
||||
#define HPFAR_MASK (~UL(0xf))
|
||||
/*
|
||||
|
|
|
@ -349,16 +349,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
|
|||
static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
|
||||
{
|
||||
switch (kvm_vcpu_trap_get_fault(vcpu)) {
|
||||
case FSC_SEA:
|
||||
case FSC_SEA_TTW0:
|
||||
case FSC_SEA_TTW1:
|
||||
case FSC_SEA_TTW2:
|
||||
case FSC_SEA_TTW3:
|
||||
case FSC_SECC:
|
||||
case FSC_SECC_TTW0:
|
||||
case FSC_SECC_TTW1:
|
||||
case FSC_SECC_TTW2:
|
||||
case FSC_SECC_TTW3:
|
||||
case ESR_ELx_FSC_EXTABT:
|
||||
case ESR_ELx_FSC_SEA_TTW0:
|
||||
case ESR_ELx_FSC_SEA_TTW1:
|
||||
case ESR_ELx_FSC_SEA_TTW2:
|
||||
case ESR_ELx_FSC_SEA_TTW3:
|
||||
case ESR_ELx_FSC_SECC:
|
||||
case ESR_ELx_FSC_SECC_TTW0:
|
||||
case ESR_ELx_FSC_SECC_TTW1:
|
||||
case ESR_ELx_FSC_SECC_TTW2:
|
||||
case ESR_ELx_FSC_SECC_TTW3:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
|
@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
|
|||
|
||||
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (kvm_vcpu_abt_iss1tw(vcpu))
|
||||
return true;
|
||||
if (kvm_vcpu_abt_iss1tw(vcpu)) {
|
||||
/*
|
||||
* Only a permission fault on a S1PTW should be
|
||||
* considered as a write. Otherwise, page tables baked
|
||||
* in a read-only memslot will result in an exception
|
||||
* being delivered in the guest.
|
||||
*
|
||||
* The drawback is that we end-up faulting twice if the
|
||||
* guest is using any of HW AF/DB: a translation fault
|
||||
* to map the page containing the PT (read only at
|
||||
* first), then a permission fault to allow the flags
|
||||
* to be set.
|
||||
*/
|
||||
switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
|
||||
case ESR_ELx_FSC_PERM:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if (kvm_vcpu_trap_is_iabt(vcpu))
|
||||
return false;
|
||||
|
|
|
@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
|
|||
*/
|
||||
if (!(esr & ESR_ELx_S1PTW) &&
|
||||
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
|
||||
(esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
|
||||
(esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
|
||||
if (!__translate_far_to_hpfar(far, &hpfar))
|
||||
return false;
|
||||
} else {
|
||||
|
|
|
@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
|
|||
if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
|
||||
bool valid;
|
||||
|
||||
valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
|
||||
valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT &&
|
||||
kvm_vcpu_dabt_isvalid(vcpu) &&
|
||||
!kvm_vcpu_abt_issea(vcpu) &&
|
||||
!kvm_vcpu_abt_iss1tw(vcpu);
|
||||
|
|
|
@ -1212,7 +1212,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
|
||||
VM_BUG_ON(write_fault && exec_fault);
|
||||
|
||||
if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
|
||||
if (fault_status == ESR_ELx_FSC_PERM && !write_fault && !exec_fault) {
|
||||
kvm_err("Unexpected L2 read permission error\n");
|
||||
return -EFAULT;
|
||||
}
|
||||
|
@ -1277,7 +1277,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
* only exception to this is when dirty logging is enabled at runtime
|
||||
* and a write fault needs to collapse a block entry into a table.
|
||||
*/
|
||||
if (fault_status != FSC_PERM || (logging_active && write_fault)) {
|
||||
if (fault_status != ESR_ELx_FSC_PERM ||
|
||||
(logging_active && write_fault)) {
|
||||
ret = kvm_mmu_topup_memory_cache(memcache,
|
||||
kvm_mmu_cache_min_pages(kvm));
|
||||
if (ret)
|
||||
|
@ -1342,7 +1343,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
* backed by a THP and thus use block mapping if possible.
|
||||
*/
|
||||
if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
|
||||
if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
|
||||
if (fault_status == ESR_ELx_FSC_PERM &&
|
||||
fault_granule > PAGE_SIZE)
|
||||
vma_pagesize = fault_granule;
|
||||
else
|
||||
vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
|
||||
|
@ -1350,7 +1352,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
&fault_ipa);
|
||||
}
|
||||
|
||||
if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
|
||||
if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
|
||||
/* Check the VMM hasn't introduced a new disallowed VMA */
|
||||
if (kvm_vma_mte_allowed(vma)) {
|
||||
sanitise_mte_tags(kvm, pfn, vma_pagesize);
|
||||
|
@ -1376,7 +1378,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
* permissions only if vma_pagesize equals fault_granule. Otherwise,
|
||||
* kvm_pgtable_stage2_map() should be called to change block size.
|
||||
*/
|
||||
if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
|
||||
if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule)
|
||||
ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
|
||||
else
|
||||
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
|
||||
|
@ -1441,7 +1443,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
|
|||
fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
|
||||
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
|
||||
|
||||
if (fault_status == FSC_FAULT) {
|
||||
if (fault_status == ESR_ELx_FSC_FAULT) {
|
||||
/* Beyond sanitised PARange (which is the IPA limit) */
|
||||
if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
|
||||
kvm_inject_size_fault(vcpu);
|
||||
|
@ -1476,8 +1478,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
|
|||
kvm_vcpu_get_hfar(vcpu), fault_ipa);
|
||||
|
||||
/* Check the stage-2 fault is trans. fault or write fault */
|
||||
if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
|
||||
fault_status != FSC_ACCESS) {
|
||||
if (fault_status != ESR_ELx_FSC_FAULT &&
|
||||
fault_status != ESR_ELx_FSC_PERM &&
|
||||
fault_status != ESR_ELx_FSC_ACCESS) {
|
||||
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
|
||||
kvm_vcpu_trap_get_class(vcpu),
|
||||
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
|
||||
|
@ -1539,7 +1542,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
|
|||
/* Userspace should not be able to register out-of-bounds IPAs */
|
||||
VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
|
||||
|
||||
if (fault_status == FSC_ACCESS) {
|
||||
if (fault_status == ESR_ELx_FSC_ACCESS) {
|
||||
handle_access_fault(vcpu, fault_ipa);
|
||||
ret = 1;
|
||||
goto out_unlock;
|
||||
|
|
|
@ -646,7 +646,7 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
|
|||
return;
|
||||
|
||||
/* Only preserve PMCR_EL0.N, and reset the rest to 0 */
|
||||
pmcr = read_sysreg(pmcr_el0) & ARMV8_PMU_PMCR_N_MASK;
|
||||
pmcr = read_sysreg(pmcr_el0) & (ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT);
|
||||
if (!kvm_supports_32bit_el0())
|
||||
pmcr |= ARMV8_PMU_PMCR_LC;
|
||||
|
||||
|
|
|
@ -616,6 +616,8 @@ static const struct midr_range broken_seis[] = {
|
|||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_PRO),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_MAX),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M2_BLIZZARD),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M2_AVALANCHE),
|
||||
{},
|
||||
};
|
||||
|
||||
|
|
|
@ -1111,6 +1111,7 @@ struct msr_bitmap_range {
|
|||
|
||||
/* Xen emulation context */
|
||||
struct kvm_xen {
|
||||
struct mutex xen_lock;
|
||||
u32 xen_version;
|
||||
bool long_mode;
|
||||
bool runstate_update_flag;
|
||||
|
|
|
@ -770,15 +770,21 @@ struct kvm_cpuid_array {
|
|||
int nent;
|
||||
};
|
||||
|
||||
static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
|
||||
u32 function, u32 index)
|
||||
static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *entry;
|
||||
|
||||
if (array->nent >= array->maxnent)
|
||||
return NULL;
|
||||
|
||||
entry = &array->entries[array->nent++];
|
||||
return &array->entries[array->nent++];
|
||||
}
|
||||
|
||||
static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
|
||||
u32 function, u32 index)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
|
||||
|
||||
if (!entry)
|
||||
return NULL;
|
||||
|
||||
memset(entry, 0, sizeof(*entry));
|
||||
entry->function = function;
|
||||
|
@ -956,22 +962,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
|
|||
entry->edx = edx.full;
|
||||
break;
|
||||
}
|
||||
/*
|
||||
* Per Intel's SDM, the 0x1f is a superset of 0xb,
|
||||
* thus they can be handled by common code.
|
||||
*/
|
||||
case 0x1f:
|
||||
case 0xb:
|
||||
/*
|
||||
* Populate entries until the level type (ECX[15:8]) of the
|
||||
* previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is
|
||||
* the starting entry, filled by the primary do_host_cpuid().
|
||||
* No topology; a valid topology is indicated by the presence
|
||||
* of subleaf 1.
|
||||
*/
|
||||
for (i = 1; entry->ecx & 0xff00; ++i) {
|
||||
entry = do_host_cpuid(array, function, i);
|
||||
if (!entry)
|
||||
goto out;
|
||||
}
|
||||
entry->eax = entry->ebx = entry->ecx = 0;
|
||||
break;
|
||||
case 0xd: {
|
||||
u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
|
||||
|
@ -1202,6 +1199,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
|
|||
entry->ebx = entry->ecx = entry->edx = 0;
|
||||
break;
|
||||
case 0x8000001e:
|
||||
/* Do not return host topology information. */
|
||||
entry->eax = entry->ebx = entry->ecx = 0;
|
||||
entry->edx = 0; /* reserved */
|
||||
break;
|
||||
case 0x8000001F:
|
||||
if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
|
||||
|
|
|
@ -138,15 +138,13 @@ void recalc_intercepts(struct vcpu_svm *svm)
|
|||
c->intercepts[i] = h->intercepts[i];
|
||||
|
||||
if (g->int_ctl & V_INTR_MASKING_MASK) {
|
||||
/* We only want the cr8 intercept bits of L1 */
|
||||
vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
|
||||
vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
|
||||
|
||||
/*
|
||||
* Once running L2 with HF_VINTR_MASK, EFLAGS.IF does not
|
||||
* affect any interrupt we may want to inject; therefore,
|
||||
* interrupt window vmexits are irrelevant to L0.
|
||||
* Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
|
||||
* does not affect any interrupt we may want to inject;
|
||||
* therefore, writes to CR8 are irrelevant to L0, as are
|
||||
* interrupt window vmexits.
|
||||
*/
|
||||
vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
|
||||
vmcb_clr_intercept(c, INTERCEPT_VINTR);
|
||||
}
|
||||
|
||||
|
|
|
@ -271,7 +271,15 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
|
|||
* Attempt to obtain the GPC lock on *both* (if there are two)
|
||||
* gfn_to_pfn caches that cover the region.
|
||||
*/
|
||||
read_lock_irqsave(&gpc1->lock, flags);
|
||||
if (atomic) {
|
||||
local_irq_save(flags);
|
||||
if (!read_trylock(&gpc1->lock)) {
|
||||
local_irq_restore(flags);
|
||||
return;
|
||||
}
|
||||
} else {
|
||||
read_lock_irqsave(&gpc1->lock, flags);
|
||||
}
|
||||
while (!kvm_gpc_check(gpc1, user_len1)) {
|
||||
read_unlock_irqrestore(&gpc1->lock, flags);
|
||||
|
||||
|
@ -304,9 +312,18 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
|
|||
* The guest's runstate_info is split across two pages and we
|
||||
* need to hold and validate both GPCs simultaneously. We can
|
||||
* declare a lock ordering GPC1 > GPC2 because nothing else
|
||||
* takes them more than one at a time.
|
||||
* takes them more than one at a time. Set a subclass on the
|
||||
* gpc1 lock to make lockdep shut up about it.
|
||||
*/
|
||||
read_lock(&gpc2->lock);
|
||||
lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_);
|
||||
if (atomic) {
|
||||
if (!read_trylock(&gpc2->lock)) {
|
||||
read_unlock_irqrestore(&gpc1->lock, flags);
|
||||
return;
|
||||
}
|
||||
} else {
|
||||
read_lock(&gpc2->lock);
|
||||
}
|
||||
|
||||
if (!kvm_gpc_check(gpc2, user_len2)) {
|
||||
read_unlock(&gpc2->lock);
|
||||
|
@ -590,26 +607,26 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
|
|||
if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) {
|
||||
r = -EINVAL;
|
||||
} else {
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
kvm->arch.xen.long_mode = !!data->u.long_mode;
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
r = 0;
|
||||
}
|
||||
break;
|
||||
|
||||
case KVM_XEN_ATTR_TYPE_SHARED_INFO:
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
break;
|
||||
|
||||
case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
|
||||
if (data->u.vector && data->u.vector < 0x10)
|
||||
r = -EINVAL;
|
||||
else {
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
kvm->arch.xen.upcall_vector = data->u.vector;
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
r = 0;
|
||||
}
|
||||
break;
|
||||
|
@ -619,9 +636,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
|
|||
break;
|
||||
|
||||
case KVM_XEN_ATTR_TYPE_XEN_VERSION:
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
kvm->arch.xen.xen_version = data->u.xen_version;
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
r = 0;
|
||||
break;
|
||||
|
||||
|
@ -630,9 +647,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
|
|||
r = -EOPNOTSUPP;
|
||||
break;
|
||||
}
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
kvm->arch.xen.runstate_update_flag = !!data->u.runstate_update_flag;
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
r = 0;
|
||||
break;
|
||||
|
||||
|
@ -647,7 +664,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
|
|||
{
|
||||
int r = -ENOENT;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
switch (data->type) {
|
||||
case KVM_XEN_ATTR_TYPE_LONG_MODE:
|
||||
|
@ -686,7 +703,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
|
|||
break;
|
||||
}
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
return r;
|
||||
}
|
||||
|
||||
|
@ -694,7 +711,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
|
|||
{
|
||||
int idx, r = -ENOENT;
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
|
||||
idx = srcu_read_lock(&vcpu->kvm->srcu);
|
||||
|
||||
switch (data->type) {
|
||||
|
@ -922,7 +939,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
|
|||
}
|
||||
|
||||
srcu_read_unlock(&vcpu->kvm->srcu, idx);
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.xen.xen_lock);
|
||||
return r;
|
||||
}
|
||||
|
||||
|
@ -930,7 +947,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
|
|||
{
|
||||
int r = -ENOENT;
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
|
||||
|
||||
switch (data->type) {
|
||||
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO:
|
||||
|
@ -1013,7 +1030,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
|
|||
break;
|
||||
}
|
||||
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.xen.xen_lock);
|
||||
return r;
|
||||
}
|
||||
|
||||
|
@ -1106,7 +1123,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
|
|||
xhc->blob_size_32 || xhc->blob_size_64))
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
if (xhc->msr && !kvm->arch.xen_hvm_config.msr)
|
||||
static_branch_inc(&kvm_xen_enabled.key);
|
||||
|
@ -1115,7 +1132,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
|
|||
|
||||
memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc));
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -1658,15 +1675,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm)
|
|||
mm_borrowed = true;
|
||||
}
|
||||
|
||||
/*
|
||||
* For the irqfd workqueue, using the main kvm->lock mutex is
|
||||
* fine since this function is invoked from kvm_set_irq() with
|
||||
* no other lock held, no srcu. In future if it will be called
|
||||
* directly from a vCPU thread (e.g. on hypercall for an IPI)
|
||||
* then it may need to switch to using a leaf-node mutex for
|
||||
* serializing the shared_info mapping.
|
||||
*/
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
/*
|
||||
* It is theoretically possible for the page to be unmapped
|
||||
|
@ -1695,7 +1704,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm)
|
|||
srcu_read_unlock(&kvm->srcu, idx);
|
||||
} while(!rc);
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
if (mm_borrowed)
|
||||
kthread_unuse_mm(kvm->mm);
|
||||
|
@ -1811,7 +1820,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm,
|
|||
int ret;
|
||||
|
||||
/* Protect writes to evtchnfd as well as the idr lookup. */
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port);
|
||||
|
||||
ret = -ENOENT;
|
||||
|
@ -1842,7 +1851,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm,
|
|||
}
|
||||
ret = 0;
|
||||
out_unlock:
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1905,10 +1914,10 @@ static int kvm_xen_eventfd_assign(struct kvm *kvm,
|
|||
evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority;
|
||||
}
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
ret = idr_alloc(&kvm->arch.xen.evtchn_ports, evtchnfd, port, port + 1,
|
||||
GFP_KERNEL);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
if (ret >= 0)
|
||||
return 0;
|
||||
|
||||
|
@ -1926,9 +1935,9 @@ static int kvm_xen_eventfd_deassign(struct kvm *kvm, u32 port)
|
|||
{
|
||||
struct evtchnfd *evtchnfd;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
evtchnfd = idr_remove(&kvm->arch.xen.evtchn_ports, port);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
if (!evtchnfd)
|
||||
return -ENOENT;
|
||||
|
@ -1946,7 +1955,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm)
|
|||
int i;
|
||||
int n = 0;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
/*
|
||||
* Because synchronize_srcu() cannot be called inside the
|
||||
|
@ -1958,7 +1967,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm)
|
|||
|
||||
all_evtchnfds = kmalloc_array(n, sizeof(struct evtchnfd *), GFP_KERNEL);
|
||||
if (!all_evtchnfds) {
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
|
@ -1967,7 +1976,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm)
|
|||
all_evtchnfds[n++] = evtchnfd;
|
||||
idr_remove(&kvm->arch.xen.evtchn_ports, evtchnfd->send_port);
|
||||
}
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.xen.xen_lock);
|
||||
|
||||
synchronize_srcu(&kvm->srcu);
|
||||
|
||||
|
@ -2069,6 +2078,7 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
|
|||
|
||||
void kvm_xen_init_vm(struct kvm *kvm)
|
||||
{
|
||||
mutex_init(&kvm->arch.xen.xen_lock);
|
||||
idr_init(&kvm->arch.xen.evtchn_ports);
|
||||
kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm, NULL, KVM_HOST_USES_PFN);
|
||||
}
|
||||
|
|
|
@ -3954,6 +3954,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
|
|||
}
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
/* Ensure that lockdep knows vcpu->mutex is taken *inside* kvm->lock */
|
||||
mutex_lock(&vcpu->mutex);
|
||||
mutex_unlock(&vcpu->mutex);
|
||||
#endif
|
||||
|
||||
if (kvm_get_vcpu_by_id(kvm, id)) {
|
||||
r = -EEXIST;
|
||||
goto unlock_vcpu_destroy;
|
||||
|
|
Загрузка…
Ссылка в новой задаче