WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Radim Krčmář	9ea369b032	KVM: x86: fix mixed APIC mode broadcast Broadcast allowed only one global APIC mode, but mixed modes are theoretically possible. x2APIC IPI doesn't mean 0xff as broadcast, the rest does. x2APIC broadcasts are accepted by xAPIC. If we take SDM to be logical, even addreses beginning with 0xff should be accepted, but real hardware disagrees. This patch aims for simple code by considering most of real behavior as undefined. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-3-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:00 +02:00
Radim Krčmář	03d2249ea6	KVM: x86: use MDA for interrupt matching In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa. Instead of preserving the information in apic_send_ipi(), we regain it by converting all destinations into correct MDA in the slow path. This allows easier reasoning about subsequent matching. Our kvm_apic_broadcast() had an interesting design decision: it didn't consider IOxAPIC 0xff as broadcast in x2APIC mode ... everything worked because IOxAPIC can't set that in physical mode and logical mode considered it as a message for first 8 VCPUs. This patch interprets IOxAPIC 0xff as x2APIC broadcast. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-2-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:59 +02:00
Paolo Bonzini	bf0fb67cf9	KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVHQ0kAAoJECPQ0LrRPXpDHKQQALjw6STaZd7n20OFopNgHd4P qVeWYEKBxnsiSvL4p3IOSlZlEul+7x08aZqtyxWQRQcDT4ggTI+3FKKfc+8yeRpH WV6YJP0bGqz7039PyMLuIgs48xkSZtntePw69hPJfHZh4C1RBlP5T2SfE8mU8VZX fWToiU3W12QfKnmN7JFgxZopnGhrYCrG0EexdTDziAZu0GEMlDrO4wnyTR60WCvT 4TEF73R0kpAz4yplKuhcDHuxIG7VFhQ4z7b09M1JtR0gQ3wUvfbD3Wqqi49SwHkv NQOStcyLsIlDosSRcLXNCwb3IxjObXTBcAxnzgm2Aoc1xMMZX1ZPQNNs6zFZzycb 2c6QMiQ35zm7ellbvrG+bT+BP86JYWcAtHjWcaUFgqSJjb8MtqcMtsCea/DURhqx /kictqbPYBBwKW6SKbkNkisz59hPkuQnv35fuf992MRCbT9LAXLPRLbcirucCzkE p1MOotsWoO3ldJMZaVn0KYk3sQf6mCIfbYPEdOcw3fhJlvyy3NdjVkLOFbA5UUg1 rQ7Ru2rTemBc0ExVrymngNTMpMB4XcEeJzXfhcgMl3DWbDj60Ku/O26sDtZ6bsFv JuDYn8FVDHz9gpEQHgiUi1YMsBKXLhnILa1ppaa6AflykU3BRfYjAk1SXmX84nQK mJUJEdFuxi6pHN0UKxUI =avA4 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups	2015-04-07 18:09:20 +02:00
Nikolay Nikolaev	e32edf4fd0	KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:11 +00:00
Radim Krčmář	f563db4bdb	KVM: SVM: fix interrupt injection (apic->isr_count always 0) In commit `b4eef9b36d`, we started to use hwapic_isr_update() != NULL instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not change apic->isr_count, because it should always be 1. The initial value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm), which is always 0 for SVM, so KVM could have injected interrupts when it shouldn't. Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the initial isr_count depend on hwapic_isr_update() for good measure. Fixes: `b4eef9b36d` ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-02 19:04:40 -03:00
Linus Torvalds	b9085bcbf5	Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: the highlights are support for GICv3 emulation and dirty page tracking s390: several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. ARM has other conflicts where functions are added in the same place by 3.19-rc and 3.20 patches. These are not large though, and entirely within KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak= =7gdx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...	2015-02-13 09:55:09 -08:00
Wincy Van	705699a139	KVM: nVMX: Enable nested posted interrupt processing If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:15:08 +01:00
Marcelo Tosatti	f933986038	KVM: x86: fix lapic_timer_int_injected with APIC-v With APICv, LAPIC timer interrupt is always delivered via IRR: apic_find_highest_irr syncs PIR to IRR. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:25 +01:00
Radim Krčmář	df04d1d191	KVM: x86: check LAPIC presence when building apic_map We forgot to re-check LAPIC after splitting the loop in commit `173beedc16` (KVM: x86: Software disabled APIC should still deliver NMIs, 2014-11-02). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: `173beedc16` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:28:31 +01:00
Radim Krčmář	8a395363e2	KVM: x86: fix x2apic logical address matching We cannot hit the bug now, but future patches will expose this path. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	3697f302ab	KVM: x86: replace 0 with APIC_DEST_PHYSICAL To make the code self-documenting. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	9368b56762	KVM: x86: cleanup kvm_apic_match_*() The majority of this patch turns result = 0; if (CODE) result = 1; return result; into return CODE; because we return bool now. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Radim Krčmář	52c233a440	KVM: x86: return bool from kvm_apic_match*() And don't export the internal ones while at it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Nicholas Krause	bab5bb3982	kvm: x86: Remove kvm_make_request from lapic.c Adds a function kvm_vcpu_set_pending_timer instead of calling kvm_make_request in lapic.c. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:48:08 +01:00
Marcelo Tosatti	6c19b7538f	KVM: x86: add tracepoint to wait_lapic_expire Add tracepoint to wait_lapic_expire. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> [Remind reader if early or late. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:48:01 +01:00
Marcelo Tosatti	d0659d946b	KVM: x86: add option to advance tscdeadline hrtimer expiration For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:47:30 +01:00
Tiejun Chen	b4eef9b36d	kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv In most cases calling hwapic_isr_update(), we always check if kvm_apic_vid_enabled() == 1, but actually, kvm_apic_vid_enabled() -> kvm_x86_ops->vm_has_apicv() -> vmx_vm_has_apicv() or '0' in svm case -> return enable_apicv && irqchip_in_kernel(kvm) So its a little cost to recall vmx_vm_has_apicv() inside hwapic_isr_update(), here just NULL out hwapic_isr_update() in case of !enable_apicv inside hardware_setup() then make all related stuffs follow this. Note we don't check this under that condition of irqchip_in_kernel() since we should make sure definitely any caller don't work without in-kernel irqchip. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:17 +01:00
Radim Krčmář	45c3094a64	KVM: x86: allow 256 logical x2APICs again While fixing an x2apic bug, `17d68b7` KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) we've made only one cluster available. This means that the amount of logically addressible x2APICs was reduced to 16 and VCPUs kept overwriting themselves in that region, so even the first cluster wasn't set up correctly. This patch extends x2APIC support back to the logical_map's limit, and keeps the CVE fixed as messages for non-present APICs are dropped. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:08 +01:00
Radim Krčmář	25995e5b4a	KVM: x86: check bounds of APIC maps They can't be violated now, but play it safe for the future. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:08 +01:00
Radim Krčmář	fa834e9197	KVM: x86: fix APIC physical destination wrapping x2apic allows destinations > 0xff and we don't want them delivered to lower APICs. They are correctly handled by doing nothing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:07 +01:00
Radim Krčmář	085563fb04	KVM: x86: deliver phys lowest-prio Physical mode can't address more than one APIC, but lowest-prio is allowed, so we just reuse our paths. SDM 10.6.2.1 Physical Destination: Also, for any non-broadcast IPI or I/O subsystem initiated interrupt with lowest priority delivery mode, software must ensure that APICs defined in the interrupt address are present and enabled to receive interrupts. We could warn on top of that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:06 +01:00
Radim Krčmář	698f9755d9	KVM: x86: don't retry hopeless APIC delivery False from kvm_irq_delivery_to_apic_fast() means that we don't handle it in the fast path, but we still return false in cases that were perfectly handled, fix that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:06 +01:00
Radim Krčmář	decdc28382	KVM: x86: use MSR_ICR instead of a number 0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:05 +01:00
Nadav Amit	c69d3d9bc1	KVM: x86: Fix reserved x2apic registers x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC Register Address Space"). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:05 +01:00
Paolo Bonzini	2b4a273b42	kvm: x86: avoid warning about potential shift wrapping bug cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Nadav Amit	f210f7572b	KVM: x86: Fix lost interrupt on irr_pending race apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is set. If this assumption is broken and apicv is disabled, the injection of interrupts may be deferred until another interrupt is delivered to the guest. Ultimately, if no other interrupt should be injected to that vCPU, the pending interrupt may be lost. commit `56cc2406d6` ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use") changed the behavior of apic_clear_irr so irr_pending is cleared after setting APIC_IRR vector. After this commit, if apic_set_irr and apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR vector set, and irr_pending cleared. In the following example, assume a single vector is set in IRR prior to calling apic_clear_irr: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic_clear_vector(...); vec = apic_search_irr(apic); // => vec == -1 apic_set_vector(...); apic->irr_pending = (vec != -1); // => apic->irr_pending == false Nonetheless, it appears the race might even occur prior to this commit: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic->irr_pending = false; apic_clear_vector(...); if (apic_search_irr(apic) != -1) apic->irr_pending = true; // => apic->irr_pending == false apic_set_vector(...); Fixing this issue by: 1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending. 2. On apic_set_irr: first call apic_set_vector, then set irr_pending. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:20 +01:00
Paolo Bonzini	a3e339e1ce	KVM: compute correct map even if all APICs are software disabled Logical destination mode can be used to send NMI IPIs even when all APICs are software disabled, so if all APICs are software disabled we should still look at the DFRs. So the DFRs should all be the same, even if some or all APICs are software disabled. However, the SDM does not say this, so tweak the logic as follows: - if one APIC is enabled and has LDR != 0, use that one to build the map. This picks the right DFR in case an OS is only setting it for the software-enabled APICs, or in case an OS is using logical addressing on some APICs while leaving the rest in reset state (using LDR was suggested by Radim). - if all APICs are disabled, pick a random one to build the map. We use the last one with LDR != 0 for simplicity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:19 +01:00
Nadav Amit	173beedc16	KVM: x86: Software disabled APIC should still deliver NMIs Currently, the APIC logical map does not consider VCPUs whose local-apic is software-disabled. However, NMIs, INIT, etc. should still be delivered to such VCPUs. Therefore, the APIC mode should first be determined, and then the map, considering all VCPUs should be constructed. To address this issue, first find the APIC mode, and only then construct the logical map. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:19 +01:00
Nadav Amit	db324fe6f2	KVM: x86: Warn on APIC base relocation APIC base relocation is unsupported by KVM. If anyone uses it, the least should be to report a warning in the hypervisor. Note that KVM-unit-tests uses this feature for some reason, so running the tests triggers the warning. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:51 +01:00
Wei Wang	4114c27d45	KVM: x86: reset RVI upon system reset A bug was reported as follows: when running Windows 7 32-bit guests on qemu-kvm, sometimes the guests run into blue screen during reboot. The problem was that a guest's RVI was not cleared when it rebooted. This patch has fixed the problem. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Tested-by: Rongrong Liu <rongrongx.liu@intel.com>, Da Chun <ngugc@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:00 +01:00
Radim Krčmář	f30ebc312c	KVM: x86: optimize some accesses to LVTT and SPIV We mirror a subset of these registers in separate variables. Using them directly should be faster. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:32 +01:00
Radim Krčmář	a323b40982	KVM: x86: detect LVTT changes under APICv APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This means that apic_reg_write cannot use kvm_apic_get_reg to omit timer cancelation when mode changes. timer_mode_mask shouldn't be changing as it depends on cpuid. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:32 +01:00
Radim Krčmář	e462755cae	KVM: x86: detect SPIV changes under APICv APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This caused a bug if you used KVM_SET_IRQCHIP to set the SW-enabled bit in the SPIV register. The chain of events is as follows: * When the irqchip is added to the destination VM, the apic_sw_disabled static key is incremented (1) * When the KVM_SET_IRQCHIP ioctl is invoked, it is decremented (0) * When the guest disables the bit in the SPIV register, e.g. as part of shutdown, apic_set_spiv does not notice the change and the static key is _not_ incremented. * When the guest is destroyed, the static key is decremented (-1), resulting in this trace: WARNING: at kernel/jump_label.c:81 __static_key_slow_dec+0xa6/0xb0() jump label: negative count! [<ffffffff816bf898>] dump_stack+0x19/0x1b [<ffffffff8107c6f1>] warn_slowpath_common+0x61/0x80 [<ffffffff8107c76c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff811931e6>] __static_key_slow_dec+0xa6/0xb0 [<ffffffff81193226>] static_key_slow_dec_deferred+0x16/0x20 [<ffffffffa0637698>] kvm_free_lapic+0x88/0xa0 [kvm] [<ffffffffa061c63e>] kvm_arch_vcpu_uninit+0x2e/0xe0 [kvm] [<ffffffffa05ff301>] kvm_vcpu_uninit+0x21/0x40 [kvm] [<ffffffffa067cec7>] vmx_free_vcpu+0x47/0x70 [kvm_intel] [<ffffffffa061bc50>] kvm_arch_vcpu_free+0x50/0x60 [kvm] [<ffffffffa061ca22>] kvm_arch_destroy_vm+0x102/0x260 [kvm] [<ffffffff810b68fd>] ? synchronize_srcu+0x1d/0x20 [<ffffffffa06030d1>] kvm_put_kvm+0xe1/0x1c0 [kvm] [<ffffffffa06036f8>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff81215c62>] __fput+0x102/0x310 [<ffffffff81215f4e>] ____fput+0xe/0x10 [<ffffffff810ab664>] task_work_run+0xb4/0xe0 [<ffffffff81083944>] do_exit+0x304/0xc60 [<ffffffff816c8dfc>] ? _raw_spin_unlock_irq+0x2c/0x50 [<ffffffff810fd22d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [<ffffffff8108432c>] do_group_exit+0x4c/0xc0 [<ffffffff810843b4>] SyS_exit_group+0x14/0x20 [<ffffffff816d33a9>] system_call_fastpath+0x16/0x1b Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:31 +01:00
Radim Krčmář	1e0ad70cc1	KVM: x86: fix deadline tsc interrupt injection The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a situation where we lose a pending deadline timer in a MSR write. Losing it is fine, because it effectively occurs before the timer fired, so we should be able to cancel or postpone it. Another problem comes from interaction with QEMU, or other userspace that can set deadline MSR without a good reason, when timer is already pending: one guest's deadline request results in more than one interrupt because one is injected immediately on MSR write from userspace and one through hrtimer later. The solution is to remove the injection when replacing a pending timer and to improve the usual QEMU path, we inject without a hrtimer when the deadline has already passed. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:28 +01:00
Radim Krčmář	5d87db7119	KVM: x86: add apic_timer_expired() Make the code reusable. If the timer was already pending, we shouldn't be waiting in a queue, so wake_up can be skipped, simplifying the path. There is no 'reinject' case => the comment is removed. Current race behaves correctly. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:27 +01:00
Nadav Amit	394457a928	KVM: x86: some apic broadcast modes does not work KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes." In addition, the local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all destination bits to one." This patch enables cluster mode broadcast. The fix tries to combine broadcast in different modes through a unified code. One rare case occurs when the source of IPI has its APIC disabled. In such case, the source can still issue IPIs, but since the source is not obliged to have the same LAPIC mode as the enabled ones, we cannot rely on it. Since it is a rare case, it is unoptimized and done on the slow-path. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from kvm_apic_broadcast. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:22 +01:00
Paolo Bonzini	a183b638b6	KVM: x86: make apic_accept_irq tracepoint more generic Initially the tracepoint was added only to the APIC_DM_FIXED case, also because it reported coalesced interrupts that only made sense for that case. However, the coalesced argument is not used anymore and tracing other delivery modes is useful, so hoist the call out of the switch statement. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-11 11:51:02 +02:00
Nadav Amit	1e1b6c2644	KVM: x86: recalculate_apic_map after enabling apic Currently, recalculate_apic_map ignores vcpus whose lapic is software disabled through the spurious interrupt vector. However, once it is re-enabled, the map is not recalculated. Therefore, if the guest OS configured DFR while lapic is software-disabled, the map may be incorrect. This patch recalculates apic map after software enabling the lapic. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:29 +02:00
Nadav Amit	fae0ba2157	KVM: x86: Clear apic tsc-deadline after deadline Intel SDM 10.5.4.1 says "When the timer generates an interrupt, it disarms itself and clears the IA32_TSC_DEADLINE MSR". This patch clears the MSR upon timer interrupt delivery which delivered on deadline mode. Since the MSR may be reconfigured while an interrupt is pending, causing the new value to be overriden, pending timer interrupts are checked before setting a new deadline. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:29 +02:00
Wanpeng Li	56cc2406d6	KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use After commit `77b0f5d` (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), "Acknowledge interrupt on exit" behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [<ffffffff81493563>] dump_stack+0x49/0x5e [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96 [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17 [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm] [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm] To fix this, we cannot rely on the processor's virtual interrupt delivery, because "acknowledge interrupt on exit" must only update the virtual ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR) but it should not deliver the interrupt through the IDT. Thus, KVM has to deliver the interrupt "by hand", similar to the treatment of EOI in commit `fc57ac2c9c` (KVM: lapic: sync highest ISR to hardware apic on EOI, 2014-05-14). The patch modifies kvm_cpu_get_interrupt to always acknowledge an interrupt; there are only two callers, and the other is not affected because it is never reached with kvm_apic_vid_enabled() == true. Then it modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition to the registers. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Tested-by: Liu, RongrongX <rongrongx.liu@intel.com> Tested-by: Felipe Reyes <freyes@suse.com> Fixes: `77b0f5d67f` Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-05 15:00:24 +02:00
Nadav Amit	98eff52ab5	KVM: x86: Fix lapic.c debug prints In two cases lapic.c does not use the apic_debug macro correctly. This patch fixes them. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-09 18:09:57 +02:00
Paolo Bonzini	fc57ac2c9c	KVM: lapic: sync highest ISR to hardware apic on EOI When Hyper-V enlightenments are in effect, Windows prefers to issue an Hyper-V MSR write to issue an EOI rather than an x2apic MSR write. The Hyper-V MSR write is not handled by the processor, and besides being slower, this also causes bugs with APIC virtualization. The reason is that on EOI the processor will modify the highest in-service interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of the SDM; every other step in EOI virtualization is already done by apic_send_eoi or on VM entry, but this one is missing. We need to do the same, and be careful not to muck with the isr_count and highest_isr_cache fields that are unused when virtual interrupt delivery is enabled. Cc: stable@vger.kernel.org Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-27 10:21:09 +02:00
Linus Torvalds	7ebd3faa9b	First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu x7N8KR5hAj+mLBoM9/Al =8sYU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Andrew Jones	0dce7cd67f	kvm: x86: fix apic_base enable check Commit `e66d2ae7c6` moved the assignment vcpu->arch.apic_base = value above a condition with (vcpu->arch.apic_base ^ value), causing that check to always fail. Use old_value, vcpu->arch.apic_base's old value, in the condition instead. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 13:42:14 +01:00
Marcelo Tosatti	9ed96e87c5	KVM: x86: limit PIT timer frequency Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 12:43:54 +01:00
Chen Fan	96893977b8	KVM: x86: Fix debug typo error in lapic fix the 'vcpi' typos when apic_debug is enabled. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:09:57 -02:00
Jan Kiszka	e66d2ae7c6	KVM: x86: Fix APIC map calculation after re-enabling Update arch.apic_base before triggering recalculate_apic_map. Otherwise the recalculation will work against the previous state of the APIC and will fail to build the correct map when an APIC is hardware-enabled again. This fixes a regression of `1e08ec4a13`. Cc: stable@vger.kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-12-30 18:58:17 -02:00
Gleb Natapov	17d68b763f	KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request an IPI, while in x2apic mode the following things happen, the destination is read from ICR2, which is a register that the guest can control. kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the cluster id. A BUG_ON is triggered, which is a protection against accessing map->logical_map with an out-of-bounds access and manages to avoid that anything really unsafe occurs. The logic in the code is correct from real HW point of view. The problem is that KVM supports only one cluster with ID 0 in clustered mode, but the code that has the bug does not take this into account. Reported-by: Lars Bull <larsbull@google.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:46:18 +01:00
Andy Honig	fda4e2e855	KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: `b93463aa59` ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:46 +01:00
Andy Honig	b963a22e6d	KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:45 +01:00

1 2 3 4

173 Коммитов