Граф коммитов

56 Коммитов

Автор SHA1 Сообщение Дата
Dor Laor 9b22bf5783 KVM: Fix guest register corruption on paravirt hypercall
The hypercall code mixes up the ->cache_regs() and ->decache_regs()
callbacks, resulting in guest register corruption.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:22 +03:00
Avi Kivity ca45aaae1e KVM: Unset kvm_arch_ops if arch module loading failed
Otherwise, the core module thinks the arch module is loaded, and won't
let you reload it after you've fixed the bug.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-18 10:49:06 +02:00
Andrew Morton e9cdb1e330 KVM: Move kvmfs magic number to <linux/magic.h>
Use the standard magic.h for kvmfs.

Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:43 +02:00
Avi Kivity 58e690e6fd KVM: Fix bogus failure in kvm.ko module initialization
A bogus 'return r' can cause an otherwise successful module load to fail.
This both denies users the use of kvm, and it also denies them the use of
their machine, as it leaves a filesystem registered with its callbacks
pointing into now-freed module memory.

Fix by returning a zero like a good module.

Thanks to Richard Lucassen <mailinglists@lucassen.org> (?) for reporting
the problem and for providing access to a machine which exhibited it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:43 +02:00
Uri Lublin ff990d5952 KVM: Remove write access permissions when dirty-page-logging is enabled
Enabling dirty page logging is done using KVM_SET_MEMORY_REGION ioctl.
If the memory region already exists, we need to remove write accesses,
so writes will be caught, and dirty pages will be logged.

Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:43 +02:00
Uri Lublin 02b27c1f80 kvm: move do_remove_write_access() up
To be called from kvm_vm_ioctl_set_memory_region()

Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:43 +02:00
Uri Lublin cd1a4a982a KVM: Fix dirty page log bitmap size/access calculation
Since dirty_bitmap is an unsigned long array, the alignment and size need
to take that into account.

Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Uri Lublin ab51a434c5 KVM: Add missing calls to mark_page_dirty()
A few places where we modify guest memory fail to call mark_page_dirty(),
causing live migration to fail.  This adds the missing calls.

Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Avi Kivity bccf2150fe KVM: Per-vcpu inodes
Allocate a distinct inode for every vcpu in a VM.  This has the following
benefits:

 - the filp cachelines are no longer bounced when f_count is incremented on
   every ioctl()
 - the API and internal code are distinctly clearer; for example, on the
   KVM_GET_REGS ioctl, there is no need to copy the vcpu number from
   userspace and then copy the registers back; the vcpu identity is derived
   from the fd used to make the call

Right now the performance benefits are completely theoretical since (a) we
don't support more than one vcpu per VM and (b) virtualization hardware
inefficiencies completely everwhelm any cacheline bouncing effects.  But
both of these will change, and we need to prepare the API today.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Avi Kivity c5ea766006 KVM: Move kvm_vm_ioctl_create_vcpu() around
In preparation of some hacking.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Avi Kivity 2c6f5df979 KVM: Rename some kvm_dev_ioctl_*() functions to kvm_vm_ioctl_*()
This reflects the changed scope, from device-wide to single vm (previously
every device open created a virtual machine).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Avi Kivity f17abe9a44 KVM: Create an inode per virtual machine
This avoids having filp->f_op and the corresponding inode->i_fop different,
which is a little unorthodox.

The ioctl list is split into two: global kvm ioctls and per-vm ioctls.  A new
ioctl, KVM_CREATE_VM, is used to create VMs and return the VM fd.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:42 +02:00
Avi Kivity 37e29d906c KVM: Add internal filesystem for generating inodes
The kvmfs inodes will represent virtual machines and vcpus, as necessary,
reducing cacheline bouncing due to inodes and filps being shared.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:41 +02:00
Avi Kivity 19d1408dfd KVM: More 0 -> NULL conversions
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:41 +02:00
Avi Kivity 270fd9b96f KVM: Wire up hypercall handlers to a central arch-independent location
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:41 +02:00
Ingo Molnar 102d8325a1 KVM: add MSR based hypercall API
This adds a special MSR based hypercall API to KVM. This is to be
used by paravirtual kernels and virtual drivers.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:40 +02:00
Markus Rechberger 5972e9535e KVM: Use page_private()/set_page_private() apis
Besides using an established api, this allows using kvm in older kernels.

Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:39 +02:00
Avi Kivity d27d4aca18 KVM: Cosmetics
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:39 +02:00
Jeremy Katz 43934a38d7 KVM: Move virtualization deactivation from CPU_DEAD state to CPU_DOWN_PREPARE
This gives it more chances of surviving suspend.

Signed-off-by: Jeremy Katz <katzj@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04 11:12:39 +02:00
Avi Kivity 59ae6c6b87 [PATCH] KVM: Host suspend/resume support
Add the necessary callbacks to suspend and resume a host running kvm.  This is
just a repeat of the cpu hotplug/unplug work.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:41 -08:00
Avi Kivity 774c47f1d7 [PATCH] KVM: cpu hotplug support
On hotplug, we execute the hardware extension enable sequence.  On unplug, we
decache any vcpus that last ran on the exiting cpu, and execute the hardware
extension disable sequence.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:41 -08:00
Avi Kivity 133de9021d [PATCH] KVM: Add a global list of all virtual machines
This will allow us to iterate over all vcpus and see which cpus they are
running on.

[akpm@osdl.org: use standard (ugly) initialisers]
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Ingo Molnar 1e8ba6fba5 [PATCH] kvm: fix vcpu freeing bug
vcpu_load() can return NULL and it sometimes does in failure paths (for
example when the userspace ABI version is too old) - causing a preemption
count underflow in the ->vcpu_free() later on.  So check for NULL.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Dor Laor 54810342f1 [PATCH] kvm: Two-way apic tpr synchronization
We report the value of cr8 to userspace on an exit.  Also let userspace change
cr8 when we re-enter the guest.  The lets 64-bit guest code maintain the tpr
correctly.

Thanks for Yaniv Kamay for the idea.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Al Viro 8b6d44c7bd [PATCH] kvm: NULL noise removal
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Al Viro 2f36698799 [PATCH] kvm: __user annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Avi Kivity 6f00e68f21 [PATCH] KVM: Emulate IA32_MISC_ENABLE msr
This allows netbsd 3.1 i386 to get further along installing.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:57 -08:00
Avi Kivity 084384754e [PATCH] KVM: make sure there is a vcpu context loaded when destroying the mmu
This makes the vmwrite errors on vm shutdown go away.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:06 -08:00
Ingo Molnar d21225ee2b [PATCH] KVM: Make loading cr3 more robust
Prevent the guest's loading of a corrupt cr3 (pointing at no guest phsyical
page) from crashing the host.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:28 -08:00
Avi Kivity cc1d8955cb [PATCH] KVM: Add missing 'break'
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:28 -08:00
Avi Kivity 86a2b42e81 [PATCH] KVM: Initialize vcpu->kvm a little earlier
Fixes oops on early close of /dev/kvm.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:28 -08:00
Avi Kivity 9ede74e0af [PATCH] KVM: MMU: Destroy mmu while we still have a vcpu left
mmu_destroy flushes the guest tlb (indirectly), which needs a valid vcpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:27 -08:00
Avi Kivity 714b93da1a [PATCH] KVM: MMU: Replace atomic allocations by preallocated objects
The mmu sometimes needs memory for reverse mapping and parent pte chains.
however, we can't allocate from within the mmu because of the atomic context.

So, move the allocations to a central place that can be executed before the
main mmu machinery, where we can bail out on failure before any damage is
done.

(error handling is deffered for now, but the basic structure is there)

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:27 -08:00
Avi Kivity 32b3562735 [PATCH] KVM: MMU: Fix cmpxchg8b emulation
cmpxchg8b uses edx:eax as the compare operand, not edi:eax.

cmpxchg8b is used by 32-bit pae guests to set page table entries atomically,
and this is emulated touching shadowed guest page tables.

Also, implement it for 32-bit hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:26 -08:00
Avi Kivity 5f015a5b28 [PATCH] KVM: MMU: Remove invlpg interception
Since we write protect shadowed guest page tables, there is no need to trap
page invalidations (the guest will always change the mapping before issuing
the invlpg instruction).

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:25 -08:00
Avi Kivity a436036baf [PATCH] KVM: MMU: If emulating an instruction fails, try unprotecting the page
A page table may have been recycled into a regular page, and so any
instruction can be executed on it.  Unprotect the page and let the cpu do its
thing.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:25 -08:00
Avi Kivity da4a00f002 [PATCH] KVM: MMU: Support emulated writes into RAM
As the mmu write protects guest page table, we emulate those writes.  Since
they are not mmio, there is no need to go to userspace to perform them.

So, perform the writes in the kernel if possible, and notify the mmu about
them so it can take the approriate action.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:25 -08:00
Avi Kivity 1b0973bd8f [PATCH] KVM: MMU: Use the guest pdptrs instead of mapping cr3 in pae mode
This lets us not write protect a partial page, and is anyway what a real
processor does.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:24 -08:00
Avi Kivity 1342d3536d [PATCH] KVM: MMU: Load the pae pdptrs on cr3 change like the processor does
In pae mode, a load of cr3 loads the four third-level page table entries in
addition to cr3 itself.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:24 -08:00
Avi Kivity cd4a4e5374 [PATCH] KVM: MMU: Implement simple reverse mapping
Keep in each host page frame's page->private a pointer to the shadow pte which
maps it.  If there are multiple shadow ptes mapping the page, set bit 0 of
page->private, and use the rest as a pointer to a linked list of all such
mappings.

Reverse mappings are needed because we when we cache shadow page tables, we
must protect the guest page tables from being modified by the guest, as that
would invalidate the cached ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:24 -08:00
Avi Kivity 399badf315 [PATCH] KVM: Prevent stale bits in cr0 and cr4
Hardware virtualization implementations allow the guests to freely change some
of the bits in cr0 and cr4, but trap when changing the other bits.  This is
useful to avoid excessive exits due to changing, for example, the ts flag.

It also means the kvm's copy of cr0 and cr4 may be stale with respect to these
bits.  most of the time this doesn't matter as these bits are not very
interesting.  Other times, however (for example when returning cr0 to
userspace), they are, so get the fresh contents of these bits from the guest
by means of a new arch operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:23 -08:00
Dor Laor c1150d8cf9 [PATCH] KVM: Improve interrupt response
The current interrupt injection mechanism might delay an interrupt under
the following circumstances:

 - if injection fails because the guest is not interruptible (rflags.IF clear,
   or after a 'mov ss' or 'sti' instruction).  Userspace can check rflags,
   but the other cases or not testable under the current API.
 - if injection fails because of a fault during delivery.  This probably
   never happens under normal guests.
 - if injection fails due to a physical interrupt causing a vmexit so that
   it can be handled by the host.

In all cases the guest proceeds without processing the interrupt, reducing
the interactive feel and interrupt throughput of the guest.

This patch fixes the situation by allowing userspace to request an exit
when the 'interrupt window' opens, so that it can re-inject the interrupt
at the right time.  Guest interactivity is very visibly improved.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:22 -08:00
Yoshimi Ichiyanagi e097f35ce5 [PATCH] KVM: Recover after an arch module load failure
If we load the wrong arch module, it leaves behind kvm_arch_ops set, which
prevents loading of the correct arch module later.

Fix be not setting kvm_arch_ops until we're sure it's good.

Signed-off-by: Yoshimi Ichiyanagi <ichiyanagi.yoshimi@lab.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:22 -08:00
Ingo Molnar 8018c27b26 [PATCH] kvm: fix GFP_KERNEL allocation in atomic section in kvm_dev_ioctl_create_vcpu()
fix an GFP_KERNEL allocation in atomic section: kvm_dev_ioctl_create_vcpu()
called kvm_mmu_init(), which calls alloc_pages(), while holding the vcpu.

The fix is to set up the MMU state in two phases: kvm_mmu_create() and
kvm_mmu_setup().

(NOTE: free_vcpus does an kvm_mmu_destroy() call so there's no need for any
extra teardown branch on allocation/init failure here.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity 55a54f79e0 [PATCH] KVM: Fix oops on oom
__free_page() doesn't like a NULL argument, so check before calling it.  A
NULL can only happen if memory is exhausted during allocation of a memory
slot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity a8d13ea28b [PATCH] KVM: More msr misery
These msrs are referenced by benchmarking software when pretending to be an
Intel cpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity 3bab1f5dda [PATCH] KVM: Move common msr handling to arch independent code
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Yoshimi Ichiyanagi 09db28b8a3 [PATCH] KVM: Initialize kvm_arch_ops on unload
The latest version of kvm doesn't initialize kvm_arch_ops in kvm_init(), which
causes an error with the following sequence.

1. Load the supported arch's module.
2. Load the unsupported arch's module.$B!!(B(loading error)
3. Unload the unsupported arch's module.

You'll get the following error message after step 3.  "BUG: unable to handle
to handle kernel paging request at virtual address xxxxxxxx"

The problem here is that the unsupported arch's module overwrites kvm_arch_ops
of the supported arch's module at step 2.

This patch initializes kvm_arch_ops upon loading architecture specific kvm
module, and prevents overwriting kvm_arch_ops when kvm_arch_ops is already set
correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity a9058ecd3c [PATCH] KVM: Simplify is_long_mode()
Instead of doing tricky stuff with the arch dependent virtualization
registers, take a peek at the guest's efer.

This simlifies some code, and fixes some confusion in the mmu branch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity 0b76e20b27 [PATCH] KVM: API versioning
Add compile-time and run-time API versioning.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:46 -08:00