The KVM emulator code assumes that the guest virtual address space (in 64-bit)
is 48-bits wide. Fail the KVM_SET_CPUID and KVM_SET_CPUID2 ioctl if
userspace tries to create a guest that does not obey this restriction.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
/me got confused between the kernel and QEMU. In the kernel, you can
only have one module_init function, and it will prevent unloading the
module unless you also have the corresponding module_exit function.
So, commit 80ce163972 (KVM: VFIO: register kvm_device_ops dynamically,
2014-09-02) broke unloading of the kvm module, by adding a module_init
function and no module_exit.
Repair it by making kvm_vfio_ops_init weak, and checking it in
kvm_init.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Alex Williamson <Alex.Williamson@redhat.com>
Fixes: 80ce163972
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit c77dcac (KVM: Move more code under CONFIG_HAVE_KVM_IRQFD) added
functionality that depends on definitions in ioapic.h when
__KVM_HAVE_IOAPIC is defined.
At the same time, kvm-arm commit 0ba0951 (KVM: EVENTFD: remove inclusion
of irq.h) removed the inclusion of irq.h, an architecture-specific header
that is not present on ARM but which happened to include ioapic.h on x86.
Include ioapic.h directly in eventfd.c if __KVM_HAVE_IOAPIC is defined.
This fixes x86 and lets ARM use eventfd.c.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and ret is redundant.
This patch removes the redundant variable, and makes init_rmode_identity_map()
return 0 on success, -errno on failure.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.
In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized
Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.
NOTE: In the original code, ept identity pagetable page is pinned in memroy.
As a result, it cannot be migrated/hot-removed. After this patch, since
kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
is no longer pinned in memory. And it can be migrated/hot-removed.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a dynamic means to register kvm_device_ops, use that
for the VFIO kvm device, instead of relying on the static table.
This is achieved by a module_init call to register the ops with KVM.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alex Williamson <Alex.Williamson@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using the new kvm_register_device_ops() interface makes us get rid of
an #ifdef in common code.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a dynamic means to register kvm_device_ops, use that
for the ARM VGIC, instead of relying on the static table.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_ioctl_create_device currently has knowledge of all the device types
and their associated ops. This is fairly inflexible when adding support
for new in-kernel device emulations, so move what we currently have out
into a table, which can support dynamic registration of ops by new
drivers for virtual hardware.
Cc: Alex Williamson <Alex.Williamson@redhat.com>
Cc: Alex Graf <agraf@suse.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some delay between
the EOI writing and irq delivery. If we do not emulate this behavior, and
re-inject the interrupt immediately after the guest sends an EOI and re-enables
interrupts, a guest might spend all its time in the ISR if it has a broken
handler for a level-triggered interrupt.
Such livelock actually happens with Windows guests when resuming from
hibernation.
As there's no way to recognize the broken handle from new raised ones, this patch
delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
still high. The guest can then make a little forward progress, until a proper
IRQ handler is set or until some detection routine in the guest (such as
Linux's note_interrupt()) recognizes the situation.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch replace the set_bit method by kvm_make_request
to make code more readable and consistent.
Signed-off-by: Guo Hui Liu <liuguohui@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Initially the tracepoint was added only to the APIC_DM_FIXED case,
also because it reported coalesced interrupts that only made sense
for that case. However, the coalesced argument is not used anymore
and tracing other delivery modes is useful, so hoist the call out
of the switch statement.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of
apic access page. So use this macro.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1. Crypto/CPACF support: To enable the MSA4 instructions we have to
provide a common control structure for each SIE control block
2. Two cleanups found by a static code checker: one redundant assignment
and one useless if
3. Fix the page handling of the diag10 ballooning interface. If the
guest freed the pages at absolute 0 some checks and frees were
incorrect
4. Limit guests to 16TB
5. Add __must_check to interrupt injection code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJUEFCCAAoJEBF7vIC1phx8HgYP/RQLmy5cD8EJCUnETCjnj9+E
5AhAtHVMAgjTuKdAIIrTbcdA3od8BjnYgfHi7zczG9XSOTA6Mq+suelTwkoWWQmU
xA4Q4/zuliU5w2LgFYDeSQIrn4faNrCut0eDd2ViQhKL2m8PYBA7O4CllsDo1o8l
B8q4cHTinXSU4fbppKaohDrdoNvSJVSiNeIy/doAxYLx/OKhlxmGd+dFXQYtUIvP
/aO31h5BhFkoljRi0AU11GC6yzIHqwJtjRjnV5k2BWmPi8kyeJe8mAbkE0+M10uI
OSVC34BHh7p+O9vNUZHKBzhZnqvyB4qc+s8Vp5wXwG0mWjWPkaevZgxOLvccOBi1
1SPKv4ic2aFdfcSiMrP+EEL45OS/ISely2o1bgNVWHxSdgsnMir5H3Frohn+pWaZ
y6Fw9jaXKTy5f25/IzTteENEvURQVfNjAHUQO03oXu/L+G8MVhzY6g7WiK+tRHDo
anM3fUThkPwBPdMVpDsBTsCoTOTVTJqMHvnxJujEWYJmVmnnzZzMyUDIUK9qqzVi
jZDpP8NEkV5LbWMr240DW1XteRGFKWtPZ176+NJUfut40jQWTiDLhBf1wG1xBRsF
40JYBqATgaLM1Ggk4r3ykd1U9jrX3oqxMT7XJsc111m5ieaaiD1F5SJPXw07Rvug
cSJI39G+9Df7Jz4R3hwv
=jt+E
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20140910' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
KVM: s390: Fixes and features for next (3.18)
1. Crypto/CPACF support: To enable the MSA4 instructions we have to
provide a common control structure for each SIE control block
2. Two cleanups found by a static code checker: one redundant assignment
and one useless if
3. Fix the page handling of the diag10 ballooning interface. If the
guest freed the pages at absolute 0 some checks and frees were
incorrect
4. Limit guests to 16TB
5. Add __must_check to interrupt injection code
The old handling of prefix pages was broken in the diag10 ballooner.
We now rely on gmap_discard to check for start > end and do a
slow path if the prefix swap pages are affected:
1. discard the pages from start to prefix
2. discard the absolute 0 pages
3. discard the pages after prefix swap to end
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Due to the earlier check we know that ipte_lock_count must be 0.
No need to add a useless if. Let's make clear that we are going
to always wakeup when we execute that code.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We must not fallthrough if the conditions for external call are not met.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Currently we fill up a full 5 level page table to hold the guest
mapping. Since commit "support gmap page tables with less than 5
levels" we can do better.
Having more than 4 TB might be useful for some testing scenarios,
so let's just limit ourselves to 16TB guest size.
Having more than that is totally untested as I do not have enough
swap space/memory.
We continue to allow ucontrol the full size.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
We now propagate interrupt injection errors back to the ioctl. We
should mark functions that might fail with __must_check.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
__kvm_set_memory_region sets r to EINVAL very early.
Doing it again is not necessary. The same is true later on, where
r is assigned -ENOMEM twice.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The first statement of kvm_dev_ioctl is
long r = -EINVAL;
No need to reassign the same value.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The expression `vcpu->spin_loop.in_spin_loop' is always true,
because it is evaluated only when the condition
`!vcpu->spin_loop.in_spin_loop' is false.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault. To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.
Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a nested page fault happens during emulation, we will inject a vmexit,
not a page fault. However because writeback happens after the injection,
we will write ctxt->eip from L2 into the L1 EIP. We do not write back
if an instruction caused an interception vmexit---do the same for page
faults.
Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is similar to what the EPT code does with the exit qualification.
This allows the guest to see a valid value for bits 33:32.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
page table entries. Intel ignores it; AMD ignores it in PDEs, but reserves it
in PDPEs and PML4Es. The SVM test is relying on this behavior, so enforce it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:
(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.
(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.
(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.
This patch fixes the issue by using the memslot generation number
to validate the mmio cache.
Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.
If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.
We can prevent the following case:
vcpu (CPU 0) | thread (CPU 1)
--------------------------------------------+--------------------------
1 vm exit |
2 srcu_read_unlock(&kvm->srcu) |
3 decide to cache something based on |
old memslots |
4 | change memslots
| (increments generation)
5 | synchronize_srcu(&kvm->srcu);
6 retrieve generation # from new memslots |
7 tag cache with new memslot generation |
8 srcu_read_unlock(&kvm->srcu) |
... |
<action based on cache occurs even |
though the caching decision was based |
on the old memslots> |
... |
<action *continues* to occur until next |
memslot generation change, which may |
be never> |
|
By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).
Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots. It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.
To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE. This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited. Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.
Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The next patch will give a meaning (a la seqcount) to the low bit of the
generation number. Ensure that it matches between kvm->memslots->generation
and kvm_current_mmio_generation().
Cc: stable@vger.kernel.org
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The check introduced in commit d7a2a246a1 (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19)
will break if the guest maxphyaddr is higher than the host's (which
sometimes happens depending on your hardware and how QEMU is
configured).
To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR
does already.
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
Remove unnecessary arguments that stem from this.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)
This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
e790d9ef6 KVM: add kvm_arch_sched_in
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).
Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD
exception in real-mode or vm86. However, the emulator considers all these
instructions the same for the matter of mode checks, and emulation upon exit
due to #UD exception.
As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH
or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these
instruction and inject #GP to the guest instead of #UD.
This patch creates a new group for these instructions and mark only VMCALL as
an instruction which can be emulated.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sparse reports the following easily fixed warnings:
arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer
arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static?
arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer
arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static?
arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static?
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411
TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.
We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
[Add PAGE_ALIGNED check, do not write useless virtual APIC page address
if TPR shadowing is disabled. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce function nested_get_vmcs12_pages() to check the valid
of nested apic access page and virtual apic page earlier.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that
userspace can, at run-time, determine if a feature is supported or not.
This allows KVM to being supporting a new feature with a new kernel
version without any need to update user space. Unfortunately, since the
definition of KVM_CAP_USER_NMI was guarded by #ifdef
__KVM_HAVE_USER_NMI, such discovery still required a user space update.
Therefore, unconditionally export KVM_CAP_USER_NMI and change the
the typo in the comment for the IOCTL number definition as well.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that
userspace can, at run-time, determine if a feature is supported or not.
This allows KVM to being supporting a new feature with a new kernel
version without any need to update user space. Unfortunately, since the
definition of KVM_CAP_READONLY_MEM was guarded by #ifdef
__KVM_HAVE_READONLY_MEM, such discovery still required a user space
update.
Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the
in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit ab3f285f22 ("KVM: s390/mm: try a cow on read only pages for
key ops")' misaligned a code block. Let's fixup the indentation.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1. The usual cleanups: get rid of duplicate code, use defines, factor
out the sync_reg handling, additional docs for sync_regs, better
error handling on interrupt injection
2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes
3. Additional registers for kvm_run sync regs. This is usually not
needed in the fast path due to eventfd/irqfd, but kvm stat claims
that we reduced the overhead of console output by ~50% on my system
4. A rework of the gmap infrastructure. This is the 2nd step towards
host large page support (after getting rid of the storage key
dependency). We introduces two radix trees to store the guest-to-host
and host-to-guest translations. This gets us rid of most of
the page-table walks in the gmap code. Only one in __gmap_link is left,
this one is required to link the shadow page table to the process page
table. Finally this contains the plumbing to support gmap page tables
with less than 5 levels.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJT/EEwAAoJEBF7vIC1phx8UOQQAKifhckkQi39bVlw75y6Is0u
YGeBp63zJqJ4mqZXUc3CL7GOKp9beps+uHj9KqlHIPJMg6oytWf+6fgr2kygQZFh
kMdlITphF5AezklAtocVu4LumRjslAhRkc6kWE2J21w9xUeSExzpavUc9kYmj8W8
81BJMcG0/xRiZSQ+GynRPn6tk9+zgIMEmUmBQoHXfWElBGNUhIJi9xSfoKOrlDho
on2PgDYSfnwfftKDaq7ttPA4ApHLxyiOpoWXnldy1SSiy1MdZpXNbKLEiuRf5g9R
2k3sJmvBxNfb3CRJuhyKAqvDbt+u+NLEktSJcky61H1R5J23oobfVBDsD2vum8Ah
ZJIwSM9H/Hi6FqJQCxywkyU1Vj+Wn7U2NYPrFLpi4apfSVi9uxXCFtWzV3WPSay4
mM87ZRd8mFQRz5DTTfK5VNAraNk3m7XTWzZyo8vQ350vk2xQDIw7X6PB+7YqAmu6
99ikYtAN/0fAUIXXx8ORbcjhPLJzEyfylJMQ2Dz+aGWS7wEdb7P+9xSAJfFzMbJQ
DWwlJVKZuc4OP3gsEyYQlB2EI+P2iJIIA5uyIPcEOPZeUHpZ1skD/ked0LHBZkwm
jj5eIcbjaiPbIcwaOlPyF5H68O/XPE7TnSSdJhpa6vp6uVMUZ0O9XWjJE+eeqKv2
X5j8yraVuyTJfEZ9RGQP
=KrOE
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20140825' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and features for 3.18 part 1
1. The usual cleanups: get rid of duplicate code, use defines, factor
out the sync_reg handling, additional docs for sync_regs, better
error handling on interrupt injection
2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes
3. Additional registers for kvm_run sync regs. This is usually not
needed in the fast path due to eventfd/irqfd, but kvm stat claims
that we reduced the overhead of console output by ~50% on my system
4. A rework of the gmap infrastructure. This is the 2nd step towards
host large page support (after getting rid of the storage key
dependency). We introduces two radix trees to store the guest-to-host
and host-to-guest translations. This gets us rid of most of
the page-table walks in the gmap code. Only one in __gmap_link is left,
this one is required to link the shadow page table to the process page
table. Finally this contains the plumbing to support gmap page tables
with less than 5 levels.
The radix tree rework removed all code that uses the gmap_rmap
and gmap_pgtable data structures. Remove these outdated definitions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add an addressing limit to the gmap address spaces and only allocate
the page table levels that are needed for the given limit. The limit
is fixed and can not be changed after a gmap has been created.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>