No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by
kvm->slots_lock, so move it out of mmu spinlock
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Set spte accessed bit only if guest_initiated == 1 that means the really
accessed
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The hash array of async gfns may still contain some left gfns after
kvm_clear_async_pf_completion_queue() called, need to clear them.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The changelog of 104f226 said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
isr_ack logic was added by e48258009d to avoid unnecessary IPIs. Back
then it made sense, but now the code checks that vcpu is ready to accept
interrupt before sending IPI, so this logic is no longer needed. The
patch removes it.
Fixes a regression with Debian/Hurd.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reported-and-tested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch fixes the logic used to detect whether BIOS has disabled VMX, for
the case where VMX is enabled only under SMX, but tboot is not active.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we enable an NMI window, we ask for an IRET intercept, since
the IRET re-enables NMIs. However, the IRET intercept happens before
the instruction executes, while the NMI window architecturally opens
afterwards.
To compensate for this mismatch, we only open the NMI window in the
following exit, assuming that the IRET has by then executed; however,
this assumption is not always correct; we may exit due to a host interrupt
or page fault, without having executed the instruction.
Fix by checking for forward progress by recording and comparing the IRET's
rip. This is somewhat of a hack, since an unchaging rip does not mean that
no forward progress has been made, but is the simplest fix for now.
Signed-off-by: Avi Kivity <avi@redhat.com>
The interrupt injection logic looks something like
if an nmi is pending, and nmi injection allowed
inject nmi
if an nmi is pending
request exit on nmi window
the problem is that "nmi is pending" can be set asynchronously by
the PIT; if it happens to fire between the two if statements, we
will request an nmi window even though nmi injection is allowed. On
SVM, this has disasterous results, since it causes eflags.TF to be
set in random guest code.
The fix is simple; make nmi_pending synchronous using the standard
vcpu->requests mechanism; this ensures the code above is completely
synchronous wrt nmi_pending.
Signed-off-by: Avi Kivity <avi@redhat.com>
Use the new support in the emulator, and drop the ad-hoc code in x86.c.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mark some instructions as vendor specific, and allow the caller to request
emulation only of vendor specific instructions. This is useful in some
circumstances (responding to a #UD fault).
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
x86_decode_insn() doesn't return X86EMUL_* values, so the check
for X86EMUL_PROPOGATE_FAULT will always fail. There is a proper
check later on, so there is no need for a replacement for this
code.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This warning was once used for debugging QEMU user space. Though
uncommon, it is actually possible to send an INIT request to a running
VCPU. So better drop this warning before someone misuses it to flood
kernel logs this way.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When a vcpu is reset, kvmclock page keeps being written to this days.
This is wrong and inconsistent: a cpu reset should take it to its
initial state.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make __get_user_pages return -EHWPOISON for HWPOISON page only if
FOLL_HWPOISON is specified. With this patch, the interested callers
can distinguish HWPOISON pages from general FAULT pages, while other
callers will still get -EFAULT for all these pages, so the user space
interface need not to be changed.
This feature is needed by KVM, where UCR MCE should be relayed to
guest for HWPOISON page, while instruction emulation and MMIO will be
tried for general FAULT page.
The idea comes from Andrew Morton.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
A correction to Intel cpu model CPUID data (patch queued)
caused winxp to BSOD when booted with a Penryn model.
This was traced to the CPUID "model" field correction from
6 -> 23 (as is proper for a Penryn class of cpu). Only in
this case does the problem surface.
The cause for this failure is winxp accessing the BBL_CR_CTL3
MSR which is unsupported by current kvm, appears to be a
legacy MSR not fully characterized yet existing in current
silicon, and is apparently carried forward in MSR space to
accommodate vintage code as here. It is not yet conclusive
whether this MSR implements any of its legacy functionality
or is just an ornamental dud for compatibility. While I
found no silicon version specific documentation link to
this MSR, a general description exists in Intel's developer's
reference which agrees with the functional behavior of
other bootloader/kernel code I've examined accessing
BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19
called out as "reserved" in the above document.
So to minimally accommodate this MSR, kvm msr get will provide
the equivalent mock data and kvm msr write will simply toss the
guest passed data without interpretation. While this treatment
of BBL_CR_CTL3 addresses the immediate problem, the approach may
be modified pending clarification from Intel.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.
Also
1: No need atomically to read/write ->mode in vcpu's thread
2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitly
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This case is a pure user space error we do not need to record. Moreover,
it can be misused to flood the kernel log. Remove it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of exchanging the guest and host rcx, have separate storage
for each. This allows us to avoid using the xchg instruction, which
is is a little slower than normal operations.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change
push top-of-stack
pop guest-rcx
pop dummy
to
pop guest-rcx
which is the same thing, only simpler.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger
PLE exits, even with the minimalistic PLE test from kvm-unit-tests.
http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545
For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in
order to get pause loop exits:
# modprobe kvm_intel ple_gap=47
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
-device testdev,chardev=log -chardev stdio,id=log \
-kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 58298446
# rmmod kvm_intel
# modprobe kvm_intel ple_gap=48
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
-device testdev,chardev=log -chardev stdio,id=log \
-kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 36616
Increase the ple_gap to 128 to be on the safe side.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Zhai, Edwin <edwin.zhai@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the necessary code to run perf-kvm on AMD
machines.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When emulating real mode, we fake some state:
- tr.base points to a fake vm86 tss
- segment registers are made to conform to vm86 restrictions
change vmx_get_segment() not to expose this fake state to userspace;
instead, return the original state.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When emulating real mode we play with tr hidden state, but leave
tr.selector alone. That works well, except for save/restore, since
loading TR writes it to the hidden state in vmx->rmode.
Fix by also saving and restoring the tr selector; this makes things
more consistent and allows migration to work during the early
boot stages of Windows XP.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Previously SPRGs 4-7 were improperly read and written in
kvm_arch_vcpu_ioctl_get_regs() and kvm_arch_vcpu_ioctl_set_regs();
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
WARNING: arch/x86/built-in.o(.text+0x1bb74): Section mismatch in reference from the function kvm_guest_cpu_online() to the function .cpuinit.text:kvm_guest_cpu_init()
The function kvm_guest_cpu_online() references
the function __cpuinit kvm_guest_cpu_init().
This is often because kvm_guest_cpu_online lacks a __cpuinit
annotation or the annotation of kvm_guest_cpu_init is wrong.
This patch fixes the warning.
Tested with linux-next (next-20101231)
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead, drop large mappings, which were the reason we dropped shadow.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/block: amiflop - Remove superfluous amiga_chip_alloc() cast
m68k/atari: ARAnyM - Add support for network access
m68k/atari: ARAnyM - Add support for console access
m68k/atari: ARAnyM - Add support for block access
m68k/atari: Initial ARAnyM support
m68k: Kconfig - Remove unneeded "default n"
m68k: Makefiles - Change to new flags variables
m68k/amiga: Reclaim Chip RAM for PPC exception handlers
m68k: Allow all kernel traps to be handled via exception fixups
m68k: Use base_trap_init() to initialize vectors
m68k: Add helper function handle_kernel_fault()
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (91 commits)
ARM: 6806/1: irq: introduce entry and exit functions for chained handlers
ARM: 6781/1: Thumb-2: Work around buggy Thumb-2 short branch relocations in gas
ARM: 6747/1: P2V: Thumb2 support
ARM: 6798/1: aout-core: zero thread debug registers in a.out core dump
ARM: 6796/1: Footbridge: Fix I/O mappings for NOMMU mode
ARM: 6784/1: errata: no automatic Store Buffer drain on Cortex-A9
ARM: 6772/1: errata: possible fault MMU translations following an ASID switch
ARM: 6776/1: mach-ux500: activate fix for errata 753970
ARM: 6794/1: SPEAr: Append UL to device address macros.
ARM: 6793/1: SPEAr: Remove unused *_SIZE macros from spear*.h files
ARM: 6792/1: SPEAr: Replace SIZE macro's with SZ_4K macros
ARM: 6791/1: SPEAr3xx: Declare device structures after shirq code
ARM: 6790/1: SPEAr: Clock Framework: Rename usbd clock and align apb_clk entry
ARM: 6789/1: SPEAr3xx: Rename sdio to sdhci
ARM: 6788/1: SPEAr: Include mach/hardware.h instead of mach/spear.h
ARM: 6787/1: SPEAr: Reorder #includes in .h & .c files.
ARM: 6681/1: SPEAr: add debugfs support to clk API
ARM: 6703/1: SPEAr: update clk API support
ARM: 6679/1: SPEAr: make clk API functions more generic
ARM: 6737/1: SPEAr: formalized timer support
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] tioca: Fix assignment from incompatible pointer warnings
[IA64] mca.c: Fix cast from integer to pointer warning
[IA64] setup.c Typo fix "Architechtuallly"
[IA64] Add CONFIG_MISC_DEVICES=y to configs that need it.
[IA64] disable interrupts at end of ia64_mca_cpe_int_handler()
[IA64] Add DMA_ERROR_CODE define.
pstore: fix build warning for unused return value from sysfs_create_file
pstore: X86 platform interface using ACPI/APEI/ERST
pstore: new filesystem interface to platform persistent storage
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
bonding: enable netpoll without checking link status
xfrm: Refcount destination entry on xfrm_lookup
net: introduce rx_handler results and logic around that
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
bonding: wrap slave state work
net: get rid of multiple bond-related netdevice->priv_flags
bonding: register slave pointer for rx_handler
be2net: Bump up the version number
be2net: Copyright notice change. Update to Emulex instead of ServerEngines
e1000e: fix kconfig for crc32 dependency
netfilter ebtables: fix xt_AUDIT to work with ebtables
xen network backend driver
bonding: Improve syslog message at device creation time
bonding: Call netif_carrier_off after register_netdevice
bonding: Incorrect TX queue offset
net_sched: fix ip_tos2prio
xfrm: fix __xfrm_route_forward()
be2net: Fix UDP packet detected status in RX compl
Phonet: fix aligned-mode pipe socket buffer header reserve
netxen: support for GbE port settings
...
Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (76 commits)
pch_uart: reference clock on CM-iTC
pch_phub: add new device ML7213
n_gsm: fix UIH control byte : P bit should be 0
n_gsm: add a documentation
serial: msm_serial_hs: Add MSM high speed UART driver
tty_audit: fix tty_audit_add_data live lock on audit disabled
tty: move cd1865.h to drivers/staging/tty/
Staging: tty: fix build with epca.c driver
pcmcia: synclink_cs: fix prototype for mgslpc_ioctl()
Staging: generic_serial: fix double locking bug
nozomi: don't use flush_scheduled_work()
tty/serial: Relax the device_type restriction from of_serial
MAINTAINERS: Update HVC file patterns
tty: phase out of ioctl file pointer for tty3270 as well
tty: forgot to remove ipwireless from drivers/char/pcmcia/Makefile
pch_uart: Fix DMA channel miss-setting issue.
pch_uart: fix exclusive access issue
pch_uart: fix auto flow control miss-setting issue
pch_uart: fix uart clock setting issue
pch_uart : Use dev_xxx not pr_xxx
...
Fix up trivial conflicts in drivers/misc/pch_phub.c (same patch applied
twice, then changes to the same area in one branch)
* 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (172 commits)
USB: Add support for SuperSpeed isoc endpoints
xhci: Clean up cycle bit math used during stalls.
xhci: Fix cycle bit calculation during stall handling.
xhci: Update internal dequeue pointers after stalls.
USB: Disable auto-suspend for USB 3.0 hubs.
USB: Remove bogus USB_PORT_STAT_SUPER_SPEED symbol.
xhci: Return canceled URBs immediately when host is halted.
xhci: Fixes for suspend/resume of shared HCDs.
xhci: Fix re-init on power loss after resume.
xhci: Make roothub functions deal with device removal.
xhci: Limit roothub ports to 15 USB3 & 31 USB2 ports.
xhci: Return a USB 3.0 hub descriptor for USB3 roothub.
xhci: Register second xHCI roothub.
xhci: Change xhci_find_slot_id_by_port() API.
xhci: Refactor bus suspend state into a struct.
xhci: Index with a port array instead of PORTSC addresses.
USB: Set usb_hcd->state and flags for shared roothubs.
usb: Make core allocate resources per PCI-device.
usb: Store bus type in usb_hcd, not in driver flags.
usb: Change usb_hcd->bandwidth_mutex to a pointer.
...