WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Avi Kivity	0b3c933302	KVM: MMU: Simplify spte fetch() function Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:45 +03:00
Avi Kivity	39c8c672a1	KVM: MMU: Add gpte_valid() helper Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:44 +03:00
Avi Kivity	a357bd229c	KVM: MMU: Add validate_direct_spte() helper Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:43 +03:00
Avi Kivity	a3aa51cfaa	KVM: MMU: Add drop_large_spte() helper To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:42 +03:00
Avi Kivity	121eee97a7	KVM: MMU: Use __set_spte to link shadow pages To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link shadow pages together. (not technically required since shadow pages are __GFP_KERNEL, so upper 32 bits are always clear) Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:41 +03:00
Avi Kivity	32ef26a359	KVM: MMU: Add link_shadow_page() helper To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:40 +03:00
Avi Kivity	908e75f3e7	KVM: Expose MCE control MSRs to userspace Userspace needs to reset and save/restore these MSRs. The MCE banks are not exposed since their number varies from vcpu to vcpu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:36 +03:00
Xiao Guangrong	aea924f606	KVM: PIT: stop vpit before freeing irq_routing Fix: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ...... Call Trace: [<ffffffffa0159bd1>] ? kvm_set_irq+0xdd/0x24b [kvm] [<ffffffff8106ea8b>] ? trace_hardirqs_off_caller+0x1f/0x10e [<ffffffff813ad17f>] ? sub_preempt_count+0xe/0xb6 [<ffffffff8106d273>] ? put_lock_stats+0xe/0x27 ... RIP [<ffffffffa0159c72>] kvm_set_irq+0x17e/0x24b [kvm] This bug is triggered when guest is shutdown, is because we freed irq_routing before pit thread stopped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:35 +03:00
Gleb Natapov	a6f177efaa	KVM: Reenter guest after emulation failure if due to access to non-mmio address When shadow pages are in use sometimes KVM try to emulate an instruction when it accesses a shadowed page. If emulation fails KVM un-shadows the page and reenter guest to allow vcpu to execute the instruction. If page is not in shadow page hash KVM assumes that this was attempt to do MMIO and reports emulation failure to userspace since there is no way to fix the situation. This logic has a race though. If two vcpus tries to write to the same shadowed page simultaneously both will enter emulator, but only one of them will find the page in shadow page hash since the one who founds it also removes it from there, so another cpu will report failure to userspace and will abort the guest. Fix this by checking (in addition to checking shadowed page hash) that page that caused the emulation belongs to valid memory slot. If it is then reenter the guest to allow vcpu to reexecute the instruction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:34 +03:00
Gleb Natapov	edba23e515	KVM: Return EFAULT from kvm ioctl when guest accesses bad area Currently if guest access address that belongs to memory slot but is not backed up by page or page is read only KVM treats it like MMIO access. Remove that capability. It was never part of the interface and should not be relied upon. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:33 +03:00
Jiri Slaby	673813e81d	KVM: fix lock imbalance in kvm_create_pit() Stanse found that there is an omitted unlock in kvm_create_pit in one fail path. Add proper unlock there. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Gleb Natapov <gleb@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gregory Haskins <ghaskins@novell.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:31 +03:00
Avi Kivity	f59c1d2ded	KVM: MMU: Keep going on permission error Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:30 +03:00
Avi Kivity	b0eeec29fe	KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabled Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:29 +03:00
Wei Yongjun	5d55f299f9	KVM: x86 emulator: re-implementing 'mov AL,moffs' instruction decoding This patch change to use DstAcc for decoding 'mov AL, moffs' and introduced SrcAcc for decoding 'mov moffs, AL'. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:27 +03:00
Wei Yongjun	07cbc6c185	KVM: x86 emulator: fix cli/sti instruction emulation If IOPL check fail, the cli/sti emulate GP and then we should skip writeback since the default write OP is OP_REG. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:26 +03:00
Wei Yongjun	b16b2b7bb5	KVM: x86 emulator: fix 'mov rm,sreg' instruction decoding The source operand of 'mov rm,sreg' is segment register, not general-purpose register, so remove SrcReg from decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:25 +03:00
Wei Yongjun	e97e883f8b	KVM: x86 emulator: fix 'and AL,imm8' instruction decoding 'and AL,imm8' should be mask as ByteOp, otherwise the dest operand length will no correct and we may fill the full EAX when writeback. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:24 +03:00
Wei Yongjun	ce7a0ad3bd	KVM: x86 emulator: fix the comment of out instruction Fix the comment of out instruction, using the same style as the other instructions. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:23 +03:00
Wei Yongjun	a5046e6c7d	KVM: x86 emulator: fix 'mov sreg,rm16' instruction decoding Memory reads for 'mov sreg,rm16' should be 16 bits only. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:22 +03:00
Avi Kivity	b79b93f92c	KVM: MMU: Don't drop accessed bit while updating an spte __set_spte() will happily replace an spte with the accessed bit set with one that has the accessed bit clear. Add a helper update_spte() which checks for this condition and updates the page flag if needed. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:21 +03:00
Avi Kivity	a9221dd5ec	KVM: MMU: Atomically check for accessed bit when dropping an spte Currently, in the window between the check for the accessed bit, and actually dropping the spte, a vcpu can access the page through the spte and set the bit, which will be ignored by the mmu. Fix by using an exchange operation to atmoically fetch the spte and drop it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:20 +03:00
Avi Kivity	ce061867aa	KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() Since we need to make the check atomic, move it to the place that will set the new spte. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:18 +03:00
Avi Kivity	be38d276b0	KVM: MMU: Introduce drop_spte() When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:17 +03:00
Xiao Guangrong	dd180b3e90	KVM: VMX: fix tlb flush with invalid root Commit 341d9b535b6c simplify reload logic while entry guest mode, it can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_SYNC both set. But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the root is invalid, it is triggered during my test: Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable] ...... Fixed by directly return if the root is not ready. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:16 +03:00
Joerg Roedel	828554136b	KVM: Remove unnecessary divide operations This patch converts unnecessary divide and modulo operations in the KVM large page related code into logical operations. This allows to convert gfn_t to u64 while not breaking 32 bit builds. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:30 +03:00
Xiao Guangrong	84754cd8fc	KVM: MMU: cleanup FNAME(fetch)() functions Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:26 +03:00
Xiao Guangrong	9e7b0e7fba	KVM: MMU: fix direct sp's access corrupted If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:25 +03:00
Xiao Guangrong	5fd5387c89	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:23 +03:00
Xiao Guangrong	36a2e6774b	KVM: MMU: fix writable sync sp mapping While we sync many unsync sp at one time(in mmu_sync_children()), we may mapping the spte writable, it's dangerous, if one unsync sp's mapping gfn is another unsync page's gfn. For example: SP1.pte[0] = P SP2.gfn's pfn = P [SP1.pte[0] = SP2.gfn's pfn] First, we write protected SP1 and SP2, but SP1 and SP2 are still the unsync sp. Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp, that is SP2, so it will mapping it writable, but we plan to sync SP2 soon, at this point, the SP2->unsync is not reliable since later we sync SP2 but SP2->gfn is already writable. So the final result is: SP2 is the sync page but SP2.gfn is writable. This bug will corrupt guest's page table, fixed by mark read-only mapping if the mapped gfn has shadow pages. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:22 +03:00
Sheng Yang	f5f48ee15c	KVM: VMX: Execute WBINVD to keep data consistency with assigned devices Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:21 +03:00
Avi Kivity	3e00750947	KVM: Simplify vcpu_enter_guest() mmu reload logic slightly No need to reload the mmu in between two different vcpu->requests checks. kvm_mmu_reload() may trigger KVM_REQ_TRIPLE_FAULT, but that will be caught during atomic guest entry later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:19 +03:00
Chris Lalancette	529df65e39	KVM: Search the LAPIC's for one that will accept a PIC interrupt Older versions of 32-bit linux have a "Checking 'hlt' instruction" test where they repeatedly call the 'hlt' instruction, and then expect a timer interrupt to kick the CPU out of halt. This happens before any LAPIC or IOAPIC setup happens, which means that all of the APIC's are in virtual wire mode at this point. Unfortunately, the current implementation of virtual wire mode is hardcoded to only kick the BSP, so if a crash+kexec occurs on a different vcpu, it will never get kicked. This patch makes pic_unlock() do the equivalent of kvm_irq_delivery_to_apic() for the IOAPIC code. That is, it runs through all of the vcpus looking for one that is in virtual wire mode. In the normal case where LAPICs and IOAPICs are configured, this won't be used at all. In the bootstrap phase of a modern OS, before the LAPICs and IOAPICs are configured, this will have exactly the same behavior as today; VCPU0 is always looked at first, so it will always get out of the loop after the first iteration. This will only go through the loop more than once during a kexec/kdump, in which case it will only do it a few times until the kexec'ed kernel programs the LAPIC and IOAPIC. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:17 +03:00
Sheng Yang	6c3f604117	KVM: x86: Enable AVX for guest Enable Intel(R) Advanced Vector Extension(AVX) for guest. The detection of AVX feature includes OSXSAVE bit testing. When OSXSAVE bit is not set, even if AVX is supported, the AVX instruction would result in UD as well. So we're safe to expose AVX bits to guest directly. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:10 +03:00
Avi Kivity	7ac77099ce	KVM: Prevent internal slots from being COWed If a process with a memory slot is COWed, the page will change its address (despite having an elevated reference count). This breaks internal memory slots which have their physical addresses loaded into vmcs registers (see the APIC access memory slot). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:08 +03:00
Avi Kivity	a8eeb04a44	KVM: Add mini-API for vcpu->requests Makes it a little more readable and hackable. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:05 +03:00
Avi Kivity	36633f32ba	KVM: i8259: simplify pic_irq_request() calling sequence Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:04 +03:00
Avi Kivity	073d46133a	KVM: i8259: reduce excessive abstraction for pic_irq_request() Part of the i8259 code pretends it isn't part of kvm, but we know better. Reduce excessive abstraction, eliminating callbacks and void pointers. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:03 +03:00
Avi Kivity	b74a07beed	KVM: Remove kernel-allocated memory regions Equivalent (and better) functionality is provided by user-allocated memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:01 +03:00
Avi Kivity	a1f4d39500	KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:00 +03:00
Avi Kivity	d1ac91d8a2	KVM: Consolidate load/save temporary buffer allocation and freeing Instead of three temporary variables and three free calls, have one temporary variable (with four names) and one free call. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:57 +03:00
Avi Kivity	a1a005f36e	KVM: Fix xsave and xcr save/restore memory leak We allocate temporary kernel buffers for these structures, but never free them. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:56 +03:00
Wei Yongjun	7d5993d63f	KVM: x86 emulator: fix group3 instruction decoding Group 3 instruction with ModRM reg field as 001 is defined as test instruction under AMD arch, and emulate_grp3() is ready for emulate it, so fix the decoding. static inline int emulate_grp3(...) { ... switch (c->modrm_reg) { case 0 ... 1: /* test */ emulate_2op_SrcV("test", c->src, c->dst, ctxt->eflags); ... } Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:55 +03:00
Chris Lalancette	e7dca5c0eb	KVM: x86: Allow any LAPIC to accept PIC interrupts If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:50 +03:00
Chris Lalancette	33572ac0ad	KVM: x86: Introduce a workqueue to deliver PIT timer interrupts We really want to "kvm_set_irq" during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:49 +03:00
Wei Yongjun	c37eda1384	KVM: x86 emulator: fix pusha instruction emulation emulate pusha instruction only writeback the last EDI register, but the other registers which need to be writeback is ignored. This patch fixed it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:48 +03:00
Zachary Amsden	bd371396b3	KVM: x86: fix -DDEBUG oops Fix a slight error with assertion in local APIC code. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:46 +03:00
Xiao Guangrong	1047df1fb6	KVM: MMU: don't walk every parent pages while mark unsync While we mark the parent's unsync_child_bitmap, if the parent is already unsynced, it no need walk it's parent, it can reduce some unnecessary workload Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:45 +03:00
Xiao Guangrong	7a8f1a74e4	KVM: MMU: clear unsync_child_bitmap completely In current code, some page's unsync_child_bitmap is not cleared completely in mmu_sync_children(), for example, if two PDPEs shard one PDT, one of PDPE's unsync_child_bitmap is not cleared. Currently, it not harm anything just little overload, but it's the prepare work for the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:44 +03:00
Xiao Guangrong	ebdea638df	KVM: MMU: cleanup for __mmu_unsync_walk() Decrease sp->unsync_children after clear unsync_child_bitmap bit Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:43 +03:00
Xiao Guangrong	be71e061d1	KVM: MMU: don't mark pte notrap if it's just sync transient If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:42 +03:00

1 2 3 4 5 ...

1293 Коммитов