WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
David Hildenbrand	5c6705aa47	drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() [ Upstream commit 3d6586008f7b638f91f3332602592caa8b00b559 ] Patch series "mm: follow_pte() improvements and acrn follow_pte() fixes". Patch #1 fixes a bunch of issues I spotted in the acrn driver. It compiles, that's all I know. I'll appreciate some review and testing from acrn folks. Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding more sanity checks, and improving the documentation. Gave it a quick test on x86-64 using VM_PAT that ends up using follow_pte(). This patch (of 3): We currently miss handling various cases, resulting in a dangerous follow_pte() (previously follow_pfn()) usage. (1) We're not checking PTE write permissions. Maybe we should simply always require pte_write() like we do for pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for ACRN_MEM_ACCESS_WRITE for now. (2) We're not rejecting refcounted pages. As we are not using MMU notifiers, messing with refcounted pages is dangerous and can result in use-after-free. Let's make sure to reject them. (3) We are only looking at the first PTE of a bigger range. We only lookup a single PTE, but memmap->len may span a larger area. Let's loop over all involved PTEs and make sure the PFN range is actually contiguous. Reject everything else: it couldn't have worked either way, and rather made use access PFNs we shouldn't be accessing. Link: https://lkml.kernel.org/r/20240410155527.474777-1-david@redhat.com Link: https://lkml.kernel.org/r/20240410155527.474777-2-david@redhat.com Fixes: `8a6e85f75a` ("virt: acrn: obtain pa from VMA with PFNMAP flag") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Fei Li <fei1.li@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Yonghua Huang <yonghua.huang@intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:32 +02:00
Christoph Hellwig	5283ac5e07	virt: acrn: stop using follow_pfn [ Upstream commit 1b265da7ea1e1ae997fa119c2846bb389eb39c6b ] Patch series "remove follow_pfn". This series open codes follow_pfn in the only remaining caller, although the code there remains questionable. It then also moves follow_phys into the only user and simplifies it a bit. This patch (of 3): Switch from follow_pfn to follow_pte so that we can get rid of follow_pfn. Note that this doesn't fix any of the pre-existing raciness and lack of permission checking in the code. Link: https://lkml.kernel.org/r/20240324234542.2038726-1-hch@lst.de Link: https://lkml.kernel.org/r/20240324234542.2038726-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fei Li <fei1.li@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 3d6586008f7b ("drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:32 +02:00
Len Baker	af453dfb45	virt: acrn: Prefer array_size and struct_size over open coded arithmetic [ Upstream commit `746f1b0ac5` ] As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the array_size() helper to do the arithmetic instead of the argument "count * size" in the vzalloc() function. Also, take the opportunity to add a flexible array member of struct vm_memory_region_op to the vm_memory_region_batch structure. And then, change the code accordingly and use the struct_size() helper to do the arithmetic instead of the argument "size + size * count" in the kzalloc function. This code was detected with the help of Coccinelle and audited and fixed manually. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Acked-by: Fei Li <fei1.li@intel.com> Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Stable-dep-of: 3d6586008f7b ("drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:31 +02:00
Pascal Terjan	348274a6bb	vboxguest: Do not use devm for irq [ Upstream commit `6169525b76` ] When relying on devm it doesn't get freed early enough which causes the following warning when unloading the module: [249348.837181] remove_proc_entry: removing non-empty directory 'irq/20', leaking at least 'vboxguest' [249348.837219] WARNING: CPU: 0 PID: 6708 at fs/proc/generic.c:715 remove_proc_entry+0x119/0x140 [249348.837379] Call Trace: [249348.837385] unregister_irq_proc+0xbd/0xe0 [249348.837392] free_desc+0x23/0x60 [249348.837396] irq_free_descs+0x4a/0x70 [249348.837401] irq_domain_free_irqs+0x160/0x1a0 [249348.837452] mp_unmap_irq+0x5c/0x60 [249348.837458] acpi_unregister_gsi_ioapic+0x29/0x40 [249348.837463] acpi_unregister_gsi+0x17/0x30 [249348.837467] acpi_pci_irq_disable+0xbf/0xe0 [249348.837473] pcibios_disable_device+0x20/0x30 [249348.837478] pci_disable_device+0xef/0x120 [249348.837482] vbg_pci_remove+0x6c/0x70 [vboxguest] Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Pascal Terjan <pterjan@google.com> Link: https://lore.kernel.org/r/20220612133744.4030602-1-pterjan@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-25 11:40:33 +02:00
Xiaolong Huang	1d5103d9bb	virt: acrn: fix a memory leak in acrn_dev_ioctl() [ Upstream commit `ecd1735f14` ] The vm_param and cpu_regs need to be freed via kfree() before return -EINVAL error. Fixes: `9c5137aedd` ("virt: acrn: Introduce VM management interfaces") Fixes: `2ad2aaee1b` ("virt: acrn: Introduce an ioctl to set vCPU registers state") Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Signed-off-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20220308092047.1008409-1-butterflyhuangxx@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:50 +02:00
Yonghua Huang	b9c43aa0b1	virt: acrn: obtain pa from VMA with PFNMAP flag [ Upstream commit `8a6e85f75a` ] acrn_vm_ram_map can't pin the user pages with VM_PFNMAP flag by calling get_user_pages_fast(), the PA(physical pages) may be mapped by kernel driver and set PFNMAP flag. This patch fixes logic to setup EPT mapping for PFN mapped RAM region by checking the memory attribute before adding EPT mapping for them. Fixes: `88f537d5e8` ("virt: acrn: Introduce EPT mapping management") Signed-off-by: Yonghua Huang <yonghua.huang@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20220228022212.419406-1-yonghua.huang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:50 +02:00
Andra Paraschiv	90d2beed5e	nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert commit `3a0152b219` upstream. After commit `5b78ed24e8` ("mm/pagemap: add mmap_assert_locked() annotations to find_vma()"), the call to get_user_pages() will trigger the mmap assert. static inline void mmap_assert_locked(struct mm_struct mm) { lockdep_assert_held(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } [ 62.521410] kernel BUG at include/linux/mmap_lock.h:156! ........................................................... [ 62.538938] RIP: 0010:find_vma+0x32/0x80 ........................................................... [ 62.605889] Call Trace: [ 62.608502] <TASK> [ 62.610956] ? lock_timer_base+0x61/0x80 [ 62.614106] find_extend_vma+0x19/0x80 [ 62.617195] __get_user_pages+0x9b/0x6a0 [ 62.620356] __gup_longterm_locked+0x42d/0x450 [ 62.623721] ? finish_wait+0x41/0x80 [ 62.626748] ? __kmalloc+0x178/0x2f0 [ 62.629768] ne_set_user_memory_region_ioctl.isra.0+0x225/0x6a0 [nitro_enclaves] [ 62.635776] ne_enclave_ioctl+0x1cf/0x6d7 [nitro_enclaves] [ 62.639541] __x64_sys_ioctl+0x82/0xb0 [ 62.642620] do_syscall_64+0x3b/0x90 [ 62.645642] entry_SYSCALL_64_after_hwframe+0x44/0xae Use get_user_pages_unlocked() when setting the enclave memory regions. That's a similar pattern as mmap_read_lock() used together with get_user_pages(). Fixes: `5b78ed24e8` ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()") Cc: stable@vger.kernel.org Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20211220195856.6549-1-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-05 12:42:39 +01:00
Shuo Liu	4c4c1257b8	virt: acrn: Do hcall_destroy_vm() before resource release The ACRN hypervisor has scenarios which could run a real-time guest VM. The real-time guest VM occupies dedicated CPU cores, be assigned with dedicated PCI devices. It can run without the Service VM after boot up. hcall_destroy_vm() returns failure when a real-time guest VM refuses. The clearing of flag ACRN_VM_FLAG_DESTROYED causes some kernel resource double-freed in a later acrn_vm_destroy(). Do hcall_destroy_vm() before resource release to drop this chance to destroy the VM if hypercall fails. Fixes: `9c5137aedd` ("virt: acrn: Introduce VM management interfaces") Cc: stable <stable@vger.kernel.org> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20210722062736.15050-1-fei1.li@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-27 16:48:45 +02:00
Longpeng(Mike)	d874742f6a	nitro_enclaves: Set Bus Master for the NE PCI device Enable Bus Master for the NE PCI device, according to the PCI spec for submitting memory or I/O requests: Master Enable – Controls the ability of a PCI Express Endpoint to issue Memory and I/O Read/Write Requests, and the ability of a Root or Switch Port to forward Memory and I/O Read/Write Requests in the Upstream direction Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Alexandru Vasile <lexnv@amazon.com> Cc: Alexandru Ciobotaru <alcioa@amazon.com> Reviewed-by: Andra Paraschiv <andraprs@amazon.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20210621004046.1419-1-longpeng2@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-24 15:48:27 +02:00
Mathias Krause	f1ce3986ba	nitro_enclaves: Fix stale file descriptors on failed usercopy A failing usercopy of the slot uid will lead to a stale entry in the file descriptor table as put_unused_fd() won't release it. This enables userland to refer to a dangling 'file' object through that still valid file descriptor, leading to all kinds of use-after-free exploitation scenarios. Exchanging put_unused_fd() for close_fd(), ksys_close() or alike won't solve the underlying issue, as the file descriptor might have been replaced in the meantime, e.g. via userland calling close() on it (leading to a NULL pointer dereference in the error handling code as 'fget(enclave_fd)' will return a NULL pointer) or by dup2()'ing a completely different file object to that very file descriptor, leading to the same situation: a dangling file descriptor pointing to a freed object -- just in this case to a file object of user's choosing. Generally speaking, after the call to fd_install() the file descriptor is live and userland is free to do whatever with it. We cannot rely on it to still refer to our enclave object afterwards. In fact, by abusing userfaultfd() userland can hit the condition without any racing and abuse the error handling in the nitro code as it pleases. To fix the above issues, defer the call to fd_install() until all possible errors are handled. In this case it's just the usercopy, so do it directly in ne_create_vm_ioctl() itself. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210429165941.27020-2-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-29 19:06:49 +02:00
Shuo Liu	1c15b334ca	virt: acrn: Fix document of acrn_msi_inject() This fixes below build warning with extra build checks. $ make W=1 ../drivers/virt/acrn/vm.c:105: warning: expecting prototype for acrn_inject_msi(). Prototype was for acrn_msi_inject() instead Fixes: `c7cf8d2724` ("virt: acrn: Introduce interrupt injection interfaces") Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210311015206.19715-1-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-24 08:26:30 +01:00
Shuo Liu	1201d68f47	virt: acrn: Correct type casting of argument of copy_from_user() hsm.c:336:50: warning: incorrect type in argument 2 (different address spaces) hsm.c:336:50: expected void const [noderef] __user from hsm.c:336:50: got void This patch fixes above sparse warning. Fixes: `3d679d5aec` ("virt: acrn: Introduce interfaces to query C-states and P-states allowed by hypervisor") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210310153708.17451-1-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-10 16:59:50 +01:00
Yejune Deng	a758b7c4c6	virt: acrn: Use EPOLLIN instead of POLLIN This fixes the following sparse warning: "sparse warnings: (new ones prefixed by >>)" >> drivers/virt/acrn/irqfd.c:163:13: sparse: sparse: restricted __poll_t degrades to integer Fixes: `dcf9625f2a` ("virt: acrn: Use vfs_poll() instead of f_op->poll()") Reported-by: kernel test robot <lkp@intel.com> Acked-by: Shuo Liu <shuo.a.liu@intel.com> Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Link: https://lore.kernel.org/r/20210310074901.7486-1-yejune.deng@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-10 09:50:23 +01:00
Yejune Deng	dcf9625f2a	virt: acrn: Use vfs_poll() instead of f_op->poll() Use a more advanced function vfs_poll() in acrn_irqfd_assign(). At the same time, modify the definition of events. Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210221133306.33530-1-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-10 09:16:09 +01:00
Shuo Liu	e54b788869	virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU Without cpu hotplug support, vCPU cannot be removed from a Service VM. Don't expose remove_cpu sysfs when CONFIG_HOTPLUG_CPU disabled. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Qais Yousef <qais.yousef@arm.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210221134339.57851-2-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-10 09:16:09 +01:00
Shuo Liu	279dcf693a	virt: acrn: Introduce an interface for Service VM to control vCPU ACRN supports partition mode to achieve real-time requirements. In partition mode, a CPU core can be dedicated to a vCPU of User VM. The local APIC of the dedicated CPU core can be passthrough to the User VM. The Service VM controls the assignment of the CPU cores. Introduce an interface for the Service VM to remove the control of CPU core from hypervisor perspective so that the CPU core can be a dedicated CPU core of User VM. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-18-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	aa3b483ff1	virt: acrn: Introduce irqfd irqfd is a mechanism to inject a specific interrupt to a User VM using a decoupled eventfd mechanism. Vhost is a kernel-level virtio server which uses eventfd for interrupt injection. To support vhost on ACRN, irqfd is introduced in HSM. HSM provides ioctls to associate a virtual Message Signaled Interrupt (MSI) with an eventfd. The corresponding virtual MSI will be injected into a User VM once the eventfd got signal. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-17-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	d8ad515156	virt: acrn: Introduce ioeventfd ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a User VM. ACRN userspace can register any arbitrary I/O address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Vhost is a kernel-level virtio server which uses eventfd for signalling. To support vhost on ACRN, ioeventfd is introduced in HSM. A new I/O client dedicated to ioeventfd is associated with a User VM during VM creation. HSM provides ioctls to associate an I/O region with a eventfd. The I/O client signals a eventfd once its corresponding I/O region is matched with an I/O request. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-16-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	5a0c9f176f	virt: acrn: Introduce I/O ranges operation interfaces An I/O request of a User VM, which is constructed by hypervisor, is distributed by the ACRN Hypervisor Service Module to an I/O client corresponding to the address range of the I/O request. I/O client maintains a list of address ranges. Introduce acrn_ioreq_range_{add,del}() to manage these address ranges. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-15-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	3d679d5aec	virt: acrn: Introduce interfaces to query C-states and P-states allowed by hypervisor The C-states and P-states data are used to support CPU power management. The hypervisor controls C-states and P-states for a User VM. ACRN userspace need to query the data from the hypervisor to build ACPI tables for a User VM. HSM provides ioctls for ACRN userspace to query C-states and P-states data obtained from the hypervisor. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-14-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	c7cf8d2724	virt: acrn: Introduce interrupt injection interfaces ACRN userspace need to inject virtual interrupts into a User VM in devices emulation. HSM needs provide interfaces to do so. Introduce following interrupt injection interfaces: ioctl ACRN_IOCTL_SET_IRQLINE: Pass data from userspace to the hypervisor, and inform the hypervisor to inject a virtual IOAPIC GSI interrupt to a User VM. ioctl ACRN_IOCTL_INJECT_MSI: Pass data struct acrn_msi_entry from userspace to the hypervisor, and inform the hypervisor to inject a virtual MSI to a User VM. ioctl ACRN_IOCTL_VM_INTR_MONITOR: Set a 4-Kbyte aligned shared page for statistics information of interrupts of a User VM. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-13-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	ce011e1363	virt: acrn: Introduce interfaces for PCI device passthrough PCI device passthrough enables an OS in a virtual machine to directly access a PCI device in the host. It promises almost the native performance, which is required in performance-critical scenarios of ACRN. HSM provides the following ioctls: - Assign - ACRN_IOCTL_ASSIGN_PCIDEV Pass data struct acrn_pcidev from userspace to the hypervisor, and inform the hypervisor to assign a PCI device to a User VM. - De-assign - ACRN_IOCTL_DEASSIGN_PCIDEV Pass data struct acrn_pcidev from userspace to the hypervisor, and inform the hypervisor to de-assign a PCI device from a User VM. - Set a interrupt of a passthrough device - ACRN_IOCTL_SET_PTDEV_INTR Pass data struct acrn_ptdev_irq from userspace to the hypervisor, and inform the hypervisor to map a INTx interrupt of passthrough device of User VM. - Reset passthrough device interrupt - ACRN_IOCTL_RESET_PTDEV_INTR Pass data struct acrn_ptdev_irq from userspace to the hypervisor, and inform the hypervisor to unmap a INTx interrupt of passthrough device of User VM. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-12-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	3c4c331667	virt: acrn: Introduce PCI configuration space PIO accesses combiner A User VM can access its virtual PCI configuration spaces via port IO approach, which has two following steps: 1) writes address into port 0xCF8 2) put/get data in/from port 0xCFC To distribute a complete PCI configuration space access one time, HSM need to combine such two accesses together. Combine two paired PIO I/O requests into one PCI I/O request and continue the I/O request distribution. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-11-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	72f293de3f	virt: acrn: Introduce I/O request management An I/O request of a User VM, which is constructed by the hypervisor, is distributed by the ACRN Hypervisor Service Module to an I/O client corresponding to the address range of the I/O request. For each User VM, there is a shared 4-KByte memory region used for I/O requests communication between the hypervisor and Service VM. An I/O request is a 256-byte structure buffer, which is 'struct acrn_io_request', that is filled by an I/O handler of the hypervisor when a trapped I/O access happens in a User VM. ACRN userspace in the Service VM first allocates a 4-KByte page and passes the GPA (Guest Physical Address) of the buffer to the hypervisor. The buffer is used as an array of 16 I/O request slots with each I/O request slot being 256 bytes. This array is indexed by vCPU ID. An I/O client, which is 'struct acrn_ioreq_client', is responsible for handling User VM I/O requests whose accessed GPA falls in a certain range. Multiple I/O clients can be associated with each User VM. There is a special client associated with each User VM, called the default client, that handles all I/O requests that do not fit into the range of any other I/O clients. The ACRN userspace acts as the default client for each User VM. The state transitions of a ACRN I/O request are as follows. FREE -> PENDING -> PROCESSING -> COMPLETE -> FREE -> ... FREE: this I/O request slot is empty PENDING: a valid I/O request is pending in this slot PROCESSING: the I/O request is being processed COMPLETE: the I/O request has been processed An I/O request in COMPLETE or FREE state is owned by the hypervisor. HSM and ACRN userspace are in charge of processing the others. The processing flow of I/O requests are listed as following: a) The I/O handler of the hypervisor will fill an I/O request with PENDING state when a trapped I/O access happens in a User VM. b) The hypervisor makes an upcall, which is a notification interrupt, to the Service VM. c) The upcall handler schedules a worker to dispatch I/O requests. d) The worker looks for the PENDING I/O requests, assigns them to different registered clients based on the address of the I/O accesses, updates their state to PROCESSING, and notifies the corresponding client to handle. e) The notified client handles the assigned I/O requests. f) The HSM updates I/O requests states to COMPLETE and notifies the hypervisor of the completion via hypercalls. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-10-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	88f537d5e8	virt: acrn: Introduce EPT mapping management The HSM provides hypervisor services to the ACRN userspace. While launching a User VM, ACRN userspace needs to allocate memory and request the ACRN Hypervisor to set up the EPT mapping for the VM. A mapping cache is introduced for accelerating the translation between the Service VM kernel virtual address and User VM physical address. >From the perspective of the hypervisor, the types of GPA of User VM can be listed as following: 1) RAM region, which is used by User VM as system ram. 2) MMIO region, which is recognized by User VM as MMIO. MMIO region is used to be utilized for devices emulation. Generally, User VM RAM regions mapping is set up before VM started and is released in the User VM destruction. MMIO regions mapping may be set and unset dynamically during User VM running. To achieve this, ioctls ACRN_IOCTL_SET_MEMSEG and ACRN_IOCTL_UNSET_MEMSEG are introduced in HSM. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-9-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	2ad2aaee1b	virt: acrn: Introduce an ioctl to set vCPU registers state A virtual CPU of User VM has different context due to the different registers state. ACRN userspace needs to set the virtual CPU registers state (e.g. giving a initial registers state to a virtual BSP of a User VM). HSM provides an ioctl ACRN_IOCTL_SET_VCPU_REGS to do the virtual CPU registers state setting. The ioctl passes the registers state from ACRN userspace to the hypervisor directly. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-8-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:19 +01:00
Shuo Liu	9c5137aedd	virt: acrn: Introduce VM management interfaces The VM management interfaces expose several VM operations to ACRN userspace via ioctls. For example, creating VM, starting VM, destroying VM and so on. The ACRN Hypervisor needs to exchange data with the ACRN userspace during the VM operations. HSM provides VM operation ioctls to the ACRN userspace and communicates with the ACRN Hypervisor for VM operations via hypercalls. HSM maintains a list of User VM. Each User VM will be bound to an existing file descriptor of /dev/acrn_hsm. The User VM will be destroyed when the file descriptor is closed. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-7-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:18 +01:00
Shuo Liu	666834c47d	virt: acrn: Introduce ACRN HSM basic driver ACRN Hypervisor Service Module (HSM) is a kernel module in Service VM which communicates with ACRN userspace through ioctls and talks to ACRN Hypervisor through hypercalls. Add a basic HSM driver which allows Service VM userspace to communicate with ACRN. The following patches will add more ioctls, guest VM memory mapping caching, I/O request processing, ioeventfd and irqfd into this module. HSM exports a char device interface (/dev/acrn_hsm) to userspace. Cc: Dave Hansen <dave.hansen@intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Yu Wang <yu1.wang@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Link: https://lore.kernel.org/r/20210207031040.49576-6-shuo.a.liu@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-09 10:58:18 +01:00
Hans de Goede	c35901b39d	virt: vbox: Do not use wait_event_interruptible when called from kernel context Do not use wait_event_interruptible when vbg_hgcm_call() gets called from kernel-context, such as it being called by the vboxsf filesystem code. This fixes some filesystem related system calls on shared folders unexpectedly failing with -EINTR. Fixes: `0532a1b0d0` ("virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x") Reported-by: Ludovic Pouzenc <bugreports@pouzenc.fr> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210121150754.147598-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-27 14:37:02 +01:00
Andra Paraschiv	d9109fe0f3	nitro_enclaves: Fixup type and simplify logic of the poll mask setup Update the assigned value of the poll result to be EPOLLHUP instead of POLLHUP to match the __poll_t type. While at it, simplify the logic of setting the mask result of the poll function. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20201102173622.32169-1-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-11-09 18:20:36 +01:00
Qinglang Miao	03c95e591a	virt: vbox: simplify the return expression of vbg_input_open() Simplify the return expression. Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Link: https://lore.kernel.org/r/20200921131113.93459-1-miaoqinglang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-10-02 11:36:13 +02:00
Andra Paraschiv	0f5c7b7484	nitro_enclaves: Add Makefile for the Nitro Enclaves driver Add Makefile for the Nitro Enclaves driver, considering the option set in the kernel config. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Remove -Wall flags, could use W=1 as an option for this. v7 -> v8 * No changes. v6 -> v7 * No changes. v5 -> v6 * No changes. v4 -> v5 * No changes. v3 -> v4 * No changes. v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. v1 -> v2 * Update path to Makefile to match the drivers/virt/nitro_enclaves directory. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-16-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	07499cc661	nitro_enclaves: Add Kconfig for the Nitro Enclaves driver Add kernel config entry for Nitro Enclaves, including dependencies. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * No changes. v7 -> v8 * No changes. v6 -> v7 * Remove, for now, the dependency on ARM64 arch. x86 is currently supported, with Arm to come afterwards. The NE kernel driver can be built for aarch64 arch. v5 -> v6 * No changes. v4 -> v5 * Add arch dependency for Arm / x86. v3 -> v4 * Add PCI and SMP dependencies. v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. v1 -> v2 * Update path to Kconfig to match the drivers/virt/nitro_enclaves directory. * Update help in Kconfig. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-15-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	9c8eb50fe9	nitro_enclaves: Add logic for terminating an enclave An enclave is associated with an fd that is returned after the enclave creation logic is completed. This enclave fd is further used to setup enclave resources. Once the enclave needs to be terminated, the enclave fd is closed. Add logic for enclave termination, that is mapped to the enclave fd release callback. Free the internal enclave info used for bookkeeping. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE PCI device. v7 -> v8 * No changes. v6 -> v7 * Remove the pci_dev_put() call as the NE misc device parent field is used now to get the NE PCI device. * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. v5 -> v6 * Update documentation to kernel-doc format. * Use directly put_page() instead of unpin_user_pages(), to match the get_user_pages() calls. v4 -> v5 * Release the reference to the NE PCI device on enclave fd release. * Adapt the logic to cpumask enclave vCPU ids and CPU cores. * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. v3 -> v4 * Use dev_err instead of custom NE log pattern. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Update kzfree() calls to kfree(). v1 -> v2 * Add log pattern for NE. * Remove the BUG_ON calls. * Update goto labels to match their purpose. * Add early exit in release() if there was a slot alloc error in the fd creation path. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-14-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	111c775a5f	nitro_enclaves: Add logic for starting an enclave After all the enclave resources are set, the enclave is ready for beginning to run. Add ioctl command logic for starting an enclave after all its resources, memory regions and CPUs, have been set. The enclave start information includes the local channel addressing - vsock CID - and the flags associated with the enclave. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE PCI device. v7 -> v8 * Add check for invalid enclave CID value e.g. well-known CIDs and parent VM CID. * Add custom error code for incorrect flag in enclave start info and invalid enclave CID. v6 -> v7 * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. v5 -> v6 * Check for invalid enclave start flags. * Update documentation to kernel-doc format. v4 -> v5 * Add early exit on enclave start ioctl function call error. * Move sanity checks in the enclave start ioctl function, outside of the switch-case block. * Remove log on copy_from_user() / copy_to_user() failure. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Update the naming for the ioctl command from metadata to info. * Check for minimum enclave memory size. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. v1 -> v2 * Add log pattern for NE. * Check if enclave state is init when starting an enclave. * Remove the BUG_ON calls. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-13-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	7dc9d4309f	nitro_enclaves: Add logic for setting an enclave memory region Another resource that is being set for an enclave is memory. User space memory regions, that need to be backed by contiguous memory regions, are associated with the enclave. One solution for allocating / reserving contiguous memory regions, that is used for integration, is hugetlbfs. The user space process that is associated with the enclave passes to the driver these memory regions. The enclave memory regions need to be from the same NUMA node as the enclave CPUs. Add ioctl command logic for setting user space memory region for an enclave. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE PCI device. v7 -> v8 * Add early check, while getting user pages, to be multiple of 2 MiB for the pages that back the user space memory region. * Add custom error code for incorrect user space memory region flag. * Include in a separate function the sanity checks for each page of the user space memory region. v6 -> v7 * Update check for duplicate user space memory regions to cover additional possible scenarios. v5 -> v6 * Check for max number of pages allocated for the internal data structure for pages. * Check for invalid memory region flags. * Check for aligned physical memory regions. * Update documentation to kernel-doc format. * Check for duplicate user space memory regions. * Use directly put_page() instead of unpin_user_pages(), to match the get_user_pages() calls. v4 -> v5 * Add early exit on set memory region ioctl function call error. * Remove log on copy_from_user() failure. * Exit without unpinning the pages on NE PCI dev request failure as memory regions from the user space range may have already been added. * Add check for the memory region user space address to be 2 MiB aligned. * Update logic to not have a hardcoded check for 2 MiB memory regions. v3 -> v4 * Check enclave memory regions are from the same NUMA node as the enclave CPUs. * Use dev_err instead of custom NE log pattern. * Update the NE ioctl call to match the decoupling from the KVM API. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Update kzfree() calls to kfree(). v1 -> v2 * Add log pattern for NE. * Update goto labels to match their purpose. * Remove the BUG_ON calls. * Check if enclave max memory regions is reached when setting an enclave memory region. * Check if enclave state is init when setting an enclave memory region. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-12-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	988b7a4717	nitro_enclaves: Add logic for getting the enclave image load info Before setting the memory regions for the enclave, the enclave image needs to be placed in memory. After the memory regions are set, this memory cannot be used anymore by the VM, being carved out. Add ioctl command logic to get the offset in enclave memory where to place the enclave image. Then the user space tooling copies the enclave image in the memory using the given memory offset. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * No changes. v7 -> v8 * Add custom error code for incorrect enclave image load info flag. v6 -> v7 * No changes. v5 -> v6 * Check for invalid enclave image load flags. v4 -> v5 * Check for the enclave not being started when invoking this ioctl call. * Remove log on copy_from_user() / copy_to_user() failure. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Set enclave image load offset based on flags. * Update the naming for the ioctl command from metadata to info. v2 -> v3 * No changes. v1 -> v2 * New in v2. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-11-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	ff8a4d3e3a	nitro_enclaves: Add logic for setting an enclave vCPU An enclave, before being started, has its resources set. One of its resources is CPU. A NE CPU pool is set and enclave CPUs are chosen from it. Offline the CPUs from the NE CPU pool during the pool setup and online them back during the NE CPU pool teardown. The CPU offline is necessary so that there would not be more vCPUs than physical CPUs available to the primary / parent VM. In that case the CPUs would be overcommitted and would change the initial configuration of the primary / parent VM of having dedicated vCPUs to physical CPUs. The enclave CPUs need to be full cores and from the same NUMA node. CPU 0 and its siblings have to remain available to the primary / parent VM. Add ioctl command logic for setting an enclave vCPU. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE PCI device. v7 -> v8 * No changes. v6 -> v7 * Check for error return value when setting the kernel parameter string. * Use the NE misc device parent field to get the NE PCI device. * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. * Calculate the number of threads per core and not use smp_num_siblings that is x86 specific. v5 -> v6 * Check CPUs are from the same NUMA node before going through CPU siblings during the NE CPU pool setup. * Update documentation to kernel-doc format. v4 -> v5 * Set empty string in case of invalid NE CPU pool. * Clear NE CPU pool mask on pool setup failure. * Setup NE CPU cores out of the NE CPU pool. * Early exit on NE CPU pool setup if enclave(s) already running. * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. * Add check for maximum vCPU id possible before looking into the CPU pool. * Remove log on copy_from_user() / copy_to_user() failure and on admin capability check for setting the NE CPU pool. * Update the ioctl call to not create a file descriptor for the vCPU. * Split the CPU pool usage logic in 2 separate functions - one to get a CPU from the pool and the other to check the given CPU is available in the pool. v3 -> v4 * Setup the NE CPU pool at runtime via a sysfs file for the kernel parameter. * Check enclave CPUs to be from the same NUMA node. * Use dev_err instead of custom NE log pattern. * Update the NE ioctl call to match the decoupling from the KVM API. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Update kzfree() calls to kfree(). * Remove file ops that do nothing for now - open, ioctl and release. v1 -> v2 * Add log pattern for NE. * Update goto labels to match their purpose. * Remove the BUG_ON calls. * Check if enclave state is init when setting enclave vCPU. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-10-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	38907e1240	nitro_enclaves: Add logic for creating an enclave VM Add ioctl command logic for enclave VM creation. It triggers a slot allocation. The enclave resources will be associated with this slot and it will be used as an identifier for triggering enclave run. Return a file descriptor, namely enclave fd. This is further used by the associated user space enclave process to set enclave resources and trigger enclave termination. The poll function is implemented in order to notify the enclave process when an enclave exits without a specific enclave termination command trigger e.g. when an enclave crashes. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE PCI device. v7 -> v8 * No changes. v6 -> v7 * Use the NE misc device parent field to get the NE PCI device. * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. v5 -> v6 * Update the code base to init the ioctl function in this patch. * Update documentation to kernel-doc format. v4 -> v5 * Release the reference to the NE PCI device on create VM error. * Close enclave fd on copy_to_user() failure; rename fd to enclave fd while at it. * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. * Remove log on copy_to_user() failure. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Update the NE ioctl call to match the decoupling from the KVM API. * Add metadata for the NUMA node for the enclave memory and CPUs. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Update kzfree() calls to kfree(). * Remove file ops that do nothing for now - open. v1 -> v2 * Add log pattern for NE. * Update goto labels to match their purpose. * Remove the BUG_ON calls. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-9-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	bd47c995c0	nitro_enclaves: Init misc device providing the ioctl interface The Nitro Enclaves driver provides an ioctl interface to the user space for enclave lifetime management e.g. enclave creation / termination and setting enclave resources such as memory and CPU. This ioctl interface is mapped to a Nitro Enclaves misc device. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the ne_devs data structure to get the refs for the NE misc device in the NE PCI device driver logic. v7 -> v8 * Add define for the CID of the primary / parent VM. * Update the NE PCI driver shutdown logic to include misc device deregister. v6 -> v7 * Set the NE PCI device the parent of the NE misc device to be able to use it in the ioctl logic. * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. v5 -> v6 * Remove the ioctl to query API version. * Update documentation to kernel-doc format. v4 -> v5 * Update the size of the NE CPU pool string from 4096 to 512 chars. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Remove the NE CPU pool init during kernel module loading, as the CPU pool is now setup at runtime, via a sysfs file for the kernel parameter. * Add minimum enclave memory size definition. v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. * Remove the WARN_ON calls. * Remove linux/bug and linux/kvm_host includes that are not needed. * Remove "ratelimited" from the logs that are not in the ioctl call paths. * Remove file ops that do nothing for now - open and release. v1 -> v2 * Add log pattern for NE. * Update goto labels to match their purpose. * Update ne_cpu_pool data structure to include the global mutex. * Update NE misc device mode to 0660. * Check if the CPU siblings are included in the NE CPU pool, as full CPU cores are given for the enclave(s). Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-8-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	e5d616d8bb	nitro_enclaves: Handle out-of-band PCI device events In addition to the replies sent by the Nitro Enclaves PCI device in response to command requests, out-of-band enclave events can happen e.g. an enclave crashes. In this case, the Nitro Enclaves driver needs to be aware of the event and notify the corresponding user space process that abstracts the enclave. Register an MSI-X interrupt vector to be used for this kind of out-of-band events. The interrupt notifies that the state of an enclave changed and the driver logic scans the state of each running enclave to identify for which this notification is intended. Create an workqueue to handle the out-of-band events. Notify user space enclave process that is using a polling mechanism on the enclave fd. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Use the reference to the pdev directly from the ne_pci_dev instead of the one from the enclave data structure. v7 -> v8 * No changes. v6 -> v7 * No changes. v5 -> v6 * Update documentation to kernel-doc format. v4 -> v5 * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Return IRQ_NONE when interrupts are not handled. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Remove "ratelimited" from the logs that are not in the ioctl call paths. v1 -> v2 * Add log pattern for NE. * Update goto labels to match their purpose. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-7-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:41 +02:00
Andra Paraschiv	ad2b6980d0	nitro_enclaves: Handle PCI device command requests The Nitro Enclaves PCI device exposes a MMIO space that this driver uses to submit command requests and to receive command replies e.g. for enclave creation / termination or setting enclave resources. Add logic for handling PCI device command requests based on the given command type. Register an MSI-X interrupt vector for command reply notifications to handle this type of communication events. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * No changes. v7 -> v8 * Update function signature for submit request and retrive reply functions as they only returned 0, no error code. * Include command type value in the error logs of ne_do_request(). v6 -> v7 * No changes. v5 -> v6 * Update documentation to kernel-doc format. v4 -> v5 * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Return IRQ_NONE when interrupts are not handled. v2 -> v3 * Remove the WARN_ON calls. * Update static calls sanity checks. * Remove "ratelimited" from the logs that are not in the ioctl call paths. v1 -> v2 * Add log pattern for NE. * Remove the BUG_ON calls. * Update goto labels to match their purpose. * Add fix for kbuild report: https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/ Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-6-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:40 +02:00
Andra Paraschiv	89308c11ae	nitro_enclaves: Init PCI device driver The Nitro Enclaves PCI device is used by the kernel driver as a means of communication with the hypervisor on the host where the primary VM and the enclaves run. It handles requests with regard to enclave lifetime. Setup the PCI device driver and add support for MSI-X interrupts. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Init the reference to the ne_pci_dev in the ne_devs data structure. v7 -> v8 * Add NE PCI driver shutdown logic. v6 -> v7 * No changes. v5 -> v6 * Update documentation to kernel-doc format. v4 -> v5 * Remove sanity checks for situations that shouldn't happen, only if buggy system or broken logic at all. v3 -> v4 * Use dev_err instead of custom NE log pattern. * Update NE PCI driver name to "nitro_enclaves". v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. * Remove the WARN_ON calls. * Remove linux/bug include that is not needed. * Update static calls sanity checks. * Remove "ratelimited" from the logs that are not in the ioctl call paths. * Update kzfree() calls to kfree(). v1 -> v2 * Add log pattern for NE. * Update PCI device setup functions to receive PCI device data structure and then get private data from it inside the functions logic. * Remove the BUG_ON calls. * Add teardown function for MSI-X setup. * Update goto labels to match their purpose. * Implement TODO for NE PCI device disable state check. * Update function name for NE PCI device probe / remove. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-5-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:40 +02:00
Andra Paraschiv	1df624892c	nitro_enclaves: Define enclave info for internal bookkeeping The Nitro Enclaves driver keeps an internal info per each enclave. This is needed to be able to manage enclave resources state, enclave notifications and have a reference of the PCI device that handles command requests for enclave lifetime management. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Add data structure to keep references to both Nitro Enclaves misc and PCI devices. v7 -> v8 * No changes. v6 -> v7 * Update the naming and add more comments to make more clear the logic of handling full CPU cores and dedicating them to the enclave. v5 -> v6 * Update documentation to kernel-doc format. * Include in the enclave memory region data structure the user space address and size for duplicate user space memory regions checks. v4 -> v5 * Include enclave cores field in the enclave metadata. * Update the vCPU ids data structure to be a cpumask instead of a list. v3 -> v4 * Add NUMA node field for an enclave metadata as the enclave memory and CPUs need to be from the same NUMA node. v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. v1 -> v2 * Add enclave memory regions and vcpus count for enclave bookkeeping. * Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming update. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-4-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:40 +02:00
Andra Paraschiv	0a44561768	nitro_enclaves: Define the PCI device interface The Nitro Enclaves (NE) driver communicates with a new PCI device, that is exposed to a virtual machine (VM) and handles commands meant for handling enclaves lifetime e.g. creation, termination, setting memory regions. The communication with the PCI device is handled using a MMIO space and MSI-X interrupts. This device communicates with the hypervisor on the host, where the VM that spawned the enclave itself runs, e.g. to launch a VM that is used for the enclave. Define the MMIO space of the NE PCI device, the commands that are provided by this device. Add an internal data structure used as private data for the PCI device driver and the function for the PCI device command requests handling. Changelog v9 -> v10 * Update commit message to include the changelog before the SoB tag(s). v8 -> v9 * Fix indent for the NE PCI device command types enum. v7 -> v8 * No changes. v6 -> v7 * Update the documentation to include references to the NE PCI device id and MMIO bar. v5 -> v6 * Update documentation to kernel-doc format. v4 -> v5 * Add a TODO for including flags in the request to the NE PCI device to set a memory region for an enclave. It is not used for now. v3 -> v4 * Remove the "packed" attribute and include padding in the NE data structures. v2 -> v3 * Remove the GPL additional wording as SPDX-License-Identifier is already in place. v1 -> v2 * Update path naming to drivers/virt/nitro_enclaves. * Update NE_ENABLE_OFF / NE_ENABLE_ON defines. Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-3-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-22 13:58:40 +02:00
Souptick Joarder	7f360bec37	drivers/virt/fsl_hypervisor: Fix error handling path First, when memory allocation for sg_list_unaligned failed, there is a bug of calling put_pages() as we haven't pinned any pages. Second, if get_user_pages_fast() failed we should unpin num_pinned pages. This will address both. As part of these changes, minor update in documentation. Fixes: `6db7199407` ("drivers/virt: introduce Freescale hypervisor management driver") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: https://lore.kernel.org/r/1598995271-6755-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-07 14:57:31 +02:00
Denis Efremov	da0d6b3a53	virt: vbox: Use current_uid() in vbg_misc_device_requestor() Modify vbg_misc_device_requestor() to use current_uid() wrapper. Signed-off-by: Denis Efremov <efremov@linux.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200824125330.487083-1-efremov@linux.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-08-28 12:22:50 +02:00
Hans de Goede	729082ed9b	virt: vbox: Fix some comments which talk about the "session spinlock" The session lock is a mutex, not a spinlock, fix the comments to match. Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200709120858.63928-9-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-10 13:45:32 +02:00
Hans de Goede	5bc117a27f	virt: vbox: Log unknown ioctl requests as error Every now and then upstream adds new ioctls without notifying us, log unknown ioctl requests as an error to catch these. Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200709120858.63928-8-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-10 13:45:32 +02:00
Hans de Goede	316b003540	virt: vbox: Add a few new vmmdev request types to the userspace whitelist Upstream VirtualBox has defined and is using a few new request types for vmmdev requests passed through /dev/vboxguest to the hypervisor. Add the defines for these to vbox_vmmdev_types.h and add add them to the whitelists of vmmdev requests which userspace is allowed to make. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545 Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200709120858.63928-7-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-10 13:45:32 +02:00

1 2

95 Коммитов