This copies the same support from pci-stub for exactly the same
purpose, enabling a set of PCI IDs to be automatically added to the
driver's dynamic ID table at module load time. The code here is
pretty simple and both vfio-pci and pci-stub are fairly unique in
being meta drivers, capable of attaching to any device, so there's no
attempt made to generalize the code into pci-core.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If VFIO VGA access is disabled for the user, either by CONFIG option
or module parameter, we can often opt-out of VGA arbitration. We can
do this when PCI bridge control of VGA routing is possible. This
means that we must have a parent bridge and there must only be a
single VGA device below that bridge. Fortunately this is the typical
case for discrete GPUs.
Doing this allows us to minimize the impact of additional GPUs, in
terms of VGA arbitration, when they are only used via vfio-pci for
non-VGA applications.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a module option so that we don't require a CONFIG change and
kernel rebuild to disable VGA support. Not only can VGA support be
troublesome in itself, but by disabling it we can reduce the impact
to host devices by doing a VGA arbitration opt-out.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
An unintended consequence of commit 42ac9bd18d ("vfio: initialize
the virqfd workqueue in VFIO generic code") is that the vfio module
is renamed to vfio_core so that it can include both vfio and virqfd.
That's a user visible change that may break module loading scritps
and it imposes eventfd support as a dependency on the core vfio code,
which it's really not. virqfd is intended to be provided as a service
to vfio bus drivers, so instead of wrapping it into vfio.ko, we can
make it a stand-alone module toggled by vfio bus drivers. This has
the additional benefit of removing initialization and exit from the
core vfio code.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now we have finally completely decoupled virqfd from VFIO_PCI. We can
initialize it from the VFIO generic code, in order to safely use it from
multiple independent VFIO bus drivers.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virqfd functionality that is used by VFIO_PCI to implement interrupt
masking and unmasking via an eventfd, is generic enough and can be reused
by another driver. Move it to a separate file in order to allow the code
to be shared.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO_PCI passes the VFIO device structure *vdev via eventfd to the handler
that implements masking/unmasking of IRQs via an eventfd. We can replace
it in the virqfd infrastructure with an opaque type so we can make use
of the mechanism from other VFIO bus drivers.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The Virqfd code needs to keep accesses to any struct *virqfd safe, but
this comes into play only when creating or destroying eventfds, so sharing
the same spinlock with the VFIO bus driver is not necessary.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The functions vfio_pci_virqfd_init and vfio_pci_virqfd_exit are not really
PCI specific, since we plan to reuse the virqfd code with more VFIO drivers
in addition to VFIO_PCI.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
[Baptiste Reynal: Move rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit
from "vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export"]
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We want to reuse virqfd functionality in multiple VFIO drivers; before
moving these functions to core VFIO, add the vfio_ prefix to the
virqfd_enable and virqfd_disable functions, and export them so they can
be used from other modules.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This adds a missing break statement to VFIO_DEVICE_SET_IRQS handler
without which vfio_pci_set_err_trigger() would never be called.
While we are here, add another "break" to VFIO_PCI_REQ_IRQ_INDEX case
so if we add more indexes later, we won't miss it.
Fixes: 6140a8f562 ("vfio-pci: Add device request interface")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Userspace can opt to receive a device request notification,
indicating that the device should be released. This is setup
the same way as the error IRQ and also supports eventfd signaling.
Future support may forcefully remove the device from the user if
the request is ignored.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We want another single vector IRQ index to support signaling of
the device request to userspace. Generalize the error reporting
IRQ index to avoid code duplication.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Current vfio-pci just supports normal pci device, so vfio_pci_probe() will
return if the pci device is not a normal device. While current code makes a
mistake. PCI_HEADER_TYPE is the offset in configuration space of the device
type, but we use this value to mask the type value.
This patch fixs this by do the check directly on the pci_dev->hdr_type.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org # v3.6+
- s390 support (Frank Blaschka)
- Enable iommu-type1 for ARM SMMU (Will Deacon)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUka+sAAoJECObm247sIsiiy0P/iqrQpv94Z7rUKRlV3K8YtJj
8Oi5fLLnT2by9v5mS+KMElnQ5gLU/C5B/QGLMNrF2uQl8lguWSXJw37r7MkbIkpN
RDLx1NhetLqbJ4CYLjyv/Jx3vl+Wr/2nNWRVIS5ajBmMjEgKVLvjYs4SaXELc3a8
a3YzcGW10BrVFlCJgUYqYIFGS1BmKjf7fbD5YBocj8tPv6NAlCiNNYYr+0pzW8Lf
GTi39JlZ2t06hDq33eiUkrySWNjrIBn4g4PfAl7HBAscsZKS1w18MD1qVw4UXXa2
15+CBbsHU7ZLVo6G7vuZeJNCX9tdQ0WIZWQzHstQa914l86WYImTJ2tyH7Rn0ZcQ
3Mu9fzef9JgjkI56ol2zDwuOs+qttOYaLWjhhHiW4jkIxdnljnesIFlvmM3XeDGz
3Zowg09HzE3K+dt8265jVKkcNJbPLzspLvF27nPMudZNHBozcoPE0jKxU7QC3eMT
Ij36+puQq+jccUic3Np6rxk5tzTHEat1a7w3IUwXCCUP5P5QW+kuuIjbd4hqQkHn
VDRjnT6MWC3GguUCXR5VyO0zezpI20pTbWwE8u2qwnE349m0Eq/vxytj2lCLYLPR
Jjtdduf1/Ppam7tATd3PwTu6KljY3dJiUUikyOc1J0KmkgkSMw+BtR6G7qytyW4q
/fhClcsaNtxSheVh0N+b
=1L2V
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- s390 support (Frank Blaschka)
- Enable iommu-type1 for ARM SMMU (Will Deacon)
* tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio:
drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU
vfio: make vfio run on s390
Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add Kconfig switch to hide INTx
add Kconfig switch to let vfio announce PCI BARs are not mapable
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Locking both the remove() and release() path results in a deadlock
that should have been obvious. To fix this we can get and hold the
vfio_device reference as we evaluate whether to do a bus/slot reset.
This will automatically block any remove() calls, allowing us to
remove the explict lock. Fixes 61d792562b.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org [3.17]
The MSIx vector table lives in device memory, which may be cleared as
part of a backdoor device reset. This is the case on the IBM IPR HBA
when the BIST is run on the device. When assigned to a QEMU guest,
the guest driver does a pci_save_state(), issues a BIST, then does a
pci_restore_state(). The BIST clears the MSIx vector table, but due
to the way interrupts are configured the pci_restore_state() does not
restore the vector table as expected. Eventually this results in an
EEH error on Power platforms when the device attempts to signal an
interrupt with the zero'd table entry.
Fix the problem by restoring the host cached MSI message prior to
enabling each vector.
Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask,
and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2,
bit 0 was redefined to be "Undefined."
Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change.
No functional change.
[bhelgaas: changelog]
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The existing vfio_pci_open() fails upon error returned from
vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB
support which this patch brings back.
The patch fixes the issue by dropping the return value of
vfio_spapr_pci_eeh_open().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Each time a device is released, mark whether a local reset was
successful or whether a bus/slot reset is needed. If a reset is
needed and all of the affected devices are bound to vfio-pci and
unused, allow the reset. This is most useful when the userspace
driver is killed and releases all the devices in an unclean state,
such as when a QEMU VM quits.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Serializing open/release allows us to fix a refcnt error if we fail
to enable the device and lets us prevent devices from being unbound
or opened, giving us an opportunity to do bus resets on release. No
restriction added to serialize binding devices to vfio-pci while the
mutex is held though.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Our current open/release path looks like this:
vfio_pci_open
vfio_pci_enable
pci_enable_device
pci_save_state
pci_store_saved_state
vfio_pci_release
vfio_pci_disable
pci_disable_device
pci_restore_state
pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device
comes to us with it enabled, it persists through the open and gets
stored as part of the device saved state. We then restore that saved
state when released, which can allow the device to attempt to continue
to do DMA. When the group is disconnected from the domain, this will
get caught by the IOMMU, but if there are other devices in the group,
the device may continue running and interfere with the user. Even in
the former case, IOMMUs don't necessarily behave well and a stream of
blocked DMA can result in unpleasant behavior on the host.
Explicitly disable Bus Master as we're enabling the device and
slightly re-work release to make sure that pci_disable_device() is
the last thing that touches the device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
According PCI local bus specification, the register of Message
Control for MSI (offset: 2, length: 2) has bit#0 to enable or
disable MSI logic and it shouldn't be part contributing to the
calculation of MSI interrupt count. The patch fixes the issue.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
There's nothing we can do different if pci_load_and_free_saved_state()
fails, other than maybe print some log message, but the actual re-load
of the state is an unnecessary step here since we've only just saved
it. We can cleanup a coverity warning and eliminate the unnecessary
step by freeing the state ourselves.
Detected by Coverity: CID 753101
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When sizing the TPH capability we store the register containing the
table size into the 'dword' variable, but then use the uninitialized
'byte' variable to analyze the size. The table size is also actually
reported as an N-1 value, so correct sizing to account for this.
The round_up() for both TPH and DPA is unnecessary, remove it.
Detected by Coverity: CID 714665 & 715156
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
pci_enable_msix() and pci_enable_msi_block() have been deprecated; use
pci_enable_msix_range() and pci_enable_msi_range() instead.
[bhelgaas: changelog]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
- Remove unnecessary and dangerous use of device_lock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJS4T1aAAoJECObm247sIsiLMkQAKn11qSLk880dJO+cfW2VpqL
vm3WreShOxNsKTEQfcbrnJtS+kYGkiyo8+Vve377QDIg+xeXpNZa0PbAEfh4HyuJ
NH+vn0FYDf3rzF4Z9snm2HOHWCAsqLCE0guyMwwhUYH+fks4+YTJWCm4YWZOXbIS
aI++G+5LBxGzMRbotoovqUhQy474fpQN/GFIlIZHP2bGtdeXiRuH0qH3BxVKafOF
50RtAgJsLm1Nb+S20Umsz+llL3sop1wcPLKOlgCXqd+545CJAoq/qqEPjUtOrr+S
qXvXhb0XM7zo73H2WzkVmS09HZWX4sMoHBYp8D6ToOIFF01LOVsXBKduPCPpTD7E
zohhjgag9/RQ99l2B153IIIv0Bsg7b4YXBQ6qkWFoJolPL94LTq1PXk0f9mNhyPo
mSqatcAeI5qtjqu9ncSd4YYTdBgp7SJcgwji8fI44tvzzpz8iheVUrpwcU1UumAK
TpXLBoTLXllK1op3u9xgFrF4ISBMl7lZeZp3c1/1YRga7Ch6SdlB0tcLPfuSBRF0
1Qb5jQrWz4qt/oZSkapcFALXQDgwoK8am9I0a5YXdhy9gSYm+o/TLwG5pTEnF4In
xxuibmmJAmNcTJ9q6rUKjLLU/TqRKin+vyqW6is41IredgvNX8XDS3YokA5Fgv01
mu1s7odFi08xjPRuYvmq
=t9+R
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio
Pull vfio update from Alex Williamson:
- convert to misc driver to support module auto loading
- remove unnecessary and dangerous use of device_lock
* tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio:
vfio-pci: Don't use device_lock around AER interrupt setup
vfio: Convert control interface to misc driver
misc: Reserve minor for VFIO
PCI resets will attempt to take the device_lock for any device to be
reset. This is a problem if that lock is already held, for instance
in the device remove path. It's not sufficient to simply kill the
user process or skip the reset if called after .remove as a race could
result in the same deadlock. Instead, we handle all resets as "best
effort" using the PCI "try" reset interfaces. This prevents the user
from being able to induce a deadlock by triggering a reset.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
device_lock is much too prone to lockups. For instance if we have a
pending .remove then device_lock is already held. If userspace
attempts to modify AER signaling after that point, a deadlock occurs.
eventfd setup/teardown is already protected in vfio with the igate
mutex. AER is not a high performance interrupt, so we can also use
the same mutex to protect signaling versus setup races.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
These are set of two capability registers, it's pretty much given that
they're registers, so reflect their purpose in the name.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The current VFIO_DEVICE_RESET interface only maps to PCI use cases
where we can isolate the reset to the individual PCI function. This
means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
transition, device specific reset, or be a singleton device on a bus
for a secondary bus reset. FLR does not have widespread support,
PM reset is not very reliable, and bus topology is dictated by the
system and device design. We need to provide a means for a user to
induce a bus reset in cases where the existing mechanisms are not
available or not reliable.
This device specific extension to VFIO provides the user with this
ability. Two new ioctls are introduced:
- VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
- VFIO_DEVICE_PCI_HOT_RESET
The first provides the user with information about the extent of
devices affected by a hot reset. This is essentially a list of
devices and the IOMMU groups they belong to. The user may then
initiate a hot reset by calling the second ioctl. We must be
careful that the user has ownership of all the affected devices
found via the first ioctl, so the second ioctl takes a list of file
descriptors for the VFIO groups affected by the reset. Each group
must have IOMMU protection established for the ioctl to succeed.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Having PCIe/PCI-X capability isn't enough to assume that there are
extended capabilities. Both specs define that the first capability
header is all zero if there are no extended capabilities. Testing
for this avoids an erroneous message about hiding capability 0x0 at
offset 0x100.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
eventfd_fget() tests to see whether the file is an eventfd file, which
we then immediately pass to eventfd_ctx_fileget(), which again tests
whether the file is an eventfd file. Simplify slightly by using
fdget() so that we only test that we're looking at an eventfd once.
fget() could also be used, but fdget() makes use of fget_light() for
another slight optimization.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If an attempt is made to unbind a device from vfio-pci while that
device is in use, the request is blocked until the device becomes
unused. Unfortunately, that unbind path still grabs the device_lock,
which certain things like __pci_reset_function() also want to take.
This means we need to try to acquire the locks ourselves and use the
pre-locked version, __pci_reset_function_locked().
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRgox9AAoJECObm247sIsizJwP/23KprR6BuRt168Obo+xC/lp
kj1E4Hd4XHx6T/5XRexa4GqhmyLqgoqbS19uj/K6ebY9rouVKo0V6OYNFDQiFR/F
yEGjYIBKV9/eBZMQI4RbZpZKGDXTeFrXyjsac1m51JrR3st8zsvZlNPE/TjzhSfz
jMnsCC99ZwIFjmptw/yWwgInswy1n9H9iICS14Xn1x7v71iyOE32reTG6M9HsPHe
Xm5F0K5iLQX8qoISHAomrTnFmCLxp5Y2qhh7nZWmC2gCbPBEnGdqx4prwzJayAaC
DAN8osaYldoKQuwI4wFYMICHxxCEFk8xU58FeKCmeQJZgomefVAoRKf86gAEsSLR
XHptBQOktWVKNFhzZWyOuAX3iua/hmWcOa05I6inycV1x2ZAGlNhvJnj1iKIphLk
+neBFAD9wDcAJZdXkfudq0jJ9jJTSAWBv8d+hozNfkjTVhF+wRnhZ7V6dZfb9+W6
kPGbgwqKApLoOabQbxbZ5Ftr6S7344prB/HAMywGa+xZfoxoYPxBHCi0rSTvUTWy
oWLtzrKRbjBDc0qF4eEK9RZlmVt2ZmCUnUB3kbWED4mrJAHu4LlFghnAloePrIZ3
kXlMARWbyw/E+1WIMO4Ito5+1s3zVsJJgzVsAQDJkTQw2aYDhns73/Y25Dj7x7QS
BKebrXnh2xVCN6OIu+eX
=F0Cc
-----END PGP SIGNATURE-----
Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
"Changes include extension to support PCI AER notification to
userspace, byte granularity of PCI config space and access to
unarchitected PCI config space, better protection around IOMMU driver
accesses, default file mode fix, and a few misc cleanups."
* tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio:
vfio: Set container device mode
vfio: Use down_reads to protect iommu disconnects
vfio: Convert container->group_lock to rwsem
PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register
vfio-pci: Enable raw access to unassigned config space
vfio-pci: Use byte granularity in config map
vfio: make local function vfio_pci_intx_unmask_handler() static
VFIO-AER: Vfio-pci driver changes for supporting AER
VFIO: Wrapper for getting reference to vfio_device
We now cache the MSI/MSI-X capability offsets in the struct pci_dev,
so no need to find the capabilities again.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register,
because PCI-E Capabilities Register bits are almost read-only. This patch
use pcie_caps_reg() instead of another access PCI-E Capabilities Register.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Devices like be2net hide registers between the gaps in capabilities
and architected regions of PCI config space. Our choices to support
such devices is to either build an ever growing and unmanageable white
list or rely on hardware isolation to protect us. These registers are
really no different than MMIO or I/O port space registers, which we
don't attempt to regulate, so treat PCI config space in the same way.
Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
The config map previously used a byte per dword to map regions of
config space to capabilities. Modulo a bug where we round the length
of capabilities down instead of up, this theoretically works well and
saves space so long as devices don't try to hide registers in the gaps
between capabilities. Unfortunately they do exactly that so we need
byte granularity on our config space map. Increase the allocation of
the config map and split accesses at capability region boundaries.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both
of which are unsigned. We attempt to bounds check these, but fail to
account for the case where start is a very large number, allowing
start + count to wrap back into the valid range. Bounds check both
start and start + count.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_pci_intx_unmask_handler() was not declared. It should be static.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The vfio drivers call kmalloc or kzalloc, but do not
include <linux/slab.h>, which causes build errors on
ARM.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when
an error occurs in the vfio_pci_device
- Register pci_error_handler for the vfio_pci driver
- When the device encounters an error, the error handler registered by
the vfio_pci driver gets invoked by the AER infrastructure
- In the error handler, signal the eventfd registered for the device.
- This results in the qemu eventfd handler getting invoked and
appropriate action taken for the guest.
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
and MMIO address 0xa0000. As these are non-overlapping, we can ignore
the I/O port vs MMIO difference and expose them both in a single
region. We make use of the VGA arbiter around each access to
configure chipset access as necessary.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We give the user access to change the power state of the device but
certain transitions result in an uninitialized state which the user
cannot resolve. To fix this we need to mark the PowerState field of
the PMCSR register read-only and effect the requested change on behalf
of the user. This has the added benefit that pdev->current_state
remains accurate while controlled by the user.
The primary example of this bug is a QEMU guest doing a reboot where
the device it put into D3 on shutdown and becomes unusable on the next
boot because the device did a soft reset on D3->D0 (NoSoftRst-).
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We can actually handle MMIO and I/O port from the same access function
since PCI already does abstraction of this. The ROM BAR only requires
a minor difference, so it gets included too. vfio_pci_config_readwrite
gets renamed for consistency.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The read and write functions are nearly identical, combine them
and convert to a switch statement. This also makes it easy to
narrow the scope of when we use the io/mem accessors in case new
regions are added.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
A read from a range hidden from the user (ex. MSI-X vector table)
attempts to fill the user buffer up to the end of the excluded range
instead of up to the requested count. Fix it.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Devices making use of PM reset are getting incorrectly identified as
not supporting reset because pci_pm_reset() fails unless the device is
in D0 power state. When first attached to vfio_pci devices are
typically in an unknown power state. We can fix this by explicitly
setting the power state or simply calling pci_enable_device() before
attempting a pci_reset_function(). We need to enable the device
anyway, so move this up in our vfio_pci_enable() function, which also
simplifies the error path a bit.
Note that pci_disable_device() does not explicitly set the power
state, so there's no need to re-order vfio_pci_disable().
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The two labels for error recovery in function vfio_pci_init() is out of
order, so fix it.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Move the device reset to the end of our disable path, the device
should already be stopped from pci_disable_device(). This also allows
us to manipulate the save/restore to avoid the save/reset/restore +
save/restore that we had before.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virq_disabled flag tracks the userspace view of INTx masking
across interrupt mode changes, but we're not consistently applying
this to the interrupt and masking handler notion of the device.
Currently if the user sets DisINTx while in MSI or MSIX mode, then
returns to INTx mode (ex. rebooting a qemu guest), the hardware has
DisINTx+, but the management of INTx thinks it's enabled, making it
impossible to actually clear DisINTx. Fix this by updating the
handler state when INTx is re-enabled.
Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We need to be ready to recieve an interrupt as soon as we call
request_irq, so our eventfd context setting needs to be moved
earlier. Without this, an interrupt from our device or one
sharing the interrupt line can pass a NULL into eventfd_signal
and oops.
Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Our mmap path mistakely relied on vma->vm_pgoff to get set in
remap_pfn_range. After b3b9c293, that path only applies to
copy-on-write mappings. Set it in our own code.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The VM_RESERVED flag was killed off in commit 314e51b985 ("mm: kill
vma flag VM_RESERVED and mm->reserved_vm counter"), and replaced by the
proper semantic flags (eg "don't core-dump" etc). But there was a new
use of VM_RESERVED that got missed by the merge.
Fix the remaining use of VM_RESERVED in the vfio_pci driver, replacing
the VM_RESERVED flag with VM_DONTEXPAND | VM_DONTDUMP.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation,org>
vfoi-pci supports a mechanism like KVM's irqfd for unmasking an
interrupt through an eventfd. There are two ways to shutdown this
interface: 1) close the eventfd, 2) ioctl (such as disabling the
interrupt). Both of these do the release through a workqueue,
which can result in a segfault if two jobs get queued for the same
virqfd.
Fix this by protecting the pointer to these virqfds by a spinlock.
The vfio pci device will therefore no longer have a reference to it
once the release job is queued under lock. On the ioctl side, we
still flush the workqueue to ensure that any outstanding releases
are completed.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add PCI device support for VFIO. PCI devices expose regions
for accessing config space, I/O port space, and MMIO areas
of the device. PCI config access is virtualized in the kernel,
allowing us to ensure the integrity of the system, by preventing
various accesses while reducing duplicate support across various
userspace drivers. I/O port supports read/write access while
MMIO also supports mmap of sufficiently sized regions. Support
for INTx, MSI, and MSI-X interrupts are provided using eventfds to
userspace.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>