idxd_device_clear_state() now grabs the idxd->dev_lock
itself, so don't grab the lock prior to calling it.
This was seen in testing after dmar fault occurred on system,
resulting in lockup stack traces.
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: dmaengine@vger.kernel.org
Fixes: cf4ac3fef3 ("dmaengine: idxd: fix lockdep warning on device driver removal")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20220823163709.2102468-1-jsnitsel@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Change the driver where WQ interrupt is requested only when wq is being
enabled. This new scheme set things up so that request_threaded_irq() is
only called when a kernel wq type is being enabled. This also sets up for
future interrupt request where different interrupt handler such as wq
occupancy interrupt can be setup instead of the wq completion interrupt.
Not calling request_irq() until the WQ actually needs an irq also prevents
wasting of CPU irq vectors on x86 systems, which is a limited resource.
idxd_flush_pending_descs() is moved to device.c since descriptor flushing
is now part of wq disable rather than shutdown().
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163942149487.2412839.6691222855803875848.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
With irq_entry already being associated with the wq in a 1:1 relationship,
embed the irq_entry in the idxd_wq struct and remove back pointers for
idxe_wq and idxd_device. In the process of this work, clean up the interrupt
handle assignment so that there's no decision to be made during submit
call on where interrupt handle value comes from. Set the interrupt handle
during irq request initialization time.
irq_entry 0 is designated as special and is tied to the device itself.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163942148362.2412839.12055447853311267866.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add a sysfs knob to allow tuning of retries for the kernel ENQCMDS
descriptor submission. While on host, it is not as likely that ENQCMDS
return busy during normal operations due to the driver controlling the
number of descriptors allocated for submission. However, when the driver is
operating as a guest driver, the chance of retry goes up significantly due
to sharing a wq with multiple VMs. A default value is provided with the
system admin being able to tune the value on a per WQ basis.
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163820629464.2702134.7577370098568297574.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Dan reports that smatch has found idxd_wq_quiesce() is being called inside
the idxd->dev_lock. idxd_wq_quiesce() calls wait_for_completion() and
therefore it can sleep. Move the call outside of the spinlock as it does
not need device lock.
Fixes: 5b0c68c473 ("dmaengine: idxd: support reporting of halt interrupt")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163716858508.1721911.15051495873516709923.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
"Interrupt handle revoked" is an event that happens when the driver is
running on a guest kernel and the VM is migrated to a new machine.
The device will trigger an interrupt that signals to the guest driver
that the interrupt handles need to be replaced.
The misc irq thread function calls a helper function to handle the
event. The function uses the WQ percpu_ref to quiesce the kernel
submissions. It then replaces the interrupt handles by requesting
interrupt handle command for each I/O MSIX vector. Once the handle is
updated, the driver will unblock the submission path to allow new
submissions.
The submitter will attempt to acquire a percpu_ref before submission. When
the request fails, it will wait on the wq_resurrect 'completion'.
The driver does anticipate the possibility of descriptors being submitted
before the WQ percpu_ref is killed. If a descriptor has already been
submitted, it will return with incorrect interrupt handle status. The
descriptor will be re-submitted with the new interrupt handle on the
completion path. For descriptors with incorrect interrupt handles,
completion interrupt won't be triggered.
At the completion of the interrupt handle refresh, the handling function
will call idxd_int_handle_refresh_drain() to issue drain descriptors to
each of the wq with associated interrupt handle. The drain descriptor will have
interrupt request set but without completion record. This will ensure all
descriptors with incorrect interrupt completion handle get drained and
a completion interrupt is triggered for the guest driver to process them.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Co-Developed-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163528420189.3925689.18212568593220415551.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Handle a descriptor that has been marked with invalid interrupt handle
error in status. Create a work item that will resubmit the descriptor. This
typically happens when the driver has handled the revoke interrupt handle
event and has a new interrupt handle.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163528419601.3925689.4166517602890523193.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The helper is called at the completion of the interrupt handle refresh
event. It issues drain descriptors to each of the wq with associated
interrupt handle. The drain descriptor will have interrupt request set but
without completion record. This will ensure all descriptors with incorrect
interrupt completion handle get drained and a completion interrupt is
triggered for the guest driver to process them.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163528418315.3925689.7944718440052849626.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Refactor the completion function to allow skipping of descriptor freeing on
the submission failure path. This completely removes descriptor freeing
from the submit failure path and leave the responsibility to the caller.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/163528416222.3925689.12859769271667814762.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Using list_move_tail() instead of list_del() + list_add_tail()
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20210908092826.67765-1-cuibixuan@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The spinlock is not being used in hard interrupt context. There is no need
to disable irq when acquiring the lock. The interrupt thread handler also
is not in bottom half context, therefore we can also remove disabling of
the bh. Convert all dev_lock acquisition to plain spin_lock() calls.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162984026772.1939166.11504067782824765879.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The list lock is never acquired in interrupt context. Therefore there is no
need to disable interrupts. Remove interrupt flags for lock operations.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162826417450.3454650.3733188117742416238.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The interrupt thread handler currently loops forever to process outstanding
completions. This causes either an "irq X: nobody cared" kernel splat or
the NMI watchdog kicks in due to running too long in the function. The irq
thread handler is expected to run again after exiting if there are
interrupts fired while the thread handler is running. So the handler code
can process all the completed I/O in a single pass and exit without losing
the follow on completed I/O.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162802977005.3084234.11836261157026497585.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Coverity static analysis of linux-next found issue.
The check (status == IDXD_COMP_DESC_ABORT) is always false since status
was previously masked with 0x7f and IDXD_COMP_DESC_ABORT is 0xff.
Fixes: 6b4b87f2c3 ("dmaengine: idxd: fix submission race window")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162698465160.3560828.18173186265683415384.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Kernel memory are pinned and will not cause faults. Since the driver
does not support interrupts for user descriptors, no fault errors are
expected to come through the misc interrupt. Remove dead code.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162630502789.631986.10591230961790023856.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add a 'struct idxd_dev' that wraps the 'struct device' for idxd conf_dev
that registers with the dsa bus. This is introduced in order to deal with
multiple different types of 'devices' that are registered on the dsa_bus
when the compat driver needs to route them to the correct driver to attach.
The bind() call now can determine the type of device and then do the
appropriate driver matching.
Reviewed-by Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162637460065.744545.584492831446090984.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Konstantin observed that when descriptors are submitted, the descriptor is
added to the pending list after the submission. This creates a race window
with the slight possibility that the descriptor can complete before it
gets added to the pending list and this window would cause the completion
handler to miss processing the descriptor.
To address the issue, the addition of the descriptor to the pending list
must be done before it gets submitted to the hardware. However, submitting
to swq with ENQCMDS instruction can cause a failure with the condition of
either wq is full or wq is not "active".
With the descriptor allocation being the gate to the wq capacity, it is not
possible to hit a retry with ENQCMDS submission to the swq. The only
possible failure can happen is when wq is no longer "active" due to hw
error and therefore we are moving towards taking down the portal. Given
this is a rare condition and there's no longer concern over I/O
performance, the driver can walk the completion lists in order to retrieve
and abort the descriptor.
The error path will set the descriptor to aborted status. It will take the
work list lock to prevent further processing of worklist. It will do a
delete_all on the pending llist to retrieve all descriptors on the pending
llist. The delete_all action does not require a lock. It will walk through
the acquired llist to find the aborted descriptor while add all remaining
descriptors to the work list since it holds the lock. If it does not find
the aborted descriptor on the llist, it will walk through the work
list. And if it still does not find the descriptor, then it means the
interrupt handler has removed the desc from the llist but is pending on
the work list lock and will process it once the error path releases the
lock.
Fixes: eb15e7154f ("dmaengine: idxd: add interrupt handle request and release support")
Reported-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162628855747.360485.10101925573082466530.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The previous state cleanup patch only performed wq state cleanups. This
does not go far enough as when device is disabled or reset, the state
for groups and engines must also be cleaned up. Add additional state
cleanup beyond wq cleanup. Tie those cleanups directly to device
disable and reset, and wq disable and reset.
Fixes: da32b28c95 ("dmaengine: idxd: cleanup workqueue config after disabling")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/162285154108.2096632.5572805472362321307.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add the code needed in the main IDXD driver to interface with the IDXD
perfmon implementation.
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/a5564a5583911565d31c2af9234218c5166c4b2c.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Remove interrupt masking and just let the hard irq handler keep
firing for new events. This is less of a performance impact vs
the MMIO readback inside the pci_msi_{mask,unmas}_irq(). Especially
with a loaded system those flushes can be stuck behind large amounts
of MMIO writes to flush. When guest kernel is running on top of VFIO
mdev, mask/unmask causes a vmexit each time and is not desirable.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/161894523436.3210025.1834640110556139277.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The char device setup and cleanup has device lifetime issues regarding when
parts are initialized and cleaned up. The initialization of struct device is
done incorrectly. device_initialize() needs to be called on the 'struct
device' and then additional changes can be added. The ->release() function
needs to be setup via device_type before dev_set_name() to allow proper
cleanup. The change re-parents the cdev under the wq->conf_dev to get
natural reference inheritance. No known dependency on the old device path exists.
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Fixes: 42d279f913 ("dmaengine: idxd: add char driver to expose submission portal to userland")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/161852987721.2203940.1478218825576630810.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Remove devm_* allocation and fix wq->conf_dev 'struct device' lifetime.
Address issues flagged by CONFIG_DEBUG_KOBJECT_RELEASE. Add release
functions in order to free the allocated memory for the wq context at
device destruction time.
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Fixes: bfe1d56091 ("dmaengine: idxd: Init and probe for Intel data accelerators")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161852985907.2203940.6840120734115043753.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Current code blindly writes over the SWERR and the OVERFLOW bits. Write
back the bits actually read instead so the driver avoids clobbering the
OVERFLOW bit that comes after the register is read.
Fixes: bfe1d56091 ("dmaengine: idxd: Init and probe for Intel data accelerators")
Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161352082229.3511254.1002151220537623503.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Nikhil reported the misc interrupt handler can sometimes miss handling
the command interrupt when an error interrupt happens near the same time.
Have the irq handling thread continue to process the misc interrupts until
all interrupts are processed. This is a low usage interrupt and is not
expected to handle high volume traffic. Therefore there is no concern of
this thread running for a long time.
Fixes: 0d5c10b4c8 ("dmaengine: idxd: add work queue drain support")
Reported-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161074755329.2183844.13295528344116907983.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add code to "complete" a descriptor when the descriptor or its completion
address hit a fault error when SVA mode is being used. This error can be
triggered due to bad programming by the user. A lock is introduced in order
to protect the descriptor completion lists since the fault handler will run
from the system work queue after being scheduled in the interrupt handler.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/160382008092.3911367.12766483427643278985.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Switch driver to use MSIX mask and unmask instead of the ignore bit.
When ignore bit is cleared, we must issue an MMIO read to ensure writes
have all arrived and check and process any additional completions. The
ignore bit does not queue up any pending MSIX interrupts. The mask bit
however does. Use API call from interrupt subsystem to mask MSIX
interrupt since the hardware does not have convenient mask bit register.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/159319517621.70410.11816465052708900506.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add wq drain support. When a wq is being released, it needs to wait for
all in-flight operation to complete. A device control function
idxd_wq_drain() has been added to facilitate this. A wq drain call
is added to the char dev on release to make sure all user operations are
complete. A wq drain is also added before the wq is being disabled.
A drain command can take an unpredictable period of time. Interrupt support
for device commands is added to allow waiting on the command to
finish. If a previous command is in progress, the new submitter can block
until the current command is finished before proceeding. The interrupt
based submission will submit the command and then wait until a command
completion interrupt happens to complete. All commands are moved to the
interrupt based command submission except for the device reset during
probe, which will be polled.
Fixes: 42d279f913 ("dmaengine: idxd: add char driver to expose submission portal to userland")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/159319502515.69593.13451647706946040301.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Fix unmasking of misc interrupt handler when completing normal. It exits
early and skips the unmasking with the current implementation. Fix to
unmask interrupt when exiting normally.
Fixes: bfe1d56091 ("dmaengine: idxd: Init and probe for Intel data accelerators")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/159311256528.855.11527922406329728512.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The current implementation may miss completions after we unmask the
interrupt. In order to make sure we process all competions, we need to:
1. Do an MMIO read from the device as a barrier to ensure that all PCI
writes for completions have arrived.
2. Check for any additional completions that we missed.
Fixes: 8f47d1a5e5 ("dmaengine: idxd: connect idxd to dmaengine subsystem")
Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/158834641769.35613.1341160109892008587.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Create a char device region that will allow acquisition of user portals in
order to allow applications to submit DMA operations. A char device will be
created per work queue that gets exposed. The workqueue type "user"
is used to mark a work queue for user char device. For example if the
workqueue 0 of DSA device 0 is marked for char device, then a device node
of /dev/dsa/wq0.0 will be created.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965026985.73301.976523230037106742.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add plumbing for dmaengine subsystem connection. The driver register a DMA
device per DSA device. The channels are dynamically registered when a
workqueue is configured to be "kernel:dmanegine" type.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965026376.73301.13867988830650740445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The idxd driver introduces the Intel Data Stream Accelerator [1] that will
be available on future Intel Xeon CPUs. One of the kernel access
point for the driver is through the dmaengine subsystem. It will initially
provide the DMA copy service to the kernel.
Some of the main functionality introduced with this accelerator
are: shared virtual memory (SVM) support, and descriptor submission using
Intel CPU instructions movdir64b and enqcmds. There will be additional
accelerator devices that share the same driver with variations to
capabilities.
This commit introduces the probe and initialization component of the
driver.
[1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>