As of now SCSI initiated error handling is broken because,
the reset APIs don't try to bring back the device initialized and
ready for further transfers.
In case of timeouts, the scsi error handler takes care of handling aborts
and resets. Improve the error handling in such scenario by resetting the
device and host and re-initializing them in proper manner.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Reviewed-by: Yaniv Gardi <ygardi@codeaurora.org>
Tested-by: Dolev Raviv <draviv@codeaurora.org>
Acked-by: Vinayak Holikatti <vinholikatti@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There is a possible race condition in the hardware when the abort
command is issued to terminate the ongoing SCSI command as described
below:
- A bit in the door-bell register is set in the controller for a
new SCSI command.
- In some rare situations, before controller get a chance to issue
the command to the device, the software issued an abort command.
- If the device recieves abort command first then it returns success
because the command itself is not present.
- Now if the controller commits the command to device it will be
processed.
- Software thinks that command is aborted and proceed while still
the device is processing it.
- The software, controller and device may go out of sync because of
this race condition.
To avoid this, query task presence in the device before sending abort
task command so that after the abort operation, the command is guaranteed
to be non-existent in both controller and the device.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Reviewed-by: Yaniv Gardi <ygardi@codeaurora.org>
Tested-by: Dolev Raviv <draviv@codeaurora.org>
Acked-by: Vinayak Holikatti <vinholikatti@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently, sending Task Management (TM) command to the card might
be broken in some scenarios as listed below:
Problem: If there are more than 8 TM commands the implementation
returns error to the caller.
Fix: Wait for one of the slots to be emptied and send the command.
Problem: Sometimes it is necessary for the caller to know the TM service
response code to determine the task status.
Fix: Propogate the service response to the caller.
Problem: If the TM command times out no proper error recovery is
implemented.
Fix: Clear the command in the controller door-bell register, so that
further commands for the same slot don't fail.
Problem: While preparing the TM command descriptor, the task tag used
should be unique across SCSI/NOP/QUERY/TM commands and not the
task tag of the command which the TM command is trying to manage.
Fix: Use a unique task tag instead of task tag of SCSI command.
Problem: Since the TM command involves H/W communication, abruptly ending
the request on kill interrupt signal might cause h/w malfunction.
Fix: Wait for hardware completion interrupt with TASK_UNINTERRUPTIBLE
set.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Reviewed-by: Yaniv Gardi <ygardi@codeaurora.org>
Tested-by: Dolev Raviv <draviv@codeaurora.org>
Acked-by: Vinayak Holikatti <vinholikatti@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix many warnings with incorrect endian assumptions
which makes the code unportable to new architectures.
The UFS specification defines the byte order as big-endian
for UPIU structure and little-endian for the host controller
transfer/task management descriptors.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Acked-by: Vinayak Holikatti <vinholikatti@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
rescan_hba_mode was defined as a u8 so could never be less than zero:
rescan_hba_mode = hpsa_hba_mode_enabled(h);
if (rescan_hba_mode < 0)
goto out;
Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The sun3 drivers suffer from a whole bunch of duplicated code. Fix this
by following the g_NCR5380_mmio example. (Notionally, sun3_scsi relates to
sun3_scsi_vme in the same way that g_NCR5380 relates to g_NCR5380_mmio.)
Dead code is also removed: we now have working debug macros so
SUN3_SCSI_DEBUG is undesirable. Dead code within #ifdef OLD_DMA is also
dropped, consistent with sun3_scsi_vme.c.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Move the #include "NCR5380.h" out of the sun3_scsi.h header file and into
the driver .c files, like all the other NCR5380 drivers in the tree.
This improves uniformity and reduces the depth of nested includes. The
sequence of #include's, #define's and #if's no longer does my head in.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the unused (and divergent) debugging macro definitions from
the sun3_NCR5380 and atari_NCR5380 drivers. These drivers have been
converted to use the common macros in NCR5380.h.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Acked-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
All three NCR5380 core driver implementations share the same NCR5380.h
header file so they need to agree on certain macro definitions.
The flag bit used by the NDEBUG_MERGING macro in atari_NCR5380 and
sun3_NCR5380 collides with the bit used by NDEBUG_LISTS.
Moreover, NDEBUG_ABORT appears in NCR5380.c so it should be defined in
NCR5380.h rather than in each of the many drivers using that core.
An undefined NDEBUG_ABORT macro caused compiler errors and led to dodgy
workarounds in the core driver that can now be removed.
(See commits f566a576bc and
185a7a1cd79b9891e3c17abdb103ba1c98d6ca7a.)
Move all of the NDEBUG_ABORT, NDEBUG_TAGS and NDEBUG_MERGING macro
definitions into NCR5380.h where all the other NDEBUG macros live.
Also, incorrect "#ifdef NDEBUG" becomes "#if NDEBUG" to fix the warning:
drivers/scsi/mac_scsi.c: At top level:
drivers/scsi/NCR5380.c:418: warning: 'NCR5380_print' defined but not used
drivers/scsi/NCR5380.c:459: warning: 'NCR5380_print_phase' defined but not used
The debugging code is now enabled when NDEBUG != 0.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
All NCR5380 drivers already include the NCR5380.h header. Better to
adopt those macros rather than have three variations on them.
Moreover, the macros in NCR5380.h are preferable because the atari_NCR5380
and sun3_NCR5380 versions are inflexible. For example, they can't accomodate
dprintk(NDEBUG_MAIN | NDEBUG_QUEUES, ...)
Replace the *_PRINTK macros from atari_NCR5380.h and sun3_NCR5380.h with
the equivalent macros from NCR5380.h.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Acked-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
All NCR5380 drivers already include the NCR5380.h header. Better to
adopt those macros rather than have three variations on them.
Moreover, the macros in NCR5380.h are preferable anyway: the atari_NCR5380
and sun3_NCR5380 versions are inflexible. For example, they can't accomodate
NCR5380_dprint(NDEBUG_MAIN | NDEBUG_QUEUES, ...)
Replace the NCR_PRINT* macros from atari_NCR5380.h and sun3_NCR5380.h with
the equivalent macros from NCR5380.h.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Acked-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are three implementations of the core NCR5380 driver and three sets
of debugging macro definitions. And all three implementations use the
NCR5380.h header as well.
Two of the definitions of the dprintk macro accept a variable argument list
whereas the third does not. Standardize on the variable argument list.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The change from cmd->target to cmd->device->id was apparently the purpose of
commit a7f251228390e87d86c5e3846f99a455517fdd8e in
kernel/git/tglx/history.git but some instances have been missed.
Also fix the "NDEBUG_LAST_WRITE_SENT" and "NDEBUG_ALL" typo's.
Also fix some format strings (%ul becomes %lu) that caused compiler warnings.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Only the NCR5380_dprint() macro should invoke the NCR5380_print() function.
That's why NCR5380.c only defines the function #if NDEBUG. Use the standard
macro.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
HOSTS_C is always undefined. There is no hosts.c anymore.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
BOARD_NORMAL is completely unused and BOARD_NCR53C400 is used only by
g_NCR5380 internally. Remove the unused definitions.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add an option to only transfer half the data for every n-th command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Taken almost entirely from Nicholas Bellinger's scsi-mq conversion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
In case of error, the bnx2fc_allocate_hash_table() didn't free
all the memory it allocated.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If bnx2fc_allocate_hash_table() for some reasons fails, it is possible that the
hash_tbl_segments or the hash_tbl_pbl pointers are NULL.
In this case bnx2fc_free_hash_table() will panic the system.
this patch also fixes a memory leak, the hash_tbl_segments pointer was never
freed.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
hash_table_size is not used by the bnx2fc_free_hash_table() function.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
And while we're at it fix a magic number
Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Teel <scott.teel@hp.com>
Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Checking for a NULL return from a kzalloc call in hpsa_get_pdisk_of_ioaccel2.
Signed-off-by: Scott Teel <scott.teel@hp.com>
Signed-off-by: Joe Handzik <joseph.t.handzik@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Checking return value for the memory allocattion and freeing it
while exiting the function
Signed-off-by: Viswas G <Viswas.G@pmcs.com>
Signed-off-by: Suresh Thiagarajan <Suresh.Thiagarajan@pmcs.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Jack Wang <xjtuwjp@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Access to tgt->req_vq is strictly serialized by spin_lock
of tgt->tgt_lock, so the ACCESS_ONCE() isn't necessary.
smp_read_barrier_depends() in virtscsi_req_done was introduced
to order reading req_vq and decreasing tgt->reqs, but it isn't
needed now because req_vq is read from
scsi->req_vqs[vq->index - VIRTIO_SCSI_VQ_BASE] instead of
tgt->req_vq, so remove the unnecessary barrier.
Also remove related comment about the barrier.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a memory barrier to ensure the valid bit is read before
any of the cqe payload is read. This fixes an issue seen
on Power where the cqe payload was getting loaded before
the valid bit. When this occurred, we saw an iotag out of
range error when a command completed, but since the iotag
looked invalid the command didn't get completed to scsi core.
Later we hit the command timeout, attempted to abort the command,
then waited for the aborted command to get returned. Since the
adapter already returned the command, we timeout waiting,
and end up escalating EEH all the way to host reset. This
patch fixes this issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Smart <james.smart@emulex.com>
---
lpfc_sli.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Resend of earlier patch - added equivalent changes to sun3 NCR5380 code]
The abort/reset lowlevel return codes had changed with the new
error SCSI handling - update Atari and Sun3 NCR5380 drivers to reflect this.
Change reset handling for Atari to clear queues only, do not attempt
to call done() on each command aborted by the reset. The EH code
should do that for us. Queues _must_ be cleared, otherwise
atari_scsi_bus_reset will not release the ST-DMA lock, deadlocking
further error recovery.
Update the Sun3 NCR5380 driver as well - the Sun3 driver was
derived from the Atari one. Kudos to Finn Thain for the Sun3 part
and cleaning up the header files. After the header cleanup, the
initio.h include (!) can be dropped from sun3_scsi.h now.
Signed-off-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Sam Creasey <sammy@sammy.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: James E.J. Bottomley <JBottomley@parallels.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
The format strings for various printk()s make use of a temporary
variable that is declared 'static'. This is probably not intended,
so fix those.
Found in the PaX patch, written by the PaX Team.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Minor fix for a message in the driver so that it matches the function name.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Bradley Grove <bgrove@attotech.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
bfa_fcb_pbc_vport_create() is called only from bfa_fcs_pbc_vport_init(),
that is called only from bfad_drv_start() with bfad_lock spinlock held.
So the patch replaces GFP_KERNEL with GFP_ATOMIC to avoid
sleeping in atomic spinlock context.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Anil Gurumurthy <anil.gurumurthy@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix following smatch warning:-
drivers/scsi/qla4xxx/ql4_os.c:1752 qla4xxx_get_ep_param() warn: variable dereferenced before check 'qla_ep' (see line 1745)
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Issue:
modprobe qla4xxx is killed by systemd due to timeout.
Solution:
The exporting of sysfs DDBs from qla4xxx_probe_adapter added delay of
approximately 15s due to which system-udevd killed the modprobe of the
driver. Added fix to export the sysfs DDBs from the DPC handler.
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use correct goto statement to free dma memory in case of
failure in function qla4_84xx_config_acb()
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Issue:
System crash while target discovery for ISP40XX
Root cause:
Function qla4xxx_init_rings() is not called for ISP40XX
Fix:
Call function qla4xxx_init_rings() for ISP40XX from
qla4xxx_start_firmware().
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Check for correct return status in function -
qla4_8xxx_minidump_pex_dma_read
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Updated driver with new opcode (RDDFE, RDMDIO and POLLWR) which are
added with latest firmware minidump template
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>