We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.
The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.
If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.
Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.
[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The virtio-scsi HBA is the basis of an alternative storage stack
for QEMU-based virtual machines (including KVM). Compared to
virtio-blk it is more scalable, because it supports many LUNs
on a single PCI slot), more powerful (it more easily supports
passthrough of host devices to the guest) and more easily
extensible (new SCSI features implemented by QEMU should not
require updating the driver in the guest).
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Some other older controllers also do have problems to perform a kdump.
Adding controllers to this list means that the driver will signal
this non-ability via a resettable flag correctly.
The unsupported list was created after a consultation with HP.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Permanent target failures are non-retryable and should be classified as
TARGET_ERROR; otherwise dm-multipath will retry an IO request that will
always fail at the target.
A SCSI command that fails with ILLEGAL_REQUEST sense and Additional
sense 0x20, 0x21, 0x24 or 0x26 represents a permanent TARGET_ERROR.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The provisioning_mode parameter in sysfs did not get updated in the
SD_LBP_DISABLE case. Make sure the provisioning mode is always set
correctly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The error reported up the stack for a discard failure did not clearly
indicate that the command was processed and subsequently failed by the
target device.
Return -EREMOTEIO so multipathing does not classify this condition as a
path failure.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The __get_free_pages can fail, so the return value should be checked.
Spotted thanks to Stanislaw.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added ping support for network connection diagnostics.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added ping support for iscsi adapter, application can use this
interface for diagnostic network connection.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added support to post kernel host event to application using
netlink interface.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added support to post kernel host event to application using
netlink interface.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
On ROM lock acquiring timeout failure, driver spews lot of warning
messages in a for loop, remove the unwanted warning message to reduce
kernel messages clutter.
Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
In some configurations user may not have boot targets configured.
In such cases the debug messages printed out by driver look like
some kind of failure happening. However this could be a valid
case, so modified the messages to appear as warning messages
versus failure messages.
Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
qla4xxx_verify_boot_idx can falsely report a DDB to be boot target
if ha->pri_ddb_idx and ha->sec_ddb_idx are not initialized correctly.
What this could cause is if there is DDB entry in FLash at index 0, then
qla4xxx_verify_boot_idx would return wrong result as ha->pri_ddb_idx is not
set correctly. Fixed the qla4xxx_get_boot_info to set the ha->pri_ddb_idx and
ha->sec_ddb_idx correctly.
Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Fix the un-necessary wait for completion of a sendtarget on an
invalid DDB entry. The state of an invalid DDB entry is 0 (unassigned)
This will also avoid the delays during system boot.
Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This code initially added for FW debugging, we don't need this
code now so taking it out.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
While we wait for GPN_FT response, if the ctlr link goes down, the stack
generates a completion for GPN_FT with error FC_EXCH_CLOSED, and reports a
discovery error. Discovery is not retried in this case, and rightly so.
However, the 'pending' flag stays set, which does not allow subsequent
discovery to succeed as GPN_FT will never be issued. Fix it by clearing the
pending flag when the discovery fails due to GPN_FT failure.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Adding and removing the host into the zone causes this panic.
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffa0491707>] fc_exch_recv+0xc57/0xe70 [libfc]
Call Trace:
[<ffffffffa050e04b>] bnx2fc_l2_rcv_thread+0x37b/0x430 [bnx2fc]
[<ffffffffa050dcd0>] ? bnx2fc_l2_rcv_thread+0x0/0x430 [bnx2fc]
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
During fc_exch_reset, the active exchanges are aborted and the exch is deleted.
As part of processing ABTS response, due to 'ep' being NULL, any access to ep in
fc_exch_recv_bls() causes this panic. Fixed to access 'ep' only if non-NULL.
Reviewed-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The reference counting was necessary on these instances
because it was possible for NPIV ports to be destroyed
after the N_Port. A previous patch ensures that all NPIV
ports are destroyed before the N_Port making the need to
track references on the interface unnecessary.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Currently all port deletion is routed though the FCoE
workqueue (fcoe_wq). When fc_remove_host is called on
an N_Port (for example, from fcoe_destroy) the vports
are queued into a FC Transport workqueue. fc_remove_host
flushes that queue and each vport is passed to fcoe's
fcoe_vport_destroy, which simply queues the associated
fcoe_ports for later deletion. This queue cannot be
flushed within the N_Ports destroy path because of
circular locking issues. The result is that the NPIV
ports are destroyed after the N_Port, which is reverse
of how they are created.
This quirk causes fcoe to keep references on the
fcoe_interface shared by each of these ports (N_Port
and NPIV). Changing the ordering such that NPIV ports
are destroyed before the N_Port will allow us to remove
reference counting on the fcoe_interface instances.
This patch simply allows fcoe_vport_destory to destroy
NPIV ports without deferring them to a workqueue context.
This ensures that when fc_remove_host is called the
NPIV ports will be destroyed first before the N_Port and
allows reference counting on the fcoe's fcoe_interface
to be remove in a later patch.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The label implies that it should be called when
there is 'nomod.' I read that to mean that the
module reference 'get' failed. However, it's only
called when the module reference 'get' succeeded.
I think it makes more sense to name the label,
'out_putmod' since it should be called when we
need to 'put' the module reference taken in the
routine before returning.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Allow FDMI attributes to be exposed via the fc_host
class object for the fcoe driver.
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This is more of a debug statement. As a KERN_ERR we generate
log entries anytime any netdev goes up or down, so when booting
there are notification log entries for all system interfaces
including 'lo'. This is too much. Let's just log when necessary.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This allows the controller to do WRITE_INSERT and READ_STRIP for SAS
disks that support protection information. SAS disks must be formatted
with protection information to use this feature via sg_format.
sg3_utils-1.32 -- sg_format version 1.19 20110730
sg_format usage:
sg_format --format --verbose --pinfo /dev/sda
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
A device logout sent in the delete path of a fcport would clear the
port handle binding inside the firmware. This could lead to queued
work items for the fcport, if any, getting incorrect results. This
patch fixes the issue by checking for device name changes after a
call to get port database.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Add a field to the qla_hw_data struct to allow us to set the maximum number of
fabric devices on a per adapter basis based on ISP type.
[jejb: fix up missing rval = QLA_SUCCESS to prevent uninit var warning]
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Rather than continuously allocating and freeing swl within the discovery
process, simply pre-allocate it the first time that it's needed, cache it
through the rest of the lifecycle of the driver and free it at module unload.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Don't use default 30 second mailbox-command timeout for these
serial requests, instead, limit the TMO to the standard 2*RATOV
plus some fudge-factor.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
During command failure/non-recognition, the upper-layer
FC-transport expects the drivers to set
job-reply->reply_payload_rcv_len. Do this in a consistent manner
to avoid duplication.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
During rport tear-down, make sure we do an implicit LOGO of the fcport in our
firmware to try to clear any residual commands associated with that fcport.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Make sure that all calls to ha->isp_ops->fabric_login() check the
return value for failure.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When thermal temperature initially fails, return a blank string to the
sysfs interface. This fixes the initial display of 0.00 followed by
subsequent display of blank line; the initial 0.00 should have not
displayed for cards that do not support thermal temperature.
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Instead of processing each RSCN individually, use only the name server results
from the switch to tell the existance of a given fcport.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>