This patch improves handling of TASK ABORTED status by Linux SCSI
mid-layer. Currently, command returned with this status considered
failed and returned to upper layers. It leads to additional error
recovery load on file systems and block layer, which sometimes can
cause undesired side effects, like I/O errors and file systems
corruptions. See http://lkml.org/lkml/2008/11/1/38, for instance.
From other side, TASK ABORTED status is returned by SCSI target if the
corresponding command was aborted by another initiator and the target
has TAS bit set in the control mode page. So, in the majority of cases
commands with TASK ABORTED status should be simply retried. In other
cases, maybe_retry path will not retry if no retries are allowed.
This patch implement suggestion by James Bottomley from
http://marc.info/?l=linux-scsi&m=121932916906009&w=2.
Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
...and the list of recent breakage goes on and on, this time
it's 242f9dcb8b (block: unify request timeout handling)
which broke it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
scsi_eh_try_stu() was still using the timeout parameter in the device
which is now not set (i.e. zero filled) meaning that it waited no time
at all for the start unit command to complete (leading the routine to
conclude failure every time). This lead to a 2.6.27 regression:
http://bugzilla.kernel.org/show_bug.cgi?id=12120
Where firewire devices that were non spec compliant wouldn't spin up.
Fix this by using the block queue timeout value instead.
Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Drivers want to be able to return DID_TRANSPORT_DISRUPTED and
have it do the right thing for commands like tape and passthrouh
as far as retries go. The LLDs previously used DID_BUS_BUSY or DID_ERROR
which followed the cmd->retries limit, but DID_TRANSPORT_DISRUPTED
was skipping that check so it could have caused a problem with tape
commands.
This patch has DID_TRANSPORT_DISRUPTED check the cmd->retries/cmd->allowed.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There's a target reset bug.
This loop:
for (id = 0; id <= shost->max_id; id++) {
Never terminates if shost->max_id is set to ~0, like aic94xx does.
It's also pretty inefficient since you mostly have compact target
numbers, but the max_id can be very high. The best way would be to
sort the recovery list by target id and skip them if they're equal,
but even a worst case O(N^2) traversal is probably OK here, so fix it
by finding the next highest target number (assuming n+1) and
terminating when there isn't one.
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This checks the errors the scsi-ml determined were retryable
and returns if we should fast fail it based on the request
fail fast flags.
Without the patch, drivers like lpfc, qla2xxx and fcoe would return
DID_ERROR for what it determines is a temporary communication problem.
There is no loss of connectivity at that time and the driver thinks
that it would be fast to retry at the driver level. SCSI-ml will however
sees fast fail on the request and DID_ERROR and will fast fail the io.
This will then cause dm-multipath to fail the path and possibley switch
target controllers when we should be retrying at the scsi layer.
We also were fast failing device errors to dm multiapth when
unless the scsi_dh modules think otherwis we want to retry at
the scsi layer because multipath can only retry the IO like scsi
should have done. multipath is a little dumber though because it
does not what the error was for and assumes that it should fail
the paths.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently, if there is a transport problem the iscsi drivers will return
outstanding commands (commands being exeucted by the driver/fw/hw) with
DID_BUS_BUSY and block the session so no new commands can be queued.
Commands that are caught between the failure handling and blocking are
failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
When the recovery_timeout fires, the iscsi drivers then fail IO with
DID_NO_CONNECT.
For fcp, some drivers will fail some outstanding IO (disk but possibly not
tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
and hits the scsi_error.c failfast check, block the rport, and commands
caught in the race are failed with DID_IMM_RETRY. Other drivers, may
hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
to be called.
The following patches attempt to unify what upper layers will see drivers
like multipath can make a good guess. This relies on drivers being
hooked into their transport class.
This first patch just defines two new host byte errors so drivers can
return the same value for when a rport/session is blocked and for
when the fast_io_fail_tmo fires.
The idea is that if the LLD/class detects a problem and is going to block
a rport/session, then if the LLD wants or must return the command to scsi-ml,
then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
the IO into the same scsi queue it came from, until the fast io fail timer
fires and the class decides what to do.
When using multipath and the fast_io_fail_tmo fires then the class
can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
the equivlent in iscsi if we ever implement more advanced recovery methods.
A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
the normal failfast path, so drivers do not have fully be ported to
work better. The point of the patches is that upper layers will
not see a failure that could be recovered from while the rport/session is
blocked until fast_io_fail_tmo/recovery_timeout fires.
V3
Remove some comments.
V2
Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
DID_TRANSPORT_DISRUPTED.
V1
initial patch.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.
Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Change scsi_check_sense HARDWARE_ERROR check to return ADD_TO_MLQUEUE
if device->retry_hwerror is set to allow retries to occur without
restriction of blk_noretry_request check.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[jejb: fixed up a ton of missed conversions.
All of you are on notice this has happened, driver trees will now
need to be rebased]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: SCSI List <linux-scsi@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch (as1116) fixes a bug in scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd(). These routines are supposed to save any
values they change and restore them later, but someone forgot to
save & restore scmd->underflow.
This fixes part of the problem reported in Bugzilla #9638.
[jejb: fix up rejections around DIF/DIX]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If initiator or target reject the I/O due to DIF errors there is no
point in retrying.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Controllers that support DMA of protection information must be told
explicitly how to handle the I/O. The controller has no knowledge of
the protection capabilities of the target device so this information
must be passed in the scsi_cmnd.
- The protection operation tells the HBA whether to generate, strip or
verify protection information.
- The protection type tells the HBA which layout the target is
formatted with. This is necessary because the controller must be
able to correctly interpret the included protection information in
order to verify it.
- When a scsi_cmnd is reused for error handling the protection
operation must be cleared and saved while error handling is in
progress.
- prot_op and prot_type are placed in an existing hole in scsi_cmnd
and don't cause the structure to grow.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some of the storage devices (that can be accessed through multiple paths),
do need some special handling for
1. Activating the passive path of the storage access.
2. Decode and handle the special sense codes returned by the devices.
3. Handle the I/Os being sent to the passive path, especially
during the device probe time.
when accessed through multiple paths.
As of today this special device handling is done at the dm-multipath
layer using dm-handlers. That works well for (1); for (2) to be handled
at dm layer, scsi sense information need to be exported from SCSI to dm-layer,
which is not very attractive; (3) cannot be done at all at the dm layer.
Device handler has been moved to SCSI mainly to handle (2) and (3) properly.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic94xx: fix section mismatch
[SCSI] u14-34f: Fix 32bit only problem
[SCSI] dpt_i2o: sysfs code
[SCSI] dpt_i2o: 64 bit support
[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to dma_alloc_coherent
[SCSI] dpt_i2o: use standard __init / __exit code
[SCSI] megaraid_sas: fix suspend/resume sections
[SCSI] aacraid: Add Power Management support
[SCSI] aacraid: Fix jbod operations scan issues
[SCSI] aacraid: Fix warning about macro side-effects
[SCSI] add support for variable length extended commands
[SCSI] Let scsi_cmnd->cmnd use request->cmd buffer
[SCSI] bsg: add large command support
[SCSI] aacraid: Fix down_interruptible() to check the return value correctly
[SCSI] megaraid_sas; Update the Version and Changelog
[SCSI] ibmvscsi: Handle non SCSI error status
[SCSI] bug fix for free list handling
[SCSI] ipr: Rename ipr's state scsi host attribute to prevent collisions
[SCSI] megaraid_mbox: fix Dell CERC firmware problem
- struct scsi_cmnd had a 16 bytes command buffer of its own.
This is an unnecessary duplication and copy of request's
cmd. It is probably left overs from the time that scsi_cmnd
could function without a request attached. So clean that up.
- Once above is done, few places, apart from scsi-ml, needed
adjustments due to changing the data type of scsi_cmnd->cmnd.
- Lots of drivers still use MAX_COMMAND_SIZE. So I have left
that #define but equate it to BLK_MAX_CDB. The way I see it
and is reflected in the patch below is.
MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB
as per the SCSI standard and is not related
to the implementation.
BLK_MAX_CDB. - The allocated space at the request level
- I have audit all ISA drivers and made sure none use ->cmnd in a DMA
Operation. Same audit was done by Andi Kleen.
(*)fixed-length here means commands that their size can be determined
by their opcode and the CDB does not carry a length specifier, (unlike
the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly
true and the SCSI standard also defines extended commands and
vendor specific commands that can be bigger than 16 bytes. The kernel
will support these using the same infrastructure used for VARLEN CDB's.
So in effect MAX_COMMAND_SIZE means the maximum size command
scsi-ml supports without specifying a cmd_len by ULD's
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Any path needs to call it to initialize the request.
This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds scsi_build_sense_buffer, a simple helper function to build
sense data in a buffer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The problem is that serveral drivers are sending a target reset from the
device reset handler, and if we have multiple devices a target reset gets
sent for each device when only one would be sufficient. And if we do a target
reset it affects all the commands on the target so the device reset handler
code only cleaning up one devices's commands makes programming the driver a
little more difficult than it should be.
This patch adds a target reset handler, which drivers can use to send
a target reset. If successful it cleans up the commands for a devices
accessed through that starget.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
several LLDs. It's a preparation for the future changes to remove
sense_buffer array in scsi_cmnd structure.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- Change title to remove "Mid-Layer" since the doc is about all of the
SCSI layers.
- Use "SCSI" instead of "scsi" in docbook text.
- Use "*/" to end kernel-doc notation blocks.
- A few other minor typo fixes.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add Documentation/DocBook/scsi_midlayer.tmpl, add to Makefile, and update
lots of kerneldoc comments in drivers/scsi/*.
Updated with comments from Stefan Richter, Stephen M. Cameron,
James Bottomley and Randy Dunlap.
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This reverts commit ac40532ef0, which gets
us back the original cleanup of 6f5391c283.
It turns out that the bug that was triggered by that commit was
apparently not actually triggered by that commit at all, and just the
testing conditions had changed enough to make it appear to be due to it.
The real problem seems to have been found by Peter Osterlund:
"pktcdvd sets it [block device size] when opening the /dev/pktcdvd
device, but when the drive is later opened as /dev/scd0, there is
nothing that sets it back. (Btw, 40944 is possible if the disk is a
CDRW that was formatted with "cdrwtool -m 10236".)
The problem is that pktcdvd opens the cd device in non-blocking mode
when pktsetup is run, and doesn't close it again until pktsetup -d is
run. The effect is that if you meanwhile open the cd device,
blkdev.c:do_open() doesn't call bd_set_size() because
bdev->bd_openers is non-zero."
In particular, to repeat the bug (regardless of whether commit
6f5391c283 is applied or not):
" 1. Start with an empty drive.
2. pktsetup 0 /dev/scd0
3. Insert a CD containing an isofs filesystem.
4. mount /dev/pktcdvd/0 /mnt/tmp
5. umount /mnt/tmp
6. Press the eject button.
7. Insert a DVD containing a non-writable filesystem.
8. mount /dev/scd0 /mnt/tmp
9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null
10. If the DVD contains data beyond the physical size of a CD, you
get I/O errors in the terminal, and dmesg reports lots of
"attempt to access beyond end of device" errors."
which in turn is because the nested open after the media change won't
cause the size to be set properly (because the original open still holds
the block device, and we only do the bd_set_size() when we don't have
other people holding the device open).
The proper fix for that is probably to just do something like
bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9;
in fs/block_dev.c:do_open() even for the cases where we're not the
original opener (but *not* call bd_set_size(), since that will also
change the block size of the device).
Cc: Peter Osterlund <petero2@telia.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 6f5391c283 ("[SCSI]
Get rid of scsi_cmnd->done") that was supposed to be a cleanup commit,
but apparently it causes regressions:
Bug 9370 - v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
http://bugzilla.kernel.org/show_bug.cgi?id=9370
this patch should be reintroduced in a more split-up form to make
testing of it easier.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spotted by Paul Jackson <pj@sgi.com>
The error handler rework moved the scatterlist into a globally exposed
structure in scsi_eh.h; unfortunately, the scatterlist include needs
to move from scsi_error.c to scsi_eh.h to allow this to compile
universally.
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- Drivers/transports that want to send a synchronous REQUEST_SENSE command
as part of their .queuecommand sequence, have 2 new API's that facilitate
in doing so and abstract them from scsi-ml internals.
void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,
struct scsi_eh_save *sesci, unsigned char *cmnd,
int cmnd_size, int sense_bytes)
Will hijack a command and prepare it for request sense if needed.
And will save any later needed info into a scsi_eh_save structure.
void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd,
struct scsi_eh_save *sesci);
Will undo any changes done to a command by above function. Making
it ready for completion.
- Re-factor scsi_send_eh_cmnd() to use above APIs
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- regrouped variables for easier reviewing of next patch
- Support of cmnd==NULL in call to scsi_send_eh_cmnd()
- In the @sense_bytes case set transfer size to the minimum
size of sense_buffer and passed @sense_bytes. cmnd[4] is
set accordingly.
- REQUEST_SENSE is set into cmnd[0] so if @sense_bytes is
not Zero passed @cmnd should be NULL.
- Also save/restore resid of failed command.
- Adjust caller
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The ULD ->done callback moves into the scsi_driver. By moving the call
to scsi_io_completion() from scsi_blk_pc_done() to scsi_finish_command(),
we can eliminate the latter entirely. By returning 'good_bytes' from
the ->done callback (rather than invoking scsi_io_completion()), we can
stop exporting scsi_io_completion().
Also move the prototypes from sd.h to sd.c as they're all internal anyway.
Rename sd_rw_intr to sd_done and rw_intr to sr_done.
Inspired-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pid field is a duplicate of the serial_number field and has been
scheduled for removal for a long time. A few drivers were still using
it, so just change them to use serial_number instead.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current code prints:
scsi 13:0:4:0: scsi: Device offlined - not ready after error recovery
which is repetitively redundant. This patch changes that message to:
scsi 6:0:6:0: Device offlined - not ready after error recovery
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Every file should #include the headers containing the prototypes for its
global functions (in this case for scsi_schedule_eh()).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the block submission path correctly bounces, we can simply
use the command sense_buffer to send to retrieve sense information and
junk the unnecessary page allocation.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use the sysfs configurable timeout when issuing a START_UNIT
command from the scsi error handler. This is needed for devices which
take longer than thirty seconds to respond to the start
unit. The problem was observed when sending a start unit
to a disk array device in an ipr RAID adapter, which results
in the adapter firmware sending potentially multiple commands
to physical devices as a result of this command, which ended
up timing out sometimes. This patch does not change the default
value used for this command.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, the scsi error handler will issue a START_UNIT
command if the drive indicates it needs its motor started
and the allow_restart flag is set in the scsi_device. If,
after the scsi error handler invokes a host adapter reset
due to error recovery, a device is in a unit attention
state AND also needs a START_UNIT, that device will be placed
offline. The disk array devices on an ipr RAID adapter
will do exactly this when in a dual initiator configuration.
This patch adds a single retry to the EH initiated
START_UNIT.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Patch modified and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes a regression caused by commit:
2dc611de5a
The sense buffer code in scsi_send_eh_cmnd was changed to use
alloc_page() and a scatter list, but the sense data copy was not
updated to match so what we actually get in the sense buffer is total
grabage starting with the kernel address of the struct page we got.
Basically the stack frame of scsi_send_eh_cmd() is what ends up
in the sense buffer.
Depending upon how pointers look on a given platform, you can
end up getting sr_ioctl.c errors when you mount a cdrom. If
the CDROM gives a check condition for GPCMD_GET_CONFIGURATION issued
by drivers/cdrom/cdrom.c:cdrom_mmc_profile(), sr_ioctl will
spit out this error message in sr_do_ioctl() with the way pointers
are on sparc64:
default:
printk(KERN_ERR "%s: CDROM (ioctl) error, command: ", cd->cdi.name);
__scsi_print_command(cgc->cmd);
scsi_print_sense_hdr("sr", &sshdr);
err = -EIO;
This is the error Tom Callaway reported in:
http://marc.info/?l=linux-sparc&m=117407453208101&w=2
Anyways, fix this by using page_address(sgl.page) which is OK
because we know this is low-mem due to GFP_ATOMIC.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Christoph Hellwig <hch@lst.de>
It looks like megaraid_sas at least needs this to throttle its commands
as they begin to time out. The code keeps the existing transport
template use of eh_timed_out (and allows the transport to override the
host if they both have this callback).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If an EH command times out today, the LLDD's abort handler
will be called to abort the command. It is assumed that this
completes successfully, which can result in the command getting
completed later resulting in an oops. Improve the current
implementation by escalating all the way to host reset if
necessary in order to clean up the EH command.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
1) If the device reports an uncorrectable MEDIUM ERROR, such
as SK MEDIUM ERROR, ASC UNRECOVERED READ ERR, AMNF DATA
FIELD or RECORD NOT FOUND, then: In scsi_check_sense()
return SUCCESS so as to not retry -- the error is
uncorrectable -- this speeds up total processing time.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Extracted the MEDIUM ERROR piece and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export a couple of functions from scsi_error that are needed to handle
failed SCSI commands from the SAS EH.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
make exports GPL and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_send_eh_cmnd is the last user of non-sg commands currently.
This patch switches it to a one-element SG list. Also updates the
kerneldoc comment for scsi_send_eh_cmnd to reflect reality while we're
at it.
Test on my mptsas card, but this should get testing with as many
drivers as possible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The callers of scsi_send_eh_cmnd are setting the cmnd buffer,
and then scsi_send_eh_cmnd is copying that updated buffer to
the old_cmnd variable. Then after the command runs, we end up
copying that old_cmnd var which has the new cmnd to the scsi
command buffer. When this command gets recent, all types of fun
things happen like getting TUR or START_STOP commands with
data and scatterlists.
This patch made against scsi-rc-fixes, has the callers of
scsi_send_eh_cmnd pass in the command so scsi_send_eh_cmnd
can do the right thing. This should go into 2.6.18 since this
fixes a regression added when we removed some of the scsi_cmnd
fields and replaced them with local variables.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently struct scsi_cmnd has various fields that are used to backup
original data after the corresponding fields have been overridden for
EH commands. This means drivers can easily get at it and misuse it.
Due to the old_ naming this doesn't happen for most of them, but two
that have different names have been used wrong a lot (see previous
patch). Another downside is that they unessecarily bloat the scsi_cmnd
size.
This patch moves them onstack in scsi_send_eh_cmnd to fix those two
issues aswell as allowing future EH fixes like moving the EH command
submissions to use SG lists like everything else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The scsi midlayer portion of the patch
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The RQ_SCSI_* flags are a vestiage of a long past history. The EH code
still sets them but we never make use of that information. The other
users is pluto.c which never had a chance to work but needs to be kept
compiling to keep Davem happy, so copy over the definition there.
We could probably get rid of RQ_ACTIVE/RQ_INACTIVE aswell with some
work, there's only two more or less bogus looking uses in ubd and scsi.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
With Achim patch the last user (gdth) is switched away from scsi_request
so we an kill it now. Also disables some code in i2o_scsi that was
broken since the sg driver stopped using scsi_requests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
libata implemented a feature to schedule EH without an associated EH
by manipulating shost->host_eh_scheduled in ata_scsi_schedule_eh()
directly. Move this function to scsi_error.c and rename it to
scsi_schedule_eh(). It is now an exported API for SCSI transports and
exported via new header file drivers/scsi/scsi_transport_api.h
This patch also de-export scsi_eh_wakeup() which was exported
specifically for ata_scsi_schedule_eh().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata needs to invoke EH without scmd. This patch adds
shost->host_eh_scheduled to implement such behavior.
Currently the only user of this feature is libata and no general
interface is defined. This patch simply adds handling for
host_eh_scheduled where needed and exports scsi_eh_wakeup() to
modules. The rest is upto libata. This is the result of the
following discussion.
http://thread.gmane.org/gmane.linux.scsi/23853/focus=9760
In short, SCSI host is not supposed to know about exceptions unrelated
to specific device or command. Such exceptions should be handled by
transport layer proper. However, the distinction is not essential to
ATA and libata is planning to depart from SCSI, so, for the time
being, libata will be using SCSI EH to handle such exceptions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This moves the eh_timed_out functionality from the scsi_host_template
to the transport_template. Given that this is now a transport function,
the EH_RESET_TIMER case no longer caps the timer reschedulings. The
transport guarantees that this is not an infinite condition.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix up an off by one error in calculating retries for scsi
commands. This bug was discovered when an SG_IO request
was sent to scsi core with retries = 0, causing the overall
timeout check to go off in scsi_softirq_done.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export two SCSI EH command handling functions. To be used by libata EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
When the scsi_execute_async interface was added it ended up reducing
the flexibility of userspace to send arbitrary scsi commands through
sg using SG_IO. The SG_IO interface allows userspace to specify the
CDB length. This is now ignored in scsi_execute_async and it is
guessed using the COMMAND_SIZE macro, which is not always correct,
particularly for vendor specific commands. This patch adds a cmd_len
parameter to the scsi_execute_async interface to allow the caller
to specify the length of the CDB.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This merge is pretty extensive. The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The eh_action semaphore in scsi_eh_send_command is cleared after a
command timeout. The command is subsequently aborted and the abort
will try to call scsi_done() on it. Unfortunately, the scsi_eh_done()
routine unconditinally completes the semaphore (which is now null).
Fix this race by makiong the scsi_eh_done() routine check that the
semaphore is non null before completing it (mirroring the ordinary
command done/timeout logic).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_send_eh_cmnd currently uses a semaphore and an overload of eh_timer
to either get a completion for a command for a timeout.
Switch to using a completion and wait_for_completion_timeout to simply
the code and not having to deal with the races ourselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
now that the abuse in qla2xxx is gone this field can be remove.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
adjust comments, remove a useless cast and remove a write-only variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Wrap a highly common idiom. Makes the code easier to read, helps pave
the way for sdev->{id,channel} removal, and adds a token that can easily
by grepped-for in the future.
There are a couple sdev_id() and scmd_printk() updates thrown in as well.
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This should eliminate (at least in the mid layer) to make numeric
assumptions about any of the enumeration variables. As a side effect,
it will also make all the messages consistent and line us up nicely for
the error logging strategy (if it ever shows itself again).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Found in the -rt patch set. The scsi_error thread likely will be in the
TASK_INTERRUPTIBLE state upon exit. This patch fixes this bug.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Alan Stern <stern@rowland.harvard.edu>
This patch (as561) fixes the error handler's thread-exit code. The
kthread_stop call won't wake the thread from a down_interruptible, so
the patch gets rid of the semaphore and simply does
set_current_state(TASK_INTERRUPTIBLE);
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Modified to simplify the termination loop and correct the sleep condition.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We fix the oops by enforcing the host state model. There have also
been two extra states added: SHOST_CANCEL_RECOVERY and
SHOST_DEL_RECOVERY so we can take the model through host removal while
the recovery thread is active.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The problem lies in the way the error handler uses TEST UNIT READY to
tell whether error recovery has succeeded. The scsi_eh_tur function
gives up after one round of retrying; after that it decides that more
error recovery is needed.
However TUR is liable to report sense data indicating a retry is needed
when in fact error recovery has succeeded. A typical example might be
SK=2, ASC=4, ASCQ=1 (Logical unit in process of becoming ready). The mere
fact that we were able to get a sensible reply to the TUR should indicate
that the device is working well enough to stop error recovery.
I ran across a case back in January where this happened. A CD-ROM drive
timed out the INQUIRY command, and a device reset fixed the blockage.
But then the drive kept responding with 2/4/1 -- because it was spinning
up I suppose -- until the error handler gave up and placed it offline.
If the initial INQUIRY had received the 2/4/1 instead, everything would
have worked okay. It doesn't seem reasonable for things to fail just
because the error handler had started running.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This one's slightly more difficult. The transport class uses
REQ_FAILFAST, so another interface (scsi_execute) had to be invented to
take the extra flag. Also, the sense functions are shifted around to
allow spi_execute to place data directly into a struct scsi_sense_hdr.
With this change, there's probably a lot of unnecessary sense buffer
allocation going on which we can fix later.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Migrate the current SCSI host state model to a model like SCSI
device is using.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We never look at it except for the old megaraid driver that abuses it
for sending internal commands. That usage can be fixed easily because
those internal commands are single-threaded by a mutex and we can easily
use a completion there.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Save and restore the scmd->result, so that timed out commands do not
return the result of the TEST UNIT READY or the start/stop commands. Code
is already in place to save and restore the result for the request sense
case.
The previous version of this patch erroneously removed the "if" check,
instead add a comment as to why the "if" is needed.
Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
'if' tests which check if eh_action isn't NULL in both
functions are always true. Remove the redundant if's as it
can give wrong impressions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_reset_provider() calls scsi_delete_timer() on exit which
isn't necessary. Remove it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Somebody forgot that | has higher priority than ?:. As the result,
allocation is done with bogus flags - instead of GFP_ATOMIC + possibly
GFP_DMA we always get GFP_DMA and no GFP_ATOMIC.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes scsi_send_eh_cmnd() use sdev and shost instead of
referencing them through scmd-> everytime.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have a DID_IMM_RETRY to require a retry at once, but we could do with
a DID_REQUEUE to instruct the mid-layer to treat this command in the
same manner as QUEUE_FULL or BUSY (i.e. halt the submission until
another command returns ... or the queue pressure builds if there are no
outstanding commands).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
anymore. All serial_number == serial_number_at_timeout tests
are always true in abort callbacks. Kill the field. Also, as
->pid always equals ->serial_number and ->serial_number
doesn't have any special meaning anymore, update comments
above ->serial_number accordingly. Once we remove all uses of
this field from all lldd's, this field should go.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_cmnd->internal_timeout field doesn't have any meaning
anymore. Kill the field.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!