Attached is a patch that should limit a possible recursion that can
lead to a stack overflow like follows:
Kernel stack overflow.
CPU: 3 Not tainted
Process zfcperp0.0.d819
(pid: 13897, task: 000000003e0d8cc8, ksp: 000000003499dbb8)
Krnl PSW : 0404000180000000 000000000030f8b2 (get_device+0x12/0x48)
Krnl GPRS: 00000000135a1980 000000000030f758 000000003ed6c1e8 0000000000000005
0000000000000000 000000000044a780 000000003dbf7000 0000000034e15800
000000003621c048 070000003499c108 000000003499c1a0 000000003ed6c000
0000000040895000 00000000408ab630 000000003499c0a0 000000003499c0a0
Krnl Code: a7 fb ff e8 a7 19 00 00 b9 02 00 22 e3 e0 f0 98 00 24 a7 84
Call Trace:
([<000000004089edc2>] scsi_request_fn+0x13e/0x650 [scsi_mod])
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
...
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089fa9e>] scsi_run_host_queues+0x196/0x230 [scsi_mod]
[<00000000409eba28>] zfcp_erp_thread+0x2638/0x3080 [zfcp]
[<0000000000107462>] kernel_thread_starter+0x6/0xc
[<000000000010745c>] kernel_thread_starter+0x0/0xc
<0>Kernel panic - not syncing: Corrupt kernel stack, can't continue.
This stack overflow occurred during tests on s390 using zfcp.
Recursion depth for this panic was 19.
Usually recursion between blk_run_queue and a request_fn is avoided
using QUEUE_FLAG_REENTER. But this does not help if the scsi stack
tries to flush the starved_list of a scsi_host.
Limit recursion depth when flushing the starved_list
of a scsi_host.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently struct scsi_cmnd has various fields that are used to backup
original data after the corresponding fields have been overridden for
EH commands. This means drivers can easily get at it and misuse it.
Due to the old_ naming this doesn't happen for most of them, but two
that have different names have been used wrong a lot (see previous
patch). Another downside is that they unessecarily bloat the scsi_cmnd
size.
This patch moves them onstack in scsi_send_eh_cmnd to fix those two
issues aswell as allowing future EH fixes like moving the EH command
submissions to use SG lists like everything else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There was a logic fault in scsi_io_completion() where zero transfer
commands that complete successfully were sent to the block layer as
not up to date. This patch removes the if (good_bytes > 0) gate
around the successful completion, since zero transfer commands do have
good_bytes == 0.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If a device gets offlined as a result of the Inquiry sent
during scanning, the following oops can occur. After the
disk gets put into the SDEV_OFFLINE state, the error handler
sends back the failed inquiry, which wakes the thread doing
the scan. This starts a race between the scanning thread
freeing the scsi device and the error handler calling
scsi_run_host_queues to restart the host. Since the disk
is in the SDEV_OFFLINE state, scsi_device_get will still
work, which results in __scsi_iterate_devices getting
a reference to the scsi disk when it shouldn't.
The following execution thread causes the oops:
CPU 0 (scan) CPU 1 (eh)
---------------------------------------------------------
scsi_probe_and_add_lun
....
scsi_eh_offline_sdevs
scsi_eh_flush_done_q
scsi_destroy_sdev
scsi_device_dev_release
scsi_restart_operations
scsi_run_host_queues
__scsi_iterate_devices
get_device
scsi_device_dev_release_usercontext
scsi_run_queue
<---OOPS--->
The patch fixes this by changing the state of the sdev to SDEV_DEL
before doing the final put_device, which should prevent the race
from occurring.
Original oops follows:
Badness in kref_get at lib/kref.c:32
Call Trace:
[C00000002F4476D0] [C00000000000EE20] .show_stack+0x68/0x1b0 (unreliable)
[C00000002F447770] [C00000000037515C] .program_check_exception+0x1cc/0x5a8
[C00000002F447840] [C00000000000446C] program_check_common+0xec/0x100
Exception: 700 at .kref_get+0x10/0x28
LR = .kobject_get+0x20/0x3c
[C00000002F447B30] [C00000002F447BC0] 0xc00000002f447bc0 (unreliable)
[C00000002F447BB0] [C000000000254BDC] .get_device+0x20/0x3c
[C00000002F447C30] [D000000000063188] .scsi_device_get+0x34/0xdc [scsi_mod]
[C00000002F447CC0] [D0000000000633EC] .__scsi_iterate_devices+0x50/0xbc [scsi_mod]
[C00000002F447D60] [D00000000006A910] .scsi_run_host_queues+0x34/0x5c [scsi_mod]
[C00000002F447DF0] [D000000000069054] .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[C00000002F447EE0] [C00000000007B4E0] .kthread+0x128/0x178
[C00000002F447F90] [C000000000025E84] .kernel_thread+0x4c/0x68
Unable to handle kernel paging request for <7>PCI: Enabling device: (0002:41:01.1), cmd 143
data at address 0x000001b8
Faulting instruction address: 0xd0000000000698e4
sym1: <1010-66> rev 0x1 at pci 0002:41:01.1 irq 216
sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi2 : sym-2.2.2
cpu 0x0: Vector: 300 (Data Access) at [c00000002f447a30]
pc: d0000000000698e4: .scsi_run_queue+0x2c/0x218 [scsi_mod]
lr: d00000000006a904: .scsi_run_host_queues+0x28/0x5c [scsi_mod]
sp: c00000002f447cb0
msr: 9000000000009032
dar: 1b8
dsisr: 40000000
current = 0xc0000000045fecd0
paca = 0xc00000000048ee80
pid = 1123, comm = scsi_eh_1
enter ? for help
[c00000002f447d60] d00000000006a904 .scsi_run_host_queues+0x28/0x5c [scsi_mod]
[c00000002f447df0] d000000000069054 .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[c00000002f447ee0] c00000000007b4e0 .kthread+0x128/0x178
[c00000002f447f90] c000000000025e84 .kernel_thread+0x4c/0x68
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have to be able to remove SCSI devices even when they are suspended, so
QUIESCE -> CANCEL must be a legal state transition. This patch (as727)
adds the transition to the state machine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch simplifies "good_bytes" computation in sd_rw_intr().
sd: "good_bytes" computation is always done in terms of the resolution
of the device's medium, since after that it is the number of good bytes
we pass around and other layers/contexts (as opposed ot sd) can translate
that to their own resolution (block layer:512). It also makes
scsi_io_completion() processing more straightforward, eliminating the
3rd argument to the function.
It also fixes a couple of bugs like not checking return value,
using "break" instead of "return;", etc.
I've been running with this patch for some time now on a
test (do-it-all) system.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
With Achim patch the last user (gdth) is switched away from scsi_request
so we an kill it now. Also disables some code in i2o_scsi that was
broken since the sg driver stopped using scsi_requests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove
duplicates of the macro.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The calculation of nr_pages in scsi_req_map_sg() doesn't account for
the fact that the first page could have an offset that pushes the end
of the buffer onto a new page.
Signed-off-by: Bryan Holty <lgeek@frontiernet.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
libata needs to invoke EH without scmd. This patch adds
shost->host_eh_scheduled to implement such behavior.
Currently the only user of this feature is libata and no general
interface is defined. This patch simply adds handling for
host_eh_scheduled where needed and exports scsi_eh_wakeup() to
modules. The rest is upto libata. This is the result of the
following discussion.
http://thread.gmane.org/gmane.linux.scsi/23853/focus=9760
In short, SCSI host is not supposed to know about exceptions unrelated
to specific device or command. Such exceptions should be handled by
transport layer proper. However, the distinction is not essential to
ATA and libata is planning to depart from SCSI, so, for the time
being, libata will be using SCSI EH to handle such exceptions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Some pioneer DVDs are apparently returning odd "not ready" status
codes that the mid-layer doesn't recognise and so passes back to the
user as errors.
This patch overhauls our not-ready handling and adds transparent retries for:
format in progress
rebuild in progress
recalculation in progress
operation in progress
Long write in progress
self test in progress
The Pioneer was actually returning "long write in progress"
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/scsi/scsi_lib.c: In function `scsi_kmap_atomic_sg':
drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 3)
drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 4)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current dc395x driver uses PIO to transfer up to 4 bytes which do not
get transferred by DMA (under unclear circumstances). For this the driver
uses page_address() which is broken on highmem. Apart from this the
actual calculation of the virtual address is wrong (even without highmem).
So, e.g., for reading it reads bytes from the driver to a wrong address
and returns wrong data, I guess, for writing it would just output random
data to the device.
The proper fix, as suggested by many, is to dynamically map data using
kmap_atomic(page, KM_BIO_SRC_IRQ) / kunmap_atomic(virt). The reason why it
has not been done until now, although I've done some preliminary patches
more than a year ago was that nobody interested in fixing this problem was
able to reliably reproduce it. Now it changed - with the help from
Sebastian Frei (CC'ed) I was able to trigger the PIO path. Thus, I was
also able to test and debug it.
There are 4 cases when PIO is used in dc395x - data-in / -out with and
without scatter-gather. I was able to reproduce and test only data-in with
and without SG. So, the data-out path is still untested, but it is also
somewhat simpler than the data-in. Fredrik Roubert (also CC'ed) also had
PIO triggering on his system, and in his case it was data-out without SG.
It would be great if he could test the attached patch on his system, but
even if he cannot, I would still request to apply the patch and just wait
if anybody cries...
Implementation: I put 2 new functions in scsi_lib.c and their declarations
in scsi_cmnd.h. I exported them without _GPL, although, I don't feel
strongly about that - not many drivers are likely to use them. But there
is at least one more - I want to use them in tmscsim.c. Whether these are
the right files for the functions and their declarations - not sure
either. Actually, they are not scsi-specific, so, might go somewhere
around other scattergather magic? They are not platform specific either,
and most SG functions are defined under arch/*/... As these issues were
discussed previously there were some more routines suggested to manipulate
scattergather buffers, I think, some of them were needed around
crypto code... So, might be a common place reasonable, like
lib/scattergather.c? I am open here.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_kill_request() completes requests via normal SCSI completion path
which decrements busy counts; however, requests which get passed to
scsi_kill_request() aren't holding busy counts and scsi_kill_request()
don't increment them before invoking completion path resulting in
incorrect busy counts. Bump up busy counts before invoking completion
path.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order to use the new execute_in_process_context() API, you have to
provide it with the work storage, which I do in SCSI in scsi_device and
scsi_target, but which also means that we can no longer queue up the
target reaps, so instead I moved the target to a state model which
allows target_alloc to detect if we've received a dying target and wait
for it to be gone. Hopefully, this should also solve the target
namespace race.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Regardless what mode page was asked for, Initio INIC-14x0 and
INIC-2430 always return page 6 without mode page headers. Try to
recognise this as a special case in scsi_mode_sense and setting the
mode sense headers accordingly.
Signed-off-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the core SCSI code to use kzalloc rather than kmalloc+memset
where possible.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix up an off by one error in calculating retries for scsi
commands. This bug was discovered when an SG_IO request
was sent to scsi core with retries = 0, causing the overall
timeout check to go off in scsi_softirq_done.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have several points in the SCSI stack (primarily for our device
functions) where we need to guarantee process context, but (given the
place where the last reference was released) we cannot guarantee this.
This API gets around the issue by executing the function directly if
the caller has process context, but scheduling a workqueue to execute
in process context if the caller doesn't have it. Unfortunately, it
requires memory allocation in interrupt context, but it's better than
what we have previously. The true solution will require a bit of
re-engineering, so isn't appropriate for 2.6.16.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When the scsi_execute_async interface was added it ended up reducing
the flexibility of userspace to send arbitrary scsi commands through
sg using SG_IO. The SG_IO interface allows userspace to specify the
CDB length. This is now ignored in scsi_execute_async and it is
guessed using the COMMAND_SIZE macro, which is not always correct,
particularly for vendor specific commands. This patch adds a cmd_len
parameter to the scsi_execute_async interface to allow the caller
to specify the length of the CDB.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
LLDDs should never see REQ_BLOCK_PC requests, we can handle them just
fine in the core code. There is a small behaviour change in that some
check in sr's rw_intr are bypassed, but I consider the old behaviour
a bug.
Mike found this cleanup opportunity and provdided early patches, so all
the credit goes to him, even if I redid the patches from scratch beause
that was easier than forward-porting the old patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch moves the SCSI softirq handling to the block layer version.
There should be no functional changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
All ordered request related stuff delegated to HLD. Midlayer
now doens't deal with ordered setting or prepare_flush
callback. sd.c updated to deal with blk_queue_ordered
setting. Currently, ordered tag isn't used as SCSI midlayer
cannot guarantee request ordering.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
add @uptodate argument to end_that_request_last() and @error
to rq_end_io_fn(). there's no generic way to pass error code
to request completion function, making generic error handling
of non-fs request difficult (rq->errors is driver-specific and
each driver uses it differently). this patch adds @uptodate
to end_that_request_last() and @error to rq_end_io_fn().
for fs requests, this doesn't really matter, so just using the
same uptodate argument used in the last call to
end_that_request_first() should suffice. imho, this can also
help the generic command-carrying request jens is working on.
Signed-off-by: tejun heo <htejun@gmail.com>
Signed-Off-By: Jens Axboe <axboe@suse.de>
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.
Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add kmemcache of scsi io contexts.
In the future when we finalize on where these functions will live
we can add a mempool for it and do a bioset for out REQ_BLOCK_PC
bios. This is needed becuase the dm-multipath handlers will
want to use the scsi_exectute* functions for failover and we cannot
have them and the bio device allocating from the same mempool.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sd does not allow scsi_io_completion to retry commands for
SG_IO requests, and it make sense that it should not happen for st
SG_IO commands too. If for st we hit the bottom of scsi_io_completion
we will probably screw things up pretty bad. This patch returns to the
block layer that the whole command completed and relies on the caller to check
the request errors field. For initialization commands like in sd, this adds
the previous behavior where scsi_io_completion did not process the error.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For tape we need to control the retries. This patch adds a retries
counter on the request for REQ_BLOCK_PC commands originating from
scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
layer SG_IO path continue to use the retires set in the ULD init_command.
(scsi_execute* does not set the gendisk so we do not execute
the init_command in that path).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This reverts commit 1b0997f561, which in
turn reverted 34ea80ec6a (which is thus
re-instated).
Quoth James Bottomley:
"All it's doing is deferring the device_put() from the
scsi_put_command() to after the scsi_run_queue(), which doesn't fix
the sleep while atomic problem of the device release method. In both
cases we still get the semaphore in atomic context problem which is
caused by scsi_reap_target() doing a device_del(), which I assumed
(wrongly) was valid from atomic context."
who also promised to fix scsi_reap_target().
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 34ea80ec6a.
It does a put_device() from softirq context, which is bad since it gets
a semaphore for reading.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem is that scsi_run_queue is called from scsi_next_command()
after doing a scsi_put_command. If the command was the only thing
holding the reference on the scsi_device then the resulting device put
will tear down the block queue. Fix this by taking a reference to the
device and holding it around scsi_run_queue()
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This function has been superceeded by the block request based interfaces
and is unused (except for the uncompilable cpqfc driver).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This should eliminate (at least in the mid layer) to make numeric
assumptions about any of the enumeration variables. As a side effect,
it will also make all the messages consistent and line us up nicely for
the error logging strategy (if it ever shows itself again).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When a request is deferred in scsi_init_io because the sg table could not
be allocated, the associated scsi_cmnd is not released and the request is
not marked with REQ_DONTPREP. When the command is retried, if
scsi_prep_fn decides to kill it then the scsi_cmnd will never be released.
This patch (as573) changes scsi_init_io so that it calls scsi_put_command
before deferring a request.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We fix the oops by enforcing the host state model. There have also
been two extra states added: SHOST_CANCEL_RECOVERY and
SHOST_DEL_RECOVERY so we can take the model through host removal while
the recovery thread is active.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I found one other thing that needs to be fixed. The call to
scsi_release_buffers in scsi_unprep_request causes an oops, because the
sgtable has already been freed in scsi_io_completion. The following patch
is needed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On Wed, 2005-09-14 at 18:06 +1000, Anton Blanchard wrote:
> And in particular it looks like the scsi_unprep_request in
> scsi_queue_insert is causing it. The following patch fixes the boot
> problems on the vscsi machine:
OK, my fault. Your fix is almost correct .. I was going to do this
eventually, honest, because there's no need to unprep and reprep a
command that comes in through scsi_queue_insert().
However, I decided to leave it in to exercise the scsi_unprep_request()
path just to make sure it was working. What's happening, I think, is
that we also use this path for retries. Since we kill and reget the
command each time, the retries decrement is never seen, so we're
retrying forever.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes an issue in scsi command initialization from a request
where sd, sr, st, and scsi_lib all fail to copy the request's
cmd_len to the scsi command's cmd_len field.
Signed-off-by: Timothy Thelin <timothy.thelin@wdc.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
set DID_NO_CONNECT for the BLKPREP_KILL case and correct a few
BLKPREP_DEFER cases that weren't checking for the need to plug the
queue.
Signed-Off-By: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Actually, just one problem and one cosmetic fix:
1) We need to dequeue for the loop and kill case (it seems easiest
simply to dequeue in the scsi_kill_request() routine)
2) There's no real need to drop the queue lock. __scsi_done() is lock
agnostic, so since there's no requirement, let's just leave it in to
avoid any locking issues.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: Alan Stern <stern@rowland.harvard.edu>
This patch (as559b) adds a new routine, scsi_unprep_request, which
gets called every place a request is requeued. (That includes
scsi_queue_insert as well as scsi_requeue_command.) It also changes
scsi_kill_requests to make it call __scsi_done with result equal to
DID_NO_CONNECT << 16. (I'm not sure if it's necessary to call
scsi_init_cmd_errh here; maybe you can check on that.) Finally, the
patch changes the return value from scsi_end_request, to avoid
returning a stale pointer in the case where the request was requeued.
Fortunately the return value is used in only place, and the change
actually simplified it.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If a filesystem, while writing out data, decides that it is good
to issue a cache flush on a SCSI drive (or other 'sd' device), it will
call blkdev_issue_flush which calls ->issue_flush_fn which is
scsi_issue_flush_fn.
This calls sd_issue_flush which calls sd_sync_cache, which calls
scsi_execute_request.
This will (as sshdr != NULL) call
kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL)
If memory is tight, the presence of GFP_KERNEL may cause write
requests to be sent to some filesystem to free up memory, however if
that filesystem is waiting for the issue_flush_fn to complete, you
could get a deadlock.
I wonder if it might be more appropriate to use GFP_NOIO as in the
following patch.
I wonder if it might be even more appropriate to cope better with a
kmalloc failure, especially as in this use, sd_sync_cache only will
use the sense information to print out a more informative error
message.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_io_completion() can be a bit noisy about certain conditions.
Previously this wasn't a problem for internally generated commands,
since they never hit it. However, since we do all SCSI commands via
bios, now they do. user CD testers like magicdev are now getting not
ready messages every time they touch the CD to see if there's anything
in it.
Fix this by making all scsi_execute commands REQ_QUIET and making
scsi_finish_io() not say anything for REQ_QUIET.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The new bio code was incorrectly converted from stack allocated to
kmalloc'd buffer handling. There are two places where it incorrectly
uses sizeof(*sense) to get the size of the sense buffer. This
actually produces one, so no sense data was ever getting back, causing
failure in things like disk spin up.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Older gcc's require variable definitions at the beginning of a block.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This one removes struct scsi_request entirely from sd. In the process,
I noticed we have no callers of scsi_wait_req who don't immediately
normalise the sense, so I updated the API to make it take a struct
scsi_sense_hdr instead of simply a big sense buffer.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This one's slightly more difficult. The transport class uses
REQ_FAILFAST, so another interface (scsi_execute) had to be invented to
take the extra flag. Also, the sense functions are shifted around to
allow spi_execute to place data directly into a struct scsi_sense_hdr.
With this change, there's probably a lot of unnecessary sense buffer
allocation going on which we can fix later.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
After this, we just have some drivers, all the ULDs and the SPI
transport class using scsi_wait_req().
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Original From: Mike Christie <michaelc@cs.wisc.edu>
Add scsi_execute_req() as a replacement for scsi_wait_req()
Fixed up various pieces (added REQ_SPECIAL and caught req use after
free)
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Here's the proof of concept for this one. It converts scsi_wait_req to
do correct REQ_BLOCK_PC submission (and works nicely in my setup).
The final goal should be to eliminate struct scsi_request, but that
can't be done until the character submission paths of sg and st are also
modified.
There's some loss of functionality to this: retries are no longer
controllable (except by setting REQ_FASTFAIL) and the wait_req API needs
to be altered, but it looks very nice.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Migrate the current SCSI host state model to a model like SCSI
device is using.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_init_io calls scsi_alloc_sgtable and then calls blk_rq_map_sg
to initialize the scatterlist structure. blk_rq_map_sg() already
memset the structure for every new segment. That makes the memset
in scsi_alloc_sgtable unnecessary.
Patch to delete the extra memset in scsi_alloc_sgtable. Tested on
a x86_64 machine. Looks stable to me.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The check in
627 BUG_ON(index > SG_MEMPOOL_NR);
with SG_MEMPOOL_NR defined in
32 #define SG_MEMPOOL_NR (sizeof(scsi_sg_pools)/sizeof(struct scsi_host_sg_pool))
was not sufficient.
sgp, set in
629 sgp = scsi_sg_pools + index;
is dereferenced in
630 mempool_free(sgl, sgp->pool);
Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We never look at it except for the old megaraid driver that abuses it
for sending internal commands. That usage can be fixed easily because
those internal commands are single-threaded by a mutex and we can easily
use a completion there.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
scsi_queue_insert() has four callers. Three callers call with
timer disabled and one (the second invocation in
scsi_dispatch_cmd()) calls with timer activated.
scsi_queue_insert() used to always call scsi_delete_timer()
and ignore the return value. This results in race with timer
expiration. Remove scsi_delete_timer() call from
scsi_queue_insert() and make the caller delete timer and check
the return value.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_queue_insert() used to use blk_insert_request() for requeueing
requests. This depends on the unobvious behavior of
blk_insert_request() setting REQ_SPECIAL and REQ_SOFTBARRIER when
requeueing. This patch makes scsi_queue_insert() use
blk_requeue_request(). As REQ_SPECIAL means special requests and
REQ_SOFTBARRIER is automatically handled by blk layer now, no flag
needs to be set.
Note that scsi_queue_insert() now calls scsi_run_queue() itself, and
the prototype of the function is added right above
scsi_queue_insert(). This is temporary, as later requeue path
consolidation patchset removes scsi_queue_insert(). By adding
temporary prototype, we can do away with unnecessarily moving
functions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_requeue_request() used to use blk_insert_request() for requeueing
requests. This depends on the unobvious behavior of
blk_insert_request() setting REQ_SPECIAL and REQ_SOFTBARRIER when
requeueing. This patch makes scsi_queue_insert() use
blk_requeue_request(). As REQ_SPECIAL means special requests and
REQ_SOFTBARRIER is automatically handled by blk layer now, no flag
needs to be set.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
blk_insert_request() has a unobivous feature of requeuing a
request setting REQ_SPECIAL|REQ_SOFTBARRIER. SCSI midlayer
was the only user and as previous patches removed the usage,
remove the feature from blk_insert_request(). Only special
requests should be queued with blk_insert_request(). All
requeueing should go through blk_requeue_request().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_init_io() used to set REQ_SPECIAL when it fails sg
allocation before requeueing the request by returning
BLKPREP_DEFER. REQ_SPECIAL is being updated to mean special
requests. So, remove REQ_SPECIAL setting.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
anymore. All serial_number == serial_number_at_timeout tests
are always true in abort callbacks. Kill the field. Also, as
->pid always equals ->serial_number and ->serial_number
doesn't have any special meaning anymore, update comments
above ->serial_number accordingly. Once we remove all uses of
this field from all lldd's, this field should go.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_cmnd->internal_timeout field doesn't have any meaning
anymore. Kill the field.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current problem seen is that the queue lock is actually in the
SCSI device structure, so when that structure is freed on device
release, we go boom if the queue tries to access the lock again.
The fix here is to move the lock from the scsi_device to the queue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!