Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:
- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.
- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.
- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.
- Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.
- Improvement of the SGP gaps logic from Keith Busch.
- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.
- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.
- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.
- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.
- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.
- Small blk cgroup cleanup from Viresh Kumar"
* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...
Commit bcdb247c6b ("sd: Limit transfer length") clamped the maximum
size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
LIMITS VPD. This had the unfortunate effect of also limiting the maximum
size of non-filesystem requests sent to the device through sg/bsg.
Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
limit directly.
Also update the comment in blk_limits_max_hw_sectors() to clarify that
max_hw_sectors defines the limit for the I/O controller only.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Brian King <brking@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
If device_add() fails then it should return the error code but instead
the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.
Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The new integrity code did not correctly unregister the profile for SD
disks. Call blk_integrity_unregister() when we release a disk.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The current string_get_size() overflows when the device size goes over
2^64 bytes because the string helper routine computes the suffix from
the size in bytes. However, the entirety of Linux thinks in terms of
blocks, not bytes, so this will artificially induce an overflow on very
large devices. Fix this by making the function string_get_size() take
blocks and the block size instead of bytes. This should allow us to
keep working until the current SCSI standard overflows.
Also fix virtio_blk and mmc (both of which were also artificially
multiplying by the block size to pass a byte side to string_get_size()).
The mathematics of this is pretty simple: we're taking a product of
size in blocks (S) and block size (B) and trying to re-express this in
exponential form: S*B = R*N^E (where N, the exponent is either 1000 or
1024) and R < N. Mathematically, S = RS*N^ES and B=RB*N^EB, so if RS*RB
< N it's easy to see that S*B = RS*RB*N^(ES+EB). However, if RS*BS > N,
we can see that this can be re-expressed as RS*BS = R*N (where R =
RS*BS/N < N) so the whole exponent becomes R*N^(ES+EB+1)
[jejb: fix incorrect 32 bit do_div spotted by kbuild test robot <fengguang.wu@intel.com>]
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The device model already takes care of races between ->remove and
->shutdown vs its other methods, and we now take care about locking
them out for ->rescan as well.
This is a partial revert of commit 39b7f1 ("[SCSI] sd: Fix refcounting").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates. There's also an
update to ufs which adds new phy drivers and finally a new logging
infrastructure for SCSI.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60
5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN
U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ
k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D
+Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO
NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw=
=odol
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates.
There's also an update to ufs which adds new phy drivers and finally a
new logging infrastructure for SCSI"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
scsi_logging: return void for dev_printk() functions
scsi: print single-character strings with seq_putc
scsi: merge consecutive seq_puts calls
scsi: replace seq_printf with seq_puts
aha152x: replace seq_printf with seq_puts
advansys: replace seq_printf with seq_puts
scsi: remove SPRINTF macro
sg: remove an unused variable
hpsa: Use local workqueues instead of system workqueues
hpsa: add in P840ar controller model name
hpsa: add in gen9 controller model names
hpsa: detect and report failures changing controller transport modes
hpsa: shorten the wait for the CISS doorbell mode change ack
hpsa: refactor duplicated scan completion code into a new routine
hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
hpsa: do not use function pointers in fast path command submission
hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
hpsa: return failed from device reset/abort handlers
hpsa: check for ctlr lockup after command allocation in main io path
...
The following patch fixes an issue observed with 4k sector disks
where the max_hw_sectors attribute was getting set too large in
sd_revalidate_disk. Since sdkp->max_xfer_blocks is in units
of SCSI logical blocks and queue_max_hw_sectors is in units of
512 byte blocks, on a 4k sector disk, every time we went through
sd_revalidate_disk, we were taking the current value of
queue_max_hw_sectors and increasing it by a factor of 8. Fix
this by only shifting sdkp->max_xfer_blocks.
Cc: stable@vger.kernel.org
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Convert sense buffer logging to use the per-cpu buffer to avoid line
breakup.
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7985090aa0 changed the discard heuristics to give preference to the
WRITE SAME commands that (unlike UNMAP) guarantee deterministic results.
Ming Lei discovered that QEMU SCSI's WRITE SAME implementation
internally relied on limits that were only communicated for the UNMAP
case. And therefore discard commands backed by WRITE SAME would fail.
Tweak the heuristics so we still pick UNMAP in the LBPRZ=0 case and only
prefer the WRITE SAME variants if the device has the LBPRZ flag set.
Reported-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The driver core driver structure has grown an owner field and now
requires it to be set for all modular drivers. Set it up for
all scsi_driver instances and get rid of the now superflous
scsi_driver owner field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Shane M Seymour <shane.seymour@hp.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
There is no reason for ULDs to pass in a flag on how to allocate the S/G
lists. While we don't need GFP_ATOMIC for the blk-mq case because we
don't hold locks, that decision can be made way down the chain without
having to pass a pointless gfp_mask argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
The T10 SBC UNMAP command does not provide any hard guarantees that
blocks will return zeroes on a subsequent READ. This is due to the fact
that the device server is free to silently ignore all or parts of the
request.
The only way to ensure that a block consistently returns zeroes after
being unmapped is to use WRITE SAME with the UNMAP bit set. Should the
device be unable to unmap one or more blocks described by the command it
is required to manually write zeroes to them.
Until now we have preferred UNMAP over the WRITE SAME variants to
accommodate thinly provisioned devices that predated the final SBC-3
spec. This patch changes the heuristic so that we favor WRITE SAME(16)
or (10) over UNMAP if these commands are marked as supported in the
Logical Block Provisioning VPD page.
The patch also disables discard_zeroes_data for devices operating in
UNMAP mode.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No need to verify the passthrough ioctls, the real handler will
take care of that. Also make sure not to block for resets on
O_NONBLOCK fds.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.
Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl. All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.
Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Open-code scsi_print_result in sd.c, and cleanup logging to
not print duplicate informations.
Also remove the call to scsi_show_result() in ufshcd.c
to be consistent with other callers of scsi_execute().
With that we can remove scsi_show_result in constants.c
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We should be using the standard dev_printk() variants for
sense code printing.
[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
sd_done() was calling scsi_print_sense() for a sense code
of 'NO_SENSE'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull block layer driver update from Jens Axboe:
"This is the block driver pull request for 3.18. Not a lot in there
this round, and nothing earth shattering.
- A round of drbd fixes from the linbit team, and an improvement in
asender performance.
- Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and
hd from Michael Opdenacker.
- Disable entropy collection from flash devices by default, from Mike
Snitzer.
- A small collection of xen blkfront/back fixes from Roger Pau Monné
and Vitaly Kuznetsov"
* 'for-3.18/drivers' of git://git.kernel.dk/linux-block:
block: disable entropy contributions for nonrot devices
xen, blkfront: factor out flush-related checks from do_blkif_request()
xen-blkback: fix leak on grant map error path
xen/blkback: unmap all persistent grants when frontend gets disconnected
rsxx: Remove deprecated IRQF_DISABLED
block: hd: remove deprecated IRQF_DISABLED
drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks
drbd: compute the end before rb_insert_augmented()
drbd: Add missing newline in resync progress display in /proc/drbd
drbd: reduce lock contention in drbd_worker
drbd: Improve asender performance
drbd: Get rid of the WORK_PENDING macro
drbd: Get rid of the __no_warn and __cond_lock macros
drbd: Avoid inconsistent locking warning
drbd: Remove superfluous newline from "resync_extents" debugfs entry.
drbd: Use consistent names for all the bi_end_io callbacks
drbd: Use better variable names
Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.
- blk-mq timeout updates and fixes from Christoph.
- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.
- blk integrity updates and fixes from Martin and Gu Zheng.
- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.
- Improve the error printing, from Rob Elliott.
- Backing device improvements and cleanups from Tejun.
- Fixup of a misplaced rq_complete() tracepoint from Hannes.
- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.
- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.
- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.
- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"
* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.
Historically, all block devices have automatically made entropy
contributions. But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
- On SSD disks, the completion times aren't as random as they
are for rotational drives. So it's questionable whether they
should contribute to the random pool in the first place.
- Calling add_disk_randomness() has a lot of overhead.
There are more reliable sources for randomness than non-rotational block
devices. From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.
Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
SCSI Well-known logical units generally don't have any scsi driver
associated with it which means no one will call scsi_autopm_put_device()
on these wlun scsi devices and this would result in keeping the
corresponding scsi device always active (hence LLD can't be suspended as
well). Same exact problem can be seen for other scsi device representing
normal logical unit whose driver is yet to be loaded. This patch fixes
the above problem with this approach:
- make the scsi_autopm_put_device call at the end of scsi_sysfs_add_sdev
to make it balance out the get earlier in the function.
- let drivers do paired get/put calls in their probe methods.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The SYNCHRONIZE_CACHE command is a medium write command and hence can
fail when the device is write protected. Avoid sending such commands by
making sure that write-cache-enable is disabled even though the device
claim to support it.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit ID: 7e660100d8 added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.
Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.
Reported-by: KY Srinivasan <kys@microsoft.com>
Tested-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.
Just leave this entirely to the CDROM driver to make everyones life
simpler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Factor out a function to initialize regular read/write commands and leave
sd_init_command as a simple dispatcher to the different prepare routines.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Currently cmd->allowed is initialized from rq->retries for discard
commands, but retries is always 0 for non-BLOCK_PC requests. Set it
to the standard number of retries instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Currently cmd->allowed is initialized from rq->retries for write same
commands, but retries is always 0 for non-BLOCK_PC requests. Set it
to the standard number of retries instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Simplify handling of discard requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Simplify handling of write same requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Simplify handling of flush requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.
Also rename scsi_setup_flush_cmnd to sd_setup_flush_cmnd for consistency.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
The data direction fiel in the SCSI command is derived only from the block
request structure. Move setting it up into common code instead of
duplicating it in the ULDs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
We should call the device handler prep_fn for all TYPE_FS requests,
not just simple read/write calls that are handled by the disk driver.
Restructure the common I/O code to call the prep_fn handler and zero
out the CDB, and just leave the call to scsi_init_io to the ULDs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Until now the per-command transfer length has exclusively been gated by
the max_sectors parameter in the scsi_host template. Given that the size
of this parameter has been bumped to an unsigned int we have to be
careful not to exceed the target device's capabilities.
If the if the device specifies a Maximum Transfer Length in the Block
Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
devices that have use_16_for_rw set and 0xffff for the rest. We then
combine the chosen disk limit with max_sectors in the host template. The
smaller of the two will be used to set the max_hw_sectors queue limit.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In init_sd function, if kmem_cache_create or mempool_create_slab_pools
calls fail, the error will not be correclty reported because
class_register previously set the value of err to 0.
Signed-off-by: Clément Calmels <clement.calmels@free.fr>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This is a fix for commit 39c60a0948
"sd: fix array cache flushing bug causing performance problems"
We must notify the block layer via q->flush_flags after a temporary change
of the cache_type to write through. Without this, a SYNCHRONIZE CACHE
command will still be generated. This patch factors out a helper that
can be called from sd_revalidate_disk and cache_type_store.
Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This change makes the scsi disk driver handle the requests whose
transfer length is greater than 0xffff with READ_16 or WRITE_16.
However, this is a preparation for extending the data type of
max_sectors in struct Scsi_Host and scsi_host_template. So, it is
impossible to happen this condition for now, because SCSI low-level
drivers can not specify max_sectors greater than 0xffff due to the
data type limitation.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs. This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Büsch <m@bues.ch>
Tested-by: Michael Büsch <m@bues.ch>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc,
be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status
of a long neglected driver for older hardware. In addition there are a lot of
minor fixes and cleanups and some more updates to make scsi mq ready.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJTlcwiAAoJEDeqqVYsXL0M5qsIALzVPLd4yxA16zCiaPQUeIV5
mfYmwISFlN+qW3AcUeSH4D13YgegCjEBfqaDMWvIkgouxLy/7jpxtChutq3MCzUE
cDT1B9+ZrzoqBISRNHEh/gx5F1MOF2VPuqG2pe0J90wyRCNzJscB6PbtWMAo86CA
2eu7wq3K9FXxCC1qY0PzwBLXHqUcgk5GWiK9CM/k4W0NiTVeNmwPeh5i91IQnBHx
E2l7NAXgNLyCf5tyeswvZ4pW0T3hlaswNmBB4qC8oJm4U6UqMN+tk4ML63Pz7uPe
4mlHG0uI8Vbdi13iv1EDUZ9Vo8iqVrzP2UAhakgP9poKSGqE4d/MD0EKNGQB2Es=
=cBY8
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This patch consists of the usual driver updates (qla2xxx, qla4xxx,
lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to
maintained status of a long neglected driver for older hardware. In
addition there are a lot of minor fixes and cleanups and some more
updates to make scsi mq ready"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits)
include/scsi/osd_protocol.h: remove unnecessary __constant
mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed"
mptfusion: fix msgContext in mptctl_hp_hostinfo
acornscsi: remove linked command support
scsi/NCR5380: dprintk macro
fusion: Remove use of DEF_SCSI_QCMD
fusion: Add free msg frames to the head, not tail of list
mpt2sas: Add free smids to the head, not tail of list
mpt2sas: Remove use of DEF_SCSI_QCMD
mpt2sas: Remove uses of serial_number
mpt3sas: Remove use of DEF_SCSI_QCMD
mpt3sas: Remove uses of serial_number
qla2xxx: Use kmemdup instead of kmalloc + memcpy
qla4xxx: Use kmemdup instead of kmalloc + memcpy
qla2xxx: fix incorrect debug printk
be2iscsi: Bump the driver version
be2iscsi: Fix processing cqe for cxn whose endpoint is freed
be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
be2iscsi: Fix memory corruption in MBX path
...
There is an error with the medium access timeout feature of the sd driver. The
sdkp->medium_access_timed_out value is reset to zero in sd_done() in the wrong
place. Currently it is reset to zero only when a command returns sense data.
This can result in cases where the medium access check falsely triggers from
timed out commands which are hours or days apart.
For example, an I/O command times out and is aborted. It then retries and
succeeds. But with no sense data generated and returned, the
medium_access_timed_out value is not reset. If no sd command returns sense
data, then the next command to time out (however far in time from the first
failure) will trigger the medium access timeout and put the device offline.
The resetting of sdkp->medium_access_timed_out should occur before the check
for sense data.
To reproduce using scsi_debug, use SCSI_DEBUG_OPT_TIMEOUT or
SCSI_DEBUG_OPT_MAC_TIMEOUT to force an I/O command to timeout. Then, remove
the opt value so the I/O will succeed on retry. Perform more I/O as desired.
Finally, repeat the process to make a new I/O command time out. Without the
patch, the device will be marked offline even though many I/O commands have
succeeded between the 2 instances of timed out commands.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of letting the ULD play games with the prep_fn move back to
the model of a central prep_fn with a callback to the ULD. This
already cleans up and shortens the code by itself, and will be required
to properly support blk-mq in the SCSI midlayer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Store the pointer to the page there, so we can always safely
reference it from end_io context where ->bio may have been
cleared.
Signed-off-by: Jens Axboe <axboe@fb.com>
This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.
Convert old style drivers to just use bio_data().
For the discard payload use case, just reference the page
in the bio.
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull async SCSI resume support from Dan Williams:
"Allow disks and other devices to resume in parallel.
This provides a tangible speed up for a non-esoteric use case (laptop
resume):
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach"
* 'async-scsi-resume' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci:
scsi: async sd resume
async_schedule() sd resume work to allow disks and other devices to
resume in parallel.
This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command. For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).
It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations. Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev. The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.
We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL. With this in place the type of resume
operation is encoded in the async function identifier.
There is a concern that async resume could trigger PSU overload. In the
enterprise, storage enclosures enforce staggered spin-up regardless of
what the kernel does making async scanning safe by default. Outside of
that context a user can disable asynchronous scanning via a kernel
command line or CONFIG_SCSI_SCAN_ASYNC. Honor that setting when
deciding whether to do resume asynchronously.
Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
[alan: bug fix and clean up suggestion]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>