Convert to proper kernel-doc format.
Some have extra blank lines (not allowed immed. after the function name)
or need blank lines (after all parameters). Function summary must be only
one line.
Colon (":") in a function description does weird things (causes kernel-doc
to think that it's a new section head sadly).
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce the notion of cooperating processes (those that submit requests
close to one another), and use these statistics to make better choices about
whether or not to do anticipatory waiting.
Help and analysis from Seetharami Seelam <seelam@cs.utep.edu>
Performance testing from Seelam:
I set up my system and executed a couple of tests that I used for OLS. I
tested with AS, cooperative process patch merged in -mm tree (which I called
Nick, below) and the cooperative patch with modifications to as_update_iohist
(which I called Seelam).
I used a dual-processor (2.28GHz Pentium 4 Xeon) system, with 1 GB main memory
and 1 MB L2 cache, running Linux 2.6.9. Only a single processor is used for
the experiments. I used 7.2K RPM Maxtor 10GB drive configured with ext2 file
system.
Experiment 1 (ex1) consists of reading one Linux source trees using
find . -type f -exec cat '{}' ';' > /dev/null.
Experiment 2 (ex2) consists of reading two disjoint Linux source trees
using
find . -type f -exec cat '{}' ';' > /dev/null.
Experiment 3 (ex3) consists of streaming read of a 2GB file in the background
and 1 instance of the chunk reads in Experiment 1.
Timings for reading the Linux source are shown below:
AS Nick Seelam
ex1: 0m25.813s 0m27.859s 0m27.640s
ex2: 1m11.468s 1m13.918s 1m5.869s
ex3: 81m44.352s 10m38.572s 6m47.994s
The difference between the numbers in Experiment 3 must be due to the code in
as_update_iohist. (akpm: that's not part of this patch. So this patch is
"Nick").
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds SCSI error handling code to the SCSI portion
of the cciss driver.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/block/ is right now a mix of core and driver parts. Lets move
the core parts to a new top level directory. Al will move the fs/
related block parts to block/ next.
Signed-off-by: Jens Axboe <axboe@suse.de>
cfq's add_req_fn callback may invoke q->request_fn directly and
depending on low-level driver used and timing, a queued request may be
finished & deallocated before add_req_fn callback returns. So,
__elv_add_request must not access rq after it's passed to add_req_fn
callback.
This patch moves rq_mergeable test above add_req_fn(). This may
result in q->last_merge pointing to REQ_NOMERGE request if add_req_fn
callback sets it but as RQ_NOMERGE is checked again when blk layer
actually tries to merge requests, this does not cause any problem.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).
Signed-off-by: Jens Axboe <axboe@suse.de>
Right now we do it at queueing time, which works alright for reads
(since they are usually sync), but not for async writes since we can
queue io a lot faster than we can complete it. This makes the vmstat
output look extremely bursty.
Signed-off-by: Jens Axboe <axboe@suse.de>
Tejun Heo notes:
"I'm currently debugging this. The problem is that we are using the
generic dispatch queue directly in the noop sched and merging is NOT
allowed on dispatch queues but generic handling of last_merge tries
to merge requests. I'm still trying to verify this, so I'll be back
with results soon."
In the meantime, disable merging for noop by setting REQ_NOMERGE in
elevator_noop_add_request().
Eventually, we should add a noop_list and do the dispatching like in the
other io schedulers. Merging is still beneficial for noop (and it has
always done it).
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't clear ->elevator_data on exit, if we are switching queues we are
overwriting the data of the new io scheduler.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the requested I/O scheduler is already in place, elevator_switch simply
leaves the queue alone, and returns. However, it forgets to call
elevator_put, so
'echo [current_sched] > /sys/block/[dev]/queue/scheduler'
will leak a reference, causing the current_sched module to be permanently
pinned in memory.
Signed-off-by: Nate Diller <nate@namesys.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a kconfig submenu to select the default I/O scheduler, in case
anticipatory is not compiled in or another default is preferred. Also,
since no-op is always available, we should use it whenever the selected
default is not.
Signed-off-by: Nate Diller <nate@namesys.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last patch from Jens Axboe for drivers/block/paride/pf.c introduced
pf_end_request() which sets pf_req to NULL.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're trying to get rid of as much as possible tasklist walks, or at
least moving them to core code. This patch falls into the second
category.
Instead of walking the tasklist in cfq-iosched move that into
elv_unregister. The added benefit is that with this change the as
ioscheduler might be might unloadable more easily aswell.
The new code uses read_lock instead of read_lock_irq because the
tasklist_lock only needs irq disabling for writers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert everyone who uses platform_bus_type to include
linux/platform_device.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
as-iosched deals with aliased requests differently from other ioscheds.
It links together aliased requests using rq->queuelist instead of
spilling alises to dispatch queue like other ioscheds do. Requests
linked in this way cannot be merged.
Unfortunately, generic q->last_merge handling patch didn't take this
into account and q->last_merge could be set to an aliased request
resulting in Badness, corrupt list and eventually panic.
This explicitly marks aliased requests to be unmergeable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When building on a 64-bit platform, gcc produces a warning
"cast of a pointer to an integer of a different size".
The scatterlist.offset on the LHS is unsigned int, so I used
that originally.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/block/ub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A "coldplug + udevstart" can be simple like this:
for i in /sys/block/*/*/uevent; do echo 1 > $i; done
for i in /sys/class/*/*/uevent; do echo 1 > $i; done
for i in /sys/bus/*/devices/*/uevent; do echo 1 > $i; done
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use get_unaligned for possibly-unaligned multi-byte accesses to the
ATA device identify response buffer.
- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch kills max_back_kb handling from elv_dispatch_sort() and
kills max_back_kb field from struct request_queue.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge. This
and the following patches move last_merge handling into
generic elevator code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch updates all four ioscheds to use generic dispatch
queue. There's one behavior change in as-iosched.
* In as-iosched, when force dispatching
(ELEVATOR_INSERT_BACK), batch_data_dir is reset to REQ_SYNC
and changed_batch and new_batch are cleared to zero. This
prevernts AS from doing incorrect update_write_batch after
the forced dispatched requests are finished.
* In cfq-iosched, cfqd->rq_in_driver currently counts the
number of activated (removed) requests to determine
whether queue-kicking is needed and cfq_max_depth has been
reached. With generic dispatch queue, I think counting
the number of dispatched requests would be more appropriate.
* cfq_max_depth can be lowered to 1 again.
Original from Tejun Heo, modified version applied.
Signed-off-by: Jens Axboe <axboe@suse.de>
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch removes try_module_get race in elevator_find.
try_module_get should always be called with the spinlock protecting
what the module init/cleanup routines register/unregister to held. In
the case of elevators, we should be holding elv_list to avoid it going
away between spin_unlock_irq and try_module_get.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Just set the name field directly in the device_driver structure
contained in the vio_driver struct.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
* Newer hardware doesn't corrupt data when the queue depth
is greater than one. Rather than force the user to recompile
with a greater queue depth, make it a module parameter.
* update copyright date
* add MODULE_VERSION()
* trim trailing whitespace
* move CARM_SG_BOUNDARY to a separate enum, since its unsigned long
* bump to version 1.0
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should not be warning about commands that we allow, even if they are
unknown. So move the if-root-allow check up a notch.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This code appears to be more trouble than it's worth, considering that
no normal users reload drivers. So, we comment it for now. It is not
removed outright for the benefit of hackers (that is, myself).
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a few problems with ub and cleans up a couple of things:
- Bump UB_MAX_REQ_SG, this allows to burn CDs
- Drop initialization of urb.transfer_flags,
now that URB_UNLINK_ASYNC is gone
- Add forgotten processing of stalls at GetMaxLUN
- Remove a few more P3-tagged printks whose time has come
- Correct comment about ZIP-100
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/block/ub.c | 53 +++++++++++++++++++++++++++--------------------------
1 file changed, 27 insertions(+), 26 deletions(-)
This function was removed a while ago, but crept in again via a recent
scsi merge.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes the problem Bjorn reported. The busy_initializing flag
should have cleared before going into the for loop.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here's the patch from
http://bugzilla.kernel.org/show_bug.cgi?id=4853
It is a feeble attempt at fixing the request handling in pf, it is totally
foobar right now.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add WRITE_LONG_2 as write safe commands, which which allows normal users to
make a c1-, c2- and cu-scan (so called cxscan) with readcd on
cxscan-capable cd/dvd-writers
Signed-off-by: Jens Axboe <axboe@suse.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some redundant BUG_ON() statements in pktcdvd and move one run-time
check to compile-time.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use kcalloc and kzalloc in pktcdvd.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the /proc statistics, only count writes that upper layers have requested.
Don't count additional writes created inside the packet driver to satisfy the
requirement to only write full packets.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the "theory of operation" description.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the packet writing driver, if the drive reports a packet size larger than
the driver can handle, bail out safely instead of triggering a BUG_ON.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add SCSI host and device info not elsewhere available to /proc/scsi/cciss/*
Namely, connect cciss device instance with scsi host number, and give scsi
host number, bus, target, lun, devicetype, and 8-byte cciss LUNID for each
tapedrive/medium changer attached to a controller
For instance:
# cat /proc/scsi/cciss/2
cciss0: SCSI host: 2
c2b0t0l0 01 0x0000000000000001
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for "One Button Disaster Recovery" devices to the
cciss driver. (OBDR devices are tape drives which can pretend to be cd-rom
devices temporarily. Once booted the device can be reverted to a tape drive
and data recovery operations can be automatically begun.)
This is an enhancement request by a vendor/partner working on One Button
Disaster Recovery.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The CCISS driver seems to loose track of DMA mappings created by it's
fill_cmd() routine. Neither callers of this routine are extracting the DMA
address created in order to do the unmap.
Instead, they simply try to unmap 0x0. It's easy to see this problem on an
x86_64 system when using the "swiotlb=force" boot option. In this case, the
driver is leaking resources of the swiotlb and not causing a sync of the
bounce buffer.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a bug in cciss_remove_one. A set of braces was missing for
the if statement causing an Oops on driver unload.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes the way we complete commands. In the old method when we
got a completion we searched our command list from the top until we find it.
This method uses a tag associated with each command (not SCSI command tagging)
to index us directly to the completed command. This helps performance.
Signed-off-by: Don Brace <dab@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes a couple of functions dealing with configuration and
replaces them with new functions. This implementation fixes some bugs
associated with the ACUXE. It also allows a logical volume to be removed from
the middle without deleting all volumes behind it.
If a user has 5 logical volumes and decides he wants to reconfigure volume
number 3, he can now do that without removing volumes 4 & 5 first. This code
has been tested in our labs against all application software.
Signed-off-by: Chase Maupin <chase.maupin@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a flag called busy_initializing. If there are multiple
controllers in a server AND the HP agents are running it's possible the agents
may try to poll a card that is still initializing if the driver is removed and
then added again.
Signed-off-by: Don Brace <dab@hp.com>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds new PCI and subsystem ID's that finally made the spec. It
also include a name change for one controller. I know there's a lot of
duplicat names but the fw folks wanted this for the different implementations.
Even though the same ASIC is used it may be embedded on some platforms,
standup card in others, and a mezzanine in other servers.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The reference count fix merged isn't fully bug free. It doesn't leak
now, but instead it crashes due to looking at freed memory. So for now,
lets reverse the change and I'll fix it for real next week.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use msleep() or msleep_interruptible() [as appropriate] instead of
schedule_timeout() to gurantee the task delays as expected. As a result
changed the units of the timeout variable from jiffies to msecs.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch does a full cleanup of 'NULL checks before vfree', and a partial
cleanup of calls to kfree for all of drivers/ - the kfree bit is partial in
that I only did the files that also had vfree calls in them. The patch
also gets rid of some redundant (void *) casts of pointers being passed to
[vk]free, and a some tiny whitespace corrections also crept in.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The soon to be released smartmontools 5.34 uses the
READ DEFECT DATA command on SCSI disks. A disk that
has defect list entries (or worse, an increasing number
of them) is at risk.
Currently the first invocation of smartctl causes this:
scsi: unknown opcode 0x37
message to appear the console and in the log.
The READ DEFECT DATA SCSI command does not change
the state of a disk. Its opcode (0x37) is valid for
SBC devices (e.g. disks) and SMC-2 devices (media
changers) where it is called INITIALIZE STATUS ELEMENT
WITH RANGE and again doesn't change the external state
of the device.
Changelog:
- mark SCSI opcode 0x37 (READ DEFECT DATA) as
safe_for_read
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the number of supported AoE slot addresses per AoE shelf
address to 16.
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Clean up timer initialization by introducing DEFINE_TIMER a'la
DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been
been in the -RT tree for some time.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
That ?: trick gives us the creeps.
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
29 July 2005, Cambridge, MA:
This afternoon Alan Stern submitted a patch to remove the URB_ASYNC_UNLINK
flag from the Linux kernel. Mr. Stern explained, "This flag is a relic
from an earlier, less-well-designed system. For over a year it hasn't
been used for anything other than printing warning messages."
An anonymous spokesman for the Linux kernel development community
commented, "This is exactly the sort of thing we see happening all the
time. As the kernel evolves, support for old techniques and old code can
be jettisoned and replaced by newer, better approaches. Proprietary
operating systems do not have the freedom or flexibility to change so
quickly."
Mr. Stern, a staff member at Harvard University's Rowland Institute who
works on Linux only as a hobby, noted that the patch (labelled as548) did
not update two files, keyspan.c and option.c, in the USB drivers' "serial"
subdirectory. "Those files need more extensive changes," he remarked.
"They examine the status field of several URBs at times when they're not
supposed to. That will need to be fixed before the URB_ASYNC_UNLINK flag
is removed."
Greg Kroah-Hartman, the kernel maintainer responsible for overseeing all
of Linux's USB drivers, did not respond to our inquiries or return our
calls. His only comment was "Applied, thanks."
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Back out Axboe-style quasi-S/G and replace it with one command and
repeated URBs. This is similar to what usb-storage does, only instead
of a few URBs allocated together, one URB is reused.
Jens's idea was very nice, but it collapsed when I had to support
packet commads for CD burning. I cannot issue two or more packet
commands where application expected only one.
However, burning does not work completely yet. The cdrecord starts,
recognizes the device, then aborts without writing a TOC.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When Al Viro saw the ub.c, he observed that it was a proof positive of
Linus not reading patches anymore: names like fo_ob_ar_ba_2 used to
cause serious fireworks. In my defence, any good scheme can be pushed
to the realm of absurd if pushed far enough.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Evidently, Yani Ioannou's display is wider than mine.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This the quasi-S/G patch for ub as suggested by Jens Axboe at OLS and
implemented that night before 4 a.m. Surprisingly, it worked right away...
Alas, I had to skip some OLS partying, but it was for the good cause.
Now the speed of ub is quite acceptable even on partitions with small
block size.
The ub does not really support S/G. Instead, it just tells the block
layer that it does. Then, most of the time, the block layer merges
requests and passes single-segmnent requests down to ub; everything
works as before. Very rarely ub gets an unmerged S/G request. In such
case, it issues several commands to the device.
I added a small array of counters to monitor the merging (sg_stat).
This may be dropped later.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sanitized and fixed floppy dependencies: split the messy dependencies for
BLK_DEV_FD by introducing a new symbol (ARCH_MAY_HAVE_PC_FDC), making
BLK_DEV_FD depend on that one and taking declarations of ARCH_MAY_HAVE_PC_FDC
to arch/*/Kconfig. While we are at it, fixed several obvious cases when
BLK_DEV_FD should have been excluded (architectures lacking asm/floppy.h
are *not* going to have floppy.c compile, let alone work).
If you can come up with better name for that ("this architecture might
have working PC-compatible floppy disk controller"), you are more than
welcome - just s/ARCH_MAY_HAVE_PC_FDC/your_prefered_name/g in the patch
below...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts kcalloc(1, ...) calls to use the new kzalloc() function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I ran across a memory leak related to the cfq scheduler. The cfq
init function increments the refcnt of the associated request_queue.
This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
only calls the elevator exit function when its refcnt goes to zero, the
request_q never gets cleaned up. It didn't look like other io schedulers were
incrementing this refcnt, so I removed the refcnt increment and it fixed the
memory leak for me.
To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
attribute to scan "- - -" repeatedly on a scsi host and watch the memory
vanish.
Signed-off-by: Brian King <brking@us.ibm.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Per-queue parameters should be updated using the appropriate blk_queue_xxx
functions.
Signed-off-by: Stuart McLaren <stuart.mclaren@hp.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to clean up missing overflow check in get_blkdev_list. The printf
which adds the "Block Devices" string in /proc/devices can overflow the
presented page if get_chrdev_list eats up the entire 4k space.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cleanup of deadline_dispatch_requests():
- replace drq selection with hopefully clearer while semantically the
same construct: take write request, if there is any, otherwise take read
one, or NULL if none exist.
- kill unused other_dir.
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fiddle with coding style a bit.
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently only a device 'fdX' shows up in sysfs; the other possible
device for this drive (like fd0h1440 etc) must be guessed from there.
This patch corrects the floppy driver to create a platform device for
each floppy found; each platform device also has an attribute 'cmos'
which represents the cmos type for this drive. From this attribute the
other possible device types can be computed.
From: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch goes through the current users of the crypto layer and sets
CRYPTO_TFM_REQ_MAY_SLEEP at crypto_alloc_tfm() where all crypto operations
are performed in process context.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding just wants the device before the skb_bond()
decapsulation occurs, so simply pass that original
device into packet_type->func() as an argument.
It remains to be seen whether we can use this same
exact thing to get rid of skb->input_dev as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
One critical fix and two minor fixes for 2.6.13-rc7:
- Max depth must currently be 2 to allow barriers to function on SCSI
- Prefer sync request over async in choosing the next request
- Never allow async request to preempt or disturb the "anticipation" for
a single cfq process context. This is as-designed, the code right now
is buggy in that area.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move initramfs options from Device Drivers | Block Drivers to General Setup
This is a more natural place for this option.
Furthermore separate out intramfs options to usr/Kconfig
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
My patch in commit fa72b903f7 incorrectly
removed blk_queue_tag->real_max_depth.
The original resize implementation was incorrect in the following
points.
* actual allocation size of tag_index was shorter than real_max_size,
but assumed to be of the same size, possibly causing memory access
beyond the allocated area.
* bits in tag_map between max_deptn and real_max_depth were
initialized to 1's, making the tags permanently reserved.
In an attempt to fix above two bugs, I had removed allocation optimization
in init_tag_map and real_max_size. Tag map/index were allocated and freed
immediately during resize.
Unfortunately, I wasn't considering that tag map/index can be resized
dynamically with tags beyond new_depth active. This led to accessing
freed area after shrinking tags and led to the following bug reporting
thread on linux-scsi.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112319898111885&w=2
To fix the problem, I've revived real_max_depth without allocation
optimization in init_tag_map, and Andrew Vasquez confirmed that the
problem was fixed. As Jens is not going to be available for a week, he
asked me to make sure that this patch reaches you.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112325778530886&w=2
Also, a comment was added to make sure that real_max_size is needed for
dynamic shrinking.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CFQ will currently stall when using write barriers and the default
max_depth setting of 1, since we artificially need a depth of 2 when
pre-pending the first flush. So never deny the barrier request going to
the device.
This is a regression since 2.6.12, it was found in SUSE testing.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds per disk queue functionality to cciss. Sometime back I
submitted a patch but it looks like only part of what I needed. In the 2.6
kernel if we have more than one logical volume the driver will Oops during
rmmod. It seems all of the queues actually point back to the same queue.
So after deleting the first volume you hit a null pointer on the second
one.
This has been tested in our labs. There is no difference in performance,
it just fixes the Oops.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a microcode lockup in my CD-ROM adapters when a blank CD
is inserted. However, do not try to burn CDs yet! I'm pretty sure that
trying it will end in coasters.
- Fix a few cases where we were unable to resynchronize with replies
for previous commands. The main thing is to keep reading replies
in case of a stall. This is done with the new state CLRRS.
- Since I am forgetting the basic state machine already, document it.
- Move counter increments in the looping path in its own function.
- Fix a harmless buglet in case CSW read fails to submit: do not
override state.
- Implement the Alan Stern's idea for adaptive signature checking.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AS is doing internal msec<->jiffies conversions twice, so the sysfs tunables
which represent time are coming out wrong. The switch from HZ=1000 exposed
this.
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_request is now expected to be holding on to queue_lock, with interrupts
disabled, when it returns NULL; but one path forgot that, causing all kinds
of nastiness under swap load - badness backtraces, strange failures, BUGs.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_io_context needlessly turned off interrupts and checked for racing io
context creations. Both of which aren't needed, because the io context can
only be created while in process context of the current process.
Also, split the function in 2. A light version, current_io_context does not
elevate the reference count specifically, but can be used when in process
context, because the process holds a reference itself.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change around locking a bit for a result of 1-2 less spin lock unlock pairs in
request submission paths.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case where the request is not able to be merged by the elevator, don't
retake the lock and retry the merge mechanism after allocating a new request.
Instead assume that the chance of a merge remains slim, and now that we've
done most of the work allocating a request we may as well just go with it.
Also be rid of the GFP_ATOMIC allocation: we've got working mempools for the
block layer now, so let's save atomic memory for things like networking.
Lastly, in get_request_wait, do an initial get_request call before going into
the waitqueue. This is reported to help efficiency.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we cap request allocations at q->nr_requests, but we allow a
batching io context to allocate up to 32 more (default setting). This
can flood the queue with request allocations, with only a few batching
processes. The real fix would be to limit the number of batchers, but
as that isn't currently tracked, I suggest we just cap the maximum
number of allocated requests to eg 50% over the limit.
This was observed in real life, users typically see this as vmstat bo
numbers going off the wall with seconds of no queueing afterwards.
Behaviour this bursty is not beneficial.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/block/cfq-iosched.c: In function 'cfq_put_queue':
drivers/block/cfq-iosched.c:303: sorry, unimplemented: inlining failed in call to 'cfq_pending_requests': function body not available
drivers/block/cfq-iosched.c:1080: sorry, unimplemented: called from here
drivers/block/cfq-iosched.c: In function '__cfq_may_queue':
drivers/block/cfq-iosched.c:1955: warning: the address of 'cfq_cfqq_must_alloc_slice', will always evaluate as 'true'
make[1]: *** [drivers/block/cfq-iosched.o] Error 1
make: *** [drivers/block/cfq-iosched.o] Error 2
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fulfills a promise I made to Christoph sometime back. I am
removing the partition info from the CCISS_GETLUNINFO ioctl as I was informed
my "driver had no damn business reading that structure." ;)
The application folks are to use /proc or /sys for partition info from now on.
I am only aware of a few apps that use this ioctl and I'm not sure they ever
used the partition info.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is pass 2 of my patch to add pci domain info to an existing ioctl. This
time I insert the domain between dev_fn and board_id as Willy suggested and
change the var to unsigned short to ease Christoph's concerns. Although I
thought unsigned int was the correct var type for this. I also thought it
didn't matter where I inserted it in the structure.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a PCI ID I got wrong before. It also adds support for
another new SAS controller due out this summer. I didn't have a marketing
name prior to my last submission. Also modifies the copyright date range.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes CONFIG_PMAC_PBOOK (PowerBook support). This is now
split into CONFIG_PMAC_MEDIABAY for the actual hotswap bay that some
powerbooks have, CONFIG_PM for power management related code, and just left
out of any CONFIG_* option for some generally useful stuff that can be used
on non-laptops as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If cfq is managing a queue and a new scheduler is later selected, it is
possible for the cfqd unplug_work work to be queued after the kblockd
work struct has been flushed. The problem is the ordering of
cfq_shutdown_timer_wq() and blk_put_queue() in cfq_put_cfqd(). The
latter may rearm the work, leaving cfq_kick_queue() with dead data.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Adjust slice values
- Instead of one async queue, one is defined per priority level. This
prevents kernel threads (such as reiserfs/x and others) that run at
higher io priority from conflicting with others. Previously, it was a
coin toss what io prio the async queue got, it was defined by who
first set up the queue.
- Let a time slice only begin, when the previous slice is completely
done. Previously we could be somewhat unfair to a new sync slice, if
the previous slice was async and had several ios queued. This might
need a little tweaking if throughput suffers a little due to this,
allowing perhaps an overlap of a single request or so.
- Optimize the calling of kblockd_schedule_work() by doing it only when
it is strictly necessary (no requests in driver and work left to do).
- Correct sync vs async logic. A 'normal' process can be purely async as
well, and a flusher can be purely sync as well. Sync or async is now a
property of the class defined and requests pending. Previously writers
could be considered sync, when they were really async.
- Get rid of the bit fields in cfqq and crq, use flags instead.
- Various other cleanups and fixes
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In cfq_find_next_crq(), cfq tries to find the next request by choosing
one of two requests before and after the current one. Currently, when
choosing the next request, if there's no next request, the next
candidate is NULL, resulting in selection of the previous request. This
results in weird scheduling. Once we reach the end, we always seek
backward.
The correct behavior is using the first request as the next candidate.
cfq_choose_req() already has logics for handling wrapped requests.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the DMA_{64,32}BIT_MASK constants from dma-mapping.h when calling
pci_set_dma_mask() or pci_set_consistent_dma_mask()
These patches include dma-mapping.h explicitly because it caused errors
on some architectures otherwise.
See http://marc.theaimsgroup.com/?t=108001993000001&r=1&w=2 for details
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Domen Puncer <domen@coderock.org>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global code static
- remove the following unused global functions:
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
- remove the following unused EXPORT_SYMBOL's:
- blk_phys_contig_segment
- blk_hw_contig_segment
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows block device drivers to convert their ioctl functions to
unlocked_ioctl() like character devices and other subsystems. All
functions that were called with the BKL held before are still used that
way, but I would not be surprised if it could be removed from the ioctl
functions in drivers/block/ioctl.c themselves.
As a side note, I found that compat_blkdev_ioctl() acquires the BKL as
well, which looks like a bug. I have checked that every user of
disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so
it could easily be removed from compat_blkdev_ioctl().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch improves write performance for the CD/DVD packet writing driver.
The logic for switching between reading and writing has been changed so
that streaming writes are no longer interrupted by read requests.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
of /proc/devices from spilling over the provided page if more than 4096
bytes of string data are generated from all the registered character and
block devices in a system
Signed-off-by: Neil Horman <nhorman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like locking can be optimised quite a lot. Increase lock widths
slightly so lo_lock is taken fewer times per request. Also it was quite
trivial to cover lo_pending with that lock, and remove the atomic
requirement. This also makes memory ordering explicitly correct, which is
nice (not that I particularly saw any mem ordering bugs).
Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
block size, 4K page size). System is 2 socket Xeon with HT (4 thread).
intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh
Before:
0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k
After:
0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sprinkle around a few branch hints in the block layer.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This memory barrier is not needed because the waitqueue will only get waiters
on it in the following situations:
rq->count has exceeded the threshold - however all manipulations of ->count
are performed under the runqueue lock, and so we will correctly pick up any
waiter.
Memory allocation for the request fails. In this case, there is no additional
help provided by the memory barrier. We are guaranteed to eventually wake up
waiters because the request allocation mempool guarantees that if the mem
allocation for a request fails, there must be some requests in flight. They
will wake up waiters when they are retired.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add KERN_ERR and __FUNCTION__ to generic tag error messages, and add a comment
in blk_queue_end_tag() which explains the silent failure path.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
blk_queue_tag->real_max_depth was used to optimize out unnecessary
allocations/frees on tag resize. However, the whole thing was very broken -
tag_map was never allocated to real_max_depth resulting in access beyond the
end of the map, bits in [max_depth..real_max_depth] were set when initializing
a map and copied when resizing resulting in pre-occupied tags.
As the gain of the optimization is very small, well, almost nill, remove the
whole thing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
blk_queue_start_tag() hand-coded searching for the first zero bit in the tag
map. Replace it with find_first_zero_bit(). With this patch,
blk_queue_star_tag() doesn't need to fill remains of tag map with 1, thus
allowing it to work properly with the next remove_real_max_depth patch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems). The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.
Does some realignment too.
Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org>
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Pravin Shelar <pravin@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs: fix drivers/block so if an attribute doesn't implement
show or store method read/write will return -EIO
instead of 0 or -EINVAL.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The recent mapping changes didn't update the kerneldoc appropriately.
Original from Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
Original From: Mike Christie <michaelc@cs.wisc.edu>
Modified to split out block changes (this patch) and SCSI pieces.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the blk_rq_map_user() and blk_rq_map_kern() interface to require
a previously allocated request to be passed in. This is both more efficient
for multiple iterations of mapping data to the same request, and it is also
a much nicer API.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add blk_rq_map_kern which takes a kernel buffer and maps it into
a request and bio. This can be used by the dm hw_handlers, old
sg_scsi_ioctl, and one day scsi special requests so all requests
comming into scsi will have bios. All requests having bios
should allow scsi to use scatter lists for all IO and allow it
to use block layer functions.
Signed-off-by: Jens Axboe <axboe@suse.de>
__cfq_get_queue(). __cfq_get_queue() finds an existing queue (struct
cfq_queue) of the current process for the device and returns it. If it's not
found, __cfq_get_queue() creates and returns a new one if __cfq_get_queue() is
called with __GFP_WAIT flag, or __cfq_get_queue() returns NULL (this means that
get_request() fails) if no __GFP_WAIT flag.
On the other hand, in __make_request(), get_request() is called without
__GFP_WAIT flag at the first time. Thus, the get_request() fails when there is
no existing queue, typically when it's called for the first I/O request of the
process to the device.
Though it will be followed by get_request_wait() for general case,
__make_request() will just end the I/O with an error (EWOULDBLOCK) when the
request was for read-ahead.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
__elv_add_request(). rq.count[READ] + rq.count[WRITE] can increase
more than one if another thread has allocated a request after the
current request is allocated or in_flight could have changed resulting
in larger-than-one change of nrq, thus breaking the threshold
mechanism.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Patch removes our homegrown DMA masks and uses the ones defined in the kernel.
This patch replaces the broken one I sent in earlier. It has been tested and works. Please discard the first submission.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This smoothes two imperfections:
- Increase number of LUNs per device from 4 to 9. The best solution
would be to remove this limit altogether, but that has to wait until
the time when more than 26 hosts are allowed.
- Replace mdelay with msleep in a probing routine.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If you tried to open a packet device first in read-only mode and then a
second time in read-write mode, the second open succeeded even though the
device was not correctly set up for writing. If you then tried to write
data to the device, the writes would fail with I/O errors.
This patch prevents that problem by making the second open fail with
-EBUSY.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
blk_insert_request() has a unobivous feature of requeuing a
request setting REQ_SPECIAL|REQ_SOFTBARRIER. SCSI midlayer
was the only user and as previous patches removed the usage,
remove the feature from blk_insert_request(). Only special
requests should be queued with blk_insert_request(). All
requeueing should go through blk_requeue_request().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is the reworked version of the patch. It sets REQ_SOFTBARRIER
in two places - in elv_next_request() on BLKPREP_DEFER and in
blk_requeue_request().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I found a bug in the packet writing driver that could cause data
corruption. The problem arised if the driver got a write request for a
sector in a "zone" it was already working on. In that case it was supposed
to queue the write request until it was done processing earlier requests
for the same zone, and instead work on some other zone in the mean time.
However, if there was no other zone to work on, the driver would initiate
two packet_data objects for the same zone, causing unpredictable things to
happen.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioctl_by_bdev may only be used INSIDE the kernel. If the "arg" argument
refers to memory that is accessed by put_user/get_user in the ioctl
function, the memory needs to be in the kernel address space (that's the
set_fs(KERNEL_DS) doing in the ioctl_by_bdev). This works on i386 because
even with set_fs(KERNEL_DS) the user space memory is still accessible with
put_user/get_user. That is not true for s390. In short the ioctl
implementation of the pktcdvd device driver is horribly broken.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[Patch] Fix raw device ioctl pass-through
Raw character devices are supposed to pass ioctls through to the block
devices they are bound to. Unfortunately, they are using the wrong
function for this: ioctl_by_bdev(), instead of blkdev_ioctl().
ioctl_by_bdev() performs a set_fs(KERNEL_DS) before calling the ioctl,
redirecting the user-space buffer access to the kernel address space.
This is, needless to say, a bad thing.
This was noticed first on s390, where raw IO was non-functioning. The
s390 driver config does not actually allow raw IO to be enabled, which
was the first part of the problem. Secondly, the s390 kernel address
space is distinct from user, causing legal raw ioctls to fail. I've
reproduced this on a kernel built with 4G:4G split on x86, which fails
in the same way (-EFAULT if the address does not exist kernel-side;
returns success without actually populating the user buffer if it does.)
The patch below fixes both the config and address-space problems. It's
based closely on a patch by Jan Glauber <jang@de.ibm.com>, which has
been tested on s390 at IBM. I've tested it on x86 4G:4G (split address
space) and x86_64 (common address space).
Kernel-address-space access has been assigned CAN-2005-1264.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I somehow missed that there is external usage of rd_size on some
architectures.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes some needlessly global identifiers static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjanv@infradead.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only caller that ever sets it can call fsync_bdev itself easily. Also
update some comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for a new class of DAC960 controllers. It's based
on the GPLed idac320 driver from IBM for Linux 2.4.18. That driver is a
fork of the 2.4.18 version of DAC960 that adds support for this new type of
controllers (internally called "GEM Series"), that differ from other DAC960
V2 firmware controllers only in the register offsets and removes support
for all others.
This patch instead integrates support for these controllers into the DAC960
driver.
Thanks to Anders Norrbring for pointing me to the idac320 driver and
testing this patch.
No Signed-Off: line because all code is either copy & pasted from IBM's
idac320 driver or support for other controllers in the 2.6 DAC960 driver.
Note: the really odd formating matches the rest of the DAC960 driver.
Cc: Dave Olien <dmo@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Drivers that expect ISA DMA API are marked as such in Kconfig.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
allow multiple aoe devices to have the same mac
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -u b/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
This patches adds the "nbds_max" parameter to the nbd kernel module, which
limits the number of nbds allocated. Previously, always all 128 entries
were allocated unconditionally, which used to waste resources and
needlessly flood the hotplug system with events. (Defaults to 16 now.)
Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Profiling hit rates on merging shows that the last merge hint works
extremely well for most work loads. So lets kill the linear merge scan in
noop-iosched, so it provides O(1) run time for any operation.
Testing credits go to Ken Chen from Intel.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(!ARCH_S390 && !M68K && !IA64 && !UML) is obviously always true on ARM.
Intended behaviour for ARM is "absent unless we are on RiscPC or
EBSA285". So what we want is added && !ARM in the first term - without
it the last part (|| ARCH_RPC || ARCH_EBSA285, that is) doesn't do
anything.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I can't use list.h, since sk_buff doesn't have a list_head but instead
has two struct sk_buff pointers, and I want to avoid any extra memory
allocation.
send outgoing packets in order
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current problem seen is that the queue lock is actually in the
SCSI device structure, so when that structure is freed on device
release, we go boom if the queue tries to access the lock again.
The fix here is to move the lock from the scsi_device to the queue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Both the RiscPC and (optionally) EBSA285 have floppy disk support. Allow this
option to be selected on these ARM platforms again.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In function __generic_unplug_device(), kernel can use a cheaper function
elv_queue_empty() instead of more expensive elv_next_request to find
whether the queue is empty or not. blk_run_queue can also made conditional
on whether queue's emptiness before calling request_fn().
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a possibility that a bio will be accessed after it has been freed
on SCSI. It happens if you submit a bio with BIO_SYNC marked and the
auto-unplugging kicks the request_fn, SCSI re-enables interrupts in-between
so if the request completes between the add_request() in __make_request()
and the bio_sync() call, we could be looking at a dead bio. It's a slim
race, but it has been triggered in the Real World.
So assign bio_sync() to a local variable instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!