* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fixes sparse warning:
zram_drv.c:666:6: warning: symbol 'zram_slot_free_notify' was not
declared. Should it be static?
Also, max_zpage_size is now size_t just to be consistent with data-type
of other variables maintaining sizes of various kinds.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the allocation of zram->table fails, we set zram->disksize to zero
to prevent accessing the unallocated table entries during cleanup.
However, we currently don't take this precaution when the initialization
fails earlier.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently init_lock only prevents concurrent execution of zram_init_device()
and zram_reset_device() but not zram_make_request() nor sysfs store functions.
This patch changes init_lock into a rw_semaphore. A write lock is taken by
init, reset and store functions, a read lock is taken by zram_make_request().
Also, avoids to release the lock before calling __zram_reset_device() for
cleaning after a failed init, thus preventing any concurrent task to see an
inconsistent state of zram.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "num_devices" is too general to be
global. This patch switches the name to be "zram_num_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "devices" is too general to be global.
This patch switches the name to be "zram_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the unmapping order of KM_USER0/1 in
handle_uncompressed_page() and zram_read() so that kmap()/kunmap() calls
are correctly nested.
Reported-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, nothing protects zram table from concurrent access.
For instance, ZRAM_UNCOMPRESSED bit can be cleared by zram_free_page()
called from a concurrent write between the time ZRAM_UNCOMPRESSED has
been set and the time it is tested to unmap KM_USER0 in
zram_bvec_write(). This ultimately leads to kernel panic.
Also, a read request can occurs when the page has been freed by a
running write request and before it has been updated, leading to
zero filled block being incorrectly read and "Read before write"
error message.
This patch replace the current mutex by a rw_semaphore. It extends
the protection to zram table (currently, only compression buffers are
protected) and read requests (currently, only write requests are
protected).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 7b19b8d45b (zram: Prevent overflow
in logical block size) introduced ZRAM_LOGICAL_BLOCK_SIZE constant to
prevent overflow of logical block size on 64k page kernel.
However, the current implementation of zram only allow operation on block
of the same size as a page. That makes theorically legit 4k requests fail
on 64k page kernel.
This patch makes zram allow operation on partial pages. Basically, it
means we still do operations on full pages internally, but only copy the
relevent segments from/to the user memory.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactor the code of zram_read/write() functions. It does
not removes a lot of duplicate code alone, but is mostly a helper for
the third patch of this series (Staging: zram: allow partial page
operations).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The offset of uncompressed page is always zero: handle_uncompressed_page()
doesn't have to care about it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Both zram and zcache use xvmalloc allocator. If xvmalloc
is compiled separately for both of them, we will get linker
error if they are both selected as "built-in". We can also
get linker error regarding missing xvmalloc symbols if zram
is not built.
So, we now compile xvmalloc separately and export its symbols
which are then used by both of zram and zcache.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently the device is initialized when first write is done to the
device. Any read attempt before the first write would fail, including
"hidden" read the user may not know about (as for example if he tries
to write a partial block).
This patch initializes the device on first request, whether read or
write.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is to resolve a merge conflict with:
drivers/staging/zram/zram_drv.c
as pointed out by Stephen Rothwell
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In zram_read() and zram_write() we were not incrementing the
index number and thus were reading/writing values from/to
incorrect sectors on zram disk, resulting in data corruption.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch eliminates duplicate code. The remove_block_head function
is a special case of remove_block which can be contained in remove_block
without confusion.
The portion of code in remove_block_head which was noted as "DEBUG ONLY"
is now mandatory. Doing this provides consistent management of the double
linked list of blocks under a freelist and makes this consolidation
of delete block code safe. The first and last blocks will have NULL
pointers in their previous and next page pointers respectively.
Additionally, any time a block is removed from a free list the next and
previous pointers will be set to NULL to avoid misuse outside xvmalloc.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently zram will do nothing to the page in the bvec when that page
has not been previously written. This allows random data to leak to
user space. That can be seen by doing the following:
## Load the module and create a 256Mb zram device called /dev/zram0
# modprobe zram
# echo $((256*1024*1024)) > /sys/class/block/zram0/disksize
## Initialize the device by writing zero to the first block
# dd if=/dev/zero of=/dev/zram0 bs=512 count=1
## Read ~256Mb of memory into a file and hope for something interesting
# dd if=/dev/zram0 of=file
This patch will treat an unwritten page as a zero-filled page. If a
page is read before a write has occurred the data returned is all 0's.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By swapping the total_pages statistic with the lock we close a
hole in the structure for 64-bit CPUs.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a debug config flag to enable debug printk output and future
debug code.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change is in a conditional block which is entered only when there is
an existing data block on the freelist where the insert has taken place.
The new block is pushed onto the freelist stack and this conditional block
is updating links in the prior stack head to point to the new stack head.
After this conditional block the first-/second-level indices are updated
to indicate that there is a free block at this location.
This patch adds an immediate return from the conditional block to avoid
setting bits again to indicate a free block on this freelist. The bits
would already be set because there was an existing free block on this
freelist.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a 64K page kernel, the value PAGE_SIZE passed to
blk_queue_logical_block_size would overflow the logical block size
argument (resulting in setting it to 0).
This patch sets the logical block size to 4096, using a new
ZRAM_LOGICAL_BLOCK_SIZE constant.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
xvmalloc will not currently function with 64K pages. Newly allocated
pages will be inserted at an offset beyond the end of the first-level
index. This tuning is needed to properly size the allocator for 64K
pages.
The default 3 byte shift results in a second level list size which can not
be indexed using the 64 bits of the flbitmap in the xv_pool structure.
The shift must increase to 4 bytes between second level list entries to
fit the size of the first level bitmap.
Here are a few statistics for structure sizes on 32- and 64-bit CPUs
with 4KB and 64KB page sizes.
bits_per_long 32 64 64
page_size 4,096 4,096 65,535
xv_align 4 8 8
fl_delta 3 3 4
num_free_lists 508 508 4,094
xv_pool size 4,144b 8,216b 66,040b
per object overhead 32 64 64
zram struct 0.5GB disk 512KB 1024KB 64KB
This patch maintains the current tunings for 4K pages, adds an optimal
sizing for 64K pages and adds a safe tuning for any other page sizes.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
zram_read() and zram_write() always return zero, so make them return
void to simplify the code.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Not exactly sure if this is a typo or not, due to my search
results comming up with not that many hits. Either its dereferenceable
or dereferencable from the two I choose the later. if it's wrong let me know
and I'll resend.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make zram_read() return a bio error if the device is not initialized
instead of pretending nothing happened.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently disksize_store() round down the disk size provided by user.
This is probably not what one would expect, so round up instead.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can not configure zram device without sysfs anyway, so make zram
depends on it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 7e24cce38a because it
was never appropriate for mainline.
Do not check for init flag before starting I/O - zram module is unusable
without this fix.
The oops mentioned in the reverted commit message was actually a problem
only with the zram version as present in project's own repository where
we allocate struct zram_stats_cpu upon device initialization. OTOH, In
mainline/staging version of zram, we allocate struct stats upfront, so
this oops cannot happen in mainline version.
Checking for init_done flag in zram_make_request() results in a *no-op*
for any I/O operation since we simply always return success. This flag
is actually set when the first write occurs on a zram disk which
triggers its initialization.
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=25722
Reported-by: Dennis Jansen <dennis.jansen@web.de>
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the warning generated by sparse: "Using plain integer
as NULL pointer" by replacing the offending 0s with NULL.
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This was done to handle a number of conflicts in the batman-adv
and winbond drivers properly. It also now allows us to fix up the sysfs
attributes properly that were not in the .37 release due to them being
only in this tree at the time.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They should be writable by root, not readable.
Doh, stupid me with the wrong flags.
Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They should not be writable by any user
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This merges the staging-next tree to Linus's tree and resolves
some conflicts that were present due to changes in other trees that were
affected by files here.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I'm getting an oops when running mkfs on zram:
NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
[c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
[c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
[c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
[c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
[c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
[c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
[c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
[c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0
Since disksize no longer starts as 0 it looks like we can call
zram_make_request before the device has been initialised. The patch below
fixes the immediate problem but this would go away if we move the
initialisation function elsewhere (as suggested in another thread).
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, the user has to explicitly write a positive value to
initstate sysfs node before the device can be used. This event
triggers allocation of per-device metadata like memory pool,
table array and so on.
We do not pre-initialize all zram devices since the 'table' array,
mapping disk blocks to compressed chunks, takes considerable amount
of memory (8 bytes per page). So, pre-initializing all devices will
be quite wasteful if only few or none of the devices are actually
used.
This explicit device initialization from user is an odd requirement and
can be easily avoided. We now initialize the device when first write is
done to the device.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Creates per-device sysfs nodes in /sys/block/zram<id>/
Currently following stats are exported:
- disksize
- num_reads
- num_writes
- invalid_io
- zero_pages
- orig_data_size
- compr_data_size
- mem_used_total
By default, disksize is set to 0. So, to start using
a zram device, fist write a disksize value and then
initialize device by writing any positive value to
initstate. For example:
# initialize /dev/zram0 with 50MB disksize
echo 50*1024*1024 | bc > /sys/block/zram0/disksize
echo 1 > /sys/block/zram0/initstate
When done using a disk, issue reset to free its memory
by writing any positive value to reset node:
echo 1 > /sys/block/zram0/reset
This change also obviates the need for 'rzscontrol' utility.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix 49 zram build errors in one swoop. Examples:
drivers/staging/zram/zram_drv.c:225: error: dereferencing pointer to incomplete type
drivers/staging/zram/zram_drv.c:226: error: implicit declaration of function 'bio_for_each_segment'
drivers/staging/zram/zram_drv.c:226: error: expected ';' before '{' token
drivers/staging/zram/zram_drv.c:281: error: implicit declaration of function 'bio_endio'
drivers/staging/zram/zram_drv.c:285: error: implicit declaration of function 'bio_io_error'
drivers/staging/zram/zram_drv.c:545: error: implicit declaration of function 'set_capacity'
drivers/staging/zram/zram_drv.c:548: error: implicit declaration of function 'queue_flag_set_unlocked'
drivers/staging/zram/zram_drv.c:548: error: 'QUEUE_FLAG_NONROT' undeclared (first use in this function)
drivers/staging/zram/zram_drv.c:548: error: dereferencing pointer to incomplete type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Related changes:
- Included example to show usage as generic
(non-swap) disk with ext4 filesystem.
- Renamed rzscontrol to zramconfig to match
with new device naming.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Related changes:
- Modify revelant Kconfig and Makefile accordingly.
- Change include filenames in code.
- Remove dependency on CONFIG_SWAP in Kconfig as zram usage
is no longer limited to swap disks.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>