Граф коммитов

732 Коммитов

Автор SHA1 Сообщение Дата
Linus Torvalds a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Jan Kara 397162ffa2 mm: remove nr_pages argument from pagevec_lookup{,_range}()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.

Just drop the argument.

Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Jan Kara d72dc8a25a mm: make pagevec_lookup() update index
Make pagevec_lookup() (and underlying find_get_pages()) update index to
the next page where iteration should continue.  Most callers want this
and also pagevec_lookup_tag() already does this.

Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
Christoph Hellwig 74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
David Howells bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Linus Torvalds 9bd42183b9 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Add the SYSTEM_SCHEDULING bootup state to move various scheduler
     debug checks earlier into the bootup. This turns silent and
     sporadically deadly bugs into nice, deterministic splats. Fix some
     of the splats that triggered. (Thomas Gleixner)

   - A round of restructuring and refactoring of the load-balancing and
     topology code (Peter Zijlstra)

   - Another round of consolidating ~20 of incremental scheduler code
     history: this time in terms of wait-queue nomenclature. (I didn't
     get much feedback on these renaming patches, and we can still
     easily change any names I might have misplaced, so if anyone hates
     a new name, please holler and I'll fix it.) (Ingo Molnar)

   - sched/numa improvements, fixes and updates (Rik van Riel)

   - Another round of x86/tsc scheduler clock code improvements, in hope
     of making it more robust (Peter Zijlstra)

   - Improve NOHZ behavior (Frederic Weisbecker)

   - Deadline scheduler improvements and fixes (Luca Abeni, Daniel
     Bristot de Oliveira)

   - Simplify and optimize the topology setup code (Lauro Ramos
     Venancio)

   - Debloat and decouple scheduler code some more (Nicolas Pitre)

   - Simplify code by making better use of llist primitives (Byungchul
     Park)

   - ... plus other fixes and improvements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  sched/cputime: Refactor the cputime_adjust() code
  sched/debug: Expose the number of RT/DL tasks that can migrate
  sched/numa: Hide numa_wake_affine() from UP build
  sched/fair: Remove effective_load()
  sched/numa: Implement NUMA node level wake_affine()
  sched/fair: Simplify wake_affine() for the single socket case
  sched/numa: Override part of migrate_degrades_locality() when idle balancing
  sched/rt: Move RT related code from sched/core.c to sched/rt.c
  sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
  sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
  sched/fair: Spare idle load balancing on nohz_full CPUs
  nohz: Move idle balancer registration to the idle path
  sched/loadavg: Generalize "_idle" naming to "_nohz"
  sched/core: Drop the unused try_get_task_struct() helper function
  sched/fair: WARN() and refuse to set buddy when !se->on_rq
  sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
  sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
  sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
  sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
  sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
  ...
2017-07-03 13:08:04 -07:00
Ingo Molnar 2055da9738 sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
So I've noticed a number of instances where it was not obvious from the
code whether ->task_list was for a wait-queue head or a wait-queue entry.

Furthermore, there's a number of wait-queue users where the lists are
not for 'tasks' but other entities (poll tables, etc.), in which case
the 'task_list' name is actively confusing.

To clear this all up, name the wait-queue head and entry list structure
fields unambiguously:

	struct wait_queue_head::task_list	=> ::head
	struct wait_queue_entry::task_list	=> ::entry

For example, this code:

	rqw->wait.task_list.next != &wait->task_list

... is was pretty unclear (to me) what it's doing, while now it's written this way:

	rqw->wait.head.next != &wait->entry

... which makes it pretty clear that we are iterating a list until we see the head.

Other examples are:

	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
	list_for_each_entry(wq, &fence->wait.task_list, task_list) {

... where it's unclear (to me) what we are iterating, and during review it's
hard to tell whether it's trying to walk a wait-queue entry (which would be
a bug), while now it's written as:

	list_for_each_entry_safe(pos, next, &x->head, entry) {
	list_for_each_entry(wq, &fence->wait.head, entry) {

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:19:14 +02:00
Ingo Molnar ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Christoph Hellwig 4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Jan Kara c1844d536d fs: Remove SB_I_DYNBDI flag
Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Jan Kara 0546c537b1 nilfs2: Convert to properly refcounting bdi
Similarly to set_bdev_super() NILFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

CC: linux-nilfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Ingo Molnar 174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Geliang Tang f3048d17d1 nilfs2: use i_blocksize()
Since i_blocksize() helper has been defined in fs.h, use it instead of
open-coding.

Link: http://lkml.kernel.org/r/1485184655-3895-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Geliang Tang 1508ddc39a nilfs2: use nilfs_btree_node_size()
Use nilfs_btree_node_size() instead of open-coding.

Link: http://lkml.kernel.org/r/1485184655-3895-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Fabian Frederick 93407472a2 fs: add i_blocksize()
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Jan Kara dc3b17cc8b block: Use pointer to backing_dev_info from request_queue
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:20:48 -07:00
Linus Torvalds 231753ef78 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull partial readlink cleanups from Miklos Szeredi.

This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.

Miklos and Al are still discussing the rest of the series.

* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  vfs: make generic_readlink() static
  vfs: remove ".readlink = generic_readlink" assignments
  vfs: default to generic_readlink()
  vfs: replace calling i_op->readlink with vfs_readlink()
  proc/self: use generic_readlink
  ecryptfs: use vfs_get_link()
  bad_inode: add missing i_op initializers
2016-12-17 19:16:12 -08:00
Miklos Szeredi dfeef68862 vfs: remove ".readlink = generic_readlink" assignments
If .readlink == NULL implies generic_readlink().

Generated by:

to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09 16:45:04 +01:00
Christoph Hellwig 70fd76140a block,fs: use REQ_* flags directly
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Linus Torvalds 101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Al Viro 3873691e5a Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
Deepa Dinamani 078cd8279e fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:21 -04:00
Miklos Szeredi 2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Miklos Szeredi f03b8ad8d3 fs: support RENAME_NOREPLACE for local filesystems
This is trivial to do:

 - add flags argument to foo_rename()
 - check if flags doesn't have any other than RENAME_NOREPLACE
 - assign foo_rename() to .rename2 instead of .rename

Filesystems converted:

affs, bfs, exofs, ext2, hfs, hfsplus, jffs2, jfs, logfs, minix, msdos,
nilfs2, omfs, reiserfs, sysvfs, ubifs, udf, ufs, vfat.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
2016-09-27 11:03:57 +02:00
Jan Kara 31051c85b5 fs: Give dentry to inode_change_ok() instead of inode
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22 10:56:19 +02:00
Ryusuke Konishi e63e88bc53 nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.

This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h".  The following minor
changes are accompanied by this migration:

 - nilfs_direct_node struct in nilfs2/direct.h is converged to
   nilfs2_ondisk.h because it's an on-disk structure.
 - inline functions nilfs_rec_len_from_disk() and
   nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.

Link: http://lkml.kernel.org/r/1465825507-3407-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:21 -04:00
Ryusuke Konishi 4ce5c3426c nilfs2: use BIT() macro
Replace bit shifts by BIT macro for clarity.

Link: http://lkml.kernel.org/r/1465825507-3407-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:21 -04:00
Ryusuke Konishi ad980c9ab7 nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables.  This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.

Link: http://lkml.kernel.org/r/1465825507-3407-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:20 -04:00
Ryusuke Konishi a7d3f104da nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.

Link: http://lkml.kernel.org/r/1464875891-5443-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:20 -04:00
Ryusuke Konishi aceb4170bb nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.

This removes the following checkpatch.pl warning:

 "WARNING: Using yield() is generally wrong. See yield() kernel-doc
  (sched/core.c)"

Link: http://lkml.kernel.org/r/1464875891-5443-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:19 -04:00
Ryusuke Konishi 39a9dcca61 nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not.  This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.

Link: http://lkml.kernel.org/r/1464875891-5443-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:19 -04:00
Ryusuke Konishi d6517deb01 nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function.  This also removes function names from the
messages unless we embed them explicitly in format strings.  Instead,
some messages are revised to clarify the context.

[arnd@arndb.de: avoid warning about unused variables]
  Link: http://lkml.kernel.org/r/20160615201945.3348205-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1464875891-5443-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:18 -04:00
Ryusuke Konishi feee880fa5 nilfs2: reduce bare use of printk() with nilfs_msg()
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:

  "WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
   then dev_crit(dev, ... then pr_crit(...  to printk(KERN_CRIT ..."

This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.

Link: http://lkml.kernel.org/r/1464875891-5443-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:17 -04:00
Ryusuke Konishi 6625689e15 nilfs2: embed a back pointer to super block instance in nilfs object
Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance.  This
simplifies replacement of printk() in the successive change.

Link: http://lkml.kernel.org/r/1464875891-5443-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:17 -04:00
Ryusuke Konishi a66dfb0a91 nilfs2: add nilfs_msg() message interface
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.

__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline.  The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument.  nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.

Link: http://lkml.kernel.org/r/1464875891-5443-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:16 -04:00
Ryusuke Konishi cae3d4ca6f nilfs2: hide function name argument from nilfs_error()
Simplify nilfs_error(), an output function used to report critical
issues in file system.  This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.

Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.

Link: http://lkml.kernel.org/r/1464875891-5443-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:16 -04:00
Linus Torvalds d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Torsten Hilbrich 63d2f95d63 fs/nilfs2: fix potential underflow in call to crc32_le
The value `bytes' comes from the filesystem which is about to be
mounted.  We cannot trust that the value is always in the range we
expect it to be.

Check its value before using it to calculate the length for the crc32_le
call.  It value must be larger (or equal) sumoff + 4.

This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1.  This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.

  BUG: unable to handle kernel paging request at ffff88021e600000
  IP:  crc32_le+0x36/0x100
  ...
  Call Trace:
    nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
    nilfs_load_super_block+0x142/0x300 [nilfs2]
    init_nilfs+0x60/0x390 [nilfs2]
    nilfs_mount+0x302/0x520 [nilfs2]
    mount_fs+0x38/0x160
    vfs_kern_mount+0x67/0x110
    do_mount+0x269/0xe00
    SyS_mount+0x9f/0x100
    entry_SYSCALL_64_fastpath+0x16/0x71

Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Mike Christie b2d4586627 nilfs: use bio op accessors
Separate the op from the rq_flag_bits and have nilfs
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie 2a222ca992 fs: have submit_bh users pass in op and flags separately
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie 4e49ea4a3d block/fs/drivers: remove rw argument from submit_bio
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie <mchristi@redhat.com>

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Ryusuke Konishi 076a378ba6 nilfs2: fix block comments
This fixes block comments with proper formatting to eliminate the
following checkpatch.pl warnings:

  "WARNING: Block comments use * on subsequent lines"
  "WARNING: Block comments use a trailing */ on a separate line"

Link: http://lkml.kernel.org/r/1462886671-3521-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 80d6505232 nilfs2: remove loops of single statement macros
This fixes checkpatch.pl warning "WARNING: Single statement macros
should not use a do {} while (0) loop".

Link: http://lkml.kernel.org/r/1462886671-3521-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 7f00184e9c nilfs2: remove unnecessary else after return or break
This fixes the checkpatch.pl warning that suggests else is not
generally useful after a break or return.

Link: http://lkml.kernel.org/r/1462886671-3521-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 0c6c44cb9f nilfs2: avoid bare use of 'unsigned'
This fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
bare use of 'unsigned'".

Link: http://lkml.kernel.org/r/1462886671-3521-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 7592ecde65 nilfs2: fix code indent coding style issue
This fixes checkpatch.pl warning "WARNING: suspect code indent for
conditional statements".

Link: http://lkml.kernel.org/r/1462886671-3521-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi c9cb9b5c85 nilfs2: remove space before semicolon
This fixes the checkpatch.pl warning "WARNING: space prohibited before
semicolon" at nilfs_store_magic_and_option().

Link: http://lkml.kernel.org/r/1462886671-3521-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 06f4abf6ca nilfs2: do not emit extra newline on nilfs_warning() and nilfs_error()
This updates call sites of nilfs_warning() and nilfs_error() so that they
don't add a duplicate newline.  These output functions are already
designed to add a trailing newline to the message.

Link: http://lkml.kernel.org/r/1462886671-3521-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi facb9ec5e6 nilfs2: clean trailing semicolons in macros
Remove trailing semicolons from macros, as suggested by checkpatch.pl.

Link: http://lkml.kernel.org/r/1461935747-10380-12-git-send-email-konishi.ryusuke@lab.ntt.co.jp
[konishi.ryusuke@lab.ntt.co.jp: fix style issues]
  Link: http://lkml.kernel.org/r/20160509.231703.1481729973362188932.konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 4ad364ca1c nilfs2: add missing line spacing
Clean up checkpatch.pl warnings "WARNING: Missing a blank line after
declarations" from nilfs2.

Link: http://lkml.kernel.org/r/1461935747-10380-11-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi e7a142aaa0 nilfs2: replace __attribute__((packed)) with __packed
This fixes the following checkpatch.pl warning:

  WARNING: __packed is preferred over __attribute__((packed))
  #23: FILE: export.h:23:
  +} __attribute__ ((packed));

Link: http://lkml.kernel.org/r/1461935747-10380-10-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 2d19961d83 nilfs2: move cleanup code of metadata file from inode routines
Refactor nilfs_clear_inode() and nilfs_i_callback() so that cleanup
code or resource deallocation related to metadata file will be moved
out to mdt.c.

Link: http://lkml.kernel.org/r/1461935747-10380-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 24e20ead2f nilfs2: get rid of nilfs_mdt_mark_block_dirty()
nilfs_mdt_mark_block_dirty() can be replaced with primary functions
like nilfs_mdt_get_block() and mark_buffer_dirty(), and it's used only
by nilfs_ioctl_mark_blocks_dirty().

This gets rid of the function to simplify the interface of metadata
file.

Link: http://lkml.kernel.org/r/1461935747-10380-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 4b420ab4ee nilfs2: clean up old e-mail addresses
E-mail addresses of osrg.net domain are no longer available.  This
removes them from authorship notices and prevents reporters from being
confused.

Link: http://lkml.kernel.org/r/1461935747-10380-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 5726d0b454 nilfs2: remove FSF mailing address from GPL notices
This removes the extra paragraph which mentions FSF address in GPL
notices from source code of nilfs2 and avoids the checkpatch.pl error
related to it.

Link: http://lkml.kernel.org/r/1461935747-10380-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi f19e78dee9 nilfs2: remove space before comma
Fix checkpatch.pl error "ERROR: space prohibited before that ','
(ctx:WxW)" at nilfs_sufile_set_suinfo().

This also fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
bare use of 'unsigned'" at nilfs_sufile_set_suinfo() and
nilfs_sufile_get_suinfo().

Link: http://lkml.kernel.org/r/1461935747-10380-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi 8fa7c32094 nilfs2: fix white space issue in nilfs_mount()
Fix the following checkpatch.pl error and warnings:

  ERROR: code indent should use tabs where possible
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

  WARNING: please, no space before tabs
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

  WARNING: please, no spaces at the start of a line
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

Link: http://lkml.kernel.org/r/1461935747-10380-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Julia Lawall 1c613cb986 nilfs2: constify nilfs_sc_operations structures
The nilfs_sc_operations structures are never modified, so declare them
as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Linus Torvalds c2e7b20705 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
 "More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: use RWF_SYNC
  fs: add RWF_DSYNC aand RWF_SYNC
  ceph: use generic_write_sync
  fs: simplify the generic_write_sync prototype
  fs: add IOCB_SYNC and IOCB_DSYNC
  direct-io: remove the offset argument to dio_complete
  direct-io: eliminate the offset argument to ->direct_IO
  xfs: eliminate the pos variable in xfs_file_dio_aio_write
  filemap: remove the pos argument to generic_file_direct_write
  filemap: remove pos variables in generic_file_read_iter
2016-05-17 15:05:23 -07:00
Al Viro c51da20c48 more trivial ->iterate_shared conversions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-09 11:41:14 -04:00
Al Viro be5b82dbfe make ext2_get_page() and friends work without external serialization
Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:25 -04:00
Al Viro 84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
Christoph Hellwig c8b8e32d70 direct-io: eliminate the offset argument to ->direct_IO
Including blkdev_direct_IO and dax_do_io.  It has to be ki_pos to actually
work, so eliminate the superflous argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Al Viro fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Joonsoo Kim fe896d1878 mm: introduce page reference manipulation functions
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Al Viro 5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Vladimir Davydov 5d097056c9 kmemcg: account certain kmem allocations to memcg
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Linus Torvalds 33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Dmitry Monakhov a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Al Viro fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Al Viro 6b2553918d replace ->follow_link() with new method that could stay in RCU mode
new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:54 -05:00
Al Viro 21fc61c73c don't put symlink bodies in pagecache into highmem
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:36 -05:00
Linus Torvalds 842cf0b952 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - misc stable fixes

 - trivial kernel-doc and comment fixups

 - remove never-used block_page_mkwrite() wrapper function, and rename
   the function that is _actually_ used to not have double underscores.

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: 9p: cache.h: Add #define of include guard
  vfs: remove stale comment in inode_operations
  vfs: remove unused wrapper block_page_mkwrite()
  binfmt_elf: Correct `arch_check_elf's description
  fs: fix writeback.c kernel-doc warnings
  fs: fix inode.c kernel-doc warning
  fs/pipe.c: return error code rather than 0 in pipe_write()
  fs/pipe.c: preserve alloc_file() error code
  binfmt_elf: Don't clobber passed executable's file header
  FS-Cache: Handle a write to the page immediately beyond the EOF marker
  cachefiles: perform test on s_blocksize when opening cache file.
  FS-Cache: Don't override netfs's primary_index if registering failed
  FS-Cache: Increase reference of parent after registering, netfs success
  debugfs: fix refcount imbalance in start_creating
2015-11-11 09:45:24 -08:00
Ross Zwisler 5c50002963 vfs: remove unused wrapper block_page_mkwrite()
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:

commit 24da4fab5a ("vfs: Create __block_page_mkwrite() helper passing
	error values back")

This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:33 -05:00
Yaowei Bai 3348a172be fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Ryusuke Konishi 4f05028f8d nilfs2: fix gcc uninitialized-variable warnings in powerpc build
Some false positive warnings are reported for powerpc build.

The following warnings are reported in
 http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/

   CC      fs/nilfs2/super.o
 fs/nilfs2/super.c: In function 'nilfs_resize_fs':
 fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
   CC      fs/nilfs2/recovery.o
 fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
 fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
 fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
 fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]

Another similar warning is reported in
 http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/

   CC      fs/nilfs2/btree.o
 fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
 include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here

This cleans out these warnings by forcing the variables to be initialized.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi 09ef29e0f6 nilfs2: fix gcc unused-but-set-variable warnings
Fix the following build warnings:

 $ make W=1
 [...]
   CC [M]  fs/nilfs2/btree.o
 fs/nilfs2/btree.c: In function 'nilfs_btree_split':
 fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
   __u64 newptr;
         ^
 fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
   __u64 newkey;
         ^
   CC [M]  fs/nilfs2/dat.o
 fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
 fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
   __u64 start;
         ^
   CC [M]  fs/nilfs2/segment.o
 fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
 fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
   int err;
       ^
   CC [M]  fs/nilfs2/sufile.o
 fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
 fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
   unsigned long nsegments, ncleansegs, nsus, cnt;
                            ^
   CC [M]  fs/nilfs2/alloc.o
 fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
 fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
   unsigned long n, entries_per_group, groups_per_desc_block;
                                       ^

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake a9cd207c23 nilfs2: add tracepoints for analyzing reading and writing metadata files
This patch adds tracepoints for analyzing requests of reading and writing
metadata files.  The tracepoints cover every in-place mdt files (cpfile,
sufile, and datfile).

Example of tracing mdt_insert_new_block():
              cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
              cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
              cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake 83eec5e6dd nilfs2: add tracepoints for analyzing sufile manipulation
This patch adds tracepoints which would be useful for analyzing segment
usage from a perspective of high level sufile manipulation (check, alloc,
free).  sufile is an important in-place updated metadata file, so
analyzing the behavior would be useful for performance turning.

example of usage (a case of allocation):

$ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
        segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
        segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benixon Dhas <benixon.dhas@wdc.com>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake 44fda11460 nilfs2: add a tracepoint for transaction events
This patch adds a tracepoint for transaction events of nilfs.  With the
tracepoint, these events can be tracked: begin, abort, commit, trylock,
lock, and unlock.  Basically, these events have corresponding functions
e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
is an exception.  It corresponds to the iteration in
nilfs_transaction_lock().

Only one tracepoint is introcued: nilfs2_transaction_transition.  The
above events are distinguished with newly introduced enum.  With this
tracepoint, we can analyse a critical section of segment constructoin.

Sample output by tpoint of perf-tools:
              cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
              cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
        segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
        segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
        segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
        segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
        segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
        segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK

This patch also does trivial cleaning of comma usage in collection stage
transition event for consistent coding style.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake 5849770383 nilfs2: add a tracepoint for tracking stage transition of segment construction
This patch adds a tracepoint for tracking stage transition of block
collection in segment construction.  With the tracepoint, we can analysis
the behavior of segment construction in depth.  It would be useful for
bottleneck detection and debugging, etc.

The tracepoint is created with the standard trace API of linux (like ext3,
ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
course, more detailed analysis will be possible if we can create nilfs
specific analysis tools.

Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools).  Time consumption between
each stage can be obtained.

$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
        segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
        segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
        segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
        segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
        segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
        segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
        segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE

For capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage.  With this change, every transition of the
stage can produce trace event in a correct manner.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi d0c14a9ee7 nilfs2: free unused dat file blocks during garbage collection
As a nilfs2 volume ages, the amount of available disk space decreases
little by little due to bloat of DAT (disk address translation) metadata
file.  Even if we delete all files in a file system and free their block
addresses from the DAT file through a garbage collection, empty DAT blocks
are not freed.

This fixes the issue by extending the deallocator of block addresses so
that empty data blocks and empty bitmap blocks of DAT are deleted.

The following comparison shows the effect of this patch.  Each shows disk
amount information of a nilfs2 volume that we cleaned out by deleting all
files and running gc after having filled 90% of its capacity.

Before:
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1      500105212  3022844 472072192   1% /test

After:
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1      500105212    16380 475078656   1% /test

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi da019954dd nilfs2: add helper functions to delete blocks from dat file
This adds delete functions for data blocks of metadata files using bitmap
based allocator.  nilfs_palloc_delete_entry_block() deletes an entry block
(e.g.  block storing dat entries), and nilfs_palloc_delete_bitmap_block()
deletes a bitmap block, respectively.

These helpers are intended to be used in the successive change on
deallocator of block addresses ("nilfs2: free unused dat file blocks
during garbage collection").

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi b22580948c nilfs2: get rid of nilfs_palloc_group_is_in()
This unfolds nilfs_palloc_group_is_in() helper function into
nilfs_palloc_freev() function to simplify a range check and an index
calculation repeatedy performed in a loop of the function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi 18c41b37f0 nilfs2: refactor nilfs_palloc_find_available_slot()
The current implementation of nilfs_palloc_find_available_slot() function
is overkill.  The underlying bit search routine is well optimized, so this
uses it more simply in nilfs_palloc_find_available_slot().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi 4e9e63a671 nilfs2: do not call nilfs_mdt_bgl_lock() needlessly
In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper
is frequently used to get a spinlock protecting a target block group.
This reduces its usage and simplifies arguments of some related functions
by directly passing a pointer to the spinlock.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi b7bed712d0 nilfs2: use nilfs_warning() in allocator implementation
This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the
bitmap based allocator implementation of nilfs2.  The warning messages are
modified to include the device name and the inode number in each message.
This makes it clear which metadata file of which device has output
warnings such as "entry number xxxx already freed".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Julia Lawall da80a39fc9 nilfs2: drop null test before destroy functions
Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Michal Hocko c62d25556b mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman 71baba4b92 mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep.  Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep.  The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake.  As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags.  This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kent Overstreet b54ffb73ca block: remove bio_get_nr_vecs()
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13 12:32:04 -06:00
Christoph Hellwig 4246a0b63b block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29 08:55:15 -06:00
Mikulas Patocka 9abea2d64c ioctl_compat: handle FITRIM
The FITRIM ioctl has the same arguments on 32-bit and 64-bit
architectures, so we can add it to the list of compatible ioctls and
drop it from compat_ioctl method of various filesystems.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-09 11:42:21 -07:00
Linus Torvalds 1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds 47a469421d Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - most of the rest of MM

 - lots of misc things

 - procfs updates

 - printk feature work

 - updates to get_maintainer, MAINTAINERS, checkpatch

 - lib/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  exit,stats: /* obey this comment */
  coredump: add __printf attribute to cn_*printf functions
  coredump: use from_kuid/kgid when formatting corename
  fs/reiserfs: remove unneeded cast
  NILFS2: support NFSv2 export
  fs/befs/btree.c: remove unneeded initializations
  fs/minix: remove unneeded cast
  init/do_mounts.c: add create_dev() failure log
  kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
  fs/efs: femove unneeded cast
  checkpatch: emit "NOTE: <types>" message only once after multiple files
  checkpatch: emit an error when there's a diff in a changelog
  checkpatch: validate MODULE_LICENSE content
  checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
  checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
  checkpatch: fix processing of MEMSET issues
  checkpatch: suggest using ether_addr_equal*()
  checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
  checkpatch: remove local from codespell path
  checkpatch: add --showfile to allow input via pipe to show filenames
  ...
2015-06-26 09:52:05 -07:00
NeilBrown f73c2f1f83 NILFS2: support NFSv2 export
The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as that
returned by encode_fh - it may be larger.

With NFSv2, the filehandle is fixed length, so it may appear longer than
expected and be zero-padded.

So we must test that fh_len is at least some value, not exactly equal to
it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Linus Torvalds bfffa1cc9d Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-block
Pull core block IO update from Jens Axboe:
 "Nothing really major in here, mostly a collection of smaller
  optimizations and cleanups, mixed with various fixes.  In more detail,
  this contains:

   - Addition of policy specific data to blkcg for block cgroups.  From
     Arianna Avanzini.

   - Various cleanups around command types from Christoph.

   - Cleanup of the suspend block I/O path from Christoph.

   - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

   - Eliminating atomic inc/dec of both remaining IO count and reference
     count in a bio.  From me.

   - Fixes for SG gap and chunk size support for data-less (discards)
     IO, so we can merge these better.  From me.

   - Small restructuring of blk-mq shared tag support, freeing drivers
     from iterating hardware queues.  From Keith Busch.

   - A few cfq-iosched tweaks, from Tahsin Erdogan and me.  Makes the
     IOPS mode the default for non-rotational storage"

* 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
  cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
  cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
  cfq-iosched: move group scheduling functions under ifdef
  cfq-iosched: fix the setting of IOPS mode on SSDs
  blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
  block, cgroup: implement policy-specific per-blkcg data
  block: Make CFQ default to IOPS mode on SSDs
  block: add blk_set_queue_dying() to blkdev.h
  blk-mq: Shared tag enhancements
  block: don't honor chunk sizes for data-less IO
  block: only honor SG gap prevention for merges that contain data
  block: fix returnvar.cocci warnings
  block, dm: don't copy bios for request clones
  block: remove management of bi_remaining when restoring original bi_end_io
  block: replace trylock with mutex_lock in blkdev_reread_part()
  block: export blkdev_reread_part() and __blkdev_reread_part()
  suspend: simplify block I/O handling
  block: collapse bio bit space
  block: remove unused BIO_RW_BLOCK and BIO_EOF flags
  block: remove BIO_EOPNOTSUPP
  ...
2015-06-25 14:29:53 -07:00
Fabian Frederick b57c2cb9ea pagemap.h: move dir_pages() over there
That function was declared in a lot of filesystems to calculate
directory pages.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:00 -04:00