Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
Now that the type changes are done, here is the final set of
changes to make the quota code work when user namespaces are enabled.
Small cleanups and fixes to make the code build when user namespaces
are enabled.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Convert w_dq_id to be a struct kquid and remove the now unncessary
w_dq_type.
This is a simple conversion and enough other places have already
been converted that this actually reduces the code complexity
by a little bit, when removing now unnecessary type conversions.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change struct dquot dq_id to a struct kqid and remove the now
unecessary dq_type.
Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
ext4, and ocfs2 to deal with the change in quota structures and
signatures. The ocfs2 changes are larger than most because of the
extensive tracing throughout the ocfs2 quota code that prints out
dq_id.
quota_tree.c:get_index is modified to take a struct kqid instead of a
qid_t because all of it's callers pass in dquot->dq_id and it allows
me to introduce only a single conversion.
The rest of the changes are either just replacing dq_type with dq_id.type,
adding conversions to deal with the change in type and occassionally
adding qid_eq to allow quota id comparisons in a user namespace safe way.
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Modify dqget to take struct kqid instead of a type and an identifier
pair.
Modify the callers of dqget in ocfs2 and dquot to take generate
a struct kqid so they can continue to call dqget. The conversion
to create struct kqid should all be the final conversions that
are needed in those code paths.
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Modify quota_send_warning to take struct kqid instead a type and
identifier pair.
When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace. There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.
Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning. When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.
The quota on function is not converted because while it takes a quota
type and an id. The id is the on disk quota format to use, which
is something completely different.
The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.
The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.
This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace. Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
sb->s_dqopt->dqptr_sem is used to serialize ops using pointers from inode to
dquots. But for __dquot_alloc_space(), it could be safely moved down after the
default warn[] array got initialized.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
"Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs. I'm
on vacation and scarcely checking email since we are expecting baby
any day now but these fixes should be safe to go in and I don't want
to delay them unnecessarily."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: avoid info leak on export
isofs: avoid info leak on export
udf: Improve table length check to avoid possible overflow
ext3: Check return value of blkdev_issue_flush()
jbd: Check return value of blkdev_issue_flush()
udf: Do not decrement i_blocks when freeing indirect extent block
udf: Fix memory leak when mounting
ext2: cleanup the confused goto label
UDF: Remove unnecessary variable "offset" from udf_fill_inode
udf: stop using s_dirt
ext3: force ro mount if ext3_setup_super() fails
quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
checkpatch.pl warns:
"WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>"
Below patch fixes it.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
So far i_mutex was ranking above dqonoff_mutex and i_mutex on quota files
was special and ranking below dqonoff_mutex (and several other locks).
However there's no real need for i_mutex on quota files to be special.
IO on quota files is serialized by dqio_mutex anyway so we don't need to
take i_mutex when writing to quota files. Other places where we take i_mutex
on quota file can accomodate standard i_mutex lock ranking, we only need
to change the lock ranking to be dqonoff_mutex > i_mutex which is a matter
of changing documentation because there's no place which would enforce
ordering in the other direction.
Signed-off-by: Jan Kara <jack@suse.cz>
When CONFIG_QUOTA_DEBUG is enabled we call inode_get_rsv_space() from
add_dquot_ref() while holding i_lock. But inode_get_rsv_space() is trying
to get i_lock as well resulting in double lock.
Fix the problem by moving inode_get_rsv_space() call out of i_lock.
Reported-and-analyzed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull ext3, UDF, and quota fixes from Jan Kara:
"A couple of ext3 & UDF fixes and also one improvement in quota
locking."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext3: fix start and len arguments handling in ext3_trim_fs()
udf: Fix deadlock in udf_release_file()
udf: Fix file entry logicalBlocksRecorded
udf: Fix handling of i_blocks
quota: Make quota code not call tty layer with dqptr_sem held
udf: Init/maintain file entry checkpoint field
ext3: Update ctime in ext3_splice_branch() only when needed
ext3: Don't call dquot_free_block() if we don't update anything
udf: Remove unnecessary OOM messages
dqptr_sem can be called from slab reclaim. tty layer uses GFP_KERNEL mask for
allocation so it can end up calling slab reclaim. Given quota code can call
into tty layer to print warning this creates possibility for lock inversion
between tty->atomic_write_lock and dqptr_sem.
Using direct printing of warnings from quota layer is obsolete but since it's
easy enough to change quota code to not hold any locks when printing warnings,
let's just do it. It seems like a good thing to do even when we use netlink
layer to transmit warnings to userspace.
Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Quota tools need to know whether quota is stored in a system file or in
classical aquota.{user|group} files. So pass this information as a flag
in GETINFO quotactl.
Signed-off-by: Jan Kara <jack@suse.cz>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Don't write quota info in dquot_commit()
ext3: Fix writepage credits computation for ordered mode
There's no reason to write quota info in dquot_commit(). The writing is a
relict from the old days when we didn't have dquot_acquire() and
dquot_release() and thus dquot_commit() could have created / removed quota
structures from the file. These days dquot_commit() only updates usage counters
/ limits in quota structure and thus there's no need to write quota info.
This also fixes an issue with journaling filesystem which didn't reserve
enough space in the transaction for write of quota info (it could have been
dirty at the time of dquot_commit() because of a race with other operation
changing it).
CC: stable@kernel.org
Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Protect the per-sb inode list with a new global lock
inode_sb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.
This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.
Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
is prone to deadlocks. We hold s_umount semaphore for reading during the
path resolution and resolution itself may need to acquire the semaphore
for writing when e. g. autofs mountpoint is passed.
Solve the problem by performing the resolution before we get hold of the
superblock (and thus s_umount semaphore). The whole thing is complicated
by the fact that some filesystems (OCFS2) ignore the path argument. So to
distinguish between filesystem which want the path and which do not we
introduce new .quota_on_meta callback which does not get the path. OCFS2
then uses this callback instead of old .quota_on.
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Christoph Hellwig <hch@lst.de>
CC: Ted Ts'o <tytso@mit.edu>
CC: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use %pV in __quota_error so a single printk can not be
interleaved with other logging messages.
Add __attribute__((format (printf, 3, 4))) so format
and arguments can be verified by compiler.
Make sure printk formats and arguments match.
Block # needed a pointer dereference.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When quotaon(8) races with __dquot_initialize() or dqget() fails because
of EIO, ENOSPC, or similar error, we could possibly dereference NULL pointer
in inode->i_dquot[cnt]. Add proper checking.
Reported-by: Dmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
__dquot_transfer accidentally called flush_warnings for a wrong set of
dquots which could result in quota warnings being issued with a wrong
identification. Also when operation fails because of EDQUOT, there's no
need check for issuing information message about user getting below limits
(no transfer has actually happened).
Signed-off-by: Jan Kara <jack@suse.cz>
I've got following lockup:
dquot_disable dquot_transfer
->dqget()
sb_has_quota_active
dqopt->flags &= ~dquot_state_flag(f, cnt) atomic_inc(dq->dq_count)
->drop_dquot_ref(sb, cnt);
down_write(dqptr_sem)
inode->i_dquot[cnt] = NULL ->__dquot_transfer
invalidate_dquots(sb, cnt); down_write(&dqptr_sem)
->wait for dq_wait_unused inode->i_dquot = new_dquot
/* wait forever */ ^^^^New quota user^^^^^^
We cannot allow new references to dquots from inodes after drop_dquot_ref()
has removed them. We have to recheck quota state under dqptr_sem and before
assignment, as we do it in dquot_initialize().
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Fix dirtying of journalled buffers in data=journal mode
ext3: default to ordered mode
quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
quota: Change quota error message to print out disk and function name
MAINTAINERS: Update entries of ext2 and ext3
MAINTAINERS: Update address of Andreas Dilger
ext3: Avoid filesystem corruption after a crash under heavy delete load
ext3: remove vestiges of nobh support
ext3: Fix set but unused variables
quota: clean up quota active checks
quota: Clean up the namespace in dqblk_xfs.h
quota: check quota reservation on remove_dquot_ref
Quota code never touches file data. It just modifies i_blocks + i_bytes
of inodes and inode flags of quota files. So use mark_inode_dirty_sync
instead of mark_inode_dirty.
Signed-off-by: Jan Kara <jack@suse.cz>
The current quota error message doesn't always print the disk name, so
it is hard to identify the "bad" disk when quota error happens.
This patch changes the standardized quota error message to print out disk name
and function name. It also uses a combination of cpp macro and inline function
to provide better type checking and to lower the text size of the message.
[Jan Kara: Export __quota_error]
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The various quota operations check for any quota beeing active on
a superblock, and the inode not having the noquota flag.
Merge these two checks into a dquot_active check and move that
into dquot.c as that's the only place where it's needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Almost all identifiers use the FS_* namespace, so rename the missing few
XFS_* ones to FS_* as well. Without this some people might get upset
about having too many XFS names in generic code.
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Reserved space must being claimed before remove_dquot_ref() for a
given inode. Filesystem is responsible for performing force blocks
allocation in case of dealloc in ->quota_off. Let's add sanity check
for that case. Do it similar to add_dquot_ref().
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Convert quota statistics to generic percpu_counter
ext3 uses rb_node = NULL; to zero rb_root.
quota: Fixup dquot_transfer
reiserfs: Fix resuming of quotas on remount read-write
pohmelfs: Remove dead quota code
ufs: Remove dead quota code
udf: Remove dead quota code
quota: rename default quotactl methods to dquot_
quota: explicitly set ->dq_op and ->s_qcop
quota: drop remount argument to ->quota_on and ->quota_off
quota: move unmount handling into the filesystem
quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
quota: move remount handling into the filesystem
ocfs2: Fix use after free on remount read-only
Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
ext4: Make fsync sync new parent directories in no-journal mode
ext4: Drop whitespace at end of lines
ext4: Fix compat EXT4_IOC_ADD_GROUP
ext4: Conditionally define compat ioctl numbers
tracing: Convert more ext4 events to DEFINE_EVENT
ext4: Add new tracepoints to track mballoc's buddy bitmap loads
ext4: Add a missing trace hook
ext4: restart ext4_ext_remove_space() after transaction restart
ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
ext4: Avoid crashing on NULL ptr dereference on a filesystem error
ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
ext4: Use our own write_cache_pages()
ext4: Show journal_checksum option
ext4: Fix for ext4_mb_collect_stats()
ext4: check for a good block group before loading buddy pages
ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
ext4: Remove extraneous newlines in ext4_msg() calls
...
Fixed up trivial conflict in fs/ext4/fsync.c
Generic per-cpu counter has some memory overhead but it is negligible for
modern systems and embedded systems compile without quota support. And code
reuse is a good thing. This patch should fix complain from preemptive kernels
which was introduced by dde9588853.
[Jan Kara: Fixed patch to work on 32-bit archs as well]
Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Commit bc8e5f0739 had a typo which caused
quota miscomputation when changing owner group of a file. Linus will hate
me.
Signed-off-by: Jan Kara <jack@suse.cz>
Follow the dquot_* style used elsewhere in dquot.c.
[Jan Kara: Fixed up missing conversion of ext2]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Only set the quota operation vectors if the filesystem actually supports
quota instead of doing it for all filesystems in alloc_super().
[Jan Kara: Export dquot_operations and vfs_quotactl_ops]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Remount handling has fully moved into the filesystem, so all this is
superflous now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Instead of having wrappers in the VFS namespace export the dquot_suspend
and dquot_resume helpers directly. Also rename vfs_quota_disable to
dquot_disable while we're at it.
[Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently, __dquot_transfer() acquires its own references of dquot structures
that will be put into inode. But for OCFS2, this creates a lock inversion
between dq_lock (waited on in dqget) and transaction start (started in
ocfs2_setattr). Currently, deadlock is impossible because dq_lock is acquired
only during dquot_acquire and dquot_release and we already hold a reference to
dquot structures in ocfs2_setattr so neither of these functions can be called
while we call dquot_transfer. But this is rather subtle and it is hard to teach
lockdep about it. So provide __dquot_transfer function that can be passed dquot
references directly. OCFS2 can then pass acquired dquot references directly to
__dquot_transfer with proper locking.
Signed-off-by: Jan Kara <jack@suse.cz>
Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Pass the larger struct fs_disk_quota to the ->set_dqblk operation so
that the Q_SETQUOTA and Q_XSETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->set_xquota
operation. The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.
Add new fieldmask values for setting the numer of blocks and inodes
values which is required for the VFS quota, but wasn't for XFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Pass the larger struct fs_disk_quota to the ->get_dqblk operation so
that the Q_GETQUOTA and Q_XGETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->get_xquota
operation. The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Quota stats is mostly writable data structure. Let's alloc percpu
bucket for each value.
NOTE: dqstats_read() function is racy against dqstats_{inc,dec}
and may return inconsistent value. But this is ok since absolute
accuracy is not required.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Suppress compilation warning: "quotatypes" defined but not used.
quotatypes is used only when CONFIG_QUOTA_DEBUG or CONFIG_PRINT_QUOTA_WARNING
is/are defined.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
- Skip locking if quota is dirty already.
- Return old quota state to help fs-specciffic implementation to optimize
case where quota was dirty already.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
To simplify metadata tracking for delalloc writes, ext4
will simply claim metadata blocks at allocation time, without
first speculatively reserving the worst case and then freeing
what was not used.
To do this, we need a mechanism to track allocations in
the quota subsystem, but potentially allow that allocation
to actually go over quota.
This patch adds a DQUOT_SPACE_NOFAIL flag and function
variants for this purpose.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Switch __dquot_alloc_space and __dquot_free_space to take flags
to indicate whether to warn and/or to reserve (or free reserve).
This is slightly more readable at the callpoints, and makes it
cleaner to add a "nofail" option in the next patch.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Make __DQUOT_PARANOIA define from the old days a standard config option
and turn it off by default.
This gets rid of a quota warning about writes before quota is turned on
for systems with ext4 root filesystem. Currently there's no way to legally
solve this because /etc/mtab has to be written before quota is turned on
on most systems.
Signed-off-by: Jan Kara <jack@suse.cz>
dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>
For a root filesystem write to the filesystem before quota is turned on happens
regularly and there's no way around it because of writes to syslog, /etc/mtab,
and similar. So the warning is rather pointless for ordinary users. It's
still useful during development so we just hide the warning behind
__DQUOT_PARANOIA config option.
Signed-off-by: Jan Kara <jack@suse.cz>
Just use 0 / -EDQUOT directly - that's what it translates to anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently various places in the VFS call vfs_dq_init directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the initialization. For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.
For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.
For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.
Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.
Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.
Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Sometimes invalidate_bdev() can fail to invalidate a part of block
device cache because of dirty data. If the filesystem has blocksize
smaller than page size, this can happen even for pages containing
quota files and thus kernel would operate on stale data. Fix the
issue by syncing the filesystem before invalidating the cache.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Current quota transfer interface support only uid/gid.
This patch extend interface in order to support various quotas types
The goal is accomplished without changes in most frequently used
vfs_dq_transfer() func.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
- remove hardcoded USRQUOTA/GRPQUOTA flags
- convert int to bool for appropriate functions
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Currenly sync_quota_sb does a lot of sync and truncate action that only
applies to "VFS" style quotas and is actively harmful for the sync
performance in XFS. Move it into vfs_quota_sync and add a wait parameter
to ->quota_sync to tell if we need it or not.
My audit of the GFS2 code says it's also not needed given the way GFS2
implements quotas, but I'd be happy if this can get a detailed review.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
If a delayed-allocation write happens before quota is enabled, the
kernel spits out a warning:
WARNING: at fs/quota/dquot.c:988 dquot_claim_space+0x77/0x112()
because the fact that user has some delayed allocation is not recorded
in quota structure.
Make dquot_initialize() update amount of reserved space for user if it sees
inode has some space reserved. Also make sure that reserved quota space does
not go negative and we warn about the filesystem bug just once.
Signed-off-by: Jan Kara <jack@suse.cz>
Since we implemented generic reserved space management interface,
then it is possible to account reserved space even when quota
is not active (similar to i_blocks/i_bytes).
Without this patch following testcase result in massive comlain from
WARN_ON in dquot_claim_space()
TEST_CASE:
mount /dev/sdb /mnt -oquota
dd if=/dev/zero of=/mnt/test bs=1M count=1
quotaon /mnt
# fs_reserved_spave == 1Mb
# quota_reserved_space == 0, because quota was disabled
dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1
# fs_reserved_spave == 2Mb
# quota_reserved_space == 1Mb
sync # ->dquot_claim_space() -> WARN_ON
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cleanup handling of S_NOQUOTA inode flag and document it a bit. The flag
does not have to be set under dqptr_sem. Only functions modifying inode's
dquot pointers have to check the flag under dqptr_sem before going forward
with the modification. This way we are sure that we cannot add new dquot
pointers to the inode which is just becoming a quota file.
The good thing about this cleanup is that there are no more places in quota
code which enforce i_mutex vs. dqptr_sem lock ordering (in particular that
dqptr_sem -> i_mutex of quota file). This should silence some (false) lockdep
warnings with ext4 + quota and generally make life of some filesystems easier.
Signed-off-by: Jan Kara <jack@suse.cz>
Commit fd8fbfc1 modified the way we find amount of reserved space
belonging to an inode. The amount of reserved space is checked
from dquot_transfer and thus inode_reserved_space gets called
even for filesystems that don't provide get_reserved_space callback
which results in a BUG.
Fix the problem by checking get_reserved_space callback and return 0 if
the filesystem does not provide it.
CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently inode_reservation is managed by fs itself and this
reservation is transfered on dquot_transfer(). This means what
inode_reservation must always be in sync with
dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result
in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be
triggered)
This is not easy because of complex locking order issues
for example http://bugzilla.kernel.org/show_bug.cgi?id=14739
The patch introduce quota reservation field for each fs-inode
(fs specific inode is used in order to prevent bloating generic
vfs inode). This reservation is managed by quota code internally
similar to i_blocks/i_bytes and may not be always in sync with
internal fs reservation.
Also perform some code rearrangement:
- Unify dquot_reserve_space() and dquot_reserve_space()
- Unify dquot_release_reserved_space() and dquot_free_space()
- Also this patch add missing warning update to release_rsv()
dquot_release_reserved_space() must call flush_warnings() as
dquot_free_space() does.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
We should hold i_mutex when looking up quota files for journaled quotas,
otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't
hold i_mutex previously probably could not lead to a real bug since the
filesystem is just being mounted / remounted read-write and thus the
root directory cannot change anyway but it's definitely cleaner with
i_mutex.
Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
Sending a message to userspace in a generic format to warn
of events (e.g. quota exceeded) in the quota subsystem is
a generically useful feature. This patch makes some minor
changes to the send_message function from dquot.c renaming
it quota_send_message, moving it to quota.c and exporting it
for use by filesystems which do not use the dquot code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Commit d01730d74d didn't completely fix
the problem since we still take dqio_mutex and i_mutex in the wrong
order. Move taking of i_mutex further down (luckily it's needed only
for updating inode flags) below where dqio_mutex is taken.
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
The following test script triggers a deadlock on ext2 filesystem:
while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done &
while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done &
I found there is a potential deadlock between quotaon and quotaoff (or
quotasync). Basically, all of quotactl operations need to be protected by
dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write
that needs to grab the i_mutex of the quota file. But in vfs_quota_on_inode
(called from quotaon operation), the current code tries to grab the i_mutex of
the quota file first before getting quonoff_mutex.
Reverse the order in which we take locks in vfs_quota_on_inode().
Jan Kara: Changed changelog to be more readable, made lockdep happy with
I_MUTEX_QUOTA.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock. So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.
So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().
Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.
Masayoshi MIZUMA has a nice panic flow:
=====================================================================
[process A] | [process B]
| |
| prune_icache() | drop_pagecache()
| spin_lock(&inode_lock) | drop_pagecache_sb()
| inode->i_state |= I_FREEING; | |
| spin_unlock(&inode_lock) | V
| | | spin_lock(&inode_lock)
| V | |
| dispose_list() | |
| list_del() | |
| clear_inode() | |
| inode->i_state = I_CLEAR | |
| | | V
| | | if (inode->i_state & (I_FREEING|I_WILL_FREE))
| | | continue; <==== NOT MATCH
| | |
| | | (DANGER from here on! Accessing disposing inode!)
| | |
| | | __iget()
| | | list_move() <===== PANIC on poisoned list !!
V V |
(time)
=====================================================================
Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
ext2: Zero our b_size in ext2_quota_read()
trivial: fix typos/grammar errors in fs/Kconfig
quota: Coding style fixes
quota: Remove superfluous inlines
quota: Remove uppercase aliases for quota functions.
nfsd: Use lowercase names of quota functions
jfs: Use lowercase names of quota functions
udf: Use lowercase names of quota functions
ufs: Use lowercase names of quota functions
reiserfs: Use lowercase names of quota functions
ext4: Use lowercase names of quota functions
ext3: Use lowercase names of quota functions
ext2: Use lowercase names of quota functions
ramfs: Remove quota call
vfs: Use lowercase names of quota functions
quota: Remove dqbuf_t and other cleanups
quota: Remove NODQUOT macro
quota: Make global quota locks cacheline aligned
quota: Move quota files into separate directory
ext4: quota reservation for delayed allocation
...
Andrew Morton has suggested that three global quota locks can end up in the
same cacheline which can result in bad cacheline ping-pong on SMP machines.
Make locks cacheline aligned so that we avoid this problem (thanks goes to
Andrew for the idea).
Signed-off-by: Jan Kara <jack@suse.cz>
CC: Andrew Morton <akpm@linux-foundation.org>