Граф коммитов

23203 Коммитов

Автор SHA1 Сообщение Дата
David Howells d6e43f751f AFS: Use i_generation not i_version for the vnode uniquifier
Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:48 -04:00
David Howells 2e41ae225f AFS: Set s_id in the superblock to the volume name
Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:47 -04:00
Jan Kara f9f07b6c13 vfs: Fix data corruption after failed write in __block_write_begin()
I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data.  In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:46 -04:00
Anton Blanchard 5e7f23373b afs: afs_fill_page reads too much, or wrong data
afs_fill_page should read the page that is about to be written but
the current implementation has a number of issues. If we aren't
extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
are extending the file we try to read the entire file.

Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
clamped to i_size.

While here, avoid calling afs_fill_page when we are doing a
PAGE_CACHE_SIZE write.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:46 -04:00
Al Viro 8aef188452 VFS: Fix vfsmount overput on simultaneous automount
[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:28:16 -04:00
Török Edwin 50338b889d fix wrong iput on d_inode introduced by e6bc45d65d
Git bisection shows that commit e6bc45d65d causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
[ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
[ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
[ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
[ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
[ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
	struct inode *inode = NULL;
.....
-               if (nd.last.name[nd.last.len])
-                       goto slashes;
                inode = dentry->d_inode;
-               if (inode)
-                       ihold(inode);
+               if (nd.last.name[nd.last.len] || !inode)
+                       goto slashes;
+               ihold(inode)
...
	if (inode)
		iput(inode);	/* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50
Reported-by: Norbert Preining <preining@logic.at>
Reported-by: Török Edwin <edwintorok@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:27:39 -04:00
Linus Torvalds 13fca640bb Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP"
This reverts commit 7f81c8890c.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have

  arch/um/sys-x86_64/shared/sysdep/vm-flags.h:

	#define VM_STACK_DEFAULT_FLAGS \
		(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

	#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 21:53:52 -07:00
Michal Hocko 7f81c8890c fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP
Commit a8bef8ff6e ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap.  The check is a compile time
one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:03:59 -07:00
Eric W. Biederman 793925334f proc: Fix Oops on stat of /proc/<zombie pid>/ns/net
Don't call iput with the inode half setup to be a namespace filedescriptor.
Instead rearrange the code so that we don't initialize ei->ns_ops until
after I ns_ops->get succeeds, preventing us from invoking ns_ops->put
when ns_ops->get failed.

Reported-by: Ingo Saitz <Ingo.Saitz@stud.uni-hannover.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-06-15 14:35:29 -07:00
Josef Bacik ed0ca14021 Btrfs: set no_trans_join after trying to expand the transaction
We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot.  This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set.  So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13.  It cannot be reproduced with this patch.  Thanks,

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:47 -04:00
Josef Bacik 8351583e3f Btrfs: protect the pending_snapshots list with trans_lock
Currently there is nothing protecting the pending_snapshots list on the
transaction.  We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption.  So protect
this list with the trans_lock.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:46 -04:00
Josef Bacik 71d7aed014 Btrfs: fix path leakage on subvol deletion
The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15 13:24:45 -04:00
Steve French 1252b3013b [CIFS] update cifs version to 1.73
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-14 16:19:54 +00:00
Al Viro c46a131c0c xfs: fix ->mknod() return value on xfs_get_acl() failure
->mknod() should return negative on errors and PTR_ERR() gives
already negative value...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-14 11:02:13 -05:00
Steve French 040d15c867 [CIFS] trivial cleanup fscache cFYI and cERROR messages
... for uniformity and cleaner debug logs.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-14 15:51:18 +00:00
Max Asbock 1123d93963 timerfd: Fix wakeup of processes when timer is cancelled on clock change
Currently processes waiting with poll on cancelable timerfd timers are
not woken up when the timers are canceled. When the system time is set
the clock_was_set() function calls timerfd_clock_was_set() to cancel
and wake up processes waiting on potential cancelable timerfd
timers. However the wake up currently has no effect because in the
case of timerfd_read it is dependent on ctx->ticks not being
0. timerfd_poll also requires ctx->ticks being non zero. As a
consequence processes waiting on cancelable timers only get woken up
when the timers expire. This patch fixes this by incrementing
ctx->ticks before calling wake_up.

Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
Cc: kay.sievers@vrfy.org
Cc: virtuoso@slind.org
Cc: johnstul <johnstul@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1307985512.4710.41.camel@w-amax.beaverton.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-06-14 11:46:14 +02:00
Jeff Layton 8d1bca328b cifs: correctly handle NULL tcon pointer in CIFSTCon
Long ago (in commit 00e485b0), I added some code to handle share-level
passwords in CIFSTCon. That code ignored the fact that it's legit to
pass in a NULL tcon pointer when connecting to the IPC$ share on the
server.

This wasn't really a problem until recently as we only called CIFSTCon
this way when the server returned -EREMOTE. With the introduction of
commit c1508ca2 however, it gets called this way on every mount, causing
an oops when share-level security is in effect.

Fix this by simply treating a NULL tcon pointer as if user-level
security were in effect. I'm not aware of any servers that protect the
IPC$ share with a specific password anyway. Also, add a comment to the
top of CIFSTCon to ensure that we don't make the same mistake again.

Cc: <stable@kernel.org>
Reported-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:34 +00:00
Jeff Layton 3e71551364 cifs: show sec= option in /proc/mounts
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:34 +00:00
Jeff Layton 7fdbaa1b8d cifs: don't allow cifs_reconnect to exit with NULL socket pointer
It's possible for the following set of events to happen:

cifsd calls cifs_reconnect which reconnects the socket. A userspace
process then calls cifs_negotiate_protocol to handle the NEGOTIATE and
gets a reply. But, while processing the reply, cifsd calls
cifs_reconnect again.  Eventually the GlobalMid_Lock is dropped and the
reply from the earlier NEGOTIATE completes and the tcpStatus is set to
CifsGood. cifs_reconnect then goes through and closes the socket and sets the
pointer to zero, but because the status is now CifsGood, the new socket
is not created and cifs_reconnect exits with the socket pointer set to
NULL.

Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is
CifsNeedNegotiate, and by making sure that generic_ip_connect is always
called at least once in cifs_reconnect.

Note that this is not a perfect fix for this issue. It's still possible
that the NEGOTIATE reply is handled after the socket has been closed and
reconnected. In that case, the socket state will look correct but it no
NEGOTIATE was performed on it be for the wrong socket. In that situation
though the server should just shut down the socket on the next attempted
send, rather than causing the oops that occurs today.

Cc: <stable@kernel.org> # .38.x: fd88ce9: [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:33 +00:00
Pavel Shilovsky cd51875d53 CIFS: Fix sparse error
cifs_sb_master_tlink was declared as inline, but without a definition.
Remove the declaration and move the definition up.

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-13 20:34:33 +00:00
Jan Kara de1b794130 jbd2: Fix oops in jbd2_journal_remove_journal_head()
jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh->b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-13 15:38:22 -04:00
Linus Torvalds ffdb8f1bfb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: unwind canceled flock state
  ceph: fix ENOENT logic in striped_read
  ceph: fix short sync reads from the OSD
  ceph: fix sync vs canceled write
  ceph: use ihold when we already have an inode ref
2011-06-13 11:21:50 -07:00
Linus Torvalds c78a9b9b8e Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: Revert 8ab2b7efd ftrace: Remove unnecessary disabling of irqs
  kprobes/trace: Fix kprobe selftest for gcc 4.6
  ftrace: Fix possible undefined return code
  oprofile, dcookies: Fix possible circular locking dependency
  oprofile: Fix locking dependency in sync_start()
  oprofile: Free potentially owned tasks in case of errors
  oprofile, x86: Add comments to IBS LVT offset initialization
2011-06-13 10:45:49 -07:00
Linus Torvalds 33a538833f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix problem in setting checkpoint interval
  nilfs2: fix missing block address termination in btree node shrinking
  nilfs2: fix incorrect block address termination in node concatenation
2011-06-13 10:32:24 -07:00
Chris Mason f4c4401621 Btrfs: drop the delalloc_bytes check in shrink_delalloc
Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-13 11:30:47 -04:00
Chris Mason ac08aedfa5 Btrfs: check the return value from set_anon_super
Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-13 11:28:50 -04:00
Tao Ma 1fb74cda1b jbd2: Remove obsolete parameters in the comments for some jbd2 functions
credits isn't a parameter for jbd2_journal_get_write_access and
jbd2_journal_get_undo_access. So remove the corresponding comments.

Acked-by: Jan Kara <jack@suse.cz>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-12 22:44:10 -04:00
Al Viro a685e08987 Delay struct net freeing while there's a sysfs instance refering to it
* new refcount in struct net, controlling actual freeing of the memory
	* new method in kobj_ns_type_operations (->drop_ns())
	* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped.  Method renamed to ->grab_current_ns().
	* old net_free() callers call net_drop_ns() instead.
	* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL.  That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.

	Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called.  The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:41 -04:00
Al Viro dde194a64b afs: fix sget() races, close leak on umount
* set ->s_fs_info in set() callback passed to sget()
* allocate the thing and set it up enough for afs_test_super() before
making it visible
* have it freed in ->kill_sb() (current tree simply leaks it)
* have ->put_super() leave ->s_fs_info->volume alone; it's too early for
dropping it; do that from ->kill_sb() after having called kill_anon_super().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:36 -04:00
Al Viro d251ed271d ubifs: fix sget races
* allocate ubifs_info in ->mount(), fill it enough for sb_test() and
set ->s_fs_info to it in set() callback passed to sget().
* do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
done kill_anon_super().
* don't free it in ubifs_fill_super() either - deactivate_locked_super()
done by caller when ubifs_fill_super() returns an error will take care
of that sucker.
* get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
only ubifs_info.  Which it will find in ->s_fs_info just fine, no need to
reassign anything...

As the result, sb_test() becomes safe to apply to all superblocks that
can be found by sget() (and a kludge with temporary use of ->s_fs_info
to store a pointer to very different structure goes away).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:34 -04:00
Al Viro b1c27ab3f9 ubifs: split allocation of ubifs_info into a separate function
preparation to ubifs sget() race fixes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:32 -04:00
Al Viro ff78fca2a0 fix leak in proc_set_super()
set_anon_super() can fail...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:28 -04:00
Linus Torvalds 3c25fa740e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: use join_transaction in btrfs_evict_inode()
  Btrfs - use %pU to print fsid
  Btrfs: fix extent state leak on failed nodatasum reads
  btrfs: fix unlocked access of delalloc_inodes
  Btrfs: avoid stack bloat in btrfs_ioctl_fs_info()
  btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline
  Btrfs: clear current->journal_info on async transaction commit
  Btrfs: make sure to recheck for bitmaps in clusters
  btrfs: remove unneeded includes from scrub.c
  btrfs: reinitialize scrub workers
  btrfs: scrub: errors in tree enumeration
  Btrfs: don't map extent buffer if path->skip_locking is set
  Btrfs: unlock the trans lock properly
  Btrfs: don't map extent buffer if path->skip_locking is set
  Btrfs: fix duplicate checking logic
  Btrfs: fix the allocator loop logic
  Btrfs: fix bitmap regression
  Btrfs: don't commit the transaction if we dont have enough pinned bytes
  Btrfs: noinline the cluster searching functions
  Btrfs: cache bitmaps when searching for a cluster
2011-06-12 11:06:36 -07:00
Li Zefan 30b4caf5d7 Btrfs: use join_transaction in btrfs_evict_inode()
The WARN_ON() in start_transaction() was triggered while balancing.

The cause is btrfs_relocate_chunk() started a transaction and
then called iput() on the inode that stores free space cache,
and iput() called btrfs_start_transaction() again.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-11 08:31:55 -04:00
Ryusuke Konishi 071d73cfe5 nilfs2: fix problem in setting checkpoint interval
Checkpoint generation interval of nilfs goes wrong after user has
changed the interval parameter with nilfs-tune tool.

 segctord starting. Construction interval = 5 seconds,
 CP frequency < 30 seconds
 segctord starting. Construction interval = 0 seconds,
 CP frequency < 30 seconds

This turned out to be caused by a trivial bug in initialization code
of log writer.  This will fix it.

Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-06-11 15:51:15 +09:00
Ryusuke Konishi d40990537c nilfs2: fix missing block address termination in btree node shrinking
nilfs_btree_delete function does not terminate part of virtual block
addresses when shrinking the last remaining child node into the root
node.  The missing address termination causes that dead btree node
blocks persist and chip away free disk space.

This fixes the leak bug on the btree node deletion.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-06-11 15:51:15 +09:00
Ryusuke Konishi fe744fdb74 nilfs2: fix incorrect block address termination in node concatenation
nilfs_btree_delete function wrongly terminates virtual block address
of the btree node held by its parent at index 0.  When concatenating
the index-0 node with its right sibling node, nilfs_btree_delete
terminates the block address of index-0 node instead of the right
sibling node which should be deleted.

This bug not only wears disk space in the long run, but also causes
file system corruption.  This will fix it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-06-11 15:51:15 +09:00
Ilya Dryomov 22b63a2971 Btrfs - use %pU to print fsid
Get rid of FIXME comment.  Uuids from dmesg are now the same as uuids
given by btrfs-progs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 19:02:04 -04:00
Jan Schmidt 08d2f347e8 Btrfs: fix extent state leak on failed nodatasum reads
When encountering an EIO while reading from a nodatasum extent, we
insert an error record into the inode's failure tree.
btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd
better clear the failure tree in that case, otherwise the kernel
complains about

	BUG extent_state: Objects remaining on kmem_cache_close()

on rmmod.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 19:00:53 -04:00
Chris Mason 0e735872fb Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into for-linus 2011-06-10 18:58:08 -04:00
David Sterba 5be76758f3 btrfs: fix unlocked access of delalloc_inodes
list_splice_init will make delalloc_inodes empty, but without a spinlock
around, this may produce corrupted list head, accessed in many placess,
The race window is very tight and nobody seems to have hit it so far.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 18:57:11 -04:00
Li Zefan 027ed2f004 Btrfs: avoid stack bloat in btrfs_ioctl_fs_info()
The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so
don't declare the variable on stack.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 18:57:10 -04:00
richard kennedy 9eb9104c66 btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline
Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit
builds. This shrinks its size to 128 bytes allowing it to fit into one
fewer cache lines and allows more objects per slab in its kmem_cache.

slabinfo extent_buffer reports :-

 before:-
    Sizes (bytes)     Slabs
    ----------------------------------
    Object :     136  Total  :     123
    SlabObj:     136  Full   :     121
    SlabSiz:    4096  Partial:       0
    Loss   :       0  CpuSlab:       2
    Align  :       8  Objects:      30

 after :-
    Object :     128  Total  :       4
    SlabObj:     128  Full   :       2
    SlabSiz:    4096  Partial:       0
    Loss   :       0  CpuSlab:       2
    Align  :       8  Objects:      32

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 18:57:10 -04:00
Sage Weil 38e880540f Btrfs: clear current->journal_info on async transaction commit
Normally current->jouranl_info is cleared by commit_transaction.  For an
async snap or subvol creation, though, it runs in a work queue.  Clear
it in btrfs_commit_transaction_async() to avoid leaking a non-NULL
journal_info when we return to userspace.  When the actual commit runs in
the other thread it won't care that it's current->journal_info is already
NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 16:42:29 -04:00
Chris Mason 38e8788066 Btrfs: make sure to recheck for bitmaps in clusters
Josef recently changed the free extent cache to look in
the block group cluster for any bitmaps before trying to
add a new bitmap for the same offset.  This avoids BUG_ON()s due
covering duplicate ranges.

But it didn't go quite far enough.  A given free range might span
between one or more bitmaps or free space entries.  The code has
looping to cover this, but it doesn't check for clustered bitmaps
every time.

This shuffles our gotos to check for a bitmap in the cluster
for every new bitmap entry we try to add.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10 16:36:57 -04:00
Arne Jansen 6eef312588 btrfs: remove unneeded includes from scrub.c
Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10 14:59:52 +02:00
Arne Jansen 632dd772fc btrfs: reinitialize scrub workers
Scrub starts the workers each time a scrub starts and stops them after it
finished. This patch adds an initialization for the workers before each
start, otherwise the workers behave strangely.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10 12:14:13 +02:00
Arne Jansen 8c51032f97 btrfs: scrub: errors in tree enumeration
due to the semantics of btrfs_search_slot the path can point to an
invalid slot when ret > 0. This condition went unnoticed, which in
turn could have led to an incomplete scrubbing.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10 12:14:13 +02:00
Josef Bacik ad3e34bba4 Btrfs: don't map extent buffer if path->skip_locking is set
Arne's scrub stuff exposed a problem with mapping the extent buffer in
reada_for_search.  He searches the commit root with multiple threads and with
skip_locking set, so we can race and overwrite node->map_token since node isn't
locked.  So fix this so that we only map the extent buffer if we don't already
have a map_token and skip_locking isn't set.  Without this patch scrub would
panic almost immediately, with the patch it doesn't panic anymore.  Thanks,

Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-10 12:14:12 +02:00
Mathias Krause dac853ae89 exec: delay address limit change until point of no return
Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function.  This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-09 12:50:05 -07:00
Josef Bacik 3473f3c06a Btrfs: unlock the trans lock properly
In btrfs_wait_for_commit if we came upon a transaction that had committed we
just exited, but that's bad since we are holding the trans_lock.  So break
instead so that the lock is dropped.  Thanks,

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09 10:15:17 -04:00
Josef Bacik 25b8b936ed Btrfs: don't map extent buffer if path->skip_locking is set
Arne's scrub stuff exposed a problem with mapping the extent buffer in
reada_for_search.  He searches the commit root with multiple threads and with
skip_locking set, so we can race and overwrite node->map_token since node isn't
locked.  So fix this so that we only map the extent buffer if we don't already
have a map_token and skip_locking isn't set.  Without this patch scrub would
panic almost immediately, with the patch it doesn't panic anymore.  Thanks,

Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09 10:12:07 -04:00
Ingo Molnar 5f127133ee Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-06-09 09:14:34 +02:00
Linus Torvalds d21131bb0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: trivial: add space in fsc error message
  cifs: silence printk when establishing first session on socket
  CIFS ACL support needs CONFIG_KEYS, so depend on it
  possible memory corruption in cifs_parse_mount_options()
  cifs: make CIFS depend on CRYPTO_ECB
  cifs: fix the kernel release version in the default security warning message
2011-06-08 13:54:29 -07:00
Josef Bacik f6a398298d Btrfs: fix duplicate checking logic
When merging my code into the integration test the second check for duplicate
entries got screwed up.  This patch fixes it by dropping ret2 and just using ret
for the return value, and checking if we got an error before adding the bitmap
to the local list.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 16:37:29 -04:00
Josef Bacik 723bda2083 Btrfs: fix the allocator loop logic
I was testing with empty_cluster = 0 to try and reproduce a problem and kept
hitting early enospc panics.  This was because our loop logic was a little
confused.  So this is what I did

1) Make the loop variable the ultimate decider on wether we should loop again
isntead of checking to see if we had an uncached bg, empty size or empty
cluster.

2) Increment loop before checking to see what we are on to make the loop
definitions make more sense.

3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0
unless we didn't actually allocate a chunk.  If we did allocate a chunk we
should be able to easily setup a new cluster so clearing
empty_size/empty_cluster makes us less efficient.

This kept me from hitting panics while trying to reproduce the other problem.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 16:37:29 -04:00
Josef Bacik 2cdc342c20 Btrfs: fix bitmap regression
In cleaning up the clustering code I accidently introduced a regression by
adding bitmap entries to the cluster rb tree.  The problem is if we've maxed out
the number of bitmaps we can have for the block group we can only add free space
to the bitmaps, but since the bitmap is on the cluster we can't find it and we
try to create another one.  This would result in a panic because the total
bitmaps was bigger than the max bitmaps that were allowed.  This patch fixes
this by checking to see if we have a cluster, and then looking at the cluster rb
tree to see if it has a bitmap entry and if it does and that space belongs to
that bitmap, go ahead and add it to that bitmap.

I could hit this panic every time with an fs_mark test within a couple of
minutes.  With this patch I no longer hit the panic and fs_mark goes to
completion.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 16:37:28 -04:00
Josef Bacik f2bb8f5cfb Btrfs: don't commit the transaction if we dont have enough pinned bytes
I noticed when running an enospc test that we would get stuck committing the
transaction in check_data_space even though we truly didn't have enough space.
So check to see if bytes_pinned is bigger than num_bytes, if it's not don't
commit the transaction.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 15:08:31 -04:00
Josef Bacik 3de85bb95c Btrfs: noinline the cluster searching functions
When profiling the find cluster code it's hard to tell where we are spending our
time because the bitmap and non-bitmap functions get inlined by the compiler, so
make that not happen.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 15:08:30 -04:00
Josef Bacik 86d4a77ba3 Btrfs: cache bitmaps when searching for a cluster
If we are looking for a cluster in a particularly sparse or fragmented block
group, we will do a lot of looping through the free space tree looking for
various things, and if we need to look at bitmaps we will endup doing the whole
dance twice.  So instead add the bitmap entries to a temporary list so if we
have to do the bitmap search we can just look through the list of entries we've
found quickly instead of having to loop through the entire tree again.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08 15:08:28 -04:00
Jeff Layton 83fb086e0e cifs: trivial: add space in fsc error message
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-08 16:03:29 +00:00
Sage Weil 0c1f91f271 ceph: unwind canceled flock state
If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman <brchrisman@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-07 21:36:45 -07:00
Sage Weil 0e98728fa3 ceph: fix ENOENT logic in striped_read
Getting ENOENT is equivalent to reading 0 bytes.  Make that correction
before setting up the hit_stripe and was_short flags.

Fixes the following case:
 dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0
 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct

Reported-by: Henry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-07 21:34:16 -07:00
Sage Weil c3cd62839a ceph: fix short sync reads from the OSD
If we get a short read from the OSD because the object is small, we need to
zero the remainder of the buffer.  For O_DIRECT reads, the attempted range
is not trimmed to i_size by the VFS, so we were actually looping
indefinitely.

Fix by trimming by i_size, and the unconditionally zeroing the trailing
range.

Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-07 21:34:14 -07:00
Sage Weil 70b666c3b4 ceph: use ihold when we already have an inode ref
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-07 21:34:11 -07:00
Linus Torvalds 9c125d2af8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: Fix corrupt inode flags when remove ATTR_SYS flag
2011-06-07 19:04:21 -07:00
Linus Torvalds d205df9955 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Processes waiting on inode glock that no processes are holding
2011-06-07 18:44:10 -07:00
Linus Torvalds 8397345172 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  vfs: make unlink() and rmdir() return ENOENT in preference to EROFS
  lmLogOpen() broken failure exit
  usb: remove bad dput after dentry_unhash
  more conservative S_NOSEC handling
2011-06-07 18:36:59 -07:00
Theodore Ts'o e6bc45d65d vfs: make unlink() and rmdir() return ENOENT in preference to EROFS
If user space attempts to remove a non-existent file or directory, and
the file system is mounted read-only, return ENOENT instead of EROFS.
Either error code is arguably valid/correct, but ENOENT is a more
specific error message.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-07 08:51:14 -04:00
Al Viro 9054760ff5 lmLogOpen() broken failure exit
Callers of lmLogOpen() expect it to return -E... on failure exits, which
is what it returns, except for the case of blkdev_get_by_dev() failure.
It that case lmLogOpen() return the error with the wrong sign...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-07 08:50:59 -04:00
Jeff Layton 9c4843ea57 cifs: silence printk when establishing first session on socket
When signing is enabled, the first session that's established on a
socket will cause a printk like this to pop:

    CIFS VFS: Unexpected SMB signature

This is because the key exchange hasn't happened yet, so the signature
field is bogus. Don't try to check the signature on the socket until the
first session has been established. Also, eliminate the specific check
for SMB_COM_NEGOTIATE since this check covers that case too.

Cc: stable@kernel.org
Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-07 00:57:05 +00:00
Casey Bodley 7d751f6f8c nfsd: link returns nfserr_delay when breaking lease
fix for commit 4795bb37ef, nfsd: break
lease on unlink, link, and rename

if the LINK operation breaks a delegation, it returns NFS4ERR_NOENT
(which is not a valid error in rfc 5661) instead of NFS4ERR_DELAY.
the return value of nfsd_break_lease() in nfsd_link() must be
converted from host_err to err

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-06 18:46:56 -04:00
Randy Dunlap be1f4084b4 nfsd: v4 support requires CRYPTO
nfsd V4 support uses crypto interfaces, so select CRYPTO
to fix build errors in 2.6.39:

ERROR: "crypto_destroy_tfm" [fs/nfsd/nfsd.ko] undefined!
ERROR: "crypto_alloc_base" [fs/nfsd/nfsd.ko] undefined!

Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-06 18:37:35 -04:00
J. Bruce Fields b084f598df nfsd: fix dependency of nfsd on auth_rpcgss
Commit b0b0c0a26e "nfsd: add proc file listing kernel's gss_krb5
enctypes" added an nunnecessary dependency of nfsd on the auth_rpcgss
module.

It's a little ad hoc, but since the only piece of information nfsd needs
from rpcsec_gss_krb5 is a single static string, one solution is just to
share it with an include file.

Cc: stable@kernel.org
Reported-by: Michael Guntsche <mike@it-loops.com>
Cc: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-06 15:07:15 -04:00
Darren Salt 243e2dd38e CIFS ACL support needs CONFIG_KEYS, so depend on it
Build fails if CONFIG_KEYS is not selected.

Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-06 16:58:16 +00:00
Vasily Averin 957df4535d possible memory corruption in cifs_parse_mount_options()
error path after mountdata check frees uninitialized mountdata_copy

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-06 15:31:29 +00:00
Lukas Czerner a9c667f8f0 ext4: fixed tracepoints cleanup
While creating fixed tracepoints for ext3, basically by porting them
from ext4, I found a lot of useless retyping, wrong type usage, useless
variable passing and other inconsistencies in the ext4 fixed tracepoint
code.

This patch cleans the fixed tracepoint code for ext4 and also simplify
some of them.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-06 09:51:52 -04:00
Lukas Czerner c03f8aa9ab ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
Currently we are not marking the extent as the last one
(FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is
because we just do not check for it right now and continue searching for
next extent. But at the point we hit the hole at the end of the file, it
is too late.

This commit adds check for the allocated block in subsequent extent and
if there is no more extents (block = EXT_MAX_BLOCKS) just flag the
current one as the last one.

This behaviour has been spotted unintentionally by 252 xfstest, when the
test hangs out, because of wrong loop condition. However on other
filesystems (like xfs) it will exit anyway, because we notice the last
extent flag and exit.

With this patch xfstest 252 does not hang anymore, ext4 fiemap
implementation still reports bad extent type in some cases, however
this seems to be different issue.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-06 00:06:52 -04:00
Lukas Czerner f17722f917 ext4: Fix max file size and logical block counting of extent format file
Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

The root cause of the problem lies in the fact that we specifically set
s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
which is 32 bit long. However, we are not storing start and end block
number, but rather start block number and length in blocks. It means
that in order to cover extent from 0 to EXT_MAX_BLOCK we need
EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
and it does not.

The only way to fix it without changing the meaning of the struct
ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
by one fs block so we can cover the whole extent we can get by the
on-disk extent format.

Also in many places EXT_MAX_BLOCK is used as length instead of maximum
logical block number as the name suggests, it is all a bit messy. So
this commit renames it to EXT_MAX_BLOCKS and change its usage in some
places to actually be maximum number of blocks in the extent.

The bug which this commit fixes can be reproduced as follows:

 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
 sync
 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-06 00:05:17 -04:00
Yongqiang Yang 5def136025 ext4: correct comments for ext4_free_blocks()
metadata is not parameter of ext4_free_blocks() any more.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-05 23:26:40 -04:00
Linus Torvalds e6ece70732 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
  btrfs: fix uninitialized variable warning
  btrfs: add helper for fs_info->closing
  Btrfs: add mount -o inode_cache
  btrfs: scrub: add explicit plugging
  btrfs: use btrfs_ino to access inode number
  Btrfs: don't save the inode cache if we are deleting this root
  btrfs: false BUG_ON when degraded
  Btrfs: don't save the inode cache in non-FS roots
  Btrfs: make sure we don't overflow the free space cache crc page
  Btrfs: fix uninit variable in the delayed inode code
  btrfs: scrub: don't reuse bios and pages
  Btrfs: leave spinning on lookup and map the leaf
  Btrfs: check for duplicate entries in the free space cache
  Btrfs: don't try to allocate from a block group that doesn't have enough space
  Btrfs: don't always do readahead
  Btrfs: try not to sleep as much when doing slow caching
  Btrfs: kill BTRFS_I(inode)->block_group
  Btrfs: don't look at the extent buffer level 3 times in a row
  Btrfs: map the node block when looking for readahead targets
  Btrfs: set range_start to the right start in count_range_bits
  ...
2011-06-05 06:17:23 +09:00
David Sterba aa0467d8d2 btrfs: fix uninitialized variable warning
With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
produced this warning:

fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
uninitialized in this function

Introduced by commit 16cdcec736 ("btrfs: implement delayed inode items
operation").

This fixes a bug in btrfs_update_inode(): if the returned value from
btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
updated and several call paths may hit a BUG_ON or fail with strange
code.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Sterba <dsterba@suse.cz>
2011-06-04 08:11:38 -04:00
David Sterba 7841cb2898 btrfs: add helper for fs_info->closing
wrap checking of filesystem 'closing' flag and fix a few missing memory
barriers.

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-06-04 08:11:22 -04:00
Chris Mason 4b9465cb9e Btrfs: add mount -o inode_cache
This makes the inode map cache default to off until we
fix the overflow problem when the free space crcs don't fit
inside a single page.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:47 -04:00
Arne Jansen e7786c3ae5 btrfs: scrub: add explicit plugging
With the removal of the implicit plugging scrub ends up doing more and
smaller I/O than necessary. This patch adds explicit plugging per chunk.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:46 -04:00
David Sterba a4689d2bd3 btrfs: use btrfs_ino to access inode number
commit 4cb5300bc ("Btrfs: add mount -o auto_defrag") accesses inode
number directly while it should use the helper with the new inode
number allocator.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:46 -04:00
Josef Bacik d132a538d2 Btrfs: don't save the inode cache if we are deleting this root
With xfstest 254 I can panic the box every time with the inode number caching
stuff on.  This is because we clean the inodes out when we delete the subvolume,
but then we write out the inode cache which adds an inode to the subvolume inode
tree, and then when it gets evicted again the root gets added back on the dead
roots list and is deleted again, so we have a double free.  To stop this from
happening just return 0 if refs is 0 (and we're not the tree root since tree
root always has refs of 0).  With this fix 254 no longer panics.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:45 -04:00
Arne Jansen 5f3f302a6f btrfs: false BUG_ON when degraded
In degraded mode the struct btrfs_device of missing devs don't have
device->name set. A kstrdup of NULL correctly returns NULL. Don't
BUG in this case.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:44 -04:00
liubo ca456ae280 Btrfs: don't save the inode cache in non-FS roots
This adds extra checks to make sure the inode map we are caching really
belongs to a FS root instead of a special relocation tree.  It
prevents crashes during balancing operations.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:44 -04:00
Chris Mason 211f96c24f Btrfs: make sure we don't overflow the free space cache crc page
The free space cache uses only one page for crcs right now,
which means we can't have a cache file bigger than the
crcs we can fit in the first page.  This adds a check to
enforce that restriction.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:43 -04:00
Chris Mason 17aca1c987 Btrfs: fix uninit variable in the delayed inode code
The nitems counter needs to start at zero

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04 08:03:43 -04:00
Arne Jansen 1bc8779349 btrfs: scrub: don't reuse bios and pages
The current scrub implementation reuses bios and pages as often as possible,
allocating them only on start and releasing them when finished. This leads
to more problems with the block layer than it's worth. The elevator gets
confused when there are more pages added to the bio than bi_size suggests.
This patch completely rips out the reuse of bios and pages and allocates
them freshly for each submit.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Maosn <chris.mason@oracle.com>
2011-06-04 08:03:17 -04:00
Linus Torvalds 4f1ba49efa Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  block: Use hlist_entry() for io_context.cic_list.first
  cfq-iosched: Remove bogus check in queue_fail path
  xen/blkback: potential null dereference in error handling
  xen/blkback: don't call vbd_size() if bd_disk is NULL
  block: blkdev_get() should access ->bd_disk only after success
  CFQ: Fix typo and remove unnecessary semicolon
  block: remove unwanted semicolons
  Revert "block: Remove extra discard_alignment from hd_struct."
  nbd: adjust 'max_part' according to part_shift
  nbd: limit module parameters to a sane value
  nbd: pass MSG_* flags to kernel_recvmsg()
  block: improve the bio_add_page() and bio_add_pc_page() descriptions
2011-06-04 08:11:26 +09:00
Linus Torvalds 3af91a1256 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix-up free space earlier
  UBIFS: intialize LPT earlier
  UBIFS: assert no fixup when writing a node
  UBIFS: fix clean znode counter corruption in error cases
  UBIFS: fix memory leak on error path
  UBIFS: fix shrinker object count reports
  UBIFS: fix recovery broken by the previous recovery fix
  UBIFS: amend ubifs_recover_leb interface
  UBIFS: introduce a "grouped" journal head flag
  UBIFS: supress false error messages
2011-06-04 07:59:32 +09:00
Al Viro 9e1f1de02c more conservative S_NOSEC handling
Caching "we have already removed suid/caps" was overenthusiastic as merged.
On network filesystems we might have had suid/caps set on another client,
silently picked by this client on revalidate, all of that *without* clearing
the S_NOSEC flag.

AFAICS, the only reasonably sane way to deal with that is
	* new superblock flag; unless set, S_NOSEC is not going to be set.
	* local block filesystems set it in their ->mount() (more accurately,
mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
local block ones clear it)
	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
inode attribute changes are picked from other clients.

It's not an earth-shattering hole (anybody that can set suid on another client
will almost certainly be able to write to the file before doing that anyway),
but it's a bug that needs fixing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-03 18:24:58 -04:00
Suresh Jayaraman 5f0b23eeba cifs: make CIFS depend on CRYPTO_ECB
When CONFIG_CRYPTO_ECB is not set, trying to mount a CIFS share with NTLM
security resulted in mount failure with the following error:
   "CIFS VFS: could not allocate des crypto API"

Seems like a leftover from commit 43988d7.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
CC: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-03 17:38:44 +00:00
Suresh Jayaraman c592a70737 cifs: fix the kernel release version in the default security warning message
When ntlm security mechanim is used, the message that warns about the upgrade
to ntlmv2 got the kernel release version wrong (Blame it on Linus :). Fix it.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-03 15:31:23 +00:00
Ben Gardiner 098011940a UBIFS: fix-up free space earlier
The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:

Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.

Artem: tweaked the patch a bit

Signed-off-by: Ben Gardiner <bengardiner@nanometrics>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03 18:12:31 +03:00
Ben Gardiner 781c5717a9 UBIFS: intialize LPT earlier
The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.

Artem: massaged the patch and commit message.

Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03 18:12:31 +03:00
Ben Gardiner 4f1ab9b01d UBIFS: assert no fixup when writing a node
The current free space fixup can result in some writing to the UBI volume
when the space_fixup flag is set.

To catch instances where UBIFS is writing to the NAND while the space_fixup
flag is set, add an assert to ubifs_write_node().

Artem: tweaked the patch, added similar assertion to the write buffer
       write path.

Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03 18:12:31 +03:00