Граф коммитов

18547 Коммитов

Автор SHA1 Сообщение Дата
Eric Paris 4ca763523e fsnotify: add groups to fsnotify_inode_groups when registering inode watch
Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation.  This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event.  This patch adds groups to the global list on
when the first inode mark is added to the group.

Signed-of-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris 36fddebaa8 fsnotify: initialize the group->num_marks in a better place
Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list.  This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0)  This patch moves the initialization stuff and makes it clear when
it is really being held.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris 19c2a0e1a2 fsnotify: rename fsnotify_groups to fsnotify_inode_groups
Simple renaming patch.  fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris 0d2e2a1d00 fsnotify: drop mask argument from fsnotify_alloc_group
Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris ffab83402f fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris cd7752ce7c fsnotify: fsnotify_obtain_group kzalloc cleanup
fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0.
This patch just deletes those useless lines.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris 74be0cc828 fsnotify: remove group_num altogether
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added.  I no longer think this is a
necessary thing to do and so we remove the group_num.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris cac69dad32 fsnotify: lock annotation for event replacement
fsnotify_replace_event need to lock both the old and the new event.  This
causes lockdep to get all pissed off since it dosn't know this is safe.
It's safe in this case since the new event is impossible to be reached from
other places in the kernel.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris 1201a5361b fsnotify: replace an event on a list
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event.  This patch implements the replace functionality of that
process.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris b4e4e14073 fsnotify: clone existing events
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller.  Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created.  Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris 74766bbfa9 fsnotify: per group notification queue merge types
inotify only wishes to merge a new event with the last event on the
notification fifo.  fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together.  This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code.  This allows each use of fsnotify to provide
their own merge functionality.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris 28c60e37f8 fsnotify: send struct file when sending events to parents when possible
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:48 -04:00
Eric Paris 2a12a9d781 fsnotify: pass a file instead of an inode to open, read, and write
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process.  Close was the only operation that already was passing
a struct file to the notification hook.  This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:32 -04:00
Eric Paris 8112e2d6a7 fsnotify: include data in should_send calls
fanotify is going to need to look at file->private_data to know if an event
should be sent or not.  This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 7b0a04fbfb fsnotify: provide the data type to should_send_event
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener.  Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris d7f0ce4e43 inotify: do not spam console without limit
inotify was supposed to have a dmesg printk ratelimitor which would cause
inotify to only emit one message per boot.  The static bool was never set
so it kept firing messages.  This patch correctly limits warnings in multiple
places.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 2dfc1cae4c inotify: remove inotify in kernel interface
nothing uses inotify in the kernel, drop it!

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 7050c48826 inotify: do not reuse watch descriptors
Prior to 2.6.31 inotify would not reuse watch descriptors until all of
them had been used at least once.  After the rewrite inotify would reuse
watch descriptors.  The selinux utility 'restorecond' was found to have
problems when watch descriptors were reused.  This patch reverts to the
pre inotify rewrite behavior to not reuse watch descriptors.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:20 -04:00
Eric Paris 6f3a539e3b fsnotify: use kmem_cache_zalloc to simplify event initialization
fsnotify event initialization is done entry by entry with almost everything
set to either 0 or NULL.  Use kmem_cache_zalloc and only initialize things
that need non-zero initialization.  Also means we don't have to change
initialization entries based on the config options.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:20 -04:00
Eric Paris f0553af054 fsnotify: kzalloc fsnotify groups
Use kzalloc for fsnotify_groups so that none of the fields can leak any
information accidentally.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:20 -04:00
Eric Paris 31ddd3268d inotify: use container_of instead of casting
inotify_free_mark casts directly from an fsnotify_mark_entry to an
inotify_inode_mark_entry.  This works, but should use container_of instead
for future proofing.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:19 -04:00
Eric Paris b4277d3dd5 fsnotify: use fsnotify_create_event to allocate the q_overflow event
Currently fsnotify defines a static fsnotify event which is sent when a
group overflows its allotted queue length.  This patch just allocates that
event from the event cache rather than defining it statically.  There is no
known reason that the current implementation is wrong, but this makes sure the
event is initialized and created like any other.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:19 -04:00
Eric Paris 40554c3dae fsnotify: allow addition of duplicate fsnotify marks
This patch allows a task to add a second fsnotify mark to an inode for the
same group.  This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function.   This
is useful if a user wants to add a new mark before removing the old one.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris 9e1c74321d fsnotify: duplicate fsnotify_mark_entry data between 2 marks
Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris b7ba837153 inotify: simplify the inotify idr handling
This patch moves all of the idr editing operations into their own idr
functions.  It makes it easier to prove locking correctness and to to
understand the code flow.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:16 -04:00
Latchesar Ionkov da7ddd3296 9p: Pass the correct end of buffer to p9stat_read
Pass the correct end of the buffer to p9stat_read.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-07-27 14:52:04 -05:00
Eric W. Biederman d33002129e sysfs: allow creating symlinks from untagged to tagged directories
Supporting symlinks from untagged to tagged directories is reasonable,
and needed to support CONFIG_SYSFS_DEPRECATED.  So don't fail a prior
allowing that case to work.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26 12:02:41 -07:00
Eric W. Biederman 521d045354 sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories.
This happens for network devices when SYSFS_DEPRECATED is enabled.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26 12:02:41 -07:00
Eric W. Biederman 96d6523adf sysfs: Don't allow the creation of symlinks we can't remove
Recently my tagged sysfs support revealed a flaw in the device core
that a few rare drivers are running into such that we don't always put
network devices in a class subdirectory named net/.

Since we are not creating the class directory the network devices wind
up in a non-tagged directory, but the symlinks to the network devices
from /sys/class/net are in a tagged directory.  All of which works
until we go to remove or rename the symlink.  When we remove or rename
a symlink we look in the namespace of the target of the symlink.
Since the target of the symlink is in a non-tagged sysfs directory we
don't have a namespace to look in, and we fail to remove the symlink.

Detect this problem up front and simply don't create symlinks we won't
be able to remove later.  This prevents symlink leakage and fails in
a much clearer and more understandable way.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-26 12:02:41 -07:00
David Howells 4c0c03ca54 CIFS: Fix a malicious redirect problem in the DNS lookup code
Fix the security problem in the CIFS filesystem DNS lookup code in which a
malicious redirect could be installed by a random user by simply adding a
result record into one of their keyrings with add_key() and then invoking a
CIFS CFS lookup [CVE-2010-2524].

This is done by creating an internal keyring specifically for the caching of
DNS lookups.  To enforce the use of this keyring, the module init routine
creates a set of override credentials with the keyring installed as the thread
keyring and instructs request_key() to only install lookup result keys in that
keyring.

The override is then applied around the call to request_key().

This has some additional benefits when a kernel service uses this module to
request a key:

 (1) The result keys are owned by root, not the user that caused the lookup.

 (2) The result keys don't pop up in the user's keyrings.

 (3) The result keys don't come out of the quota of the user that caused the
     lookup.

The keyring can be viewed as root by doing cat /proc/keys:

2a0ca6c3 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: 1/4

It can then be listed with 'keyctl list' by root.

	# keyctl list 0x2a0ca6c3
	1 key in keyring:
	726766307: --alswrv     0     0 dns_resolver: foo.bar.com

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <smfrench@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-22 09:42:40 -07:00
Linus Torvalds a4ce96ac35 Fix up trivial spelling errors ('taht' -> 'that')
Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.

Reported-by: Lucas <canolucas@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-21 09:25:42 -07:00
Linus Torvalds e0959371b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: do not include cap/dentry releases in replayed messages
  ceph: reuse request message when replaying against recovering mds
  ceph: fix creation of ipv6 sockets
  ceph: fix parsing of ipv6 addresses
  ceph: fix printing of ipv6 addrs
  ceph: add kfree() to error path
  ceph: fix leak of mon authorizer
  ceph: fix message revocation
2010-07-20 16:27:58 -07:00
Linus Torvalds 620d0be881 Merge branch 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev
* 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
  xfs: track AGs with reclaimable inodes in per-ag radix tree
  xfs: convert inode shrinker to per-filesystem contexts
  mm: add context argument to shrinker callback
2010-07-19 20:18:24 -07:00
Linus Torvalds ee1039307a Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE
  Btrfs: fix CLONE ioctl destination file size expansion to block boundary
  Btrfs: fix split_leaf double split corner case
2010-07-19 19:33:02 -07:00
Dave Chinner 16fd536737 xfs: track AGs with reclaimable inodes in per-ag radix tree
https://bugzilla.kernel.org/show_bug.cgi?id=16348

When the filesystem grows to a large number of allocation groups,
the summing of recalimable inodes gets expensive. In many cases,
most AGs won't have any reclaimable inodes and so we are wasting CPU
time aggregating over these AGs. This is particularly important for
the inode shrinker that gets called frequently under memory
pressure.

To avoid the overhead, track AGs with reclaimable inodes in the
per-ag radix tree so that we can find all the AGs with reclaimable
inodes via a simple gang tag lookup. This involves setting the tag
when the first reclaimable inode is tracked in the AG, and removing
the tag when the last reclaimable inode is removed from the tree.
Then the summation process becomes a loop walking the radix tree
summing AGs with the reclaim tag set.

This significantly reduces the overhead of scanning - a 6400 AG
filesystea now only uses about 25% of a cpu in kswapd while slab
reclaim progresses instead of being permanently stuck at 100% CPU
and making little progress. Clean filesystems filesystems will see
no overhead and the overhead only increases linearly with the number
of dirty AGs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-20 09:43:39 +10:00
Dave Chinner 70e60ce715 xfs: convert inode shrinker to per-filesystem contexts
Now the shrinker passes us a context, wire up a shrinker context per
filesystem. This allows us to remove the global mount list and the
locking problems that introduced. It also means that a shrinker call
does not need to traverse clean filesystems before finding a
filesystem with reclaimable inodes.  This significantly reduces
scanning overhead when lots of filesystems are present.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-20 08:07:02 +10:00
Dan Rosenberg 2ebc346478 Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE
1.  The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check
whether the donor file is append-only before writing to it.

2.  The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer
overflow that allows a user to specify an out-of-bounds range to copy
from the source file (if off + len wraps around).  I haven't been able
to successfully exploit this, but I'd imagine that a clever attacker
could use this to read things he shouldn't.  Even if it's not
exploitable, it couldn't hurt to be safe.

Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
cc: stable@kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-19 16:58:20 -04:00
Sage Weil b5384d48f4 Btrfs: fix CLONE ioctl destination file size expansion to block boundary
The CLONE and CLONE_RANGE ioctls round up the range of extents being
cloned to the block size when the range to clone extends to the end of file
(this is always the case with CLONE).  It was then using that offset when
extending the destination file's i_size.  Fix this by not setting i_size
beyond the originally requested ending offset.

This bug was introduced by a22285a6 (2.6.35-rc1).

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-19 16:15:06 -04:00
Chris Mason 99d8f83c98 Btrfs: fix split_leaf double split corner case
split_leaf was not properly balancing leaves when it was forced to
split a leaf twice.  This commit adds an extra push left and right
before forcing the double split in hopes of getting the slot where
we want to insert at either the start or end of the leaf.

If the extra pushes do work, then we are able to avoid splitting twice
and we keep the tree properly balanced.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-19 16:14:50 -04:00
Peter Oberparleiter cffab6bc55 [S390] dasd: use correct label location for diag fba disks
Partition boundary calculation fails for DASD FBA disks under the
following conditions:
- disk is formatted with CMS FORMAT with a blocksize of more than
  512 bytes
- all of the disk is reserved to a single CMS file using CMS RESERVE
- the disk is accessed using the DIAG mode of the DASD driver

Under these circumstances, the partition detection code tries to
read the CMS label block containing partition-relevant information
from logical block offset 1, while it is in fact located at physical
block offset 1.

Fix this problem by using the correct CMS label block location
depending on the device type as determined by the DASD SENSE ID
information.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-07-19 09:22:50 +02:00
Dave Chinner 7f8275d0d6 mm: add context argument to shrinker callback
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-19 14:56:17 +10:00
Linus Torvalds bea9a6d239 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Silence gcc warning in ocfs2_write_zero_page().
  jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
  ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
  ocfs2: Don't duplicate pages past i_size during CoW.
  ocfs2: tighten up strlen() checking
  ocfs2: Make xattr reflink work with new local alloc reservation.
  ocfs2: make xattr extension work with new local alloc reservation.
  ocfs2: Remove the redundant cpu_to_le64.
  ocfs2/dlm: don't access beyond bitmap size
  ocfs2: No need to zero pages past i_size.
  ocfs2: Zero the tail cluster when extending past i_size.
  ocfs2: When zero extending, do it by page.
  ocfs2: Limit default local alloc size within bitmap range.
  ocfs2: Move orphan scan work to ocfs2_wq.
  fs/ocfs2/dlm: Add missing spin_unlock
2010-07-18 10:09:25 -07:00
Joel Becker 5453258d53 ocfs2: Silence gcc warning in ocfs2_write_zero_page().
ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc
doesn't know that.  Set ret=0 just to make gcc happy.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-16 13:33:39 -07:00
Sage Weil e979cf5039 ceph: do not include cap/dentry releases in replayed messages
Strip the cap and dentry releases from replayed messages.  They can
cause the shared state to get out of sync because they were generated
(with the request message) earlier, and no longer reflect the current
client state.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-16 10:30:18 -07:00
Sage Weil 01a92f174f ceph: reuse request message when replaying against recovering mds
Replayed rename operations (after an mds failure/recovery) were broken
because the request paths were regenerated from the dentry names, which
get mangled when d_move() is called.

Instead, resend the previous request message when replaying completed
operations.  Just make sure the REPLAY flag is set and the target ino is
filled in.

This fixes problems with workloads doing renames when the MDS restarts,
where the rename operation appears to succeed, but on mds restart then
fails (leading to client confusion, app breakage, etc.).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-16 10:30:17 -07:00
Jan Kara 13ceef099e jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
      jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
      any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback.  This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 15:17:47 -07:00
Wengang Wang a39953dd95 ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
dlm_migration_can_proceed() for that purpose.  However, if the node is
down, dlm_migration_can_proceed() will also return "go ahead".  In this
rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
the BUG_ON() that trips over this condition.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 10:56:30 -07:00
Tao Ma f5e27b6ddf ocfs2: Don't duplicate pages past i_size during CoW.
During CoW, the pages after i_size don't contain valid data, so there's
no need to read and duplicate them.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 10:54:28 -07:00
Bob Peterson 728a756b8f GFS2: rename causes kernel Oops
This patch fixes a kernel Oops in the GFS2 rename code.

The problem was in the way the gfs2 directory code was trying
to re-use sentinel directory entries.

In the failing case, gfs2's rename function was renaming a
file to another name that had the same non-trivial length.
The file being renamed happened to be the first directory
entry on the leaf block.

First, the rename code (gfs2_rename in ops_inode.c) found the
original directory entry and decided it could do its job by
simply replacing the directory entry with another.  Therefore
it determined correctly that no block allocations were needed.

Next, the rename code deleted the old directory entry prior to
replacing it with the new name.  Therefore, the soon-to-be
replaced directory entry was temporarily made into a directory
entry "sentinel" or a place holder at the start of a leaf block.

Lastly, it went to re-add the replacement directory entry in
that leaf block.  However, when gfs2_dirent_find_space was
looking for space in the leaf block, it used the wrong value
for the sentinel.  That threw off its calculations so later
it decides it can't really re-use the sentinel and therefore
must allocate a new leaf block.  But because it previously decided
to re-use the directory entry, it didn't waste the time to
grab a new block allocation for the inode.  Therefore, the
inode's i_alloc pointer was still NULL and it crashes trying to
reference it.

In the case of sentinel directory entries, the entire dirent is
reused, not just the "free space" portion of it, and therefore
the function gfs2_dirent_find_space should use the value 0
rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.

Fixing this calculation enables the reproducer programs to work
properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-15 09:07:56 +01:00
Abhijith Das 8b4216018b GFS2: BUG in gfs2_adjust_quota
HighMem pages on i686 do not get mapped to the buffer_heads and this was
causing a NULL pointer dereference when we were trying to memset page buffers
to zero.
We now use zero_user() that kmaps the page and directly manipulates page data.
This patch also fixes a boundary condition that was incorrect.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-15 09:07:16 +01:00