WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Eric Paris	ffab83402f	fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:50 -04:00
Eric Paris	cd7752ce7c	fsnotify: fsnotify_obtain_group kzalloc cleanup fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0. This patch just deletes those useless lines. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:50 -04:00
Eric Paris	74be0cc828	fsnotify: remove group_num altogether The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:50 -04:00
Eric Paris	cac69dad32	fsnotify: lock annotation for event replacement fsnotify_replace_event need to lock both the old and the new event. This causes lockdep to get all pissed off since it dosn't know this is safe. It's safe in this case since the new event is impossible to be reached from other places in the kernel. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:50 -04:00
Eric Paris	1201a5361b	fsnotify: replace an event on a list fanotify would like to clone events already on its notification list, make changes to the new event, and then replace the old event on the list with the new event. This patch implements the replace functionality of that process. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:49 -04:00
Eric Paris	b4e4e14073	fsnotify: clone existing events fsnotify_clone_event will take an event, clone it, and return the cloned event to the caller. Since events may be in use by multiple fsnotify groups simultaneously certain event entries (such as the mask) cannot be changed after the event was created. Since fanotify would like to merge events happening on the same file it needs a new clean event to work with so it can change any fields it wishes. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:49 -04:00
Eric Paris	74766bbfa9	fsnotify: per group notification queue merge types inotify only wishes to merge a new event with the last event on the notification fifo. fanotify is willing to merge any events including by means of bitwise OR masks of multiple events together. This patch moves the inotify event merging logic out of the generic fsnotify notification.c and into the inotify code. This allows each use of fsnotify to provide their own merge functionality. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:49 -04:00
Eric Paris	28c60e37f8	fsnotify: send struct file when sending events to parents when possible fanotify needs a path in order to open an fd to the object which changed. Currently notifications to inode's parents are done using only the inode. For some parental notification we have the entire file, send that so fanotify can use it. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:48 -04:00
Eric Paris	2a12a9d781	fsnotify: pass a file instead of an inode to open, read, and write fanotify, the upcoming notification system actually needs a struct path so it can do opens in the context of listeners, and it needs a file so it can get f_flags from the original process. Close was the only operation that already was passing a struct file to the notification hook. This patch passes a file for access, modify, and open as well as they are easily available to these hooks. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:32 -04:00
Eric Paris	8112e2d6a7	fsnotify: include data in should_send calls fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:31 -04:00
Eric Paris	7b0a04fbfb	fsnotify: provide the data type to should_send_event fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:31 -04:00
Eric Paris	d7f0ce4e43	inotify: do not spam console without limit inotify was supposed to have a dmesg printk ratelimitor which would cause inotify to only emit one message per boot. The static bool was never set so it kept firing messages. This patch correctly limits warnings in multiple places. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:31 -04:00
Eric Paris	2dfc1cae4c	inotify: remove inotify in kernel interface nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:31 -04:00
Eric Paris	7050c48826	inotify: do not reuse watch descriptors Prior to 2.6.31 inotify would not reuse watch descriptors until all of them had been used at least once. After the rewrite inotify would reuse watch descriptors. The selinux utility 'restorecond' was found to have problems when watch descriptors were reused. This patch reverts to the pre inotify rewrite behavior to not reuse watch descriptors. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:20 -04:00
Eric Paris	6f3a539e3b	fsnotify: use kmem_cache_zalloc to simplify event initialization fsnotify event initialization is done entry by entry with almost everything set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things that need non-zero initialization. Also means we don't have to change initialization entries based on the config options. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:20 -04:00
Eric Paris	f0553af054	fsnotify: kzalloc fsnotify groups Use kzalloc for fsnotify_groups so that none of the fields can leak any information accidentally. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:20 -04:00
Eric Paris	31ddd3268d	inotify: use container_of instead of casting inotify_free_mark casts directly from an fsnotify_mark_entry to an inotify_inode_mark_entry. This works, but should use container_of instead for future proofing. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:19 -04:00
Eric Paris	b4277d3dd5	fsnotify: use fsnotify_create_event to allocate the q_overflow event Currently fsnotify defines a static fsnotify event which is sent when a group overflows its allotted queue length. This patch just allocates that event from the event cache rather than defining it statically. There is no known reason that the current implementation is wrong, but this makes sure the event is initialized and created like any other. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:19 -04:00
Eric Paris	40554c3dae	fsnotify: allow addition of duplicate fsnotify marks This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:17 -04:00
Eric Paris	9e1c74321d	fsnotify: duplicate fsnotify_mark_entry data between 2 marks Simple copy fsnotify information from one mark to another in preparation for the second mark to replace the first. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:17 -04:00
Eric Paris	b7ba837153	inotify: simplify the inotify idr handling This patch moves all of the idr editing operations into their own idr functions. It makes it easier to prove locking correctness and to to understand the code flow. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:16 -04:00
Latchesar Ionkov	da7ddd3296	9p: Pass the correct end of buffer to p9stat_read Pass the correct end of the buffer to p9stat_read. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-07-27 14:52:04 -05:00
Eric W. Biederman	d33002129e	sysfs: allow creating symlinks from untagged to tagged directories Supporting symlinks from untagged to tagged directories is reasonable, and needed to support CONFIG_SYSFS_DEPRECATED. So don't fail a prior allowing that case to work. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-07-26 12:02:41 -07:00
Eric W. Biederman	521d045354	sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories. This happens for network devices when SYSFS_DEPRECATED is enabled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-07-26 12:02:41 -07:00
Eric W. Biederman	96d6523adf	sysfs: Don't allow the creation of symlinks we can't remove Recently my tagged sysfs support revealed a flaw in the device core that a few rare drivers are running into such that we don't always put network devices in a class subdirectory named net/. Since we are not creating the class directory the network devices wind up in a non-tagged directory, but the symlinks to the network devices from /sys/class/net are in a tagged directory. All of which works until we go to remove or rename the symlink. When we remove or rename a symlink we look in the namespace of the target of the symlink. Since the target of the symlink is in a non-tagged sysfs directory we don't have a namespace to look in, and we fail to remove the symlink. Detect this problem up front and simply don't create symlinks we won't be able to remove later. This prevents symlink leakage and fails in a much clearer and more understandable way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-07-26 12:02:41 -07:00
David Howells	4c0c03ca54	CIFS: Fix a malicious redirect problem in the DNS lookup code Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-22 09:42:40 -07:00
Linus Torvalds	a4ce96ac35	Fix up trivial spelling errors ('taht' -> 'that') Pointed out by Lucas who found the new one in a comment in setup_percpu.c. And then I fixed the others that I grepped for. Reported-by: Lucas <canolucas@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-21 09:25:42 -07:00
Linus Torvalds	e0959371b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not include cap/dentry releases in replayed messages ceph: reuse request message when replaying against recovering mds ceph: fix creation of ipv6 sockets ceph: fix parsing of ipv6 addresses ceph: fix printing of ipv6 addrs ceph: add kfree() to error path ceph: fix leak of mon authorizer ceph: fix message revocation	2010-07-20 16:27:58 -07:00
Linus Torvalds	620d0be881	Merge branch 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev * 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev: xfs: track AGs with reclaimable inodes in per-ag radix tree xfs: convert inode shrinker to per-filesystem contexts mm: add context argument to shrinker callback	2010-07-19 20:18:24 -07:00
Linus Torvalds	ee1039307a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE Btrfs: fix CLONE ioctl destination file size expansion to block boundary Btrfs: fix split_leaf double split corner case	2010-07-19 19:33:02 -07:00
Dave Chinner	16fd536737	xfs: track AGs with reclaimable inodes in per-ag radix tree https://bugzilla.kernel.org/show_bug.cgi?id=16348 When the filesystem grows to a large number of allocation groups, the summing of recalimable inodes gets expensive. In many cases, most AGs won't have any reclaimable inodes and so we are wasting CPU time aggregating over these AGs. This is particularly important for the inode shrinker that gets called frequently under memory pressure. To avoid the overhead, track AGs with reclaimable inodes in the per-ag radix tree so that we can find all the AGs with reclaimable inodes via a simple gang tag lookup. This involves setting the tag when the first reclaimable inode is tracked in the AG, and removing the tag when the last reclaimable inode is removed from the tree. Then the summation process becomes a loop walking the radix tree summing AGs with the reclaim tag set. This significantly reduces the overhead of scanning - a 6400 AG filesystea now only uses about 25% of a cpu in kswapd while slab reclaim progresses instead of being permanently stuck at 100% CPU and making little progress. Clean filesystems filesystems will see no overhead and the overhead only increases linearly with the number of dirty AGs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-20 09:43:39 +10:00
Dave Chinner	70e60ce715	xfs: convert inode shrinker to per-filesystem contexts Now the shrinker passes us a context, wire up a shrinker context per filesystem. This allows us to remove the global mount list and the locking problems that introduced. It also means that a shrinker call does not need to traverse clean filesystems before finding a filesystem with reclaimable inodes. This significantly reduces scanning overhead when lots of filesystems are present. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-20 08:07:02 +10:00
Dan Rosenberg	2ebc346478	Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:58:20 -04:00
Sage Weil	b5384d48f4	Btrfs: fix CLONE ioctl destination file size expansion to block boundary The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by `a22285a6` (2.6.35-rc1). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:15:06 -04:00
Chris Mason	99d8f83c98	Btrfs: fix split_leaf double split corner case split_leaf was not properly balancing leaves when it was forced to split a leaf twice. This commit adds an extra push left and right before forcing the double split in hopes of getting the slot where we want to insert at either the start or end of the leaf. If the extra pushes do work, then we are able to avoid splitting twice and we keep the tree properly balanced. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:14:50 -04:00
Peter Oberparleiter	cffab6bc55	[S390] dasd: use correct label location for diag fba disks Partition boundary calculation fails for DASD FBA disks under the following conditions: - disk is formatted with CMS FORMAT with a blocksize of more than 512 bytes - all of the disk is reserved to a single CMS file using CMS RESERVE - the disk is accessed using the DIAG mode of the DASD driver Under these circumstances, the partition detection code tries to read the CMS label block containing partition-relevant information from logical block offset 1, while it is in fact located at physical block offset 1. Fix this problem by using the correct CMS label block location depending on the device type as determined by the DASD SENSE ID information. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-07-19 09:22:50 +02:00
Dave Chinner	7f8275d0d6	mm: add context argument to shrinker callback The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-19 14:56:17 +10:00
Linus Torvalds	bea9a6d239	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Silence gcc warning in ocfs2_write_zero_page(). jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node ocfs2: Don't duplicate pages past i_size during CoW. ocfs2: tighten up strlen() checking ocfs2: Make xattr reflink work with new local alloc reservation. ocfs2: make xattr extension work with new local alloc reservation. ocfs2: Remove the redundant cpu_to_le64. ocfs2/dlm: don't access beyond bitmap size ocfs2: No need to zero pages past i_size. ocfs2: Zero the tail cluster when extending past i_size. ocfs2: When zero extending, do it by page. ocfs2: Limit default local alloc size within bitmap range. ocfs2: Move orphan scan work to ocfs2_wq. fs/ocfs2/dlm: Add missing spin_unlock	2010-07-18 10:09:25 -07:00
Joel Becker	5453258d53	ocfs2: Silence gcc warning in ocfs2_write_zero_page(). ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc doesn't know that. Set ret=0 just to make gcc happy. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-16 13:33:39 -07:00
Sage Weil	e979cf5039	ceph: do not include cap/dentry releases in replayed messages Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-16 10:30:18 -07:00
Sage Weil	01a92f174f	ceph: reuse request message when replaying against recovering mds Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-16 10:30:17 -07:00
Jan Kara	13ceef099e	jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 15:17:47 -07:00
Wengang Wang	a39953dd95	ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set before sending DLM_MIG_LOCKRES_MSG message to the target. We are using dlm_migration_can_proceed() for that purpose. However, if the node is down, dlm_migration_can_proceed() will also return "go ahead". In this rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove the BUG_ON() that trips over this condition. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 10:56:30 -07:00
Tao Ma	f5e27b6ddf	ocfs2: Don't duplicate pages past i_size during CoW. During CoW, the pages after i_size don't contain valid data, so there's no need to read and duplicate them. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 10:54:28 -07:00
Bob Peterson	728a756b8f	GFS2: rename causes kernel Oops This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:07:56 +01:00
Abhijith Das	8b4216018b	GFS2: BUG in gfs2_adjust_quota HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:07:16 +01:00
Bob Peterson	b1becbdee7	GFS2: Fix kernel NULL pointer dereference by dlm_astd This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:06:25 +01:00
Bob Peterson	b7dc2df572	GFS2: recovery stuck on transaction lock This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-07-15 09:05:57 +01:00
Bob Peterson	a8bf2bc212	GFS2: O_TRUNC not working on stuffed files across cluster This patch replaces a statement that got dropped out by accident. Without the patch, truncates on stuffed (very small) files cause those files to have an unpredictable size. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:05:17 +01:00
Dan Carpenter	e372357ba5	ocfs2: tighten up strlen() checking This function is only called from one place and it's like this: dlm_register_domain(conn->cc_name, dlm_key, &fs_version); The "conn->cc_name" is 64 characters long. If strlen(conn->cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb->uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:57:53 -07:00
Tao Ma	121a39bb00	ocfs2: Make xattr reflink work with new local alloc reservation. The new reservation code in local alloc has add the limitation that the caller should handle the case that the local alloc doesn't give use enough contiguous clusters. It make the old xattr reflink code broken. So this patch udpate the xattr reflink code so that it can handle the case that local alloc give us one cluster at a time. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:57:50 -07:00
Tao Ma	a78f9f4668	ocfs2: make xattr extension work with new local alloc reservation. The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-12 13:57:24 -07:00
Tao Ma	0a463b74e7	ocfs2: Remove the redundant cpu_to_le64. In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno. But actually bg->bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: Marcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:56:18 -07:00
Wengang Wang	f471c9df92	ocfs2/dlm: don't access beyond bitmap size dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:56:14 -07:00
Joel Becker	693c241a5f	ocfs2: No need to zero pages past i_size. When ocfs2 fills a hole, it does so by allocating clusters. When a cluster is larger than the write, ocfs2 must zero the portions of the cluster outside of the write. If the clustersize is smaller than a pagecache page, this is handled by the normal pagecache mechanisms, but when the clustersize is larger than a page, ocfs2's write code will zero the pages adjacent to the write. This makes sure the entire cluster is zeroed correctly. Currently ocfs2 behaves exactly the same when writing past i_size. However, this means ocfs2 is writing zeroed pages for portions of a new cluster that are beyond i_size. The page writeback code isn't expecting this. It treats all pages past the one containing i_size as left behind due to a previous truncate operation. Thankfully, ocfs2 calculates the number of pages it will be working on up front. The rest of the write code merely honors the original calculation. We can simply trim the number of pages to only cover the actual file data. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-12 13:55:27 -07:00
Sage Weil	f91d3471cc	ceph: fix creation of ipv6 sockets Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-09 15:00:20 -07:00
Sage Weil	39139f64e1	ceph: fix parsing of ipv6 addresses Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-09 15:00:18 -07:00
Sage Weil	d06dbaf6c2	ceph: fix printing of ipv6 addrs The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-08 16:49:53 -07:00
Joel Becker	5693486bad	ocfs2: Zero the tail cluster when extending past i_size. ocfs2's allocation unit is the cluster. This can be larger than a block or even a memory page. This means that a file may have many blocks in its last extent that are beyond the block containing i_size. There also may be more unwritten extents after that. When ocfs2 grows a file, it zeros the entire cluster in order to ensure future i_size growth will see cleared blocks. Unfortunately, block_write_full_page() drops the pages past i_size. This means that ocfs2 is actually leaking garbage data into the tail end of that last cluster. This is a bug. We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect when a write or truncate is past i_size. They will use ocfs2_zero_extend() to ensure the data is properly zeroed. Older versions of ocfs2_zero_extend() simply zeroed every block between i_size and the zeroing position. This presumes three things: 1) There is allocation for all of these blocks. 2) The extents are not unwritten. 3) The extents are not refcounted. (1) and (2) hold true for non-sparse filesystems, which used to be the only users of ocfs2_zero_extend(). (3) is another bug. Since we're now using ocfs2_zero_extend() for sparse filesystems as well, we teach ocfs2_zero_extend() to check every extent between i_size and the zeroing position. If the extent is unwritten, it is ignored. If it is refcounted, it is CoWed. Then it is zeroed. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-08 13:25:35 -07:00
Joel Becker	a4bfb4cf11	ocfs2: When zero extending, do it by page. ocfs2_zero_extend() does its zeroing block by block, but it calls a function named ocfs2_write_zero_page(). Let's have ocfs2_write_zero_page() handle the page level. From ocfs2_zero_extend()'s perspective, it is now page-at-a-time. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-08 13:24:49 -07:00
Linus Torvalds	c77e9e6826	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: simplify the write back thread queue writeback: split writeback_inodes_wb writeback: remove writeback_inodes_wbc fs-writeback: fix kernel-doc warnings splice: check f_mode for seekable file splice: direct_splice_actor() should not use pos in sd	2010-07-08 08:06:40 -07:00
Dan Carpenter	b0bbb0be8f	ceph: add kfree() to error path We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-08 08:03:24 -07:00
Linus Torvalds	1cc9629402	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix crush device 'out' threshold to 1.0, not 0.1 ceph: fix caps usage accounting for import (non-reserved) case ceph: only release clean, unused caps with mds requests ceph: fix crush CHOOSE_LEAF when type is already a leaf ceph: fix crush recursion ceph: fix caps debugfs entry ceph: delay umount until all mds requests drop inode+dentry refs ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate ceph: fix crush map update decoding ceph: fix message memory leak, uninitialized variable ceph: fix map handler error path ceph: some endianity fixes	2010-07-06 17:15:15 -07:00
Christoph Hellwig	83ba7b071f	writeback: simplify the write back thread queue First remove items from work_list as soon as we start working on them. This means we don't have to track any pending or visited state and can get rid of all the RCU magic freeing the work items - we can simply free them once the operation has finished. Second use a real completion for tracking synchronous requests - if the caller sets the completion pointer we complete it, otherwise use it as a boolean indicator that we can free the work item directly. Third unify struct wb_writeback_args and struct bdi_work into a single data structure, wb_writeback_work. Previous we set all parameters into a struct wb_writeback_args, copied it into struct bdi_work, copied it again on the stack to use it there. Instead of just allocate one structure dynamically or on the stack and use it all the way through the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:59:53 +02:00
Christoph Hellwig	edadfb10ba	writeback: split writeback_inodes_wb The case where we have a superblock doesn't require a loop here as we scan over all inodes in writeback_sb_inodes. Split it out into a separate helper to make the code simpler. This also allows to get rid of the sb member in struct writeback_control, which was rather out of place there. Also update the comments in writeback_sb_inodes that explain the handling of inodes from wrong superblocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:54:08 +02:00
Christoph Hellwig	9c3a8ee8a1	writeback: remove writeback_inodes_wbc This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:54:03 +02:00
Sage Weil	22b1de06c9	ceph: fix leak of mon authorizer Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 15:36:49 -07:00
Sage Weil	ed98adad3d	ceph: fix message revocation A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 12:16:23 -07:00
Sage Weil	153a10939e	ceph: fix crush device 'out' threshold to 1.0, not 0.1 Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 09:44:17 -07:00
Linus Torvalds	e3668dd83b	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: remove block number from inode lookup code xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED xfs: validate untrusted inode numbers during lookup xfs: always use iget in bulkstat xfs: prevent swapext from operating on write-only files	2010-07-04 20:13:31 -07:00
Randy Dunlap	06d738fa91	fs-writeback: fix kernel-doc warnings Fix kernel-doc to match the function's changed args. Warning(fs/fs-writeback.c:190): No description found for parameter 'args' Warning(fs/fs-writeback.c:190): Excess function parameter 'sb' description in 'bdi_queue_work_onstack' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-01 08:26:34 +02:00
Changli Gao	19c9a49b43	splice: check f_mode for seekable file check f_mode for seekable file As a seekable file is allowed without a llseek function, so the old way isn't work any more. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> ---- fs/splice.c \| 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-30 08:12:37 +02:00
Changli Gao	2cb4b05e76	splice: direct_splice_actor() should not use pos in sd direct_splice_actor() shouldn't use sd->pos, as sd->pos is for file reading, file->f_pos should be used instead. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> ---- fs/splice.c \| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-30 08:12:37 +02:00
Andrew Morton	f4985dc714	fs/fcntl.c:kill_fasync_rcu() fa_lock must be IRQ-safe Fix a lockdep-splat-causing regression introduced by commit `989a297920` ("fasync: RCU and fine grained locking"). kill_fasync() can be called from both process and hard-irq context, so fa_lock must be taken with IRQs disabled. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16230 Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:32 -07:00
Lubomir Rintel	46c23d7f52	sysvfs: fix NULL deref. when allocating new inode A call to sysv_write_inode() in sysv_new_inode() to its new interface that replaced wait flag with writeback structure. This was broken by `a9185b41a4` ("pass writeback_control to ->write_inode"). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.34.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:32 -07:00
Mike Frysinger	2952095c6b	flat: tweak default stack alignment The recent commit `1f0ce8b3dd` ("mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h>") which moved the ARCH_SLAB_MINALIGN default into the global header inadvertently broke FLAT for a bunch of systems. Blackfin systems now fail on any FLAT exec with: Unable to read code+data+bss, errno 14 When your /init is a FLAT binary, obviously this can be annoying ;). This stems from the alignment usage in the FLAT loader. The behavior before was that FLAT would default to ARCH_SLAB_MINALIGN only if it was defined, and this was only defined by arches when they wanted a larger alignment value. Otherwise it'd default to pointer alignment. Arguably, this is kind of hokey that the FLAT is semi-abusing defines it shouldn't. So let's merge the two alignment requirements so the floor is never 0. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: David McCullough <davidm@snapgear.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:31 -07:00
Mike Frysinger	3c26c9d959	nommu: add '[stack]' label to /proc/pid/maps output Add support to the NOMMU /proc/pid/maps file to show which mapping is the stack of the original thread after execve. This is largely based on the MMU code. Subsidiary thread stacks are not indicated. For FDPIC, we now get: root:/> cat /proc/self/maps 02064000-02067ccc rw-p 0004d000 00:01 22 /bin/busybox 0206e000-0206f35c rw-p 00006000 00:01 295 /lib/ld-uClibc.so.0 025f0000-025f6f0c r-xs 00000000 00:01 295 /lib/ld-uClibc.so.0 02680000-026ba6b0 r-xs 00000000 00:01 297 /lib/libc.so.0 02700000-0274d384 r-xs 00000000 00:01 22 /bin/busybox 02816000-02817000 rw-p 00000000 00:00 0 02848000-0284c0d8 rw-p 00000000 00:00 0 02860000-02880000 rw-p 00000000 00:00 0 [stack] The semi-downside here is that for FLAT, we get: root:/> cat /proc/155/maps 029f0000-029f9000 rwxp 00000000 00:00 0 [stack] The reason being that FLAT combines a whole lot of stuff into one map (including the stack). But this isn't any worse than the current output (which is nothing), so screw it. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:30 -07:00
Linus Torvalds	984bc9601f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Don't count_vm_events for discard bio in submit_bio. cfq: fix recursive call in cfq_blkiocg_update_completion_stats() cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n cfq: Don't allow queue merges for queues that have no process references block: fix DISCARD_BARRIER requests cciss: set SCSI max cmd len to 16, as default is wrong cpqarray: fix two more wrong section type cpqarray: fix wrong __init type on pci probe function drbd: Fixed a race between disk-attach and unexpected state changes writeback: fix pin_sb_for_writeback writeback: add missing requeue_io in writeback_inodes_wb writeback: simplify and split bdi_start_writeback writeback: simplify wakeup_flusher_threads writeback: fix writeback_inodes_wb from writeback_inodes_sb writeback: enforce s_umount locking in writeback_inodes_sb writeback: queue work on stack in writeback_inodes_sb writeback: fix writeback completion notifications	2010-06-29 10:42:52 -07:00
npiggin@suse.de	57439f878a	fs: fix superblock iteration race list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. `6754af6` introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 10:38:22 -07:00
Sage Weil	443b3760a0	ceph: fix caps usage accounting for import (non-reserved) case We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-29 09:31:56 -07:00
Sage Weil	ec97f88ba6	ceph: only release clean, unused caps with mds requests We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-29 09:31:55 -07:00
Tejun Heo	327f935a9e	ocfs2: update gfp/slab.h includes Implicit slab.h inclusion via percpu.h is about to go away. Make sure gfp.h or slab.h is included as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2010-06-28 10:19:19 +10:00
Linus Torvalds	e7865c234f	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: Fix an embarassing typo in encode_attrs() NFSv4: Ensure that /proc/self/mountinfo displays the minor version number NFSv4.1: Ensure that we initialise the session when following a referral SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir() nfs4 use mandatory attribute file type in nfs4_get_root	2010-06-27 09:04:02 -07:00
Linus Torvalds	55982d9400	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: update ctime when changing the file's permission by setfacl ext2: update ctime when changing the file's permission by setfacl	2010-06-27 07:50:47 -07:00
Linus Torvalds	be1d29f59c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: MAINTAINERS: change mailing list address for CIFS cifs: remove bogus first_time check in NTLMv2 session setup code cifs: don't call cifs_new_fileinfo unless cifs_open succeeds cifs: don't ignore cifs_posix_open_inode_helper return value cifs: clean up arguments to cifs_open_inode_helper cifs: pass instantiated filp back after open call cifs: move cifs_new_fileinfo call out of cifs_posix_open cifs: implement drop_inode superblock op cifs: don't attempt busy-file rename unless it's in same directory	2010-06-27 07:34:02 -07:00
Miao Xie	30e2bab2d6	ext3: update ctime when changing the file's permission by setfacl ext3 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext3 must update it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-06-25 01:20:37 +02:00
Jan Kara	523825bc58	ext2: update ctime when changing the file's permission by setfacl ext2 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext2 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. Signed-off-by: Jan Kara <jack@suse.cz>	2010-06-25 01:20:37 +02:00
Sage Weil	a1a31e7342	ceph: fix crush CHOOSE_LEAF when type is already a leaf We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 12:58:14 -07:00
Sage Weil	55bda7aacd	ceph: fix crush recursion There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 12:55:48 -07:00
Yehuda Sadeh	bfaf148eb2	ceph: fix caps debugfs entry The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 09:47:36 -07:00
Dave Chinner	7b6259e7a8	xfs: remove block number from inode lookup code The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:35:17 +10:00
Dave Chinner	1920779e67	xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:47 +10:00
Dave Chinner	7124fe0a5b	xfs: validate untrusted inode numbers during lookup When we decode a handle or do a bulkstat lookup, we are using an inode number we cannot trust to be valid. If we are deleting inode chunks from disk (default noikeep mode), then we cannot trust the on disk inode buffer for any given inode number to correctly reflect whether the inode has been unlinked as the di_mode nor the generation number may have been updated on disk. This is due to the fact that when we delete an inode chunk, we do not write the clusters back to disk when they are removed - instead we mark them stale to avoid them being written back potentially over the top of something that has been subsequently allocated at that location. The result is that we can have locations of disk that look like they contain valid inodes but in reality do not. Hence we cannot simply convert the inode number to a block number and read the location from disk to determine if the inode is valid or not. As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the inode up in the inode allocation btree to determine if the inode number is valid or not. It should be noted even on ikeep filesystems, there is the possibility that blocks on disk may look like valid inode clusters. e.g. if there are filesystem images hosted on the filesystem. Hence even for ikeep filesystems we really need to validate that the inode number is valid before issuing the inode buffer read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:33 +10:00
Christoph Hellwig	7dce11dbac	xfs: always use iget in bulkstat The non-coherent bulkstat versionsthat look directly at the inode buffers causes various problems with performance optimizations that make increased use of just logging inodes. This patch makes bulkstat always use iget, which should be fast enough for normal use with the radix-tree based inode cache introduced a while ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-06-23 18:11:11 +10:00
Dan Rosenberg	1817176a86	xfs: prevent swapext from operating on write-only files This patch prevents user "foo" from using the SWAPEXT ioctl to swap a write-only file owned by user "bar" into a file owned by "foo" and subsequently reading it. It does so by checking that the file descriptors passed to the ioctl are also opened for reading. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 12:07:47 +10:00
Trond Myklebust	d3f6baaa34	NFSv4: Fix an embarassing typo in encode_attrs() Apparently, we have never been able to set the atime correctly from the NFSv4 client. Reported-by: 小倉一夫 <ka-ogura@bd6.so-net.ne.jp> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:22:54 -04:00
Trond Myklebust	0be8189f2c	NFSv4: Ensure that /proc/self/mountinfo displays the minor version number Currently, we do not display the minor version mount parameter in the /proc mount info. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:22:53 -04:00
Trond Myklebust	44950b67a6	NFSv4.1: Ensure that we initialise the session when following a referral Put the code that is common to both the referral and ordinary mount cases into a common helper routine. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:22:53 -04:00
Andy Adamson	f799bdb355	nfs4 use mandatory attribute file type in nfs4_get_root S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits, so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:17:43 -04:00
Sage Weil	17c688c3df	ceph: delay umount until all mds requests drop inode+dentry refs This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-21 16:11:50 -07:00

1 2 3 4 5 ...

18593 Коммитов