WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Linus Torvalds	267aeb6c14	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Fix double page_unlock BUG in write_begin/end	2010-10-09 12:03:23 -07:00
Sunil Mushran	4d94aa1b1d	ocfs2/cluster: Bump up dlm protocol to version 1.1 dlm protocol 1.1. activates messages DLM_QUERY_REGION and DLM_QUERY_NODEINFO that are a must for global heartbeat. It also activates o2hb_global_heartbeat_active(). Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:27:04 -07:00
Boaz Harrosh	f17b1f9f1a	exofs: Fix double page_unlock BUG in write_begin/end This BUG is there since the first submit of the code, but only triggered in last Kernel. It's timing related do to the asynchronous object-creation behaviour of exofs. (Which should be investigated farther) The bug is obvious hence the fixed. Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>	2010-10-08 11:26:54 -04:00
Nicolas Pitre	e4eab08d60	ARM: 6342/1: fix ASLR of PIE executables Since commits `990cb8acf2` and `cc92c28b2d`, it is possible to have full address space layout randomization (ASLR) on ARM. Except that one small change was missing for ASLR of PIE executables. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-10-08 10:02:53 +01:00
Linus Torvalds	5710c2b275	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: properly account for reclaimed inodes	2010-10-07 13:45:26 -07:00
Sage Weil	d91f2438d8	ceph: update issue_seq on cap grant We need to update the issue_seq on any grant operation, be it via an MDS reply or a separate grant message. The update in the grant path was missing. This broke cap release for inodes in which the MDS sent an explicit grant message that was not soon after followed by a successful MDS reply on the same inode. Also fix the signedness on seq locals. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:01:50 -07:00
Greg Farnum	21b559de56	ceph: send cap release message early on failed revoke. If an MDS tries to revoke caps that we don't have, we want to send releases early since they probably contain the caps message the MDS is looking for. Previously, we only sent the messages if we didn't have the inode either. But in a multi-mds system we can retain the inode after dropping all caps for a single MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V	bba0cd0e3d	ceph: Update max_len with minimum required size encode_fh on error should update max_len with minimum required size, so that caller can redo the call with the reallocated buffer. This is required with open by handle patch series Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V	92923dcbfc	ceph: Fix return value of encode_fh function encode_fh function should return 255 on error as done by other file system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units and not in bytes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Sage Weil	6bc18876ba	ceph: avoid null deref in osd request error path If we interrupt an osd request, we call __cancel_request, but it wasn't verifying that req->r_osd was non-NULL before dereferencing it. This could cause a crash if osds were flapping and we aborted a request on said osd. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Henry C Chang	936aeb5c4a	ceph: fix list_add usage on unsafe_writes list Fix argument order. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Johannes Weiner	081003fff4	xfs: properly account for reclaimed inodes When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@kernel.org Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-10-06 22:35:48 -05:00
Sunil Mushran	43695d095d	ocfs2/cluster: Show per region heartbeat elapsed time This patch adds a per region debugfs file that shows the elapsed time since the time the o2hb timer was last armed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:09 -07:00
Sunil Mushran	d6aa1c7c9e	ocfs2/cluster: Add mlogs for heartbeat up/down events This patch adds mlogs for o2hb up and down events. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 18:50:50 -07:00
Sunil Mushran	1f28530537	ocfs2/cluster: Create debugfs dir/files for each region This patch creates debugfs directory for each o2hb region and creates files to expose the region number and the per region live node bitmap. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:12 -07:00
Sunil Mushran	a6de013654	ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps This patch prints the bitmaps of live, quorum and failed regions. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:13 -07:00
Sunil Mushran	b1c5ebfbe3	ocfs2/cluster: Maintain bitmap of failed regions In global heartbeat mode, we track the bitmap of regions that have seen heartbeat timeouts. We fence if the number of such regions is greater than or equal to half the number of quorum regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:05:52 -07:00
Sunil Mushran	43182d2a79	ocfs2/cluster: Maintain bitmap of quorum regions o2hb allows online adding of regions. However, a newly added region is not used in quorum calculations unless it has been added on all nodes. This patch tracks a bitmap of such quorum regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:16 -07:00
Sunil Mushran	e7d656baf6	ocfs2/cluster: Track bitmap of live heartbeat regions A heartbeat region becomes live (or active) after a fixed number of (steady) iterations. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:18 -07:00
Sunil Mushran	536f0741f3	ocfs2/cluster: Track number of global heartbeat regions In global heartbeat mode, we have a upper limit for the number of active regions. This patch adds the facility to track the number of active global heartbeat regions and fails to start heartbeat if the number exceeds the maximum. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:03:07 -07:00
Sunil Mushran	823a637ae9	ocfs2/cluster: Maintain live node bitmap per heartbeat region Currently we track a global livenode bitmap that keeps track of all nodes that are heartbeating in all regions. This patch adds the ability to track the livenode bitmap on a per region basis. We will use this facility in a later patch to allow us to withstand the loss of a minority number of regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:21 -07:00
Sunil Mushran	8ca8b0bbd8	ocfs2/cluster: Reorganize o2hb debugfs init o2hb debugfs handling is reorganized to allow for easy expansion. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:01:27 -07:00
Sunil Mushran	0e105d37c2	ocfs2/cluster: Check slots for unconfigured live nodes o2hb currently checks slots for configured nodes only. This patch makes it check the slots for the live nodes too to take care of a race in which a node is removed from the configuration but not from the live map. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:00:16 -07:00
Sunil Mushran	39a298563e	ocfs2/cluster: Print messages when adding/removing nodes Prints messages when the user adds or removes nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:30:17 -07:00
Sunil Mushran	18c50cb0d3	ocfs2/cluster: Print messages when adding/removing heartbeat regions Prints messages when the user adds or removes heartbeat regions in global heartbeat mode. These messages are useful when debugging cluster related issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 18:26:59 -07:00
Sunil Mushran	18cfdf1b1a	ocfs2/dlm: Add message DLM_QUERY_NODEINFO Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all registered nodes. This message is sent if the negotiated dlm protocol is 1.1 or higher. If the information of the joining node does not match that of any existing nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 16:47:03 -07:00
Sunil Mushran	5f3c6d9c61	ocfs2: Print message if user mounts without starting global heartbeat In global heartbeat mode, the heartbeat is started by the user. This patch prints an error if the user attempts to mount a volume without starting the heartbeat. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:29 -07:00
Sunil Mushran	ea2034416b	ocfs2/dlm: Add message DLM_QUERY_REGION Adds new dlm message DLM_QUERY_REGION that sends the names of all active heartbeat regions. This message is only sent in the global heartbeat mode. If the regions in the joining node do not fully match the ones in the active nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:26:23 -07:00
Sunil Mushran	b3c85c4cdf	ocfs2/cluster: Get all heartbeat regions Export function in o2hb to get a list of heartbeat regions. It also adds an upper limit to the length of the heartbeat region name. o2hb_global_heartbeat_active() currently disables global heartbeat. It will be enabled in a later patch after all the code is added. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 14:31:06 -07:00
Sunil Mushran	b1365d0bd1	ocfs2/dlm: Expose dlm_protocol in dlm_state Add dlm_protocol to the list of info shown by the debugfs file, dlm_state. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:34 -07:00
Sunil Mushran	2c442719e9	ocfs2: Add support for heartbeat=global mount option Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 15:23:50 -07:00
Sunil Mushran	98f486f23b	ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:24:46 -07:00
Sunil Mushran	54b5187b5a	ocfs2/cluster: Add heartbeat mode configfs parameter Add heartbeat mode parameter to the configfs tree. This will be used to set/show the heartbeat mode. The user is free to toggle the mode between local and global as long as there is no active heartbeat region. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 15:26:08 -07:00
Linus Torvalds	089eed29b4	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: always use sb->s_bdi for writeback purposes	2010-10-06 11:11:18 -07:00
Linus Torvalds	8fe9793af0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Initialize total_len in fuse_retrieve()	2010-10-06 09:50:41 -07:00
Steven Whitehouse	134669854e	GFS2: Fix type mapping for demote_rq interface Mostly the glock operations follow the type of the glock. The one exception is the transaction glock, so we need to check for that directly. Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-10-06 09:58:44 +01:00
Christoph Hellwig	aaead25b95	writeback: always use sb->s_bdi for writeback purposes We currently use struct backing_dev_info for various different purposes. Originally it was introduced to describe a backing device which includes an unplug and congestion function and various bits of readahead information and VM-relevant flags. We're also using for tracking dirty inodes for writeback. To make writeback properly find all inodes we need to only access the per-filesystem backing_device pointed to by the superblock in ->s_bdi inside the writeback code, and not the instances pointeded to by inode->i_mapping->backing_dev which can be overriden by special devices or might not be set at all by some filesystems. Long term we should split out the writeback-relevant bits of struct backing_device_info (which includes more than the current bdi_writeback) and only point to it from the superblock while leaving the traditional backing device as a separate structure that can be overriden by devices. The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-04 14:25:33 +02:00
Geert Uytterhoeven	0157443c56	fuse: Initialize total_len in fuse_retrieve() fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this function Initialize total_len to zero, else its value will be undefined. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-10-04 10:45:32 +02:00
Linus Torvalds	c6ea21e35b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: prevent infinite recursion in cifs_reconnect_tcon cifs: set backing_dev_info on new S_ISREG inodes	2010-10-01 15:03:37 -07:00
Frederic Weisbecker	9d8117e72b	reiserfs: fix unwanted reiserfs lock recursion Prevent from recursively locking the reiserfs lock in reiserfs_unpack() because we may call journal_begin() that requires the lock to be taken only once, otherwise it won't be able to release the lock while taking other mutexes, ending up in inverted dependencies between the journal mutex and the reiserfs lock for example. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35.4.4a #3 ------------------------------------------------------- lilo/1620 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs] [<c10b6a20>] do_remount_sb+0x60/0x110 [<c10cee25>] do_mount+0x625/0x790 [<c10cf014>] sys_mount+0x84/0xb0 [<c12fca3d>] syscall_call+0x7/0xb -> #0 (&journal->j_mutex){+.+...}: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb other info that might help us debug this: 2 locks held by lilo/1620: #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs] #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3 Call Trace: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: All since 2.6.32 <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Frederic Weisbecker	3f259d092c	reiserfs: fix dependency inversion between inode and reiserfs mutexes The reiserfs mutex already depends on the inode mutex, so we can't lock the inode mutex in reiserfs_unpack() without using the safe locking API, because reiserfs_unpack() is always called with the reiserfs mutex locked. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35c #13 ------------------------------------------------------- lilo/1606 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs] [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs] [<c10b7d17>] get_sb_bdev+0x117/0x170 [<d0313e21>] get_super_block+0x21/0x30 [reiserfs] [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0 [<c10b7659>] do_kern_mount+0x39/0xe0 [<c10cebe0>] do_mount+0x340/0x790 [<c10cf0b4>] sys_mount+0x84/0xb0 [<c12f25cd>] syscall_call+0x7/0xb -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb other info that might help us debug this: 1 lock held by lilo/1606: #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1606, comm: lilo Not tainted 2.6.35c #13 Call Trace: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.32 and later] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Jiri Olsa	3036e7b490	proc: make /proc/pid/limits world readable Having the limits file world readable will ease the task of system management on systems where root privileges might be restricted. Having admin restricted with root priviledges, he/she could not check other users process' limits. Also it'd align with most of the /proc stat files. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Eugene Teo <eugene@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Jeff Layton	f569599ae7	cifs: prevent infinite recursion in cifs_reconnect_tcon cifs_reconnect_tcon is called from smb_init. After a successful reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo. Those functions also call smb_init. It's possible for the session and tcon reconnect to succeed, and then for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call smb_init and cifs_reconnect_tcon again, ad infinitum... Break the infinite recursion by having those functions use a new smb_init variant that doesn't attempt to perform a reconnect. Reported-and-Tested-by: Michal Suchanek <hramrach@centrum.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-01 17:50:08 +00:00
Christoph Hellwig	40de9a7ceb	hfsplus: fix rename over directories When renaming over a directory we need to use hfsplus_rmdir instead of hfsplus_unlink to evict the victim. This makes sure we properly error out on non-empty directory as required by Posix (BZ #16571), and it also makes sure we do the right thing in case i_nlink will every be set correctly for directories on hfsplus. Reported-by: Vlado Plaga <rechner@vlado-do.de> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 09:12:08 +02:00
Thomas Gleixner	467c3d9cd5	hfsplus: convert tree_lock to mutex tree_lock is used as mutex so make it a mutex. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:46:52 +02:00
Christoph Hellwig	7fcc99f4f2	hfsplus: add missing extent locking in hfsplus_write_inode Most of the extent handling code already does proper SMP locking, but hfsplus_write_inode was calling into hfsplus_ext_write_extent without taking the extents_lock. Fix this by splitting hfsplus_ext_write_extent into an internal helper that expects the lock, and a public interface that first acquires it. Also add a few locking asserts and document the locking rules in hfsplus_fs.h. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:46:31 +02:00
Christoph Hellwig	89755dcace	hfsplus: protect readdir against removals from open_dir_list We already have i_mutex for readdir and the namespace operations that add entries to open_dir_list, the only thing that was missing was the removal in hfsplus_dir_release. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:25 +02:00
Christoph Hellwig	84adede312	hfsplus: use atomic bitops for the superblock flags The flags in the HFS+-specific superlock do get modified during runtime, use atomic bitops to make the modifications SMP safe. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:20 +02:00
Christoph Hellwig	7ac9fb9c2a	hfsplus: add per-superblock lock for volume header updates Lock updates to the mutal fields in the volume header, and document the locing in the hfsplus_sb_info structure. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:08 +02:00
Christoph Hellwig	58a818f532	hfsplus: remove the rsrc_inodes list We never walk the list - the only reason for it is to make the resource fork inodes appear hashed to the writeback code. Borrow a trick from JFS to do that without needing a list head. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:44:02 +02:00
Christoph Hellwig	66e5db05bb	hfsplus: do not cache and write next_alloc We never look at it, nor change the next_alloc field in the superblock. So don't bother caching it or writing it out in hfsplus_sync_fs. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:58 +02:00
Christoph Hellwig	f17c89bfcc	hfsplus: fix error handling in hfsplus_symlink We need to free the inode again on a hfsplus_create_cat failure. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:54 +02:00
Christoph Hellwig	30d3abbec7	hfsplus: merge mknod/mkdir/creat Make hfsplus_mkdir and hfsplus_create call hfsplus_mknod instead of duplicating the code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:50 +02:00
Christoph Hellwig	b5080f77ed	hfsplus: clean up hfsplus_write_inode Add a new hfsplus_system_write_inode for writing the special system inodes and streamline the fastpath write_inode code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:43 +02:00
Christoph Hellwig	fc4fff8210	hfsplus: clean up hfsplus_iget Add a new hfsplus_system_read_inode for reading the special system inodes and streamline the fastpath iget code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:41 +02:00
Christoph Hellwig	6af502de22	hfsplus: fix HFSPLUS_I calling convention HFSPLUS_I doesn't return a pointer to the hfsplus-specific inode information like all other FOO_I macros, but dereference the pointer in a way that made it look like a direct struct derefence. This only works as long as the HFSPLUS_I macro is used directly and prevents us from keepig a local hfsplus_inode_info pointer. Fix the calling convention and introduce a local hip variable in all functions that use it constantly. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:31 +02:00
Christoph Hellwig	dd73a01a30	hfsplus: fix HFSPLUS_SB calling convention HFSPLUS_SB doesn't return a pointer to the hfsplus-specific superblock information like all other FOO_SB macros, but dereference the pointer in a way that made it look like a direct struct derefence. This only works as long as the HFSPLUS_SB macro is used directly and prevents us from keepig a local hfsplus_sb_info pointer. Fix the calling convention and introduce a local sbi variable in all functions that use it constantly. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:42:59 +02:00
Christoph Hellwig	e753a62156	hfsplus: remove BKL from hfsplus_put_super Except for ->put_super the BKL is now gone from HFS, which means it's superflous there too as ->put_super is serialized by the VFS. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:53 +02:00
Christoph Hellwig	a9fdbf8c60	hfsplus: use alloc_mutex in hfsplus_sync_fs Use alloc_mutex to protect hfsplus_sync_fs against itself and concurrent allocations, which allows to get rid of lock_super in hfsplus. Note that most fields in the superblock still aren't protected against concurrent allocations, that will follow later. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:50 +02:00
Christoph Hellwig	40bf48afe9	hfsplus: introduce alloc_mutex Use a new per-sb alloc_mutex instead of abusing i_mutex of the alloc_file to protect block allocations. This gets rid of lockdep nesting warnings and prepares for extending the scope of alloc_mutex. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:39 +02:00
Christoph Hellwig	6333816ade	hfsplus: protect setflags using i_mutex Use i_mutex for protecting against concurrent setflags ioctls like in other filesystems and get rid of the BKL in hfsplus_ioctl. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:35 +02:00
Christoph Hellwig	94744567fe	hfsplus: split hfsplus_ioctl Give each ioctl command a function of it's own. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:31 +02:00
Christoph Hellwig	249e635300	hfsplus: fix BKL leak in hfsplus_ioctl Currenly the HFSPLUS_IOC_EXT2_GETFLAGS case never unlocks the BKL, which can lead to easily reproduced lockups when doing multiple GETFLAGS ioctls. Fix this by only taking the BKL for the HFSPLUS_IOC_EXT2_SETFLAGS case as neither HFSPLUS_IOC_EXT2_GETFLAGS not the default error case needs it. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:27 +02:00
Bob Peterson	46290341cd	GFS2 fatal: filesystem consistency error on rename This patch fixes a GFS2 problem whereby the first rename after a mount can result in a file system consistency error being flagged improperly and cause the file system to withdraw. The problem is that the rename code tries to run the rgrp list with function gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in from disk. The patch makes the rename function hold the rindex glock (as the gfs2_unlink code does today) which reads in the rgrp list if need be. There were a total of three places in the rename code that improperly referenced the rgrp list without the rindex glock and this patch fixes all three. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-30 17:23:03 +01:00
Linus Torvalds	0d4911081c	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Don't walk off the end of fast symlinks.	2010-09-29 20:38:07 -07:00
Joel Becker	1fc8a11786	ocfs2: Don't walk off the end of fast symlinks. ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-09-29 17:33:05 -07:00
Jeff Layton	522440ed55	cifs: set backing_dev_info on new S_ISREG inodes Testing on very recent kernel (2.6.36-rc6) made this warning pop: WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x65/0x70() Hardware name: Dirtiable inode bdi default != sb bdi cifs ...the following patch fixes it and seems to be the obviously correct thing to do for cifs. Cc: stable@kernel.org Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:23:23 +00:00
Steven Whitehouse	feb47ca931	GFS2: Improve journal allocation via sysfs Recently a feature was added to GFS2 to allow journal id allocation via sysfs. This patch builds upon that so that a negative journal id will be treated as an error code to be passed back as the return code from mount. This allows termination of the mount process if there is a failure. Also, the process has been updated so that the kernel will wait for a journal id, even in the "spectator" case. This is required in order to avoid mounting a filesystem in case there is an error while joining the cluster. In the spectator case, 0 is written into the file to indicate that all is well, and that mount should continue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 15:04:18 +01:00
Steven Whitehouse	43f74c1995	GFS2: Add "norecovery" mount option as a synonym for "spectator" XFS supports the "norecovery" mount option which is basically the same as the GFS2 spectator mode. This adds support for "norecovery" as a synonym for spectator mode, which is hopefully a more obvious description of what it actually does. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:24:41 +01:00
Steven Whitehouse	c741c45512	GFS2: Fix spectator umount issue The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:20:52 +01:00
Dave Chinner	80168676eb	xfs: force background CIL push under sustained load I have been seeing occasional pauses in transaction throughput up to 30s long under heavy parallel workloads. The only notable thing was that the xfsaild was trying to be active during the pauses, but making no progress. It was running exactly 20 times a second (on the 50ms no-progress backoff), and the number of pushbuf events was constant across this time as well. IOWs, the xfsaild appeared to be stuck on buffers that it could not push out. Further investigation indicated that it was trying to push out inode buffers that were pinned and/or locked. The xfsbufd was also getting woken at the same frequency (by the xfsaild, no doubt) to push out delayed write buffers. The xfsbufd was not making any progress because all the buffers in the delwri queue were pinned. This scan- and-make-no-progress dance went one in the trace for some seconds, before the xfssyncd came along an issued a log force, and then things started going again. However, I noticed something strange about the log force - there were way too many IO's issued. 516 log buffers were written, to be exact. That added up to 129MB of log IO, which got me very interested because it's almost exactly 25% of the size of the log. He delayed logging code is suppose to aggregate the minimum of 25% of the log or 8MB worth of changes before flushing. That's what really puzzled me - why did a log force write 129MB instead of only 8MB? Essentially what has happened is that no CIL pushes had occurred since the previous tail push which cleared out 25% of the log space. That caused all the new transactions to block because there wasn't log space for them, but they kick the xfsaild to push the tail. However, the xfsaild was not making progress because there were buffers it could not lock and flush, and the xfsbufd could not flush them because they were pinned. As a result, both the xfsaild and the xfsbufd could not move the tail of the log forward without the CIL first committing. The cause of the problem was that the background CIL push, which should happen when 8MB of aggregated changes have been committed, is being held off by the concurrent transaction commit load. The background push does a down_write_trylock() which will fail if there is a concurrent transaction commit holding the push lock in read mode. With 8 CPUs all doing transactions as fast as they can, there was enough concurrent transaction commits to hold off the background push until tail-pushing could no longer free log space, and the halt would occur. It should be noted that there is no reason why it would halt at 25% of log space used by a single CIL checkpoint. This bug could definitely violate the "no transaction should be larger than half the log" requirement and hence result in corruption if the system crashed under heavy load. This sort of bug is exactly the reason why delayed logging was tagged as experimental.... The fix is to start blocking background pushes once the threshold has been exceeded. Rework the threshold calculations to keep the amount of log space a CIL checkpoint can use to below that of the AIL push threshold to avoid the problem completely. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-09-29 07:51:03 -05:00
Steven Whitehouse	d594845106	GFS2: Fix compiler warning from previous patch This shouldn't really be required, but gcc can't tell that "al" is only accessed when initialised. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 10:17:47 +01:00
Benjamin Marzinski	bf97b6734e	GFS2: reserve more blocks for transactions Some of the functions in GFS2 were not reserving space in the transaction for the resource group header and the resource groups bitblocks that get added when you do allocation. GFS2 now makes sure to reserve space for the resource group header and either all the bitblocks in the resource group, or one for each block that it may allocate, whichever is smaller using the new gfs2_rg_blocks() inline function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 09:44:24 +01:00
Steven Whitehouse	d0795f9123	GFS2: Fix journal check for spectator mounts When checking journals for spectator mounts, we cannot rely on the journal being locked, whatever its jid might be. This patch ensures that we always get the journal locks when checking journals for a spectator mount. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-27 15:58:11 +01:00
Linus Torvalds	d1f3e68efb	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: o2dlm: force free mles during dlm exit ocfs2: Sync inode flags with ext2. ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ocfs2: update ctime when changing the file's permission by setfacl ocfs2/net: fix uninitialized ret in o2net_send_message_vec() Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes. ocfs2: Fix lockdep warning in reflink. ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.	2010-09-24 14:08:15 -07:00
Steven Whitehouse	c80dbb58f9	GFS2: Remove upgrade mount option This option has never done anything useful. Also at the same time this cleans up the sb checks which are done at mount time. The debug option will be accepted, but ignored in future. Since it didn't do anything, there didn't seem much point in retaining it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-24 09:55:07 +01:00
Srinivas Eeda	5dad6c39d1	o2dlm: force free mles during dlm exit While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:53 -07:00
Tao Ma	0000b86202	ocfs2: Sync inode flags with ext2. We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:49 -07:00
Tao Ma	4a452de4fd	ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:47 -07:00
Tao Ma	47dea42379	ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:34 -07:00
Tao Ma	12828061cd	ocfs2: update ctime when changing the file's permission by setfacl In commit `30e2bab`, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:21 -07:00
Steven Whitehouse	c2048b003c	GFS2: Remove localcaching mount option This option defaulted to on for lock_nolock mounts and off otherwise. The only function was to avoid the revalidation of dentries. In the cluster case, that is entirely pointless and liable to cause coherency problems. The patch changes the revalidation to depend upon whether the fs is a local or cluster fs (i.e. it follows the existing default behaviour). I very much doubt anybody ever used this option as there is no reason to. Even so we will continue to accept it on the mount command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 14:00:31 +01:00
Steven Whitehouse	f57a024ed2	GFS2: Remove ignore_local_fs mount argument This is been a no-op for a very long time now. I'm pretty sure nobody uses it, but just in case we'll still accept it on the command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 13:41:42 +01:00
KOSAKI Motohiro	1c2499ae87	/proc/pid/smaps: fix dirty pages accounting Currently, /proc/<pid>/smaps has wrong dirty pages accounting. Shared_Dirty and Private_Dirty output only pte dirty pages and ignore PG_dirty page flag. It is difference against documentation, but also inconsistent against Referenced field. (Referenced checks both pte and page flags) This patch fixes it. Test program: large-array.c --------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> char array[110241024*1024L]; int main(void) { memset(array, 1, sizeof(array)); pause(); return 0; } --------------------------------------------------- Test case: 1. run ./large-array 2. cat /proc/`pidof large-array`/smaps 3. swapoff -a 4. cat /proc/`pidof large-array`/smaps again Test result: <before patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 218992 kB <-- showed pages as clean incorrectly Private_Dirty: 829584 kB Referenced: 388364 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB <after patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 1048576 kB <-- fixed Referenced: 388480 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00
Jan Kara	a0c42bac79	aio: do not return ERESTARTSYS as a result of AIO OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00
Arnd Bergmann	c227e69028	/proc/vmcore: fix seeking Commit `73296bc611` ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:38 -07:00
Dan Rosenberg	767b68e969	Prevent freeing uninitialized pointer in compat_do_readv_writev In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit `b83733639a` ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Cc: stable@kernel.org (2.6.35) Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:38 -07:00
Linus Torvalds	b68e9d4581	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends char: Mark /dev/zero and /dev/kmem as not capable of writeback bdi: Initialize noop_backing_dev_info properly cfq-iosched: fix a kernel OOPs when usb key is inserted block: fix blk_rq_map_kern bio direction flag cciss: freeing uninitialized data on error path	2010-09-22 09:12:37 -07:00
Jan Kara	692ebd17c2	bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends Inodes of devices such as /dev/zero can get dirty for example via utime(2) syscall or due to atime update. Backing device of such inodes (zero_bdi, etc.) is however unable to handle dirty inodes and thus __mark_inode_dirty complains. In fact, inode should be rather dirtied against backing device of the filesystem holding it. This is generally a good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes in these pseudofilesystems are referenced from ordinary filesystem inodes and carry mapping with real data of the device. Thus for these inodes we have to use inode->i_mapping->backing_dev_info as we did so far. We distinguish these filesystems by checking whether sb->s_bdi points to a non-trivial backing device or not. Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /. There's a device inode A described by a path "/dev/sdb" on this filesystem. This inode will be dirtied against backing device "8:0" after this patch. bdev filesystem contains block device inode B coupled with our inode A. When someone modifies a page of /dev/sdb, it's B that gets dirtied and the dirtying happens against the backing device "8:16". Thus both inodes get filed to a correct bdi list. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-22 09:48:47 +02:00
Jan Kara	371d217ee1	char: Mark /dev/zero and /dev/kmem as not capable of writeback These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-22 09:48:47 +02:00
Linus Torvalds	19746cad00	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: select CRYPTO ceph: check mapping to determine if FILE_CACHE cap is used ceph: only send one flushsnap per cap_snap per mds session ceph: fix cap_snap and realm split ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap ceph: correctly set 'follows' in flushsnap messages ceph: fix dn offset during readdir_prepopulate ceph: fix file offset wrapping at 4GB on 32-bit archs ceph: fix reconnect encoding for old servers ceph: fix pagelist kunmap tail ceph: fix null pointer deref on anon root dentry release	2010-09-21 11:20:10 -07:00
Steven Whitehouse	8d1235852b	GFS2: Make . and .. qstrs constant Rather than calculating the qstrs for . and .. each time we need them, its better to keep a constant version of these and just refer to them when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2010-09-20 11:21:09 +01:00
Steven Whitehouse	9fa0ea9f26	GFS2: Use new workqueue scheme The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-09-20 11:20:36 +01:00
Steven Whitehouse	1fea7c25a0	GFS2: Update handling of DLM return codes to match reality GFS2's idea of which return codes it needs to handle was based upon those listed in dlm.h. Those didn't cover all the possible codes and listed some which never happen. This updates GFS2 to handle all the codes which can actually be returned from the DLM under various circumstances. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:20:12 +01:00
Steven Whitehouse	7b5e3d5fcf	GFS2: Don't enforce min hold time when two demotes occur in rapid succession Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>	2010-09-20 11:19:50 +01:00
Steven Whitehouse	fe08d5a897	GFS2: Fix whitespace in previous patch Removes the offending space Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:35 +01:00
Benjamin Marzinski	3921120e75	GFS2: fallocate support This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:17 +01:00
Steven Whitehouse	9a3f236d40	GFS2: Add a bug trap in allocation code This adds a check to ensure that if we reach the block allocator that we don't try and proceed if there is no alloc structure hanging off the inode. This should only happen if there is a bug in GFS2. The error return code is distinctive in order that it will be easily spotted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:59 +01:00
Steven Whitehouse	820969f353	GFS2: No longer experimental I think the time has arrvied to remove the experimental tag from GFS2. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:46 +01:00
Steven Whitehouse	a2e0f79939	GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:29 +01:00

1 2 3 4 5 ...

19587 Коммитов