WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
matt mooney	0ccd234ca0	fs: change to new flag variable Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y for cleaner conditional inclusion. Signed-off-by: matt mooney <mfm@muteddisk.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>	2011-03-17 14:02:57 +01:00
Jens Axboe	95f28604a6	fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away We don't have proper reference counting for this yet, so we run into cases where the device is pulled and we OOPS on flushing the fs data. This happens even though the dirty inodes have already been migrated to the default_backing_dev_info. Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 11:13:12 +01:00
Martin K. Petersen	a91a2785b2	block: Require subsystems to explicitly allocate bio_set integrity mempool MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snizer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 11:11:05 +01:00
Jens Axboe	82f04ab47e	jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging 'write_op' was still used, even though it was always WRITE_SYNC now. Add plugging around the cases where it submits IO, and flush them before we end up waiting for that IO. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 11:01:52 +01:00
Jens Axboe	65ab80279d	jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging 'write_op' was still used, even though it was always WRITE_SYNC now. Add plugging around the cases where it submits IO, and flush them before we end up waiting for that IO. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 10:56:45 +01:00
Jens Axboe	4ee2491ed8	fs: make fsync_buffers_list() plug It used WRITE_SYNC_PLUG before and potentially submits a batch of IO, so lets enable plugging for this case. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 10:51:40 +01:00
Linus Torvalds	054cfaacf8	Merge branch 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: bury ->get_sb() nfs: switch NFS from ->get_sb() to ->mount() nfs: stop mangling ->mnt_devname on NFS vfs: new superblock methods to override /proc/*/mount{s,info} nfs: nfs_do_{ref,sub}mount() superblock argument is redundant nfs: make nfs_path() work without vfsmount nfs: store devname at disconnected NFS roots nfs: propagate devname to nfs{,4}_get_root()	2011-03-16 19:09:57 -07:00
Linus Torvalds	242e5d06be	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] tioca: Fix assignment from incompatible pointer warnings [IA64] mca.c: Fix cast from integer to pointer warning [IA64] setup.c Typo fix "Architechtuallly" [IA64] Add CONFIG_MISC_DEVICES=y to configs that need it. [IA64] disable interrupts at end of ia64_mca_cpe_int_handler() [IA64] Add DMA_ERROR_CODE define. pstore: fix build warning for unused return value from sysfs_create_file pstore: X86 platform interface using ACPI/APEI/ERST pstore: new filesystem interface to platform persistent storage	2011-03-16 19:01:29 -07:00
Linus Torvalds	f74b944419	Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: That's all, folks fs/locks.c: Remove stale FIXME left over from BKL conversion ipx: remove the BKL appletalk: remove the BKL x25: remove the BKL ufs: remove the BKL hpfs: remove the BKL drivers: remove extraneous includes of smp_lock.h tracing: don't trace the BKL adfs: remove the big kernel lock	2011-03-16 17:21:00 -07:00
Linus Torvalds	a5e6b135bd	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (50 commits) printk: do not mangle valid userspace syslog prefixes efivars: Add Documentation efivars: Expose efivars functionality to external drivers. efivars: Parameterize operations. efivars: Split out variable registration efivars: parameterize efivars efivars: Make efivars bin_attributes dynamic efivars: move efivars globals into struct efivars drivers:misc: ti-st: fix debugging code kref: Fix typo in kref documentation UIO: add PRUSS UIO driver support Fix spelling mistakes in Documentation/zh_CN/SubmittingPatches firmware: Fix unaligned memory accesses in dmi-sysfs firmware: Add documentation for /sys/firmware/dmi firmware: Expose DMI type 15 System Event Log firmware: Break out system_event_log in dmi-sysfs firmware: Basic dmi-sysfs support firmware: Add DMI entry types to the headers Driver core: convert platform_{get,set}_drvdata to static inline functions Translate linux-2.6/Documentation/magic-number.txt into Chinese ...	2011-03-16 15:05:40 -07:00
Theodore Ts'o	688f869ce3	ext4: Initialize fsync transaction ids in ext4_new_inode() When allocating a new inode, we need to make sure i_sync_tid and i_datasync_tid are initialized. Otherwise, one or both of these two values could be left initialized to zero, which could potentially result in BUG_ON in jbd2_journal_commit_transaction. (This could happen by having journal->commit_request getting set to zero, which could wake up the kjournald process even though there is no running transaction, which then causes a BUG_ON via the J_ASSERT(j_ruinning_transaction != NULL) statement. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-03-16 17:16:31 -04:00
Al Viro	1a102ff925	vfs: bury ->get_sb() This is an ex-parrot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	011949811b	nfs: switch NFS from ->get_sb() to ->mount() The last remaining instances of ->get_sb() can be converted ->mount() now - nothing in them uses new vfsmount anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	fd462fb51d	nfs: stop mangling ->mnt_devname on NFS now we can do that - nobody cares about its value anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	c7f404b40a	vfs: new superblock methods to override /proc/*/mount{s,info} a) ->show_devname(m, mnt) - what to put into devname columns in mounts, mountinfo and mountstats b) ->show_path(m, mnt) - what to put into relative path column in mountinfo Leaving those NULL gives old behaviour. NFS switched to using those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	f8ad9c4bae	nfs: nfs_do_{ref,sub}mount() superblock argument is redundant It's always equal to dentry->d_sb Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	b514f872f8	nfs: make nfs_path() work without vfsmount part 3: now we have everything to get nfs_path() just by dentry - just follow to (disconnected) root and pick the rest of the thing there. Start killing propagation of struct vfsmount * on the paths that used to bring it to nfs_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:47:55 -04:00
Al Viro	b1942c5f8c	nfs: store devname at disconnected NFS roots part 2: make sure that disconnected roots have corresponding mnt_devname values stashed into them. Have nfs_get_root() stuff a copy of devname into ->d_fsdata of the found root, provided that it is disconnected. Have ->d_release() free it when dentry goes away. Have the places where NFS uses ->d_fsdata for sillyrename (and that can never* happen to a disconnected root - dentry will be attached to its parent) free old devname copies if they find those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:44:24 -04:00
Al Viro	0d5839ad05	nfs: propagate devname to nfs{,4}_get_root() step 1 of ->mnt_devname fixes: make sure we have the value of devname available in ..._get_root(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:27:04 -04:00
Linus Torvalds	2e270d8422	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fix cdev leak on O_PATH final fput()	2011-03-16 13:26:17 -07:00
Miklos Szeredi	60ed8cf78f	fix cdev leak on O_PATH final fput() __fput doesn't need a cdev_put() for O_PATH handles. Signed-off-by: mszeredi@suse.cz Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:18:39 -04:00
Tony Luck	afe997a183	Pull pstorev4 into release branch	2011-03-16 09:58:31 -07:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Linus Torvalds	3ae2a1ce2e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Don't use _raw version of RCU dereference GFS2: Adding missing unlock_page() GFS2: Update to AIL list locking GFS2: introduce AIL lock GFS2: fix block allocation check for fallocate GFS2: Optimize glock multiple-dequeue code GFS2: Remove potential race in flock code GFS2: Fix glock deallocation race GFS2: quota allows exceeding hard limit GFS2: deallocation performance patch GFS2: panics on quotacheck update GFS2: Improve cluster mmap scalability GFS2: Fix glock queue trace point GFS2: Post-VFS scale update for RCU path walk GFS2: Use RCU for glock hash table	2011-03-16 08:58:43 -07:00
Linus Torvalds	26a992dbc2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (46 commits) fs/9p: Make the writeback_fid owned by root fs/9p: Writeback dirty data before setattr fs/9p: call vmtruncate before setattr 9p opeation fs/9p: Properly update inode attributes on link fs/9p: Prevent multiple inclusion of same header fs/9p: Workaround vfs rename rehash bug fs/9p: Mark directory inode invalid for many directory inode operations fs/9p: Add . and .. dentry revalidation flag fs/9p: mark inode attribute invalid on rename, unlink and setattr fs/9p: Add support for marking inode attribute invalid fs/9p: Initialize root inode number for dotl fs/9p: Update link count correctly on different file system operations fs/9p: Add drop_inode 9p callback fs/9p: Add direct IO support in cached mode fs/9p: Fix inode i_size update in file_write fs/9p: set default readahead pages in cached mode fs/9p: Move writeback fid to v9fs_inode fs/9p: Add v9fs_inode fs/9p: Don't set stat.st_blocks based on nrpages fs/9p: Add inode hashing ...	2011-03-16 08:58:09 -07:00
Linus Torvalds	bd2895eead	Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER	2011-03-16 08:20:19 -07:00
Mi Jinlong	d2b217439f	nfs41: make sure nfs server return right ca_maxresponsesize_cached According to rfc5661, ca_maxresponsesize_cached: Like ca_maxresponsesize, but the maximum size of a reply that will be stored in the reply cache (Section 2.10.6.1). For each channel, the server MAY decrease this value, but MUST NOT increase it. the latest kernel(2.6.38-rc8) may increase the value for ignoring request's ca_maxresponsesize_cached value. We should not ignore it. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-16 11:10:22 -04:00
Linus Torvalds	34d211a2d5	Increase OSF partition limit from 8 to 18 It turns out that while a maximum of 8 partitions may be what people "should" have had, you can actually fit up to 18 entries() in a sector. And some people clearly were taking advantage of that, like Michael Cree, who had ten partitions on one of his OSF disks. () The OSF partition data starts at byte offset 64 in the first sector, and the array of 16-byte partition entries start at offset 148 in the on-disk partition structure. Reported-by: Michael Cree <mcree@orcon.net.nz> Cc: stable@kernel.org (v2.6.38) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-16 08:04:07 -07:00
Christoph Hellwig	bab1d9444d	prune back iprune_sem iprune_sem is continously giving us lockdep warnings because we do take it in read mode in the reclaim path, but we're also doing non-NOFS allocations under it taken in write mode. Taking a bit deeper look at it I think it's fixable quite trivially: - for invalidate_inodes we do not need iprune_sem at all. We have an active reference on the superblock, so the filesystem is not going away until it has finished. - for evict_inodes we do need it, to make sure prune_icache has done it's work before we tear down the superblock. But there is no reason to hold it over the actual reclaim operation - it's enough to cycle through it after the actual reclaim to make sure we wait for any pending prune_icache to complete. We just have to remove the WARN_ON for otherwise busy inodes as they can actually happen now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 09:56:03 -04:00
Artem Bityutskiy	5d630e4328	UBIFS: clean-up commentaries Clean-up commentaries in debug.h and remove references to non-existing symblols. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:25 +02:00
Artem Bityutskiy	7c83cc91ab	UBIFS: save 128KiB or more RAM When debugging is enabled, we allocate a buffer of PEB size for various debugging purposes. However, now all users of this buffer are gone and we can safely remove it and save 128KiB or more RAM. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:25 +02:00
Artem Bityutskiy	f5cf319cf3	UBIFS: allocate orphans scan buffer on demand Instead of using pre-allocated 'c->dbg->buf' buffer in 'dbg_scan_orphans()', dynamically allocate it when needed. The intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and save 128KiB of RAM (or more if PEB size is larger). Indeed, currently we allocate this memory even if the user never enables any self-check, which is wasteful. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:25 +02:00
Artem Bityutskiy	cab95d446c	UBIFS: allocate lpt dump buffer on demand Instead of using pre-allocated 'c->dbg->buf' buffer in 'dump_lpt_leb()', dynamically allocate it when needed. The intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and save 128KiB of RAM (or more if PEB size is larger). Indeed, currently we allocate this memory even if the user never enables any self-check, which is wasteful. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:25 +02:00
Artem Bityutskiy	6fb324a4b0	UBIFS: allocate ltab checking buffer on demand Instead of using pre-allocated 'c->dbg->buf' buffer in 'dbg_check_ltab_lnum()', dynamically allocate it when needed. The intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and save 128KiB of RAM (or more if PEB size is larger). Indeed, currently we allocate this memory even if the user never enables any self-check, which is wasteful. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:24 +02:00
Artem Bityutskiy	cd5f7485bb	UBIFS: allocate scanning buffer on demand Instead of using pre-allocated 'c->dbg->buf' buffer in 'scan_check_cb()', dynamically allocate it when needed. The intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and save 128KiB of RAM (or more if PEB size is larger). Indeed, currently we allocate this memory even if the user never enables any self-check, which is wasteful. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:24 +02:00
Artem Bityutskiy	73d9aec3fd	UBIFS: allocate dump buffer on demand Instead of using pre-allocated 'c->dbg->buf' buffer in 'dbg_dump_leb()', dynamically allocate it when needed. The intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and save 128KiB of RAM (or more if PEB size is larger). Indeed, currently we allocate this memory even if the user never enables any self-check, which is wasteful. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-16 14:05:24 +02:00
Al Viro	0e794589e5	fix follow_link() breakage commit `574197e0de` had a missing piece, breaking the loop detection ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 04:57:03 -04:00
Phillip Lougher	44cff8a9ee	Squashfs: handle corruption of directory structure Handle the rare case where a directory metadata block is uncompressed and corrupted, leading to a kernel oops in directory scanning (memcpy). Normally corruption is detected at the decompression stage and dealt with then, however, this will not happen if: - metadata isn't compressed (users can optionally request no metadata compression), or - the compressed metadata block was larger than the original, in which case the uncompressed version was used, or - the data was corrupt after decompression This patch fixes this by adding some sanity checks against known maximum values. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-03-16 01:04:18 +00:00
Linus Torvalds	422e6c4bc4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc/<pid>/mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ...	2011-03-15 15:48:13 -07:00
Trond Myklebust	c83ce989cb	VFS: Fix the nfs sillyrename regression in kernel 2.6.38 The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename because the latter relies on being able to determine the parent directory of the dentry in the ->iput() callback in order to send the appropriate unlink rpc call. Looking at the code that cares about races with dput(), there doesn't seem to be anything that specifically uses d_parent as a test for whether or not there is a race: - __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent - shrink_dcache_for_umount() is safe since nothing else can rearrange the dentries in that super block. - have_submount(), select_parent() and d_genocide() can test for a deletion if we set the DCACHE_DISCONNECTED flag when the dentry is removed from the parent's d_subdirs list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org (2.6.38, needs commit `c826cb7dfc` "dcache.c: create helper function for duplicated functionality" ) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-15 15:46:11 -07:00
James Morris	a002951c97	Merge branch 'next' into for-linus	2011-03-16 09:41:17 +11:00
Linus Torvalds	c826cb7dfc	dcache.c: create helper function for duplicated functionality This creates a helper function for he "try to ascend into the parent directory" case, which was written out in triplicate before. With all the locking and subtle sequence number stuff, we really don't want to duplicate that kind of code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-15 15:29:21 -07:00
Al Viro	574197e0de	tidy the trailing symlinks traversal up * pull the handling of current->total_link_count into __do_follow_link() * put the common "do ->put_link() if needed and path_put() the link" stuff into a helper (put_link(nd, link, cookie)) * rename __do_follow_link() to follow_link(), while we are at it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:25 -04:00
Al Viro	b356379a02	Turn resolution of trailing symlinks iterative everywhere The last remaining place (resolution of nested symlink) converted to the loop of the same kind we have in path_lookupat() and path_openat(). Note that we still do have a recursion in pathname resolution; can't avoid it, really. However, it's strictly for nested symlinks now - i.e. ones in the middle of a pathname. link_path_walk() has lost the tail now - it always walks everything except the last component. do_follow_link() renamed to nested_symlink() and moved down. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:25 -04:00
Al Viro	ce0525449d	simplify link_path_walk() tail Now that link_path_walk() is called without LOOKUP_PARENT only from do_follow_link(), we can simplify the checks in last component handling. First of all, checking if we'd arrived to a directory is not needed - the caller will check it anyway. And LOOKUP_FOLLOW is guaranteed to be there, since we only get to that place with nd->depth > 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:25 -04:00
Al Viro	bd92d7fed8	Make trailing symlink resolution in path_lookupat() iterative Now the only caller of link_path_walk() that does not pass LOOKUP_PARENT is do_follow_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:25 -04:00
Al Viro	b21041d0f7	update nd->inode in __do_follow_link() instead of after do_follow_link() ... and note that we only need to do it for LAST_BIND symlinks Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:25 -04:00
Al Viro	ce57dfc179	pull handling of one pathname component into a helper new helper: walk_component(). Handles everything except symlinks; returns negative on error, 0 on success and 1 on symlinks we decided to follow. Drops out of RCU mode on such symlinks. link_path_walk() and do_last() switched to using that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 17:16:20 -04:00
Aneesh Kumar K.V	11a7b371b6	fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH We don't want to allow creation of private hardlinks by different application using the fd passed to them via SCM_RIGHTS. So limit the null relative name usage in linkat syscall to CAP_DAC_READ_SEARCH Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-03-15 17:16:05 -04:00
Sage Weil	09adc80c61	ceph: preserve I_COMPLETE across rename d_move puts the renamed dentry at the end of d_subdirs, screwing with our cached dentry directory offsets. We were just clearing I_COMPLETE to avoid any possibility of trouble. However, assigning the renamed dentry an offset at the end of the directory (to match it's new d_subdirs position) is sufficient to maintain correct behavior and hold onto I_COMPLETE. This is especially important for workloads like rsync, which renames files into place. Before, we would lose I_COMPLETE and do MDS lookups for each file. With this patch we only talk to the MDS on create and rename. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-15 09:14:03 -07:00
Aneesh Kumar K.V	7c9e592e1f	fs/9p: Make the writeback_fid owned by root Changes to make sure writeback fid is owned by root Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:42 -05:00
Aneesh Kumar K.V	3dc5436aa5	fs/9p: Writeback dirty data before setattr change file attribute can result in making the file readonly. So flush the dirty pages before that. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:42 -05:00
Aneesh Kumar K.V	f10fc50f1a	fs/9p: call vmtruncate before setattr 9p opeation We need to call vmtruncate before 9p setattr operation, otherwise we could write back some dirty pages between setattr with ATTR_SIZE and vmtruncate causing some truncated pages to be written back to server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:42 -05:00
Aneesh Kumar K.V	c06c066a08	fs/9p: Properly update inode attributes on link With caching enabled, we need to make sure we don't update inode->i_size via stat2inode because we could have dirty data which is not yet written to the server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:42 -05:00
Aneesh Kumar K.V	e0459f57b8	fs/9p: Prevent multiple inclusion of same header Add necessary #ifndef #endif blocks to avoid mulitple inclusion of same headers Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:41 -05:00
Aneesh Kumar K.V	23b08e97f2	fs/9p: Workaround vfs rename rehash bug This is similar to what ceph, ocfs2 and nfs does http://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/18/1498534 May be we should get vfs fixed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:41 -05:00
Aneesh Kumar K.V	d28c61f0e0	fs/9p: Mark directory inode invalid for many directory inode operations One successfull directory operation we would have changed directory inode attribute. So mark them invalid Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:41 -05:00
Aneesh Kumar K.V	823fcfd422	fs/9p: Add . and .. dentry revalidation flag We need to revalidate . and .. entries also Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:41 -05:00
Aneesh Kumar K.V	3bc86de317	fs/9p: mark inode attribute invalid on rename, unlink and setattr rename, unlink and setattr can result in update of inode attribute. So mark the cached copy invalid Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:41 -05:00
Aneesh Kumar K.V	b3cbea03b4	fs/9p: Add support for marking inode attribute invalid With cached mode some of the file system operation result in updating inode attributes (ctime). Add support for marking inode attribute invalid in such cases so that we fetch the updated inode attribute on dentry revalidation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	0e432703aa	fs/9p: Initialize root inode number for dotl Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	b271ec47bc	fs/9p: Update link count correctly on different file system operations Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	edd73cf544	fs/9p: Add drop_inode 9p callback We want to immediately drop the inode in non cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	e959b54901	fs/9p: Add direct IO support in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	fa6ea16160	fs/9p: Fix inode i_size update in file_write Only update inode i_size when we write towards end of file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	6b365604ca	fs/9p: set default readahead pages in cached mode We want to enable readahead in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	6b39f6d22f	fs/9p: Move writeback fid to v9fs_inode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	a78ce05d5d	fs/9p: Add v9fs_inode Switch to the fscache code to v9fs_inode. We will later use v9fs_inode in cache=loose mode to track the inode cache validity timeout. Ie if we find an inode in cache older that a specific jiffie range we will consider it stale Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	a12119087b	fs/9p: Don't set stat.st_blocks based on nrpages simple_getattr does set stat.st_blocks to a value derived from nrpages. That is not correct with 9p Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	5ffc0cb308	fs/9p: Add inode hashing We didn't add the inode to inode hash in 9p. We need to do that to get sync to work, otherwise __mark_inode_dirty will not add the inode to super block's dirty list. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	62d810b424	fs/9p: We need not writeback dirty pages during close Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	00ea2df43e	fs/9p: Implement syncfs call back for 9Pfs FIXME!! what about dotu ? Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	db5841d4a5	fs/9p: Mark file system with MS_SYNCHRONOUS only if it is not cached mode We should not mark file system synchronous if mounted cache=* option Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	a950a65264	fs/9p: Clarify cached dentry delete operation Update the comment to indicate that we don't want to cache negative dentries. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	7263cebed9	fs/9p: Add buffered write support for v9fs. We can now support writeable mmaps. Based on the original patch from Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	3cf387d780	fs/9p: Add fid to inode in cached mode The fid attached to inode will be opened O_RDWR mode and is used for dirty page writeback only. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	17311779ac	fs/9p: Add read write helper function We add read write helper function here which will be used later by the mmap patch Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	2efda7998b	fs/9p: [fscache] wait for page write in cached mode We need to call fscache_wait_on_page_write in launder_page for fscache Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	20656a49ef	fs/9p: increment inode->i_count in cached mode. We need to ihold even in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Aneesh Kumar K.V	46848de024	fs/9p: set fs cache cookie in create path also We need to call v9fs_cache_inode_set_cookie in create path also Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Aneesh Kumar K.V	29236f4e18	fs/9p: set the cached file_operations struct during inode init With the old code we were not setting the file->f_op with cached file operations during creat. (format correction by jvrao@linux.vnet.ibm.com) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)	6752a1ebd1	[fs/9p] Make access=client default in 9p2000.L protocol Current code sets access=user as default for all protocol versions. This patch chagnes it to "client" only for dotl. User can always specify particular access mode with -o access= option. No change there. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	e782ef7109	[fs/9P] Add posixacl mount option The mount option access=client is overloaded as it assumes acl too. Adding posixacl option to enable POSIX ACLs makes it explicit and clear. Also it is convenient in the future to add other types of acls like richacls. Ideally, the access mode 'client' should be just like V9FS_ACCESS_USER except it underscores the location of access check. Traditional 9P protocol lets the server perform access checks but with this mode, all the access checks will be performed on the client itself. Server just follows the client's directive. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	9332685dff	[fs/9p] Ignore acl mount option when CONFIG_9P_FS_POSIX_ACL is not defined. If the kernel is not compiled with CONFIG_9P_FS_POSIX_ACL and the mount option is specified to enable ACLs current code fails the mount. This patch brings the behavior inline with other filesystems like ext3 by proceeding with the mount and log a warning to syslog. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	d344b0fb72	[fs/9p] Initialze cached acls both in cached/uncached mode. With create/mkdir/mknod in non cached mode we initialize the inode using v9fs_get_inode. v9fs_get_inode doesn't initialize the cache inode value to NULL. This is causing to trip on BUG_ON in v9fs_get_cached_acl. Fix is to initialize acls to NULL and not to leave them in ACL_NOT_CACHED state. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-03-15 09:57:33 -05:00
Venkateswararao Jujjuri (JV)	c61fa0d6d9	[fs/9p] Plug potential acl leak In v9fs_get_acl() if __v9fs_get_acl() gets only one of the dacl/pacl we are not releasing it. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-03-15 09:57:33 -05:00
Boaz Harrosh	a49fb4c3d0	exofs: deprecate the commands pending counter One leftover from the days of IBM's original code, is an SB counter that counts in-flight asynchronous commands. And a piece of code that waits for the counter to reach zero at unmount. I guess it might have been needed then, cause of some reference missing or something. I'm not removing it yet but am putting a warning message if ever this counter triggers at unmount. If I'll never see it triggers or reported I'll remove the counter for good. (I had this print as a debug output for a long time and never had it trigger) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:52 +02:00
Boaz Harrosh	1cea312ad4	exofs: Write sbi->s_nextid as part of the Create command Before when creating a new inode, we'd set the sb->s_dirt flag, and sometime later the system would write out s_nextid as part of the sb_info. Also on inode sync we would force the sb sync as well. Define the s_nextid as a new partition attribute and set it every time we create a new object. At mount we read it from it's new place. We now never set sb->s_dirt anywhere in exofs. write_super is actually never called. The call to exofs_write_super from exofs_put_super is also removed because the VFS always calls ->sync_fs before calling ->put_super twice. To stay backward-and-forward compatible we also write the old s_nextid in the super_block object at unmount, and support zero length attribute on mount. This also fixes a BUG where in layouts when group_width was not a divisor of EXOFS_SUPER_ID (0x10000) the s_nextid was not read from the device it was written to. Because of the sliding window layout trick, and because the read was always done from the 0 device but the write was done via the raid engine that might slide the device view. Now we read and write through the raid engine. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:51 +02:00
Boaz Harrosh	9ed9648431	exofs: Add option to mount by osdname If /dev/osd* devices are shuffled because more devices where added, and/or login order has changed. It is hard to mount the FS you want. Add an option to mount by osdname. osdname is any osd-device's osdname as specified to the mkfs.exofs command when formatting the osd-devices. The new mount format is: OPT="osdname=$UUID0,pid=$PID,_netdev" mount -t exofs -o $OPT $DEV_OSD0 $MOUNTDIR if "osdname=" is specified in options above $DEV_OSD0 is ignored and can be empty. Also while at it: Removed some old unused Opt_* enums. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:51 +02:00
bharrosh@panasas.com	66cd6cad49	exofs: Override read-ahead to align on stripe_size * Set all inode->i_mapping->backing_dev_info to point to the per super-block sb->s_bdi. * Calculating a read_ahead that is: - preferable 2 stripes long (Future patch will add a mount option to override this) - Minimum 128K aligned up to stripe-size - Caped to maximum-IO-sizes round down to stripe_size. (Max sizes are governed by max bio-size that fits in a page times number-of-devices) CC: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:50 +02:00
Nick Piggin	97178b7b6c	exofs: simple fsync race fix It is incorrect to test inode dirty bits without participating in the inode writeback protocol. Inode writeback sets I_SYNC and clears I_DIRTY_?, then writes out the particular bits, then clears I_SYNC when it is done. BTW. it may not completely write all pages out, so I_DIRTY_PAGES would get set again. This is a standard pattern used throughout the kernel's writeback caches (I_SYNC ~= I_WRITEBACK, if that makes it clearer). And so it is not possible to determine an inode's dirty status just by checking I_DIRTY bits. Especially not for the purpose of data integrity syncs. Missing the check for these bits means that fsync can complete while writeback to the inode is underway. Inode writeback functions get this right, so call into them rather than try to shortcut things by testing dirty state improperly. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:50 +02:00
Boaz Harrosh	a8f1418f9e	exofs: Optimize read_4_write Don't attempt a read passed i_size, just zero the page and be done with it. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:02:49 +02:00
Boaz Harrosh	0a935519cc	exofs: Trivial: fix some indentation and debug prints I stumbled on some of these prints in log files so, might just submit the fixes. * All i_ino prints in exofs should be hex * All OSD_ERR prints should end with a "\n" Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-03-15 15:00:27 +02:00
Stephen Rothwell	8f68cd42d8	nfs: BKL is no longer needed, so remove the include Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-15 08:44:35 -04:00
Tobias Klauser	2c722c9a47	exofs: Remove redundant unlikely() IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2011-03-15 12:33:42 +02:00
Steven Whitehouse	7e32d02613	GFS2: Don't use _raw version of RCU dereference As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-03-15 08:58:17 +00:00
Al Viro	326be7b484	Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	65cfc67223	readlinkat(), fchownat() and fstatat() with empty relative pathnames For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	bcda76524c	Allow O_PATH for symlinks At that point we can't do almost nothing with them. They can be opened with O_PATH, we can manipulate such descriptors with dup(), etc. and we can see them in /proc//{fd,fdinfo}/. We can't (and won't be able to) follow /proc//fd/ symlinks for those; there's simply not enough information for pathname resolution to go on from such point - to resolve a symlink we need to know which directory does it live in. We will be able to do useful things with them after the next commit, though - readlinkat() and fchownat() will be possible to use with dfd being an O_PATH-opened symlink and empty relative pathname. Combined with open_by_handle() it'll give us a way to do realink-by-handle and lchown-by-handle without messing with more redundant syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	1abf0c718f	New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does NOT affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f2fa2ffc20	ext4: Copy fs UUID to superblock File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	03cb5f03dc	ext3: Copy fs UUID to superblock. File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	93f1c20bc8	vfs: Export file system uuid via /proc/<pid>/mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f17b604207	fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	aae8a97d3e	fs: Don't allow to create hardlink for deleted file Add inode->i_nlink == 0 check in VFS. Some of the file systems do this internally. A followup patch will remove those instance. This is needed to ensure that with link by handle we don't allow to create hardlink of an unlinked file. The check also prevent a race between unlink and link Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	becfd1f375	vfs: Add open by file handle support [AV: duplicate of open() guts removed; file_open_root() used instead] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	990d6c2d7a	vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:37 -04:00
J. Bruce Fields	0a5e5f122c	nfsd: fix compile error "fs/built-in.o: In function `supported_enctypes_show': nfsctl.c:(.text+0x7beb0): undefined reference to `gss_mech_get_by_name' nfsctl.c:(.text+0x7bebc): undefined reference to `gss_mech_put' " Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-14 20:57:44 -04:00
Al Viro	f52e0c1130	New AT_... flag: AT_EMPTY_PATH For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 19:12:20 -04:00
Linus Torvalds	5f40d42094	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSROOT should default to "proto=udp" nfs4: remove duplicated #include NFSv4: nfs4_state_mark_reclaim_nograce() should be static NFSv4: Fix the setlk error handler NFSv4.1: Fix the handling of the SEQUENCE status bits NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses NFSv4.1 reclaim complete must wait for completion NFSv4: remove duplicate clientid in struct nfs_client NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY sunrpc: Propagate errors from xs_bind() through xs_create_sock() (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid nfs: fix compilation warning nfs: add kmalloc return value check in decode_and_add_ds SUNRPC: Remove resource leak in svc_rdma_send_error() nfs: close NFSv4 COMMIT vs. CLOSE race SUNRPC: Close a race in __rpc_wait_for_completion_task()	2011-03-14 11:19:50 -07:00
Timo Warns	1eafbfeb7b	Fix corrupted OSF partition table parsing The kernel automatically evaluates partition tables of storage devices. The code for evaluating OSF partitions contains a bug that leaks data from kernel heap memory to userspace for certain corrupted OSF partitions. In more detail: for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) { iterates from 0 to d_npartitions - 1, where d_npartitions is read from the partition table without validation and partition is a pointer to an array of at most 8 d_partitions. Add the proper and obvious validation. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org [ Changed the patch trivially to not repeat the whole le16_to_cpu() thing, and to use an explicit constant for the magic value '8' ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-14 10:14:28 -07:00
Maxim	6c474f7bc1	GFS2: Adding missing unlock_page() gfs2_write_begin() calls grab_cache_page_write_begin() that returns locked page. Correspondent error-handling path lacks for unlock_page() call: > out: > if (error == 0) > return 0; > > page_cache_release(page); The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin() failed for some reason. Reported-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V	5fe0c23788	exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	c8b91accfa	clean statfs-like syscalls up New helpers: user_statfs() and fd_statfs(), taking userland pathname and descriptor resp. and filling struct kstatfs. Syscalls of statfs family (native, compat and foreign - osf and hpux on alpha and parisc resp.) switched to those. Removes some boilerplate code, simplifies cleanup on errors... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	73d049a40f	open-style analog of vfs_path_lookup() new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	5b6ca027d8	reduce vfs_path_lookup() to do_path_lookup() New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() and gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	5a18fff209	untangle do_lookup() That thing has devolved into rats nest of gotos; sane use of unlikely() gets rid of that horror and gives much more readable structure: * make a fast attempt to find a dentry; false negatives are OK. In RCU mode if everything went fine, we are done, otherwise just drop out of RCU. If we'd done (RCU) ->d_revalidate() and it had not refused outright (i.e. didn't give us -ECHILD), remember its result. * now we are not in RCU mode and hopefully have a dentry. If we do not, lock parent, do full d_lookup() and if that has not found anything, allocate and call ->lookup(). If we'd done that ->lookup(), remember that dentry is good and we don't need to revalidate it. * now we have a dentry. If it has ->d_revalidate() and we can't skip it, call it. * hopefully dentry is good; if not, either fail (in case of error) or try to invalidate it. If d_invalidate() has succeeded, drop it and retry everything as if original attempt had not found a dentry. * now we can finish it up - deal with mountpoint crossing and automount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	40b39136f0	path_openat: clean ELOOP handling a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	f374ed5fa8	do_last: kill a rudiment of old ->d_revalidate() workaround There used to be time when ->d_revalidate() couldn't return an error. So intents code had lookup_instantiate_filp() stash ERR_PTR(error) in nd->intent.open.filp and had it checked after lookup_hash(), to catch the otherwise silent failures. That had been introduced by commit `4af4c52f34`. These days ->d_revalidate() can and does propagate errors back to callers explicitly, so this check isn't needed anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	6c0d46c493	fold __open_namei_create() and open_will_truncate() into do_last() ... and clean up a bit more Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	ca344a894b	do_last: unify may_open() call and everyting after it We have a bunch of diverging codepaths in do_last(); some of them converge, but the case of having to create a new file duplicates large part of common tail of the rest and exits separately. Massage them so that they could be merged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	9b44f1b392	move may_open() from __open_name_create() to do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	0f9d1a10c3	expand finish_open() in its only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	5a202bcd75	sanitize pathname component hash calculation Lift it to lookup_one_len() and link_path_walk() resp. into the same place where we calculated default hash function of the same name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	6a96ba5441	kill __lookup_one_len() only one caller left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	fe2d35ff0d	switch non-create side of open() to use of do_last() Instead of path_lookupat() doing trailing symlink resolution, use the same scheme as on the O_CREAT side. Walk with LOOKUP_PARENT, then (in do_last()) look the final component up, then either open it or return error or, if it's a symlink, give the symlink back to path_openat() to be resolved there. The really messy complication here is RCU. We don't want to drop out of RCU mode before the final lookup, since we don't want to bounce parent directory ->d_count without a good reason. Result is _not_ pretty; later in the series we'll clean it up. For now we are roughly back where we'd been before the revert done by Nick's series - top-level logics of path_openat() is cleaned up, do_last() does actual opening, symlink resolution is done uniformly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	70e9b35711	get rid of nd->file Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	951361f954	get rid of the last LOOKUP_RCU dependencies in link_path_walk() New helper: terminate_walk(). An error has happened during pathname resolution and we either drop nd->path or terminate RCU, depending the mode we had been in. After that, nd is essentially empty. Switch link_path_walk() to using that for cleanup. Now the top-level logics in link_path_walk() is back to sanity. RCU dependencies are in the lower-level functions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	a7472baba2	make nameidata_dentry_drop_rcu_maybe() always leave RCU mode Now we have do_follow_link() guaranteed to leave without dangling RCU and the next step will get LOOKUP_RCU logics completely out of link_path_walk(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	ef7562d528	make handle_dots() leave RCU mode on error Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	4455ca6223	clear RCU on all failure exits from link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	9856fa1b28	pull handling of . and .. into inlined helper getting LOOKUP_RCU checks out of link_path_walk()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	7bc055d1d5	kill out_dput: in link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	13aab428a7	separate -ESTALE/-ECHILD retries in do_filp_open() from real work new helper: path_openat(). Does what do_filp_open() does, except that it tries only the walk mode (RCU/normal/force revalidation) it had been told to. Both create and non-create branches are using path_lookupat() now. Fixed the double audit_inode() in non-create branch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	47c805dc2d	switch do_filp_open() to struct open_flags take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	c3e380b0b3	Collect "operation mode" arguments of do_last() into a structure No point messing with passing shitloads of "operation mode" arguments to do_open() one by one, especially since they are not going to change during do_filp_open(). Collect them into a struct, fill it and pass to do_last() by reference. Make sure that lookup intent flags are correctly set and removed - we want them for do_last(), but they make no sense for __do_follow_link(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	f1afe9efc8	clean up the failure exits after __do_follow_link() in do_filp_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	36f3b4f690	pull security_inode_follow_link() into __do_follow_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	086e183a64	pull dropping RCU on success of link_path_walk() into path_lookupat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	16c2cd7179	untangle the "need_reval_dot" mess instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	fe479a580d	merge component type recognition no need to do it in three places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	e41f7d4ee5	merge path_init and path_init_rcu Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	ee0827cd6b	sanitize path_walk() mess New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	52094c8a06	take RCU-dependent stuff around exec_permission() into a new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Al Viro	c9c6cac0c2	kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Steven Whitehouse	c618e87a5f	GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>	2011-03-14 12:40:29 +00:00
Al Viro	c44ed965be	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, the compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix, so IMO it's OK for -final. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-13 16:29:07 -07:00
Al Viro	586ce098a2	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-13 19:21:26 -04:00
Linus Torvalds	0e5b88cd99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling	2011-03-13 16:00:49 -07:00
Chris Mason	36e39c40b3	Btrfs: break out of shrink_delalloc earlier Josef had changed shrink_delalloc to exit after three shrink attempts, which wasn't quite enough because new writers could race in and steal free space. But it also fixed deadlocks and stalls as we tried to recover delalloc reservations. The code was tweaked to loop 1024 times, and would reset the counter any time a small amount of progress was made. This was too drastic, and with a lot of writers we can end up stuck in shrink_delalloc forever. The shrink_delalloc loop is fairly complex because the caller is looping too, and the caller will go ahead and force a transaction commit to make sure we reclaim space. This reworks things to exit shrink_delalloc when we've forced some writeback and the delalloc reservations have gone down. This means the writeback has not just started but has also finished at least some of the metadata changes required to reclaim delalloc space. If we've got this wrong, we're returning ENOSPC too early, which is a big improvement over the current behavior of hanging the machine. Test 224 in xfstests hammers on this nicely, and with 1000 writers trying to fill a 1GB drive we get our first ENOSPC at 93% full. The other writers are able to continue until we get 100%. This is a worst case test for btrfs because the 1000 writers are doing small IO, and the small FS size means we don't have a lot of room for metadata chunks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-12 07:08:42 -05:00
Alex Elder	0c9ba97318	xfs: don't name variables "panic" The new xfs_alert_tag() used a variable named "panic", and that is to be avoided. Rename it. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-03-11 16:34:51 -06:00
Rob Landley	c5cb09b6f8	Cleanup: Factor out some cut-and-paste code. Factor out some cut-and-paste code in options parsing. Saves about 800 bytes on x86-64. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Rob Landley	c12bacec45	cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. Eliminate two mostly duplicate functions (nfs_parse_simple_hostname() and nfs_parse_protected_hostname()) and instead just make the calling function (nfs_parse_devname()) do everything. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Konstantin Khlebnikov	7ec10f26e1	NFS: account direct-io into task io accounting Account NFS direct-io reads and writes into Task I/O Accounting. Do it before complition to handle aio. NFS have unusual direct-io implementation, thus accounting in generic code does not work. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	b064eca2cf	NFSv4: Send unmapped uid/gids to the server when using auth_sys The new behaviour is enabled using the new module parameter 'nfs4_disable_idmapping'. Note that if the server rejects an unmapped uid or gid, then the client will automatically switch back to using the idmapper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	3ddeb7c5c6	NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr This will be required in order to switch uid/gid mapping back on if the admin has tried to disable it. Note that we also propagate NFS4ERR_BADNAME at the same time, in order to work around a Linux server bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	e4fd72a17d	NFSv4: cleanup idmapper functions to take an nfs_server argument ...instead of the nfs_client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	f0b851689a	NFSv4: Send unmapped uid/gids to the server if the idmapper fails Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	5cf36cfdc8	NFSv4: If the server sends us a numeric uid/gid then accept it Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Benny Halevy	75247affd7	NFSv4.1: reject zero layout with zeroed stripe unit Allowing stripe_unit==0 causes the client to crash later on when dividing by zero. Reported-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	36fe432d33	NFSv4.1: Clear lseg pointer in ->doio function Now that we have access to the pointer, clear it immediately after the put, instead of in caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	c76069bda0	NFSv4.1: rearrange ->doio args This will make it possible to clear the lseg pointer in the same function as it is put, instead of in the caller nfs_pageio_doio(). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	a69aef1496	NFSv4.1: pnfs filelayout driver write Allows the pnfs filelayout driver to write to the data servers. Note that COMMIT to data servers will be implemented in a future patch. To avoid improper behavior, for the moment any WRITE to a data server that would also require a COMMIT to the data server is sent NFS_FILE_SYNC. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	7ffd10640d	NFSv4.1: remove GETATTR from ds writes Any WRITE compound directed to a data server needs to have the GETATTR calls suppressed. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Andy Adamson	0382b74409	NFSv4.1: implement generic pnfs layer write switch Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	44b83799a9	NFSv4.1: trigger LAYOUTGET for writes Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	5053aa568d	NFSv4.1: Send lseg down into nfs_write_rpcsetup We grab the lseg sent in from the doio function and attach it to each struct nfs_write_data created. This is how the lseg will be sent to the layout driver. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	b029bc9b08	NFSv4.1: add callback to nfs4_write_done Add callback that pnfs layout driver can use to do its own handling of data server WRITE response. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	d138d5d17b	NFSv4.1: rearrange nfs_write_rpcsetup Reorder nfs_write_rpcsetup, preparing for a pnfs entry point. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	568e8c494d	NFSv4.1: turn off pNFS on ds connection failure If a data server is unavailable, go through MDS. Mark the deviceid containing the data server as a negative cache entry. Do not try to connect to any data server on a deviceid marked as a negative cache entry. Mark any layout that tries to use the marked deviceid as failed. Inodes with a layout marked as fails will not use the layout for I/O, and will not perform any more layoutgets. Inodes without a layout will still do layoutget, but the layout will get marked immediately. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Christoph Hellwig	ea8eecdd11	NFSv4.1 move deviceid cache to filelayout driver No need for generic cache with only one user. Keep a simple hash of deviceids in the filelayout driver. Signed-off-by: Christoph Hellwig <hch@infradead.org> Acked-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	cbdabc7f8b	NFSv4.1: filelayout async error handler Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	dc70d7b318	NFSv4.1: filelayout read Attempt a pNFS file layout read by setting up the nfs_read_data struct and calling nfs_initiate_read with the data server rpc client and the filelayout rpc call ops. Error handling is implemented in a subsequent patch. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Fred Isaman	cfe7f4120f	NFSv4.1: filelayout i/o helpers Prepare for filelayout_read_pagelist with helper functions that find the correct data server, filehandle, and offset. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Tigran Mkrtchyan <tigran@anahit.desy.de> Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	d83217c135	NFSv4.1: data server connection Introduce a data server set_client and init session following the nfs4_set_client and nfs4_init_session convention. Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state serializes access to creating an nfs_client struct with matching properties. Use the new nfs_get_client() that initializes new clients. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	64419a9b20	NFSv4.1: generic read Separate the rpc run portion of nfs_read_rpcsetup into a new function nfs_initiate_read that is called for normal NFS I/O. Add a pNFS read_pagelist function that is called instead of nfs_intitate_read for pNFS reads. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	bae724ef95	NFSv4.1: shift pnfs_update_layout locations Move the pnfs_update_layout call location to nfs_pageio_do_add_request(). Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach it to each nfs_read_data so it can be sent to the layout driver. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	94ad1c80e2	NFSv4.1: coelesce across layout stripes Add a pg_test layout driver hook which is used to avoid coelescing I/O across layout stripes. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	d684d2ae10	NFSv4.1: lseg refcounting Prepare put_lseg and get_lseg to be called from the pNFS I/O code. Pull common code from pnfs_lseg_locked to call from pnfs_lseg. Inline pnfs_lseg_locked into it's only caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	94de8b27d0	NFSv4.1: add MDS mount DS only check The DS only role cannot be used to mount. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d6fb79d433	NFSv4.1: new flag for lease time check Data servers cannot send nfs4_proc_get_lease_time. but still need to setup state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease time can be checked. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d3b4c9d767	NFSv4.1: new flag for state renewal check Data servers not sharing a session with the mount MDS always have an empty cl_superblocks list. Replace the cl_superblocks empty list check to see if it is time to shut down renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	89d1ea6579	NFSv4.1: send zero stateid seqid on v4.1 i/o Data servers require a zero stateid seqid, and there is no advantage to not doing the same for all NFSv4.1 Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	45a52a0207	NFS move nfs_client initialization into nfs_get_client Now nfs_get_client returns an nfs_client ready to be used no matter if it was found or created. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	bf9c1387ca	NFSv4.1: put_layout_hdr can remove nfsi->layout Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a pnfs_layout_hdr first pnfs_layout_segment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Fred Isaman	136028967a	NFS: change nfs_writeback_done to return void The return values are not used by any callers. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	83762c56c1	NFS: remove pointless if statement in nfs_direct_write_result The code was doing nothing more in either branch of the if. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	f49f9baac8	pnfs: fix pnfs lock inversion of i_lock and cl_lock The pnfs code was using throughout the lock order i_lock, cl_lock. This conflicts with the nfs delegation code. Rework the pnfs code to avoid taking both locks simultaneously. Currently the code takes the double lock to add/remove the layout to a nfs_client list, while atomically checking that the list of lsegs is empty. To avoid this, we rely on existing serializations. When a layout is initialized with lseg count equal zero, LAYOUTGET's openstateid serialization is in effect, making it safe to assume it stays zero unless we change it. And once a layout's lseg count drops to zero, it is set as DESTROYED and so will stay at zero. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	9f52c2525e	pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding io is finished. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	3851172244	pnfs: avoid incorrect use of layout stateid The code could violate the following from RFC5661, section 12.5.3: "Once a client has no more layouts on a file, the layout stateid is no longer valid and MUST NOT be used." This can occur when a layout already has a lseg, starts another non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg is processed before we hit pnfs_layout_process(). Solve by setting, each time the client has no more lsegs for a file, a flag which blocks further use of the layout and triggers its removal. This also fixes a second bug which occurs in the same instance as above. If we actually use pnfs_layout_process, we add the new lseg to the layout, but the layout has been removed from the nfs_client list by the intervening CB_LAYOUTRECALL and will not be added back. Thus the newly acquired lseg will not be properly returned in the event of a subsequent CB_LAYOUTRECALL. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:39 -05:00
Chuck Lever	53d4737580	NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@lavos.net> bisected to commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit `56463e50`. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@lavos.net> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:07 -05:00
Huang Weiyi	57df216bd8	nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:37 -05:00
Trond Myklebust	f9feab1e18	NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Tracey Dent	bea9312839	jffs2: remove a trailing white space in commentaries Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2011-03-11 14:22:48 +00:00
Dave Chinner	d6a079e82e	GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 11:52:25 +00:00
Benjamin Marzinski	e4a7b7b0c9	GFS2: fix block allocation check for fallocate GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:26:48 +00:00
Bob Peterson	fa1bbdea30	GFS2: Optimize glock multiple-dequeue code This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:24:54 +00:00
Artem Bityutskiy	2bcf002159	UBIFS: do not check data crc by default Change the default UBIFS behavior WRT data CRC checking. Currently, UBIFS checks data CRC when reading, which slows it down quite a bit, and this is the default option. However, it looks like in average user does not need this feature and would prefer faster read speed over extra reliability. And this seems to be de-facto standard that file-systems do not check data CRC every time they read from the media. Thus, make UBIFS default behavior so that it does not check data CRC. This corresponds to the no_chk_data_crc mount option. Those users who need extra protection can always enable it using the chk_data_crc option. Please, read more information about this feature here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Artem Bityutskiy	cce3f612fe	UBIFS: simplify UBIFS Kconfig menu Remove debug message level and debug checks Kconfig options as they proved to be useless anyway. We have sysfs interface which we can use for fine-grained debugging messages and checks selection, see Documentation/filesystems/ubifs.txt for mode details. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Artem Bityutskiy	6342aaebda	UBIFS: print max. index node size Improve debugging messages by printing the maximum index node size on mount. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Matthew L. Creech	d882962f6a	UBIFS: handle allocation failures in UBIFS write path Running kernel 2.6.37, my PPC-based device occasionally gets an order-2 allocation failure in UBIFS, which causes the root FS to become unwritable: kswapd0: page allocation failure. order:2, mode:0x4050 Call Trace: [c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable) [c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c [c787dd00] [c0061b98] __get_free_pages+0x20/0x50 [c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200 [c787dd50] [c00e82d4] do_writepage+0x94/0x198 [c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c [c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370 [c787de90] [c0068224] shrink_zone+0x2b4/0x2b8 [c787df00] [c0068854] kswapd+0x408/0x5d4 [c787dfb0] [c0037bcc] kthread+0x80/0x84 [c787dff0] [c000ef44] kernel_thread+0x4c/0x68 Similar problems were encountered last April by Tomasz Stanislawski: http://patchwork.ozlabs.org/patch/50965/ This patch implements Artem's suggested fix: fall back to a mutex-protected static buffer, allocated at mount time. I tested it by forcing execution down the failure path, and didn't see any ill effects. Artem: massaged the patch a little, improved it so that we'd not allocate the write reserve buffer when we are in R/O mode. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
David Teigland	e43f055a95	dlm: use alloc_workqueue function Replaces deprecated create_singlethread_workqueue(). Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 13:22:34 -06:00
David Teigland	e3853a90e2	dlm: increase default hash table sizes Make all three hash tables a consistent size of 1024 rather than 1024, 512, 256. All three tables, for resources, locks, and lock dir entries, will generally be filled to the same order of magnitude. Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 13:08:22 -06:00
David Teigland	8304d6f24c	dlm: record full callback state Change how callbacks are recorded for locks. Previously, information about multiple callbacks was combined into a couple of variables that indicated what the end result should be. In some situations, we could not tell from this combined state what the exact sequence of callbacks were, and would end up either delivering the callbacks in the wrong order, or suppress redundant callbacks incorrectly. This new approach records all the data for each callback, leaving no uncertainty about what needs to be delivered. Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 10:40:00 -06:00
Miao Xie	7e6b6465e6	btrfs: fix not enough reserved space btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
Daniel J Blueman	b4966b7770	btrfs: fix dip leak The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
J. Bruce Fields	d891eedbc3	fs/dcache: allow d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 05:18:54 -05:00
Marco Stornelli	1ca551c6ca	Check for immutable/append flag in fallocate path In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 04:22:15 -05:00
Al Viro	9177ada99d	fat: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:49 -05:00
Al Viro	8ce84eeb5b	jfs: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:28 -05:00
Al Viro	4714e63731	ocfs2: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:07 -05:00
Al Viro	53fe924161	gfs2: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:48 -05:00
Al Viro	529c5f958f	fuse: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:31 -05:00
Al Viro	0eb980e317	ceph: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:05 -05:00
Al Viro	c78f4cc5e7	reiserfs xattr ->d_revalidate() shouldn't care about RCU ... it returns an error unconditionally Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:42:01 -05:00
Al Viro	ae50adcb0a	/proc/self is never going to be invalidated... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:41:53 -05:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	721a9602e6	block: kill off REQ_UNPLUG With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Jens Axboe	cf15900e12	aio: remove request submission batching This should be useless now that we have on-stack plugging. So lets just kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Shaohua Li	9f5b942546	fs: make aio plug Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Jens Axboe	2ed1a6bcf9	fs: make mpage read/write_pages() plug Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:26 +01:00
Jens Axboe	7eaceaccab	block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:07 +01:00
Linus Torvalds	3979491701	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: wrong index used in inner loop nfsd4: fix bad pointer on failure to find delegation NFSD: fix decode_cb_sequence4resok	2011-03-09 14:52:09 -08:00
Linus Torvalds	78833dd706	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: nd->inode is not set on the second attempt in path_walk() unfuck proc_sysctl ->d_compare() minimal fix for do_filp_open() race	2011-03-09 13:55:51 -08:00
Tejun Heo	69e02c59a7	block: Don't check events while open is in progress Not all block drivers clear events immediately after reporting. Some do so in ->revalidate_disk() or other steps during ->open(). There is a slim chance event poll may happen between the clearing event check from check_disk_change() and the actual clearing of the events which would result in spurious events. Block event checks while block device open is in progress. There is no need to kick explicit event check afterwards as events are always checked during open. -v2: The original patch could have called disk_unblock_events() with an already released or %NULL @disk causing oops. Fixed by making sure references are put after disk_unblock_events() is called. It also makes the error path of __blkdev_get() a bit simpler. This problem was reported by Jens. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:27 +01:00
Tejun Heo	6936217cc7	block: Don't check events on close unless it was blocked The block event mechanism currently always checks events when the device is being closed regardless of the open mode. The intention was to allow detection of EJECT_REQUEST when a device is closed whether disk event polling is enabled or not. This is unnecessary as, for devices of interest, events are checked from either userland or kernel and in the former case ->check_events() is performed on open of each poll attempt anyway. Furthermore, this unconditional event check on close makes the code susceptible to event loop if the block driver doesn't clear reported events correctly - an event triggers userland to open and close the device which in turn causes another event, rinse and repeat. Check events on close only if it was blocked by excl write open. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:27 +01:00
Tejun Heo	facc31ddc3	block: Don't implicitly trigger event check on disk_unblock_events() Currently, disk_unblock_events() implicitly kick event check if the block count reaches zero. This behavior is not described in the comment and hinders with future changes. Make the unblocker explicitly check events by calling disk_check_events() as necessary. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:27 +01:00
Christoph Hellwig	ecb6928fcf	xfs: factor agf counter updates into a helper Updating the AGF and transactions counters is duplicated between allocating and freeing extents. Factor the code into a common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-03-09 08:23:47 -06:00
Christoph Hellwig	86fa8af69d	xfs: clean up the xfs_alloc_compute_aligned calling convention Pass a xfs_alloc_arg structure to xfs_alloc_compute_aligned and derive the alignment and minlen paramters from it. This cleans up the existing callers, and we'll need even more information from the xfs_alloc_arg in subsequent patches. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-03-09 08:23:33 -06:00
Steven Whitehouse	0a33443b38	GFS2: Remove potential race in flock code This patch ensures that we always wait for glock demotion when dropping flocks on a file in order to prevent any race conditions associated with further flock calls or closing the file. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 11:14:32 +00:00
Steven Whitehouse	fc0e38dae6	GFS2: Fix glock deallocation race This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 10:58:04 +00:00
Abhijith Das	662e3a551b	GFS2: quota allows exceeding hard limit Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 09:32:44 +00:00
Ryusuke Konishi	e3154e9748	nilfs2: get rid of nilfs_sb_info structure This directly uses sb->s_fs_info to keep a nilfs filesystem object and fully removes the intermediate nilfs_sb_info structure. With this change, the hierarchy of on-memory structures of nilfs will be simplified as follows: Before: super_block -> nilfs_sb_info -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) After: super_block -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) The reason why we didn't design so from the beginning is because the initial shape also differed from the above. The early hierachy was composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a shared nilfs object. On the kernel 2.6.37, it was changed to the current shape in order to unify super block instances into one per device, and this cleanup became applicable as the result. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:54:26 +09:00
Al Viro	b306419ae0	nd->inode is not set on the second attempt in path_walk() We leave it at whatever it had been pointing to after the first link_path_walk() had failed with -ESTALE. Things do not work well after that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-08 21:16:28 -05:00
Ryusuke Konishi	f7545144c2	nilfs2: use sb instance instead of nilfs_sb_info struct This replaces sbi uses with direct reference to sb instance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	d96bbfa28a	nilfs2: get rid of sc_sbi back pointer Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct from log writer object (nilfs_sc_info). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	3fd3fe5aea	nilfs2: move log writer onto nilfs object Log writer is held by the nilfs_sb_info structure. This moves it into nilfs object and replaces all uses of NILFS_SC() accessor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	9b1fc4e497	nilfs2: move next generation counter into nilfs object Moves s_next_generation counter and a spinlock protecting it to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	693dd32122	nilfs2: move s_inode_lock and s_dirty_files into nilfs object Moves s_inode_lock spinlock and s_dirty_files list to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
Ryusuke Konishi	574e6c3145	nilfs2: move parameters on nilfs_sb_info into nilfs object This moves four parameter variables on nilfs_sb_info s_resuid, s_resgid, s_interval and s_watermark to the nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
Ryusuke Konishi	3b2ce58b0f	nilfs2: move mount options to nilfs object This moves mount_opt local variable to nilfs object from nilfs_sb_info struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
roel	3ec07aa952	nfsd: wrong index used in inner loop Index i was already used in the outer loop Cc: stable@kernel.org Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-08 19:46:10 -05:00
J. Bruce Fields	0997b17360	nfsd4: fix struct file leak Make sure we properly reference count the struct files that a lock depends on, and release them when the lock stateid is released. This fixes a major leak of struct files when using locking over nfsv4. Cc: stable@kernel.org Reported-by: Rick Koshi <nfs-bug-report@more-right-rudder.com> Tested-by: Ivo Přikryl <prikryl@eurosat.cz> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-08 19:38:27 -05:00
J. Bruce Fields	529d7b2a7f	nfsd4: minor nfs4state.c reshuffling Minor cleanup in preparation for a bugfix--moving some code to avoid forward references, etc. No change in functionality. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-08 19:38:15 -05:00
Chris Mason	ea8efc74bd	Btrfs: make sure not to return overlapping extents to fiemap The btrfs fiemap code was incorrectly returning duplicate or overlapping extents in some cases. cp was blindly trusting this result and we would end up with a destination file that was bigger than the original because some bytes were copied twice. The fix here adjusts our offsets to make sure we're always moving forward in the fiemap results. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-08 11:58:09 -05:00
Artem Bityutskiy	2765df7da5	UBIFS: use max_write_size during recovery When recovering from unclean reboots UBIFS scans the journal and checks nodes. If a corrupted node is found, UBIFS tries to check if this is the last node in the LEB or not. This is is done by checking if there only 0xFF bytes starting from the next min. I/O unit. However, since now we write in c->max_write_size, we should actually check for 0xFFs starting from the next max. write unit. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	6c7f74f703	UBIFS: use max_write_size for write-buffers Switch write-buffers from 'c->min_io_size' to 'c->max_write_size' which presumably has to be more write speed-efficient. However, when write-buffer is synchronized, write only the the min. I/O units which contain the data, do not write whole write-buffer. This is more space-efficient. Additionally, this patch takes into account that the LEB might not start from the max. write unit-aligned address. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	3c89f396dc	UBIFS: introduce write-buffer size field Currently we assume write-buffer size is always min_io_size. But this is about to change and write-buffers may be of variable size. Namely, they will be of max_write_size at the beginning, but will get smaller when we are approaching the end of LEB. This is a preparation patch which introduces 'size' field in the write-buffer structure which carries the current write-buffer size. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	ca2ec61d15	UBI: incorporate LEB offset information Incorporate the LEB offset information into UBIFS. We'll use this information in one of the next patches to figure out what are the max. write size offsets relative to the PEB. So this patch is just a preparation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:48 +02:00
Artem Bityutskiy	3e8e2e0c8d	UBIFS: incorporate maximum write size Incorporate maximum write size into the UBIFS description data structure. This patch just introduces new 'c->max_write_size' and 'c->max_write_shift' fields as a preparation for the following patches. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:48 +02:00
Martin K. Petersen	df67714028	block: biovec_slab vs. CONFIG_BLK_DEV_INTEGRITY The block integrity subsystem no longer uses the bio_vec slabs so this code can safely be compiled in. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-08 08:28:01 +01:00
Al Viro	dfef6dcd35	unfuck proc_sysctl ->d_compare() a) struct inode is not going to be freed under ->d_compare(); however, the thing PROC_I(inode)->sysctl points to just might. Fortunately, it's enough to make freeing that sucker delayed, provided that we don't step on its ->unregistering, clear the pointer to it in PROC_I(inode) before dropping the reference and check if it's NULL in ->d_compare(). b) I'm not sure that we can walk into NULL inode here (we recheck dentry->seq between verifying that it's still hashed / fetching dentry->d_inode and passing it to ->d_compare() and there's no negative hashed dentries in /proc/sys/*), but if we can walk into that, we really should not have ->d_compare() return 0 on it! Said that, I really suspect that this check can be simply killed. Nick? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-08 02:22:27 -05:00
Ryusuke Konishi	be667377a8	nilfs2: record used amount of each checkpoint in checkpoint list This records the number of used blocks per checkpoint in each checkpoint entry of cpfile. Even though userland tools can get the block count via nilfs_get_cpinfo ioctl, it was not updated by the nilfs2 kernel code. This fixes the issue and makes it available for userland tools to calculate used amount per checkpoint. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp>	2011-03-08 14:58:31 +09:00
Ryusuke Konishi	ae191838b0	nilfs2: optimize rec_len functions This is a similar change to those in ext2/ext3 codebase (commit `40a063f669` and `a4ae309486`, respectively). The addition of 64k block capability in the rec_len_from_disk and rec_len_to_disk functions added a bit of math overhead which slows down file create workloads needlessly when the architecture cannot even support 64k blocks. This will cut the corner. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	4138ec2382	nilfs2: append blocksize info to warnings during loading super blocks At present, the same warning message can be output twice when nilfs detected a problem on super blocks: NILFS warning: broken superblock. using spare superblock. NILFS warning: broken superblock. using spare superblock. ... This is because these super blocks are reloaded with the block size written in a super block if it differs from the first block size, but this repetition looks somewhat confusing. So, we hint at what is going on by appending block size information to those messages. Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	828b1c50ae	nilfs2: add compat ioctl The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if application is 32 bit and kernel is 64 bit. This issue is avoidable by adding compat_ioctl method. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	cde98f0f84	nilfs2: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION Add support for the standard attributes set via chattr and read via lsattr. These attributes are already in the flags value in the nilfs2 inode, but currently we don't have any ioctl commands that expose them to the userland. Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting i_generation, which allows users to list the file's generation number with "lsattr -v". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	b253a3e4f2	nilfs2: tighten restrictions on inode flags Nilfs has few rectrictions on which flags may be set on which inodes like ext2/3/4 filesystems used to be. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. This introduces a flags masking function like those of extN and uses it during inode creation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	32f4aeb315	nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is set At present, nilfs marks S_NOATIME flag on all inodes. This restricts nilfs_set_inode_flags function so that it marks S_NOATIME only if a given inode has an FS_NOATIME_FL flag. Although nilfs does not support atime yet, touch_atime() still safely returns on IS_NOATIME check since MS_NOATIME is always set on sb. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	f0c9f242f9	nilfs2: use common file attribute macros Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL, NILFS_COMPR_FL, and so forth) with common inode flags, and removes the own flag declarations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	9954e7af14	nilfs2: add free entries count only if clear bit operation succeeded Three functions of the current persistent object allocator, nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and nilfs_palloc_freev functions unconditionally add a counter after doing clear bit operation on a bitmap block. If the clear bit operation overlapped, the counter will not add up. This fixes the issue by making the counter operations conditional. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:04 +09:00
Ryusuke Konishi	25b18d39cc	nilfs2: decrement inodes count only if raw inode was successfully deleted This fixes the issue that inodes count will not add up after removal of raw inodes fails. Hence, this prevents possible under flow of the inodes count. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:04 +09:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
James Morris	1cc26bada9	Merge branch 'master'; commit 'v2.6.38-rc7' into next	2011-03-08 10:55:06 +11:00
Mi Jinlong	5ece3cafbd	nfsd41: modify the members value of nfsd4_op_flags The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH \| ALLOWED_ON_ABSENT_FS) equals to ALLOWED_AS_FIRST_OP, maybe that's not what we want. OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH \| ALLOWED_ON_ABSENT_FS, can't appears as the first operation with out SEQUENCE ops. This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which was introduced by `f9bb94c4`. Cc: stable@kernel.org Reviewed-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:10:33 -05:00
Kevin Coffman	b0b0c0a26e	nfsd: add proc file listing kernel's gss_krb5 enctypes Add a new proc file which lists the encryption types supported by the kernel's gss_krb5 code. Newer MIT Kerberos libraries support the assertion of acceptor subkeys. This enctype information allows user-land (svcgssd) to request that the Kerberos libraries limit the encryption types that it uses when generating the subkeys. Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:06:48 -05:00
Jesper Juhl	46d4cef9cf	NFSD, VFS: Remove dead code in nfsd_rename() Currently we have the following code in fs/nfsd/vfs.c::nfsd_rename() : ... host_err = nfsd_break_lease(odentry->d_inode); if (host_err) goto out_drop_write; if (ndentry->d_inode) { host_err = nfsd_break_lease(ndentry->d_inode); if (host_err) goto out_drop_write; } if (host_err) goto out_drop_write; ... 'host_err' is guaranteed to be 0 by the time we test 'ndentry->d_inode'. If 'host_err' becomes != 0 inside the 'if' statement, then we goto 'out_drop_write'. So, after the 'if' statement there is no way that 'host_err' can be anything but 0, so the test afterwards is just dead code. This patch removes the dead code. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:05:14 -05:00
Shan Wei	35079582e7	nfsd: kill unused macro definition These macros had never been used for several years. So, remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:05:09 -05:00
Namhyung Kim	f32cb53219	locks: use assign_type() Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:05:09 -05:00
J. Bruce Fields	32b007b4e1	nfsd4: fix bad pointer on failure to find delegation In case of a nonempty list, the return on error here is obviously bogus; it ends up being a pointer to the list head instead of to any valid delegation on the list. In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky, then renew_client may oops, and it may take an embarassingly long time to figure out why. Facepalm. BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200 ... Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 11:44:53 -05:00
Eric Sandeen	d7433142b6	ext3: Always set dx_node's fake_dirent explicitly. (crossport of `1f7bebb9e9` by Andreas Schlick <schlick@lavabit.com>) When ext3_dx_add_entry() has to split an index node, it has to ensure that name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck won't recognise it as an intermediate htree node and consider the htree to be corrupted. CC: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-03-07 17:20:58 +01:00
Chris Mason	31339acd07	Btrfs: deal with short returns from copy_from_user When copy_from_user is only able to copy some of the bytes we requested, we may end up creating a partially up to date page. To avoid garbage in the page, we need to treat a partial copy as a zero length copy. This makes the rest of the file_write code drop the page and retry the whole copy instead of marking the partially up to date page as dirty. Signed-off-by: Chris Mason <chris.mason@oracle.com> cc: stable@kernel.org	2011-03-07 11:10:24 -05:00
Chris Mason	b1bf862e9d	Btrfs: fix regressions in copy_from_user handling Commit `914ee295af` fixed deadlocks in btrfs_file_write where we would catch page faults on pages we had locked. But, there were a few problems: 1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy data when the amount to copy is more than 4K and the offset to start copying from is not page aligned. The result was btrfs_file_write looping forever retrying the iov_iter_copy_from_user_atomic We deal with this by changing btrfs_file_write to drop down to single page copies when iov_iter_copy_from_user_atomic starts returning failure. 2) The btrfs_file_write code was leaking delalloc reservations when iov_iter_copy_from_user_atomic returned zero. The looping above would result in the entire filesystem running out of delalloc reservations and constantly trying to flush things to disk. 3) btrfs_file_write will lock down page cache pages, make sure any writeback is finished, do the copy_from_user and then release them. Before the loop runs we check the first and last pages in the write to see if they are only being partially modified. If the start or end of the write isn't aligned, we make sure the corresponding pages are up to date so that we don't introduce garbage into the file. With the copy_from_user changes, we're allowing the VM to reclaim the pages after a partial update from copy_from_user, but we're not making sure the page cache page is up to date when we loop around to resume the write. We deal with this by pushing the up to date checks down into the page prep code. This fits better with how the rest of file_write works. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> cc: stable@kernel.org	2011-03-07 10:42:27 -05:00
Dave Chinner	9130090b5f	xfs: kill support/debug.[ch] The remaining functionality in debug.[ch] is effectively just assert handling, conditional debug definitions and hex dumping. The hex dumping and assert function can be moved into the new printk module, while the rest can be moved into top-level header files. This allows fs/xfs/support/debug.[ch] to be completely removed from the codebase. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:09:35 +11:00
Dave Chinner	0b932cccbd	xfs: Convert remaining cmn_err() callers to new API Once converted, kill the remainder of the cmn_err() interface. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:08:35 +11:00
Dave Chinner	8221112b43	xfs: convert the quota debug prints to new API Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:07:35 +11:00
Dave Chinner	6d4a8ecb34	xfs: rename xfs_cmn_err_fsblock_zero() The "cmn_err" part of the function name is no longer relevant. Rename the function to xfs_alert_fsblock_zero() to match the new logging API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:06:35 +11:00
Dave Chinner	5348778699	xfs: convert xfs_fs_cmn_err to new error logging API Continue to clean up the error logging code by converting all the callers of xfs_fs_cmn_err() to the new API. Once done, remove the unused old API function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:05:35 +11:00
Dave Chinner	af34e09da4	xfs: kill xfs_fs_mount_cmn_err() macro The xfs_fs_mount_cmn_err() hides a simple check as to whether the mount path should output an error or not. Remove the macro and open code the check. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:04:35 +11:00
Dave Chinner	65333b4c3d	xfs: kill xfs_fs_repair_cmn_err() macro In certain cases of inode corruption, the xfs_fs_repair_cmn_err() macro is used to output an extra message in the corruption report. That extra message is "unmount and run xfs_repair", which really applies to any corruption report. Each case that this macro is called (except one) a following call to xfs_corruption_error() is made to optionally dump more information about the error. Hence, move the output of "run xfs_repair" to xfs_corruption_error() so that it is output on all corruption reports. Also, convert the callers of the repair macro that don't call xfs_corruption_error() to call it, hence provide consiѕtent error reporting for all cases where xfs_fs_repair_cmn_err() used to be called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:03:35 +11:00
Dave Chinner	6a19d9393a	xfs: convert xfs_cmn_err to xfs_alert_tag Continue the conversion of the old cmn_err interface be converting all the conditional panic tag errors to xfs_alert_tag() and then removing xfs_cmn_err(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:02:35 +11:00
Dave Chinner	a0fa2b679e	xfs: Convert xlog_warn to new logging interface Convert the xfs log operations to use the new error logging interfaces. This removes the xlog_{warn,panic} wrappers and makes almost all errors emit the device they belong to instead of just refering to "XFS". Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:01:35 +11:00
Dave Chinner	4f10700a2e	xfs: Convert linux-2.6/ files to new logging interface Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level> logging format that replaces the old Irix inherited cmn_err() interfaces. While there, also convert naked printk calls to use the relevant xfs logging function to standardise output format. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:00:35 +11:00
Al Viro	31be83aeae	omfs: make readdir stop when filldir says so filldir returning an error does not mean "skip this entry, try the next one"... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:24:12 -05:00
Al Viro	d932805b3d	omfs: merge unlink() and rmdir(), close leak in rename() In case of directory-overwriting rename(), omfs forgot to mark the victim doomed, so omfs_evict_inode() didn't free it. We could fix that by calling omfs_rmdir() for directory victims instead of doing omfs_unlink(), but it's easier to merge omfs_unlink() and omfs_rmdir() instead. Note that we have no hardlinks here. It also makes the checks in omfs_rename() go away - they fold into what omfs_remove() does when it runs into a directory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:24:01 -05:00
Al Viro	cdb26496db	omfs: stop playing silly buggers with omfs_unlink() in ->rename() Since omfs directories are hashes of inodes and name is part of inode, we have to remove inode from old directory before we can put it into new one / under new name. So instead of bump i_nlink call omfs_unlink, which does omfs_delete_entry() decrement i_nlink and mark parent dirty in case of success decrement i_nlink if omfs_unlink failed and hadn't done it itself let's just call omfs_delete_entry() and dirty the parent ourselves... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:23:39 -05:00
Al Viro	013e4f4a28	omfs: rename() needs to mark old_inode dirty after ctime update we do mark it dirty before, but it doesn't guarantee that we don't get preempted just before assignment to ->i_ctime, with inode getting written out before we get CPU back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:20:30 -05:00
Linus Torvalds	fb62c00a6d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: no .snap inside of snapped namespace libceph: fix msgr standby handling libceph: fix msgr keepalive flag libceph: fix msgr backoff libceph: retry after authorization failure libceph: fix handling of short returns from get_user_pages ceph: do not clear I_COMPLETE from d_release ceph: do not set I_COMPLETE Revert "ceph: keep reference to parent inode on ceph_dentry"	2011-03-05 10:43:22 -08:00
Mingming Cao	198868f35d	ext4: Use single thread to perform DIO unwritten convertion While running ext4 testing on multiple core, we found there are per cpu ext4-dio-unwritten threads processing conversion from unwritten extents to written for IOs completed from async direct IO patch. Per filesystem is enough, we don't need per cpu threads to work on conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2011-03-05 11:52:45 -05:00

... 4 5 6 7 8 ...

22146 Коммитов