Граф коммитов

7664 Коммитов

Автор SHA1 Сообщение Дата
Linus Torvalds 29bd17af7d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (31 commits)
  ocfs2: clean up bh null checks
  ocfs2: document access rules for blocked_lock_list
  configfs: file.c fix possible recursive locking
  configfs: dir.c fix possible recursive locking
  configfs: Remove EXPERIMENTAL
  ocfs2: bump version number
  ocfs2/dlm: Clear joining_node on hearbeat node down
  ocfs2: convert byte order of constant instead of variable
  ocfs2: Update default cluster timeouts
  ocfs2: printf fixes
  ocfs2: Use generic_file_llseek
  ocfs2: Safer read_inline_data()
  ocfs2: Silence false lockdep warnings
  [PATCH 2/2] ocfs2: cluster aware flock()
  [PATCH 1/2] ocfs2: add flock lock type
  ocfs2: Local alloc window size changeable via mount option
  ocfs2: Support commit= mount option
  ocfs2: Add missing permission checks
  [PATCH 2/2] ocfs2: Implement group add for online resize
  [PATCH 1/2] ocfs2: Add group extend for online resize
  ...
2008-01-25 17:11:13 -08:00
Mark Fasheh 2fe5c1d7eb ocfs2: clean up bh null checks
If we know a buffer_head is non-null, then brelse() is unnecessary and
put_bh() can be used instead. Also, an explicit check for NULL is
unnecessary when using brelse(). This patch only covers buffer_head_io.c and
resize.c, which have recently added code which exhibits this problem.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:48 -08:00
Mark Fasheh 7ec373cf33 ocfs2: document access rules for blocked_lock_list
ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some
usage restrictions which aren't immediately obvious to anyone reading the
code. It's a good idea to document this so that we avoid making costly
mistakes in the future.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:48 -08:00
Joonwoo Park 116ba5d5ea configfs: file.c fix possible recursive locking
configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #145
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40c9a9e>] configfs_add_file+0x2e/0x70

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #145
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fda88>] mutex_lock_nested+0x98/0x2e0
 [<c40c9a9e>] configfs_add_file+0x2e/0x70
 [<c40c9b0c>] configfs_create_file+0x2c/0x40
 [<c40ca639>] configfs_attach_item+0x139/0x220
 [<c40ca734>] configfs_attach_group+0x14/0x140
 [<c40ca7e9>] configfs_attach_group+0xc9/0x140
 [<c40ca9f6>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:47 -08:00
Joonwoo Park ba611edfe4 configfs: dir.c fix possible recursive locking
configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #141
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca76f>] configfs_attach_group+0x4f/0x190

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #141
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fdad8>] mutex_lock_nested+0x98/0x2e0
 [<c40ca76f>] configfs_attach_group+0x4f/0x190
 [<c40caa46>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:47 -08:00
Joel Becker 02ac0499c0 configfs: Remove EXPERIMENTAL
configfs has been alive and kicking for a while now.  It underpins some
non-EXPERIMENTAL subsystems, such as OCFS2's cluster stack.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:47 -08:00
Mark Fasheh 0e5ae03203 ocfs2: bump version number
Bump the printed version to 1.5.0. This helps us quickly identify which
version of Ocfs2 a bug filer is running.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Tao Ma 2d4b1cbb44 ocfs2/dlm: Clear joining_node on hearbeat node down
Currently the process of dlm join contains 2 steps: query join and assert join.
After query join, the joined node will set its joining_node. So if the joining
node happens to panic before the 2nd step, the joined node will fail to clear
its joining_node flag because that node isn't in the domain map. It at least
cause 2 problems.
1. All the new join request will fail. So no new node can mount the volume.
2. The joined node can't umount the volume since during the umount process it
   has to wait for the joining_node to be unknown. So the umount will be hanged.

The solution is to clear the joining_node before we check the domain map.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Marcin Slusarz 4092d49f70 ocfs2: convert byte order of constant instead of variable
Convert byte order of constant instead of variable it will be done at
compile time vs run time. Remove unused le32_and_cpu.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Sunil Mushran 17104683d2 ocfs2: Update default cluster timeouts
Lots of people are having trouble with the default timeouts, which are too
low. These new values are derived from an informal survey taken on
ocfs2-users, as well as data from bug reports. This should reduce the amount
of cluster disconnects and subsequent fencing seen during normal workloads.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara 634bf74d1e ocfs2: printf fixes
Explicitely convert loff_t to long long in printf. Just for sure...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara 32c3c0e2e5 ocfs2: Use generic_file_llseek
We should use generic_file_llseek() and not default_llseek() so that
s_maxbytes gets properly checked when seeking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara d2849fb294 ocfs2: Safer read_inline_data()
In ocfs2_read_inline_data() we should store file size in loff_t. Although
the file size should fit in 32 bits we cannot be sure in case filesystem is
corrupted.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:44 -08:00
Jan Kara 5fa0613ea5 ocfs2: Silence false lockdep warnings
Create separate lockdep lock classes for system file's i_mutexes. They are
used to guard allocations and similar things and thus rank differently
than i_mutex of a regular file or directory.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:44 -08:00
Mark Fasheh 53fc622b9e [PATCH 2/2] ocfs2: cluster aware flock()
Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
mount option, "localflocks" is added so that users can revert to old
functionality as need be.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Mark Fasheh cf8e06f1a8 [PATCH 1/2] ocfs2: add flock lock type
This adds a new dlmglue lock type which is intended to back flock()
requests.

Since these locks are driven from userspace, usage rules are much more
liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
make use of most dlmglue features - lock caching and lock level
optimizations in particular. Additionally, userspace is free to deadlock
itself, so we have to deal with that in the same way as the rest of the
kernel - by allowing a signal to abort a lock request.

In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
does it's own dlm coordination. We still use the same helper functions
though, so duplicated code is kept to a minimum.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Sunil Mushran 2fbe8d1ebe ocfs2: Local alloc window size changeable via mount option
Local alloc is a performance optimization in ocfs2 in which a node
takes a window of bits from the global bitmap and then uses that for
all small local allocations. This window size is fixed to 8MB currently.
This patch allows users to specify the window size in MB including
disabling it by passing in 0. If the number specified is too large,
the fs will use the default value of 8MB.

mount -o localalloc=X /dev/sdX /mntpoint

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Mark Fasheh d147b3d630 ocfs2: Support commit= mount option
Mostly taken from ext3. This allows the user to set the jbd commit interval,
in seconds. The default of 5 seconds stays the same, but now users can
easily increase the commit interval. Typically, this would be increased in
order to benefit performance at the expense of data-safety.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:42 -08:00
Mark Fasheh 0957f00796 ocfs2: Add missing permission checks
Check that an online resize is being driven by a user with permission to
change system resource limits.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:19 -08:00
Tao Ma 7909f2bf83 [PATCH 2/2] ocfs2: Implement group add for online resize
This patch adds the ability for a userspace program to request that a
properly formatted cluster group be added to the main allocation bitmap for
an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD.
On a high level, this is similar to ext3, but we use a different ioctl as
the structure which has to be passed through is different.

During an online resize, tunefs.ocfs2 will format any new cluster groups
which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on
each one. Kernel verifies that the core cluster group information is valid
and then does the work of linking it into the global allocation bitmap.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:04:24 -08:00
Tao Ma d659072f73 [PATCH 1/2] ocfs2: Add group extend for online resize
This patch adds the ability for a userspace program to request an extend of
last cluster group on an Ocfs2 file system. The request is made via ioctl,
OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is
obviously Ocfs2 specific.

tunefs.ocfs2 would call this for an online-resize operation if the last
cluster group isn't full.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:53:35 -08:00
Tao Ma e9d578a8f2 ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum.
This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there
is only 1 group, it will be equal to the total clusters in the volume. So
as for online resize, it should change for all the nodes in the cluster.
It isn't easy and there is no corresponding lock for it.

bitmap_cpg is only used in 2 areas:
1. Check whether the suballoc is too large for us to allocate from the global
   bitmap, so it is little used. And now the suballoc size is 2048, it rarely
   meet this situation and the check is almost useless.
2. Calculate which group a cluster belongs to. We use it during truncate to
   figure out which cluster group an extent belongs too. But we should be OK
   if we increase it though as the cluster group calculated shouldn't change
   and we only ever have a small bitmap_cpg on file systems with a single
   cluster group.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:54 -08:00
Mark Fasheh 1252c434e3 ocfs2: Documentation update
Remove 'readpages' from the list in ocfs2.txt. Instead of having two
identical lists, I just removed the list in the OCFS2 section of fs/Kconfig
and added a pointer to Documentation/filesystems/ocfs2.txt.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:42 -08:00
Mark Fasheh 628a24f5bd ocfs2: Readpages support
Add ->readpages support to Ocfs2. This is rather trivial - all it required
is a small update to ocfs2_get_block (for mapping full extents via b_size)
and an ocfs2_readpages() function which partially mirrors ocfs2_readpage().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:12 -08:00
Mark Fasheh e63aecb651 ocfs2: Rename ocfs2_meta_[un]lock
Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:46:01 -08:00
Mark Fasheh c934a92d05 ocfs2: Remove data locks
The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.

Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.

We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:57 -08:00
Mark Fasheh f1f540688e ocfs2: Add data downconvert worker to inode lock
In order to extend inode lock coverage to inode data, we use the same data
downconvert worker with only a small modification to only do work for
regular files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:54 -08:00
Mark Fasheh 34d024f843 ocfs2: Remove mount/unmount votes
The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.

The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:34 -08:00
Mark Fasheh 6f7b056ea9 ocfs2: Remove fs dependency on ocfs2_heartbeat module
Now that the dlm exposes domain information to us, we don't need generic
node up / node down callbacks. And since the DLM is only telling us when a
node goes down unexpectedly, we no longer need to optimize away node down
callbacks via the umount map.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:36:40 -08:00
Mark Fasheh 6561168cb4 ocfs2_dlm: Call node eviction callbacks from heartbeat handler
With this, a dlm client can take advantage of the group protocol in the dlm
to get full notification whenever a node within the dlm domain leaves
unexpectedly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:36:40 -08:00
Linus Torvalds 0008bf5440 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (96 commits)
  sched: keep total / count stats in addition to the max for
  sched, futex: detach sched.h and futex.h
  sched: fix: don't take a mutex from interrupt context
  sched: print backtrace of running tasks too
  printk: use ktime_get()
  softlockup: fix signedness
  sched: latencytop support
  sched: fix goto retry in pick_next_task_rt()
  timers: don't #error on higher HZ values
  sched: monitor clock underflows in /proc/sched_debug
  sched: fix rq->clock warps on frequency changes
  sched: fix, always create kernel threads with normal priority
  debug: clean up kernel/profile.c
  sched: remove the !PREEMPT_BKL code
  sched: make PREEMPT_BKL the default
  debug: track and print last unloaded module in the oops trace
  debug: show being-loaded/being-unloaded indicator for modules
  sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY
  sched: rt-group: reduce rescheduling
  hrtimer: unlock hrtimer_wakeup
  ...
2008-01-25 13:42:32 -08:00
Linus Torvalds 2d94dfc8c3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  mount options: fix jfs
  JFS: simplify types to get rid of sparse warning
  JFS: FIx one more plain integer as NULL pointer warning
  JFS: Remove defconfig ptr comparison to 0
  JFS: use DIV_ROUND_UP where appropriate
  Remove unnecessary kmalloc casts in the jfs filesystem
  JFS is missing a memory barrier
  JFS: Make sure special inode data is written after journal is flushed
  JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
2008-01-25 12:20:32 -08:00
Arjan van de Ven 9745512ce7 sched: latencytop support
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Paul E. McKenney e260be673a Preempt-RCU: implementation
This patch implements a new version of RCU which allows its read-side
critical sections to be preempted. It uses a set of counter pairs
to keep track of the read-side critical sections and flips them
when all tasks exit read-side critical section. The details
of this implementation can be found in this paper -

	http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

and the article-

	http://lwn.net/Articles/253651/

This patch was developed as a part of the -rt kernel development and
meant to provide better latencies when read-side critical sections of
RCU don't disable preemption.  As a consequence of keeping track of RCU
readers, the readers have a slight overhead (optimizations in the paper).
This implementation co-exists with the "classic" RCU implementations
and can be switched to at compiler.

Also includes RCU tracing summarized in debugfs.

[ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:24 +01:00
Linus Torvalds b47711bfbc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  selinux: make mls_compute_sid always polyinstantiate
  security/selinux: constify function pointer tables and fields
  security: add a secctx_to_secid() hook
  security: call security_file_permission from rw_verify_area
  security: remove security_sb_post_mountroot hook
  Security: remove security.h include from mm.h
  Security: remove security_file_mmap hook sparse-warnings (NULL as 0).
  Security: add get, set, and cloning of superblock security information
  security/selinux: Add missing "space"
2008-01-25 08:44:29 -08:00
Linus Torvalds e07dd2ad30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (56 commits)
  [GFS2] Allow journal recovery on read-only mount
  [GFS2] Lockup on error
  [GFS2] Fix page_mkwrite truncation race path
  [GFS2] Fix typo
  [GFS2] Fix write alloc required shortcut calculation
  [GFS2] gfs2_alloc_required performance
  [GFS2] Remove unneeded i_spin
  [GFS2] Reduce inode size by moving i_alloc out of line
  [GFS2] Fix assert in log code
  [GFS2] Fix problems relating to execution of files on GFS2
  [GFS2] Initialize extent_list earlier
  [GFS2] Allow page migration for writeback and ordered pages
  [GFS2] Remove unused variable
  [GFS2] Fix log block mapper
  [GFS2] Minor correction
  [GFS2] Eliminate the no longer needed sd_statfs_mutex
  [GFS2] Incremental patch to fix compiler warning
  [GFS2] Function meta_read optimization
  [GFS2] Only fetch the dinode once in block_map
  [GFS2] Reorganize function gfs2_glmutex_lock
  ...
2008-01-25 08:39:18 -08:00
Steve French 366781c196 [CIFS] DFS build fixes
Also includes a few minor changes suggested by Christoph

Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-25 10:12:41 +00:00
Abhijith Das 7bc5c414fe [GFS2] Allow journal recovery on read-only mount
This patch allows gfs2 to perform journal recovery even if it is mounted
read-only. Strictly speaking, a read-only mount should not be writing to
the filesystem, but we do this only to perform journal recovery. A
read-only mount will fail if we don't recover the dirty journal. Also,
when gfs2 is used as a root filesystem, it will be mounted read-only
before being mounted read-write during the boot sequence. A failed
read-only mount will panic the machine during bootup.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:21:22 +00:00
Bob Peterson 1b8177ec1e [GFS2] Lockup on error
I spotted this bug while I was digging around.  Looks like it could cause
a lockup in some rare error condition.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:21:04 +00:00
Steven Whitehouse b7fe2e391e [GFS2] Fix page_mkwrite truncation race path
There was a bug in the truncation/invalidation race path for
->page_mkwrite for gfs2. It ought to return 0 so that the effect is the
same as if the page was truncated at any of the other points at which
the page_lock is dropped. This will result in the restart of the whole
page fault path. If it was due to a real truncation (as opposed to an
invalidate because we let a glock go) then the ->fault path will pick
that up when it gets called again.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:20:15 +00:00
Bob Peterson 3e5cd0877e [GFS2] Fix typo
This patch fixes a minor typo.  Surprisingly, it still compiled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:51 +00:00
Steven Whitehouse 1af535727b [GFS2] Fix write alloc required shortcut calculation
The comparison was being made against the wrong quantity.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:28 +00:00
Bob Peterson 0522053519 [GFS2] gfs2_alloc_required performance
This is a small I/O performance enhancement to gfs2.  (Actually, it is a rework of
an earlier version I got wrong).  The idea here is to check if the write extends
past the last block in the file.  If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required.  Benchmarks like
iozone should see better performance.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:03 +00:00
Bob Peterson 598278bd48 [GFS2] Remove unneeded i_spin
This patch removes a vestigial variable "i_spin" from the gfs2_inode
structure.  This not only saves us memory (>300000 of these in memory
for the oom test) it also saves us time because we don't have to
spend time initializing it (i.e. slightly better performance).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:44 +00:00
Steven Whitehouse 6dbd822487 [GFS2] Reduce inode size by moving i_alloc out of line
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.

In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:25 +00:00
Steven Whitehouse ac39aadd04 [GFS2] Fix assert in log code
Although the values were all being calculated correctly, there was a
race in the assert due to the way it was using atomic variables. This
changes the value we assert on so that we get the same effect by testing
a different variable. This prevents the assert triggering when it shouldn't.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:03 +00:00
Steven Whitehouse 9656b2c14c [GFS2] Fix problems relating to execution of files on GFS2
This patch fixes a couple of problems which affected the execution of files
on GFS2. The first is that there was a corner case where inodes were not
always uptodate at the point at which permissions checks were being carried
out, this was resulting in refusal of execute permission, but only on the
first lookup, subsequent requests worked correctly. The second was a problem
relating to incorrect updating of file sizes which was introduced with the
write_begin/end code for GFS2 a little while back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2008-01-25 08:17:31 +00:00
Bob Peterson 0811a127cb [GFS2] Initialize extent_list earlier
Here is a patch for the latest upstream GFS2 code:
The journal extent map needs to be initialized sooner than it
currently is.  Otherwise failed mount attempts (e.g. not enough
journals, etc.) may panic trying to access the uninitialized list.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:17:04 +00:00
Steven Whitehouse e5d9dc278c [GFS2] Allow page migration for writeback and ordered pages
To improve performance on NUMA, we use the VM's standard page
migration for writeback and ordered pages. Probably we could
also do the same for journaled data, but that would need a
careful audit of the code, so will be the subject of a later
patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:16:41 +00:00
Steven Whitehouse 65a6290998 [GFS2] Remove unused variable
The go_drop_th function is never called or referenced.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:16:19 +00:00
Steven Whitehouse ff91cc9bb4 [GFS2] Fix log block mapper
A missing offset in the calculation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:58 +00:00
Bob Peterson fa3742fa85 [GFS2] Minor correction
This is a small correction to my previously posted patch1.
It just changes a divide to a shift.  It's faster and doesn't
introduce odd dependencies on 32-bit compiles.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:37 +00:00
Bob Peterson c3f60b6e3a [GFS2] Eliminate the no longer needed sd_statfs_mutex
This patch eliminates the unneeded sd_statfs_mutex mutex but preserves
the ordering as discussed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:16 +00:00
Bob Peterson b3513fca7e [GFS2] Incremental patch to fix compiler warning
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:53 +00:00
Bob Peterson 15c7cee799 [GFS2] Function meta_read optimization
This patch optimizes function gfs2_meta_read.  Basically, gfs2_meta_wait
was being called regardless of whether a disk read was requested.
This just pulls that wait into the if that triggers the read.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:33 +00:00
Bob Peterson b0d5fd3074 [GFS2] Only fetch the dinode once in block_map
Function gfs2_block_map was often looking up the disk inode twice.
This optimizes it so that only does it once.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:13 +00:00
Bob Peterson 398bbe6832 [GFS2] Reorganize function gfs2_glmutex_lock
This patch optimizes the function gfs2_glmutex_lock.
The basic theory is: Why bother initializing a holder, setting up
wait bits and then waiting on them, if you know the glock can be
yours.  So the holder stuff is placed inside the if checking if the
glock is locked.  This one needs careful scrutiny because changing
anything to do with locking should strike terror into one's heart.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:52 +00:00
Bob Peterson 5fdc2eeb5d [GFS2] Run through full bitmaps quicker in gfs2_bitfit
I eliminated the passing of an unused parameter into gfs2_bitfit called rgd.

This also changes the gfs2_bitfit code that searches for free (or used) blocks.
Before, the code was trying to check for bytes that indicated 4 blocks in
the undesired state.  The problem is, it was spending more time trying to
do this than it actually was saving.  This version only optimizes the case
where we're looking for free blocks, and it checks a machine word at a time.
So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit
machines, it will check 64-bits (32 blocks) at a time.  The compiler
optimizes that quite well and we save some time, especially when running
through full bitmaps (like the bitmaps allocated for the journals).

There's probably a more elegant or optimized way to do this, but I haven't
thought of it yet.  I'm open to suggestions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:31 +00:00
Bob Peterson 0d0868bde3 [GFS2] Get rid of useless "found" variable in quota.c
This just eliminates an unused variable from the quota code.
Not likely to be a time saver.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:01 +00:00
Bob Peterson da6dd40d59 [GFS2] Journal extent mapping
This patch saves a little time when gfs2 writes to the journals by
keeping a mapping between logical and physical blocks on disk.
That's better than constantly looking up indirect pointers in
buffers, when the journals are several levels of indirection
(which they typically are).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:11:46 +00:00
Bob Peterson e9e1ef2b6e [GFS2] Remove function gfs2_get_block
This patch is just a cleanup.  Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters.  By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:25 +00:00
David Teigland 2066b58b0a [GFS2] use pid for plock owner for nfs clients
The fl_owner is that of lockd when posix locks arrive from nfs
clients, so it can't be used to distinguish between lock holders.
Use fl_pid as owner instead; it's the pid of the process on the
nfs client.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:23 +00:00
Steven Whitehouse dbee2199c3 [GFS2] Remove unused variable
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:20 +00:00
Abhijith Das 292c8c14ca [GFS2] patch to check for recursive lock requests in gfs2_rename code path
A certain scenario in the rename code path triggers a kernel BUG()
because it accidentally does recursive locking The first lock is
requested to unlink an already existing inode (replacing a file) and the
second lock is requested when the destination directory needs to alloc
some space. It is rare that these two
events happen during the same rename call, and even more rare that these
two instances try to lock the same rgrp. It is, however, possible.
https://bugzilla.redhat.com/show_bug.cgi?id=404711

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:18 +00:00
Wendy Cheng c97bfe4351 [GFS2] Remove lock methods for lock_nolock protocol
GFS2 supports two modes of locking - lock_nolock for single node filesystem
and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
file operation table for lock_nolock protocol. This would allow VFS to handle
posix lock and flock logics just like other in-tree filesystems without
duplication.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:15 +00:00
Fabio M. Di Nitto bcd405599f [GFS2] Remove unrequired code
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:13 +00:00
Fabio Massimo Di Nitto 6a69a23f7d [GFS2] Fix build warnings
Hi Steven,

Steven Whitehouse wrote:
> Hi,
>
> Now in the -nmw git tree. Thanks,
>
> Steve.
>
> On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote:

this patch introduces a bunch of build warnings by leaving around

struct inode *inode = &ip->i_inode;

The patch in attachment cleans them up. Please apply.

Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:11 +00:00
Ryan O'Hara 002ef1dc63 [GFS2] remove unnecessary permission checks
Remove read/write permission() checks from xattr operations.
VFS layer is already handling permission for xattrs via the
xattr_permission() call, so there is no need for gfs2 to
check permissions. Futhermore, using permission() for SELinux
xattrs ops is incorrect.

Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:08 +00:00
Fabio Massimo Di Nitto 1a2781cfa5 [GFS2] Fix runtime issue with UP kernels
The issue is indeed UP vs SMP and it is totally random.

spin_is_locked() is a bad assertion because there is no correct answer on UP.
on UP spin_is_locked() has to return either one value or another, always.

This means that in my setup I am lucky enough to trigger the issue and your you
are lucky enough not to.

the patch in attachment removes the bogus calls to BUG_ON and according to David
(in CC and thanks for the long explanation on the problem) we can rely upon
things like lockdep to find problem that might be trying to catch.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:06 +00:00
David Teigland 00c134756c [GFS2] tidy up error message
Print error with log_error() to be consistent with others.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:04 +00:00
Fabio Massimo Di Nitto 0b7580c786 [GFS2] Check for installation of mount helpers for DLM mounts
The patch is a fix to abort mount if the mount.gfs* and possible
umount.* are missing from /sbin.

While we do what we can to guarantee that they are installed properly in
userland (CVS HEAD), we want to make sure that mount still aborts properly.

The only sign of missing helpers is that lock_dlm will receive no mount options
at all. According to David the problem does not exist for lock_nolock as the
helpers are not required.

The patch has been tested for both gfs and gfs2 and it works as expected. The
lack of mount.gfs* will generate an error that is propagated to mount:

oot@node1:~# mount -t  gfs2 /dev/nbd2 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/nbd2,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

[ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
[ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly!
[ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata=

You might want to notice that it will also avoid mount to hang or fail silently
or with strange errors that will require the cluster to reboot/restart before
you can actually mount the filesystem again.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:01 +00:00
Steven Whitehouse e35b921185 [GFS2] Don't periodically update the jindex
We only care about the content of the jindex in two cases,
one is when we mount the fs and the other is when we need
to recover another journal. In both cases we have to update
the jindex anyway, so there is no point in updating it
periodically between times, so this removes it to simplify
gfs2_logd.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:59 +00:00
Steven Whitehouse ec69b18883 [GFS2] Move gfs2_logd into log.c
This means that we can mark gfs2_ail1_empty static and prepares
the way for further changes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:56 +00:00
Steven Whitehouse fd041f0b40 [GFS2] Use atomic_t for journal free blocks counter
This patch changes the counter which keeps track of the free
blocks in the journal to an atomic_t in preparation for the
following patch which will update the log reservation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:54 +00:00
Steven Whitehouse 2bcd610d2f [GFS2] Don't add glocks to the journal
The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:52 +00:00
David Teigland 8cbc434247 [GFS2] check kthread_should_stop when waiting
Use wait_event_interruptible() in the lock_dlm thread instead
of an open coded equivalent, and include a kthread_should_stop()
check in the wait test so we don't miss a kthread_stop().

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:49 +00:00
Bob Peterson c7227e4642 [GFS2] Given device ID rather than s_id in "id" sysfs file
This patch changes the /sys/fs/gfs2/<s_id>/id file to give the device
id "major:minor" rather than the s_id.  That enables gfs2_tool to
match devices properly (by id, not name) when locating the tuning files.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:47 +00:00
Steven Whitehouse e589665eb9 [GFS2] Remove flags no longer required
The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.

Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:44 +00:00
Steven Whitehouse 3042a2ccd6 [GFS2] Reorder writeback for glock sync
Previously we were doing (write data, wait for data, write metadata, wait
for metadata). After this patch we so (write metadata, write data, wait for
data, wait for metadata) which should be more efficient.

Also I noticed that the drop_bh and xmote_bh functions were almost
identical. In fact the only difference was a single test, and that
test is such that in the drop_bh case, it would always evaluate to
the correct result. As such we can use the xmote_bh functions in
all the places where we were using the drop_bh function and remove
the drop_bh functions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:42 +00:00
Steven Whitehouse 52d4c74b08 [GFS2] Add sync_page to metadata address space operations
This set of address space operations was missing a sync_page
operation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:40 +00:00
Steven Whitehouse c2932e03db [GFS2] Remove "reclaim limit"
This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:37 +00:00
Steven Whitehouse 60b0d08779 [GFS2] Remove unused variables
These haven't been used for some time, remove them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:35 +00:00
Steven Whitehouse 47e83b5091 [GFS2] Use correct include file in ops_address.c
Something changed in the upstream kernel, and it needs this
one-liner to allow ops_address.c to build.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:32 +00:00
Steven Whitehouse c41d4f09f1 [GFS2] Don't hold page lock when starting transaction
This is an addendum to the new AOPs work which moves the point
at which we take the page lock so that we don't get it until
the last possible moment. This resolves a conflict between
starting transactions and the page lock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:30 +00:00
Steven Whitehouse b8e7cbb65b [GFS2] Add writepages for GFS2 jdata
This patch resolves a lock ordering issue where we had been getting
a transaction lock in the wrong order with respect to the page lock.
By using writepages rather than just writepage, it is then possible
to start a transaction before locking the page, and thus matching the
locking order elsewhere in the code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:28 +00:00
Steven Whitehouse 9ff8ec32e5 [GFS2] Split gfs2_writepage into three cases
This patch splits gfs2_writepage into separate functions for each of
the three cases: writeback, ordered and journalled. As a result
it becomes a lot easier to see what each one is doing. The common
code is moved into gfs2_writepage_common.

This fixes a performance bug where we were doing more work than
strictly required in the ordered write case.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:25 +00:00
Steven Whitehouse 5561093e2c [GFS2] Introduce gfs2_set_aops()
Just like ext3 we now have three sets of address space operations
to cover the cases of writeback, ordered and journalled data
writes. This means that the individual operations can now become
less complicated as we are able to remove some of the tests for
file data mode from the code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:23 +00:00
Steven Whitehouse bf36a71316 [GFS2] Add gfs2_is_writeback()
This adds a function "gfs2_is_writeback()" along the lines of the
existing "gfs2_is_jdata()" in order to clean up the code and make
the various tests for the inode mode more obvious. It also fixes
the PageChecked() logic where we were resetting the flag too early
in the case of an error path.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:21 +00:00
Steven Whitehouse e7e36f1435 [GFS2] Remove unused field in struct gfs2_inode
Removes a field that is not used.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:18 +00:00
Steven Whitehouse f91a0d3e24 [GFS2] Remove useless i_cache from inodes
The i_cache was designed to keep references to the indirect blocks
used during block mapping so that they didn't have to be looked
up continually. The idea failed because there are too many places
where the i_cache needs to be freed, and this has in the past been
the cause of many bugs.

In addition there was no performance benefit being gained since the
disk blocks in question were cached anyway. So this patch removes
it in order to simplify the code to prepare for other changes which
would otherwise have had to add further support for this feature.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:16 +00:00
Steven Whitehouse 3cc3f710ce [GFS2] Use ->page_mkwrite() for mmap()
This cleans up the mmap() code path for GFS2 by implementing the
page_mkwrite function for GFS2. We are thus able to use the
generic filemap_fault function for our ->fault() implementation.

This now means that shared writable mappings will be much more
efficiently shared across the cluster if there is a reasonable
proportion of read activity (the greater proportion, the better).

As a side effect, it also reduces the size of the code, removes
special cases from readpage and readpages, and makes the code
path easier to follow.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:13 +00:00
Steven Whitehouse 51ff87bdd9 [GFS2] Clean up internal read function
As requested by Christoph, this patch cleans up GFS2's internal
read function so that it no longer uses the do_generic_mapping_read
function. This function is obsolete and GFS2 is the last user of it.

As a side effect the internal read code gets smaller and easier
to read and gfs2_readpage is split into two. One function has the locking
and the other function has the rest of the logic.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
2008-01-25 08:07:11 +00:00
Wendy Cheng cc7e79b168 [GFS2] Handle multiple glock demote requests
Fix a race condition where multiple glock demote requests are sent to
a node back-to-back. This patch does a check inside handle_callback()
to see whether a demote request is in progress. If true, it sets a flag
to make sure run_queue() will loop again to handle the new request,
instead of erronously setting gl_demote_state to a different state.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:09 +00:00
Greg Kroah-Hartman 197b12d679 Kobject: convert fs/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman f9cb074bff Kobject: rename kobject_init_ng() to kobject_init()
Now that the old kobject_init() function is gone, rename
kobject_init_ng() to kobject_init() to clean up the namespace.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:38 -08:00
Kay Sievers edfaa7c365 Driver core: convert block from raw kobjects to core devices
This moves the block devices to /sys/class/block. It will create a
flat list of all block devices, with the disks and partitions in one
directory. For compatibility /sys/block is created and contains symlinks
to the disks.

  /sys/class/block
  |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
  |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
  |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
  |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
  |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
  |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
  |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
  `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

  /sys/block/
  |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:36 -08:00
Greg Kroah-Hartman a5815ddf26 Kobject: convert fs/char_dev.c to use kobject_init/add_ng()
This converts the code to use the new kobject functions, cleaning up the
logic in doing so.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:31 -08:00
Greg Kroah-Hartman 901195ed7f Kobject: change GFS2 to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:26 -08:00
Greg Kroah-Hartman 6e90aa972d kobject: convert ecryptfs to use kobject_create
Using a kset for this trivial directory is an overkill.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman 0ff21e4663 kobject: convert kernel_kset to be a kobject
kernel_kset does not need to be a kset, but a much simpler kobject now
that we have kobj_attributes.

We also rename kernel_kset to kernel_kobj to catch all users of this
symbol with a build error instead of an easy-to-ignore build warning.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman 830d3cfb16 kset: convert block_subsys to use kset_create
Dynamically create the kset instead of declaring it statically.  We also
rename block_subsys to block_kset to catch all users of this symbol
with a build error instead of an easy-to-ignore build warning.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:23 -08:00
Greg Kroah-Hartman c60b717879 kset: convert ocfs2 to use kset_create
Dynamically create the kset instead of declaring it statically.

Also use the new kobj_attribute which cleans up this file a _lot_.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:23 -08:00
Kay Sievers 000f2a4d8c Driver Core: kill subsys_attribute and default sysfs ops
Remove the no longer needed subsys_attributes, they are all converted to
the more sensical kobj_attributes.

There is no longer a magic fallback in sysfs attribute operations, all
kobjects which create simple attributes need explicitely a ktype
assigned, which tells the core what was intended here.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:22 -08:00
Greg Kroah-Hartman af6370ea92 ecryptfs: remove version_str file from sysfs
This file violates the one-value-per-file sysfs rule.

If you all want it added back, please do something like a per-feature
file to show what is present and what isn't.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:18 -08:00
Kay Sievers 386f275f5d Driver Core: switch all dynamic ksets to kobj_sysfs_ops
Switch all dynamically created ksets, that export simple attributes,
to kobj_attribute from subsys_attribute. Struct subsys_attribute will
be removed.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:18 -08:00
Greg Kroah-Hartman bd35b93d80 kset: convert kernel_subsys to use kset_create
Dynamically create the kset instead of declaring it statically.  We also
rename kernel_subsys to kernel_kset to catch all users of this symbol
with a build error instead of an easy-to-ignore build warning.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman d405936b32 kset: convert dlm to use kset_create
Dynamically create the kset instead of declaring it statically.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman 136a27507f kset: convert gfs2 dlm to use kset_create
Dynamically create the kset instead of declaring it statically.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman 9bec101a0c kset: convert gfs2 to use kset_create
Dynamically create the kset instead of declaring it statically.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman 00d2666623 kobject: convert main fs kobject to use kobject_create
This also renames fs_subsys to fs_kobj to catch all current users with a
build error instead of a build warning which can easily be missed.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman 917e865df7 kset: convert ecryptfs to use kset_create
Dynamically create the kset instead of declaring it statically.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman 3794491d0c kobject: convert configfs to use kobject_create
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman 191e186bd0 kobject: convert debugfs to use kobject_create
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman 5c89e17e9c kobject: convert fuse to use kobject_create
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman 4ff6abff83 kobject: get rid of kobject_add_dir
kobject_create_and_add is the same as kobject_add_dir, so drop
kobject_add_dir.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman 3514faca19 kobject: remove struct kobj_type from struct kset
We don't need a "default" ktype for a kset.  We should set this
explicitly every time for each kset.  This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Jiri Slaby d7b3788965 sysfs: remove SPIN_LOCK_UNLOCKED
SPIN_LOCK_UNLOCKED is deprecated, use DEFINE_SPINLOCK instead

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:08 -08:00
Kay Sievers 2f90a85180 sysfs: create optimal relative symlink targets
Instead of walking from the source down to the root of sysfs, and back
to the target, we stop at the first directory the source and the target
share.

This link:
  /devices/pci0000:00/0000:00:1d.7/usb1/1-0:1.0/ep_81

pointed to:
  ../../../../../devices/pci0000:00/0000:00:1d.0/usb2/2-0:1.0/usb_endpoint/usbdev2.1_ep81

now it just points to:
  usb_endpoint/usbdev1.1_ep81

Thanks to Denis Cheng for bringing this up, and sending the initial patch.

CC: Denis Cheng <crquan@gmail.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:08 -08:00
Greg Kroah-Hartman 30a468b1c1 ecryptfs: clean up attribute mess
It isn't that hard to add simple kset attributes, so don't go through
all the gyrations of creating your own object type and show and store
functions.  Just use the functions that are already present.  This makes
things much simpler.

Note, the version_str string violates the "one value per file" rule for
sysfs.  I suggest changing this now (individual files per type supported
is one suggested way.)


Cc: Michael A. Halcrow <mahalcro@us.ibm.com>
Cc: Michael C. Thompson <mcthomps@us.ibm.com>
Cc: Tyler Hicks <tyhicks@ou.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:08 -08:00
Kay Sievers 62ca879256 coda: convert struct class_device to struct device
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Tony Jones <tonyj@suse.de>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:05 -08:00
Jean Delvare 9fd5b1c906 sysfs: Fix a copy-n-paste typo in comment
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:04 -08:00
Igor Mammedov 6d5ae0deb1 [CIFS] DFS support: provide shrinkable mounts
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-25 03:28:31 +00:00
James Morris c43e259cc7 security: call security_file_permission from rw_verify_area
All instances of rw_verify_area() are followed by a call to
security_file_permission(), so just call the latter from the former.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:52 +11:00
Miklos Szeredi 5c5e32ceeb mount options: fix jfs
Add iocharset= and errors= options to /proc/mounts for jfs
filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-24 16:13:21 -06:00
Linus Torvalds 4784b11c4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC]: Constify function pointer tables.
  [SPARC64]: Fix section error in sparcspkr
  [SPARC64]: Fix of section mismatch warnings.
2008-01-23 18:46:25 -08:00
James Bottomley d4acd722b7 [SCSI] sysfs: add filter function to groups
This patch allows the various users of attribute_groups to selectively
allow the appearance of group attributes.  The primary consumer of
this will be the transport classes in which we currently have
elaborate attribute selection algorithms to do this same thing.

Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-23 11:29:18 -06:00
James Bottomley 11f24fbdf5 [SCSI] sysfs: fix the sysfs_add_file_to_group interfaces
I can't see a reason why these shouldn't work on every group.  However,
they only seem to work on named groups.  This patch allows the group
functions to work on anonymous groups (those with NULL names).

Acked-by: Tejun Heo <htejun@gmail.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-23 11:29:17 -06:00
Jan Engelhardt 872e2be7c4 [SPARC]: Constify function pointer tables.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-22 18:29:20 -08:00
Johann Felix Soden 889c94a14e Fix file references in documentation and Kconfig
Fix typo in arch/powerpc/boot/flatdevtree_env.h.
There is no Documentation/networking/ixgbe.txt.

README.cycladesZ is now in Documentation/.
wavelan.p.h is now in drivers/net/wireless/.
HFS.txt is now Documentation/filesystems/hfs.txt.
OSS-files are now in sound/oss/.

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-22 10:43:36 -08:00
Steve French ed2b91701d [CIFS] Do not log path names in lookup errors
Andi Kleen noticed that we were logging access denied errors (which is
noisy in the dmesg log, and not needed to be logged) and that we were
logging path names on that an other errors (e.g. EIO) which we should
not be doing.

CC: Andi Kleen <ak@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-20 00:30:29 +00:00
Jonas Bonn f63dcda197 jbd: do not try lock_acquire after handle made invalid
This likely fixes the oops in __lock_acquire reported as:

http://www.kerneloops.org/raw.php?rawid=2753&msgid=
http://www.kerneloops.org/raw.php?rawid=2749&msgid=

In these reported oopses, start_this_handle is returning -EROFS.

Signed-off-by: Jonas Bonn <jonas.bonn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-17 15:38:59 -08:00
Eric Sandeen 46a39c1cd5 hfs: fix coverity-found null deref
Fix potential null deref introduced by commit
cf05946250
http://bugzilla.kernel.org/show_bug.cgi?id=9748

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-17 15:38:58 -08:00
Tejun Heo 456ef1553c sysfs: fix bugs in sysfs_rename/move_dir()
sysfs_rename/move_dir() have the following bugs.

 - On dentry lookup failure, kfree() is called on ERR_PTR() value.
 - sysfs_move_dir() has an extra dput() on success path.

Fix them.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-16 09:54:03 -08:00
Tejun Heo e49452c677 sysfs: make sysfs_lookup() return ERR_PTR(-ENOENT) on failed lookup
sysfs tries to keep dcache a strict subset of sysfs_dirent tree by
shooting down dentries when a node is removed, that is, no negative
dentry for sysfs.  However, the lookup function returned NULL and thus
created negative dentries when the target node didn't exist.

Make sysfs_lookup() return ERR_PTR(-ENOENT) on lookup failure.  This
fixes the NULL dereference bug in sysfs_get_dentry() discovered by
bluetooth rfcomm device moving around.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-16 09:54:03 -08:00
Linus Torvalds c23f72cae9 Revert "writeback: introduce writeback_control.more_io to indicate more io"
This reverts commit 2e6883bdf4, as
requested by Fengguang Wu.  It's not quite fully baked yet, and while
there are patches around to fix the problems it caused, they should get
more testing.  Says Fengguang: "I'll resend them both for -mm later on,
in a more complete patchset".

See

	http://bugzilla.kernel.org/show_bug.cgi?id=9738

for some of this discussion.

Requested-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-14 21:21:29 -08:00
Oleg Nesterov a98fdcef94 fix the "remove task_ppid_nr_ns" commit
Commit 84427eaef1 (remove task_ppid_nr_ns)
moved the task_tgid_nr_ns(task->real_parent) outside of lock_task_sighand().
This is wrong, ->real_parent could be freed/reused.

Both ->parent/real_parent point to nothing after __exit_signal() because
we remove the child from ->children list, and thus the child can't be
reparented when its parent exits.

rcu_read_lock() protects ->parent/real_parent, but _only_ if we know it was
valid before we take rcu lock.

Revert this part of the patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-14 13:23:00 -08:00
NeilBrown ba67a39efd knfsd: Allow NFSv2/3 WRITE calls to succeed when krb5i etc is used.
When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple
of 8 bytes.  This can make the request look slightly longer than it
really is.

As of

	f34b95689d "The NFSv2/NFSv3 server does not handle zero
		length WRITE request correctly",

the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't
the right length, so krb5i (for example) WRITE requests can get lost.

This patch relaxes the appropriate test and enhances the related comment.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-13 09:57:57 -08:00
Roland McGrath 84427eaef1 remove task_ppid_nr_ns
task_ppid_nr_ns is called in three places.  One of these should never
have called it.  In the other two, using it broke the existing
semantics.  This was presumably accidental.  If the function had not
been there, it would have been much more obvious to the eye that those
patches were changing the behavior.  We don't need this function.

In task_state, the pid of the ptracer is not the ppid of the ptracer.

In do_task_stat, ppid is the tgid of the real_parent, not its pid.
I also moved the call outside of lock_task_sighand, since it doesn't
need it.

In sys_getppid, ppid is the tgid of the real_parent, not its pid.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-13 09:56:43 -08:00
Linus Torvalds 974a9f0b47 Use access mode instead of open flags to determine needed permissions
Way back when (in commit 834f2a4a15, aka
"VFS: Allow the filesystem to return a full file pointer on open intent"
to be exact), Trond changed the open logic to keep track of the original
flags to a file open, in order to pass down the the intent of a dentry
lookup to the low-level filesystem.

However, when doing that reorganization, it changed the meaning of
namei_flags, and thus inadvertently changed the test of access mode for
directories (and RO filesystem) to use the wrong flag.  So fix those
test back to use access mode ("acc_mode") rather than the open flag
("flag").

Issue noticed by Bill Roman at Datalight.

Reported-and-tested-by: Bill Roman <bill.roman@datalight.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-12 14:47:58 -08:00
Linus Torvalds 460c54b779 Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] fix unaligned access in readdir
2008-01-11 11:49:31 -08:00
Christoph Hellwig aea6ad0ce5 [XFS] fix unaligned access in readdir
This patch should fix the issue seen on Alpha with unaligned accesses in
the new readdir code. By aligning each dirent to sizeof(u64) we'll avoid
unaligned accesses. To make doubly sure we're not hitting problems also
rearrange struct hack_dirent to avoid holes.

SGI-PV: 975411
SGI-Modid: xfs-linux-melb:xfs-kern:30302a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-01-11 18:05:04 +11:00
Igor Mammedov e6ab15827e [CIFS] DFS support patchset: Added mountdata
Also cifs_fs_type was made not static for ussage in dfs code.

Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-11 01:49:48 +00:00
Dave Kleikamp 967c9ec4ec JFS: simplify types to get rid of sparse warning
jfs_metapage.c was using uints and unsigned ints inconsistently when
regular ints suffice.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-10 16:04:25 -06:00
Trond Myklebust d0dc3701cb NFSv4: Give the lock stateid its own sequence queue
Sharing the open sequence queue causes a deadlock when we try to take
both a lock sequence id and and open sequence id.

This fixes the regression reported by Dimitri Puzin and Jeff Garzik: See

	http://bugzilla.kernel.org/show_bug.cgi?id=9712

for details.

Reported-and-tested-by: Dimitri Puzin <bugs@psycast.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-10 13:35:32 -08:00
Steve French 197c183f35 [CIFS] Forgot to add two new files from previous commit
Thanks to Igor for noticing this.
CC: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-10 17:10:23 +00:00
Steve French 6103335de8 [CIFS] DNS name resolution helper upcall for cifs
Adds additional option CIFS_DFS_UPCALL to fs/Kconfig for enabling
        DFS support.  Resolved IP address is saved as a string in the
	key payload.

	Igor has a series of related patches that will follow which finish up
	CIFS DFS support

Acked-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-09 16:21:36 +00:00
Eric Sandeen cf05946250 hfs: handle more on-disk corruptions without oopsing
hfs seems prone to bad things when it encounters on disk corruption.  Many
values are read from disk, and used as lengths to memcpy, as an example.
This patch fixes up several of these problematic cases.

o sanity check the on-disk maximum key lengths on mount
  (these are set to a defined value at mkfs time and shouldn't differ)
o check on-disk node keylens against the maximum key length for each tree
o fix hfs_btree_open so that going out via free_tree: doesn't wind
  up in hfs_releasepage, which wants to follow the very pointer
  we were trying to set up:
	HFS_SB(sb)->cat_tree = hfs_btree_open()
		...
		failure gets to hfs_releasepage and tries
		to follow HFS_SB(sb)->cat_tree

Tested with the fsfuzzer; it survives more than it used to.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08 16:10:36 -08:00
Michael Halcrow caeeeecfda eCryptfs: fix dentry handling on create error, unlink, and inode destroy
This patch corrects some erroneous dentry handling in eCryptfs.

If there is a problem creating the lower file, then there is nothing that
the persistent lower file can do to really help us.  This patch makes a
vfs_create() failure in the lower filesystem always lead to an
unconditional do_create failure in eCryptfs.

Under certain sequences of operations, the eCryptfs dentry can remain in
the dcache after an unlink.  This patch calls d_drop() on the eCryptfs
dentry to correct this.

eCryptfs has no business calling d_delete() directly on a lower
filesystem's dentry.  This patch removes the call to d_delete() on the
lower persistent file's dentry in ecryptfs_destroy_inode().

(Thanks to David Kleikamp, Eric Sandeen, and Jeff Moyer for helping
identify and resolve this issue)

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08 16:10:36 -08:00
OGAWA Hirofumi 9f966be899 fat: optimize fat_count_free_clusters()
On large partition, scanning the free clusters is very slow if users
doesn't use "usefree" option.

For optimizing it, this patch uses sb_breadahead() to read of FAT
sectors. On some user's 15GB partition, this patch improved it very
much (1min => 600ms).

The following is the result of 2GB partition on my machine.

without patch:
	root@devron (/)# time df -h > /dev/null

	real    0m1.202s
	user    0m0.000s
	sys     0m0.440s

with patch:
	root@devron (/)# time df -h > /dev/null

	real    0m0.378s
	user    0m0.012s
	sys     0m0.168s

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08 16:10:35 -08:00
Steve French f6d0998219 [CIFS] fix checkpatch warnings in fs/cifs/inode.c
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-08 23:18:22 +00:00
Roland McGrath 45626bb26a core dump: real_parent ppid
The pr_ppid field reported in core dumps should match what
getppid() would have returned to that process, regardless of
whether a debugger is attached.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-07 14:55:37 -08:00
Akos Maroy 3fee37c1e2 fix: using joysticks in 32 bit applications on 64 bit systems
unfortunately 32 bit apps don't see the joysticks on a 64 bit system.
this prevents one playing X-Plane (http://www.x-plane.com/) or other
32-bit games with joysticks.

this is a known issue, and already raised several times:

 http://readlist.com/lists/vger.kernel.org/linux-kernel/28/144411.html

 http://www.brettcsmith.org/wiki/wiki.cgi?action=browse&diff=1&id=OzyComputer/Joystick

unfortunately this is still not fixed in the mainline kernel.

it would be nice to have this fixed, so that people can play these games
without having to patch their kernel.

the following patch solves the problem on 2.6.22.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-06 10:26:06 -08:00
Dave Kleikamp da8a41d192 JFS: FIx one more plain integer as NULL pointer warning
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:12:16 -06:00
Joe Perches 09aaa749f6 JFS: Remove defconfig ptr comparison to 0
Remove sparse warning: Using plain integer as NULL pointer

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:12:10 -06:00
Shaun Zinck a7fe0ba7ee JFS: use DIV_ROUND_UP where appropriate
This replaces some macros and code, which do the same thing as DIV_ROUND_UP
defined in kernel.h, to use the DIV_ROUND_UP macro.

Signed-off-by: Shaun Zinck <shaun.zinck@gmail.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:11:59 -06:00
Jack Stone 1eb3a711d6 Remove unnecessary kmalloc casts in the jfs filesystem
Signed-off-by: Jack Stone <jack@hawkeye.stone.uk.eu.org>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:11:53 -06:00
Nick Piggin 54af6233d1 JFS is missing a memory barrier
JFS is missing a memory barrier needed to close the critical section before
clearing the lock bit. Use lock bitops for this.

unlock_page() has a second barrier after clearing the lock, which is
required because it checks whether the waitqueue is active without locks.
Such a barrier is not required here because the waitqueue spinlock is
always taken (something to think about if performance is an issue).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:11:44 -06:00
Dave Kleikamp 67e6682f18 JFS: Make sure special inode data is written after journal is flushed
This patch makes sure that data that we tried to flush before the journal
was completely written actually gets pushed to disk.

To avoid duplicating code, moved common code to write_special_inodes().

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2008-01-03 13:11:37 -06:00
Dave Kleikamp 29a424f283 JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
When JFS decides to drop a dirty metapage, it simply clears the META_dirty
bit and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.

When such no-write page goes to metapage_writepage(), the `relic'
PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
repeatedly trying to sync them.  This is done through
set_page_writeback(), so call it should be called in all cases.  If
no I/O is initiated, end_page_writeback() should be called immediately.

This is how __block_write_full_page() does things.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
2008-01-03 13:09:33 -06:00
Steve French 88e7d705c4 [CIFS] hold ses sem on tcp session reconnect during mount
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-01-03 17:37:09 +00:00
Trond Myklebust e6e21970ba NFSv4: Fix open_to_lock_owner sequenceid allocation...
NFSv4 file locking is currently completely broken since it doesn't respect
the OPEN sequencing when it is given an unconfirmed lock_owner and needs to
do an open_to_lock_owner. Worse: it breaks the sunrpc rules by doing a
GFP_KERNEL allocation inside an rpciod callback.

Fix is to preallocate the open seqid structure in nfs4_alloc_lockdata if we
see that the lock_owner is unconfirmed.
Then, in nfs4_lock_prepare() we wait for either the open_seqid, if
the lock_owner is still unconfirmed, or else fall back to waiting on the
standard lock_seqid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:17 -05:00
Trond Myklebust bb22629ee8 NFSv4: nfs4_open_confirm must not set the open_owner as confirmed on error
RFC3530 states that the open_owner is confirmed if and only if the client
sends an OPEN_CONFIRM request with the appropriate sequence id and stateid
within the lease period.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:17 -05:00
Trond Myklebust b274b48f3e NFSv4: Fix circular locking dependency in nfs4_kill_renewd
Erez Zadok reports:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.24-rc6-unionfs2 #80
-------------------------------------------------------
umount.nfs4/4017 is trying to acquire lock:
 (&(&clp->cl_renewd)->work){--..}, at: [<c0223e53>]
__cancel_work_timer+0x83/0x17f

but task is already holding lock:
 (&clp->cl_sem){----}, at: [<f8879897>] nfs4_kill_renewd+0x17/0x29 [nfs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&clp->cl_sem){----}:
       [<c0230699>] __lock_acquire+0x9cc/0xb95
       [<c0230c39>] lock_acquire+0x5f/0x78
       [<c0397cb8>] down_read+0x3a/0x4c
       [<f88798e6>] nfs4_renew_state+0x1c/0x1b8 [nfs]
       [<c0223821>] run_workqueue+0xd9/0x1ac
       [<c0224220>] worker_thread+0x7a/0x86
       [<c0226b49>] kthread+0x3b/0x62
       [<c02033a3>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (&(&clp->cl_renewd)->work){--..}:
       [<c0230589>] __lock_acquire+0x8bc/0xb95
       [<c0230c39>] lock_acquire+0x5f/0x78
       [<c0223e87>] __cancel_work_timer+0xb7/0x17f
       [<c0223f5a>] cancel_delayed_work_sync+0xb/0xd
       [<f887989e>] nfs4_kill_renewd+0x1e/0x29 [nfs]
       [<f885a8f6>] nfs_free_client+0x37/0x9e [nfs]
       [<f885ab20>] nfs_put_client+0x5d/0x62 [nfs]
       [<f885ab9a>] nfs_free_server+0x75/0xae [nfs]
       [<f8862672>] nfs4_kill_super+0x27/0x2b [nfs]
       [<c0258aab>] deactivate_super+0x3f/0x51
       [<c0269668>] mntput_no_expire+0x42/0x67
       [<c025d0e4>] path_release_on_umount+0x15/0x18
       [<c0269d30>] sys_umount+0x1a3/0x1cb
       [<c0269d71>] sys_oldumount+0x19/0x1b
       [<c02026ca>] sysenter_past_esp+0x5f/0xa5
       [<ffffffff>] 0xffffffff

Looking at the code, it would seem that taking the clp->cl_sem in
nfs4_kill_renewd is completely redundant, since we're already guaranteed to
have exclusive access to the nfs_client (we're shutting down).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:16 -05:00
Trond Myklebust e9cc6c234b NFS: Fix a possible Oops in fs/nfs/super.c
Sigh... commit 4584f520e1 (NFS: Fix NFS
mountpoint crossing...) had a slight flaw: server can be NULL if sget()
returned an existing superblock.

Fix the fix by dereferencing s->s_fs_info.

Thanks to Coverity/Adrian Bunk and Frank Filz for spotting the bug.
(See http://bugzilla.kernel.org/show_bug.cgi?id=9647)

Also add in the same namespace Oops fix for NFSv4 in both the mountpoint
crossing case, and the referral case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:11 -05:00
Al Viro 831830b5a2 restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace pid
Contents of /proc/*/maps is sensitive and may become sensitive after
open() (e.g.  if target originally shares our ->mm and later does exec
on suid-root binary).

Check at read() (actually, ->start() of iterator) time that mm_struct
we'd grabbed and locked is
 - still the ->mm of target
 - equal to reader's ->mm or the target is ptracable by reader.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02 13:13:27 -08:00
Linus Torvalds 158a962422 Unify /proc/slabinfo configuration
Both SLUB and SLAB really did almost exactly the same thing for
/proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.

This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
and SLAB, and shares all the setup code.  Maybe SLOB will want this some
day too.

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02 13:04:48 -08:00
Pekka Enberg 6b6adc22a0 slub: register slabinfo to procfs
We need to register slabinfo to procfs when CONFIG_SLUB is enabled to
make the file actually visible to user-space.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02 10:42:39 -08:00
Steve French 97837582bc [CIFS] Allow setting mode via cifs acl
Requires cifsacl mount flag to be on and CIFS_EXPERIMENTAL enabled

CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 07:47:21 +00:00
Jeff Layton 28c5a02a11 [CIFS] fix unicode string alignment in SPNEGO setup
Unicode strings need to be word aligned, but the code that handles that
is currently not taking the length of the SPNEGO blob into account. Fix
it to do so.

Signed-off-by: Jeff Layton <jlayton@tupile.poochiereds.net>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 04:56:21 +00:00
Steve French bb5a9a04d4 [CIFS] cifs_partialpagewrite() cleanup
rc cannot be -EBADF now and condition is always true

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 04:21:29 +00:00
Jeff Layton 1a67570c76 [CIFS] use krb5 session key from first SMB session after a NegProt
Currently, any new kerberos SMB session overwrites the server's session
key. The session key should only be set by the first SMB session set up
on the socket.

Signed-off-by: Jeff Layton <jlayton@tupile.poochiereds.net>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 04:03:02 +00:00
Jeff Layton 1d9a8852c3 [CIFS] redo existing session setup if needed in cifs_mount
When cifs_mount finds an existing SMB session that it can use for a new
mount, it does not check to see whether that session is in need of being
reconnected. An easy way to reproduce:

1) mount //server/share1
2) watch /proc/fs/cifs/DebugData for the share to go DISCONNECTED
3) mount //server/share2 with same creds as in step 1.

The second mount will fail because CIFSTCon returned -EAGAIN. If you do
an operation in share1 and then reattempt the mount it will work (since
the session is reestablished).

The following patch fixes this by having cifs_mount check the status
of the session when it picks an existing session and calling
cifs_setup_session on it again if it's in need of reconnection.

Thanks to Wojciech Pilorz for the initial bug report.

Signed-off-by: Jeff Layton <jlayton@tupile.poochiereds.net>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 01:37:11 +00:00
Jeff Layton 05b3de63da [CIFS] Only dump SPNEGO key if CONFIG_CIFS_DEBUG2 is set
The SPNEGO key data is not terribly interesting except in certain
debugging situations. Only dump it to the ring buffer if needed.

Signed-off-by: Jeff Layton <jlayton@tupile.poochiereds.net>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-31 00:51:45 +00:00
Steve French dae5dbdbd7 [CIFS] fix SetEA failure to some Samba versions
Thanks to Oleg Gvozdev for noticing the problem.

CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-12-30 23:49:57 +00:00
Eric Sandeen 16317ec2e5 ecryptfs: redo dget,mntget on dentry_open failure
Thanks to Jeff Moyer for pointing this out.

If the RDWR dentry_open() in ecryptfs_init_persistent_file fails,
it will do a dput/mntput.  Need to re-take references if we
retry as RDONLY.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mike Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-23 12:54:37 -08:00
Eric Sandeen c8161f64cc ecryptfs: fix unlocking in error paths
Thanks to Josef Bacik for finding these.

A couple of ecryptfs error paths don't properly unlock things they locked.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Josef Bacik <jbacik@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-23 12:54:37 -08:00
Jan Kara c525460e27 Don't send quota messages repeatedly when hardlimit reached
We should send quota message to netlink only once when hardlimit is
reached.  Otherwise user could easily make the system busy by trying to
exceed the hardlimit (and also the messages could be anoying if you cannot
stop writing just now).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-23 12:54:36 -08:00
Jan Kara 22dd483721 Fix computation of SKB size for quota messages
Fix computation of size of skb needed for quota message.  We should use
netlink provided functions and not just an ad-hoc number.  Also don't print
the return value from nla_put_foo() as it is always -1.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-23 12:54:36 -08:00
Eric Sandeen b88629060b ecryptfs: fix string overflow on long cipher names
Passing a cipher name > 32 chars on mount results in an overflow when the
cipher name is printed, because the last character in the struct
ecryptfs_key_tfm's cipher_name string was never zeroed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-23 12:54:36 -08:00
Linus Torvalds 2b5baad165 Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Initialise current offset in xfs_file_readdir correctly
  [XFS] Fix mknod regression
2007-12-20 17:02:22 -08:00
Lachlan McIlroy 4743e0ec12 [XFS] Initialise current offset in xfs_file_readdir correctly
After reading the directory contents into the temporary buffer, we grab
each dirent and pass it to filldir witht eh current offset of the dirent.
The current offset was not being set for the first dirent in the temporary
buffer, which coul dresult in bad offsets being set in the f_pos field
result in looping and duplicate entries being returned from readdir.

SGI-PV: 974905
SGI-Modid: xfs-linux-melb:xfs-kern:30282a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-21 11:40:05 +11:00
Christoph Hellwig bad60fdd14 [XFS] Fix mknod regression
This was broken by my '[XFS] simplify xfs_create/mknod/symlink prototype',
which assigned the re-shuffled ondisk dev_t back to the rdev variable in
xfs_vn_mknod. Because of that i_rdev is set to the ondisk dev_t instead of
the linux dev_t later down the function.

Fortunately the fix for it is trivial: we can just remove the assignment
because xfs_revalidate_inode has done the proper job before unlocking the
inode.

SGI-PV: 974873
SGI-Modid: xfs-linux-melb:xfs-kern:30273a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-21 11:39:58 +11:00
Ivan Kokshaysky 3c378158d4 mm: fix exit_mmap BUG() on a.out binary exit
The problem was introduced by commit "mm: variable length argument
support" (b6a2fea393)
as it didn't update fs/binfmt_aout.c like other binfmt's.

I noticed that on alpha when accidentally launched old OSF/1
Acrobat Reader binary. Obviously, other architectures are affected
as well.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ollie Wild <aaw@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-20 07:49:53 -08:00
Lachlan McIlroy 041388b54e [XFS] Put the correct offset in dirent d_off
The recent filldir regression fix was not putting the correct d_off in
each dirent. This was resulting in incorrect cookies being passed to dmapi
ioctls and the wrong offset appearing in the dirents. readdir was
unaffected as the filp->f_pos was being updated with the correct offset
and this was being written into the last dirent in each buffer. Fix the
XFS code to do the right thing.

SGI-PV: 973746
SGI-Modid: xfs-linux-melb:xfs-kern:30240a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-18 17:16:23 +11:00
Lachlan McIlroy c734c79bc3 [XFS] Don't wait for pending I/Os when purging blocks beyond eof.
On last close of a file we purge blocks beyond eof. The same code is used
when we truncate the file size down. In this case we need to wait for any
pending I/Os for dirty pages beyond the new eof. For the last close case
we are not changing the file size and therefore do not need to wait for
any I/Os to complete. This fixes a performance bottleneck where writes
into the page cache and cache flushes can become mutually exclusive.

SGI-PV: 964002
SGI-Modid: xfs-linux-melb:xfs-kern:30220a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
2007-12-18 17:16:17 +11:00
Jan Kara 087ee8d5be Fix compilation warning in dquot.c
Fix compilation warning about discarded const.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:17 -08:00
Eric Sandeen 7a3f595cc8 ecryptfs: fix fsx data corruption problems
ecryptfs in 2.6.24-rc3 wasn't surviving fsx for me at all, dying after 4
ops.  Generally, encountering problems with stale data and improperly
zeroed pages.  An extending truncate + write for example would expose stale
data.

With the changes below I got to a million ops and beyond with all mmap ops
disabled - mmap still needs work.  (A version of this patch on a RHEL5
kernel ran for over 110 million fsx ops)

I added a few comments as well, to the best of my understanding
as I read through the code.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:17 -08:00
Eric Sandeen 7c9e70efbf ecryptfs: set s_blocksize from lower fs in sb
eCryptfs wasn't setting s_blocksize in it's superblock; just pick it up
from the lower FS.  Having an s_blocksize of 0 made things like "filefrag"
which call FIGETBSZ unhappy.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mike Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:17 -08:00
Andries E. Brouwer b47b6f38e5 ext3, ext4: avoid divide by zero
As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when
mounting an ext3 filesystem.  If that number is zero, a crash follows.
Below a patch.

This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers.

Cc: <linux-ext4@vger.kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:16 -08:00
Uwe Kleine-König 9e2de407be fs/Kconfig: grammar fix
This was introduced in 4af8e944c22d8af92a7548354a9567250cc1a782

Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:16 -08:00
Eric Sandeen 459e216429 ecryptfs: initialize new auth_tokens before teardown
ecryptfs_destroy_mount_crypt_stat() checks whether each
auth_tok->global_auth_tok_key is nonzero and if so puts that key.  However,
in some early mount error paths nothing has initialized the pointer, and we
try to key_put() garbage.  Running the bad cipher tests in the testsuite
exposes this, and it's happy with the following change.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:15 -08:00
Linus Torvalds 2cc3a8f6ac Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  MAINTAINERS: update the NFS CLIENT entry
  NFS: Fix an Oops in NFS unmount
  Revert "NFS: Ensure we return zero if applications attempt to write zero bytes"
  SUNRPC xprtrdma: fix XDR tail buf marshalling for all ops
  NFSv2/v3: Fix a memory leak when using -onolock
  NFS: Fix NFS mountpoint crossing...
2007-12-17 13:36:17 -08:00
Mark Fasheh e8aed3450c ocfs2: Re-journal buffers after transaction extend
ocfs2_extend_trans() might call journal_restart() which will commit dirty
buffers and then restart the transaction. This means that any buffers which
still need changes should be passed to journal_access() again. Some paths
during extend weren't doing this right.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:23 -08:00
Mark Fasheh 0879c584ff ocfs2: Allow for debugging of transaction extends
The nastiest cases of transaction extends are also the rarest. We can expose
them more quickly at the expense of performance by going straight to the
journal_restart() in ocfs2_extend_trans(). Wrap things in OCFS2_DEBUG_FS so
that we only do this when "expensive debugging" is turned on.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:14 -08:00
Mark Fasheh 92295d8054 ocfs2: Don't panic when truncating an empty extent
This BUG_ON() was unintentionally left in after the sparse file support was
written.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:04 -08:00
Mark Fasheh a86370fbb6 ocfs2: fix exit-while-locked bug in ocfs2_queue_orphans()
We're holding the cluster lock when a failure might happen in
ocfs2_dir_foreach() so it needs to be released.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:49:43 -08:00
Trond Myklebust a10db50a4a NFS: Fix an Oops in NFS unmount
Ensure that the dummy 'root dentry' is invisible to d_find_alias(). If not,
then it may be spliced into the tree if a parent directory from the same
filesystem gets mounted at a later time.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-12 11:12:15 -05:00
Trond Myklebust a5576cfa5c Revert "NFS: Ensure we return zero if applications attempt to write zero bytes"
This reverts commit b9148c6b80.

On Wed, 12 Dec 2007 10:57:30 -0500, Chuck Lever wrote
> commit b9148c6b should be reverted.  It was recently forward-ported
> from some years-old patches, and is clearly not needed now.
>
> On Dec 11, 2007, at 5:21 PM, Adrian Bunk wrote:
>
>> This code became dead after commit
>> b9148c6b80
>> (which BTW doesn't seem to have changed any behaviour) and can
>> therefore
>> be removed.
>>
>> Spotted by the Coverity checker.
>>
>> Signed-off-by: Adrian Bunk <bunk@kernel.org>
>>
>> ---
>> --- linux-2.6/fs/nfs/direct.c.old     2007-12-02 21:54:53.000000000 +0100
>> +++ linux-2.6/fs/nfs/direct.c 2007-12-02 21:55:10.000000000 +0100
>> @@ -897,15 +897,12 @@ ssize_t nfs_file_direct_write(struct kio
>>       if (!count)
>>               goto out;       /* return 0 */
>>
>>       retval = -EINVAL;
>>       if ((ssize_t) count < 0)
>>               goto out;
>> -     retval = 0;
>> -     if (!count)
>> -             goto out;
>>
>>       retval = nfs_sync_mapping(mapping);
>>       if (retval)
>>               goto out;
>>
>>       retval = nfs_direct_write(iocb, iov, nr_segs, pos, count);
>>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-12 11:08:33 -05:00
Trond Myklebust 5cef338b30 NFSv2/v3: Fix a memory leak when using -onolock
Neil Brown said:
> Hi Trond,
> 
> We found that a machine which made moderately heavy use of
> 'automount' was leaking some nfs data structures - particularly the
> 4K allocated by rpc_alloc_iostats.
> It turns out that this only happens with filesystems with -onolock
> set.

> The problem is that if NFS_MOUNT_NONLM is set, nfs_start_lockd doesn't
> set server->destroy, so when the filesystem is unmounted, the
> ->client_acl is not shutdown, and so several resources are still
> held.  Multiple mount/umount cycles will slowly eat away memory
> several pages at a time.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NeilBrown <neilb@suse.de>
2007-12-11 22:01:56 -05:00
Trond Myklebust 4584f520e1 NFS: Fix NFS mountpoint crossing...
The check that was added to nfs_xdev_get_sb() to work around broken
servers, works fine for NFSv2, but causes mountpoint crossing on NFSv3 to
always return ESTALE.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-11 19:01:45 -05:00
Eric W. Biederman 3790ee4bd8 proc: remove/Fix proc generic d_revalidate
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-10 19:43:55 -08:00
Linus Torvalds 41f81e88e0 Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC
  [XFS] Make xfsbufd threads freezable
  [XFS] revert to double-buffering readdir
  [XFS] Fix broken inode cluster setup.
  [XFS] Clear XBF_READ_AHEAD flag on I/O completion.
  [XFS] Fixed a few bugs in xfs_buf_associate_memory()
  [XFS] 971064 Various fixups for xfs_bulkstat().
  [XFS] Fix dbflush panic in xfs_qm_sync.
2007-12-10 10:18:27 -08:00
David Chinner cf10e82bdc [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC
The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look
at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created
inodes from moving to the dirty list. Change this to the correct check
which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct.

SGI-PV: 974225
SGI-Modid: xfs-linux-melb:xfs-kern:30204a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:56 +11:00
Rafael J. Wysocki 978c7b2ff4 [XFS] Make xfsbufd threads freezable
Fix breakage caused by commit 8314418629
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

SGI-PV: 974224
SGI-Modid: xfs-linux-melb:xfs-kern:30203a

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:36 +11:00
Christoph Hellwig e89bc612d6 [XFS] revert to double-buffering readdir
The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback. The only
short-term fix for this is to revert to the old inefficient
double-buffering scheme.

SGI-PV: 973377
SGI-Modid: xfs-linux-melb:xfs-kern:30201a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:15 +11:00
David Chinner a7430847fc [XFS] Fix broken inode cluster setup.
The radix tree based inode caches did away with the inode cluster hashes,
replacing them with a bunch of masking and gang lookups on the radix tree.

This masking got broken when moving the code to per-ag radix trees and
indexing by agino # rather than straight inode number. The result is
clustered inode writeback does not cluster and things can go extremely
slowly when there are lots of inodes to write.

Fix it up by comparing the agino # of the inode we just looked up to the
index of the cluster we are looking for.

Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>

SGI-PV: 972915
SGI-Modid: xfs-linux-melb:xfs-kern:30033a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:46:59 +11:00
Lachlan McIlroy 77be55a5a1 [XFS] Clear XBF_READ_AHEAD flag on I/O completion.
SGI-PV: 972554
SGI-Modid: xfs-linux-melb:xfs-kern:30128a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2007-12-10 13:46:45 +11:00
Lachlan McIlroy d1afb678ce [XFS] Fixed a few bugs in xfs_buf_associate_memory()
- calculation of 'page_count' was incorrect as it did not
  consider the offset of 'mem' into the first page. The
  logic to bump 'page_count' didn't work if 'len' was <=
  PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
- setting b_buffer_length to 'len' is incorrect if 'offset'
  is > 0. Set it to the total length of the buffer.
- I suspect that passing a non-aligned address into
  mem_to_page() for the first page may have been causing
  issues - don't know but just tidy up that code anyway.

SGI-PV: 971596
SGI-Modid: xfs-linux-melb:xfs-kern:30143a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2007-12-10 13:46:20 +11:00
Lachlan McIlroy cd57e594ad [XFS] 971064 Various fixups for xfs_bulkstat().
- sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]()
- remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This
  special case causes bulkstat to fail because the special case uses
  xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions
  have different semantics.  xfs_bulkstat() will return the next inode
  after the one supplied while skipping internal inodes (ie quota inodes).
  xfs_bulkstate_single() will only lookup the inode supplied and return
  an error if it is an internal inode.
- in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied
  so in cases were we return without examining any inodes the scan wont
  restart back at zero.
- sanity check for valid *ubcountp values. Cannot sanity check for valid
  ubuffer here because some users of xfs_bulkstat() don't supply a buffer.
- checks against 'ubleft' (the space left in the user's buffer) should be
  against 'statstruct_size' which is the supplied minimum object size.
  The mixture of checks against statstruct_size and 0 was one of the
  reasons we were skipping inodes.
- if the formatter function returns BULKSTAT_RV_NOTHING and an error and
  the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT
  is for inodes that are no longer valid and we just skip them. EINVAL is
  returned if we try to lookup an internal inode so we skip them too. For
  a DMF scan if the inode and DMF attribute cannot fit into the space left
  in the user's buffer it would return ERANGE. We didn't handle this error
  and skipped the inode. We would continue to skip inodes until one fitted
  into the user's buffer or we completed the scan.
- put back the recalculation of agino (that got removed with the last fix)
  at the end of the while loop. This is because the code at the start of
  the loop expects agino to be the last inode examined if it is non-zero.
- if we found some inodes but then encountered an error, return success
  this time and the error next time. If the formatter aborted with ENOMEM
  we will now return this error but only if we couldn't read any inodes.
  Previously if we encountered ENOMEM without reading any inodes we
  returned a zero count and no error which falsely indicated the scan was
  complete.

SGI-PV: 973431
SGI-Modid: xfs-linux-melb:xfs-kern:30089a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
2007-12-10 13:44:11 +11:00
Donald Douwsma d757762bf2 [XFS] Fix dbflush panic in xfs_qm_sync.
The recent behaviour layer removal dropped the check for quotas that have
been requested at mount time but have subsequently been turned off. This
results in a panic when accessing m_quotainfo which has been freed.

This patch adds the check originally made by xfs_qm_syncall() to
xfs_qm_sync().

SGI-PV: 969769
SGI-Modid: xfs-linux-melb:xfs-kern:29908a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:40:10 +11:00
Matthew Wilcox 2dfe485a2c Remove commented-out code copied from NFS
This is a false positive when grepping ... change it to be what the NFS
code looks like now.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:40:31 -05:00
Matthew Wilcox 150030b78a NFS: Switch from intr mount option to TASK_KILLABLE
By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr'
mount option.  We have to use _killable everywhere instead of _interruptible
as we get rid of rpc_clnt_sigmask/sigunmask.

Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:40:25 -05:00
Liam R. Howlett da78451190 Use mutex_lock_killable in vfs_readdir
Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:39:54 -05:00
Matthew Wilcox 6d8982d9b8 proc/base.c: Use task_is_*
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:20:35 -05:00
Matthew Wilcox 1587e2b188 proc/array.c: Use TASK_REPORT
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:20:28 -05:00
Matthew Wilcox 4a6e9e2ce8 Use wake_up_locked() in eventpoll
Replace the uses of __wake_up_locked with wake_up_locked

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
2007-12-06 17:07:16 -05:00
Len Brown f7a5274d7d Pull suspend-2.6.24 into release branch 2007-12-06 16:26:52 -05:00
Al Viro 97bd7919e2 remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.

Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:20 -08:00
Al Viro 7e46aa5c8c regression: bfs endianness bug
BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
endian fields), not struct bfs_inode_info * (in-core stuff, with host-
endian ones).

It's a macro and fields with the right names are present in
bfs_inode_info, so it compiles, but on big-endian host it gives bogus
results.

Introduced in commit f433dc5634 ("Fixes to
the BFS filesystem driver").

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:20 -08:00
Al Viro 9b5e6857b3 regression: cifs endianness bug
access_flags_to_mode() gets on-the-wire data (little-endian) and treats
it as host-endian.

Introduced in commit e01b640013 ("[CIFS]
enable get mode from ACL when cifsacl mount option specified")

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:19 -08:00
Alexey Dobriyan 5a622f2d0f proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&de->count) == 0)
					if (atomic_dec_and_test(&de->count))
						if (de->deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de->deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de->deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [<c10ac4f0>] vsnprintf+0x2ad/0x49b
 [<c10ac779>] vscnprintf+0x14/0x1f
 [<c1018e6b>] vprintk+0xc5/0x2f9
 [<c10379f1>] handle_fasteoi_irq+0x0/0xab
 [<c1004f44>] do_IRQ+0x9f/0xb7
 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
 [<c100264e>] need_resched+0x1f/0x21
 [<c10190ba>] printk+0x1b/0x1f
 [<c107c8ad>] de_put+0x3d/0x50
 [<c107c8f8>] proc_delete_inode+0x38/0x41
 [<c107c8c0>] proc_delete_inode+0x0/0x41
 [<c1066298>] generic_delete_inode+0x5e/0xc6
 [<c1065aa9>] iput+0x60/0x62
 [<c1063c8e>] d_kill+0x2d/0x46
 [<c1063fa9>] dput+0xdc/0xe4
 [<c10571a1>] __fput+0xb0/0xcd
 [<c1054e49>] filp_close+0x48/0x4f
 [<c1055ee9>] sys_close+0x67/0xa5
 [<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:20 -08:00
Jan Kara d4beaf4ab5 jbd: Fix assertion failure in fs/jbd/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:20 -08:00
Evgeniy Dushistov 0c664f9742 ufs: fix nexstep dir block size
This patch fixes regression, introduced since 2.6.16.  NextStep variant of
UFS as OpenStep uses directory block size equals to 1024.  Without this
change, ufs_check_page fails in many cases.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Dave Bailey <dsbailey@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:18 -08:00
Jeff Moyer e00ba3dae0 aio: only account I/O wait time in read_events if there are active requests
On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle).  The UML code sits in io_getevents waiting for
an event to be submitted and completed.

Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:18 -08:00
Rafael J. Wysocki e136e769d4 Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2)
Fix breakage caused by commit d5d8c5976d
"freezer: do not send signals to kernel threads" in
jffs2_garbage_collect_thread() that assumed it would be sent signals
by the freezer.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Pete MacKay <armlinux@architechnical.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-12-04 01:35:41 -05:00
Linus Torvalds 8002cedc1a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
2007-12-03 08:15:36 -08:00
Eric W. Biederman 2b1e300a9d [NETNS]: Fix /proc/net breakage
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify ->lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-12-02 00:33:17 +11:00
Heiko Carstens 81257def2a tty: add the new termios2 ioctls to the compatible list.
Make them depend on TCGETS2.  If that one is implemented the rest should be
there as well.

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:55 -08:00
Miklos Szeredi 08b633070a fuse: fix attribute caching after rename
Invalidate attributes on rename, since some filesystems may update
st_ctime.  Reported by Szabolcs Szakacsits

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
John Muir fbee36b92a fuse: fix uninitialized field in fuse_inode
I found problems accessing (executing) previously existing files, until
I did chmod on them (or setattr).

If the fi->attr_version is not initialized, then it could be
larger than fc->attr_version until a setattr is executed, and as a
result the inode attributes would never be set.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi d0186b25e6 fuse: fix FUSE_FILE_OPS sending
FUSE_FILE_OPS is meant to signal that the kernel will send the open file to to
the userspace filesystem for operations on open files, so that sillyrenaming
unlinked files becomes unnecessary.

However this needs VFS changes, which won't make it into 2.6.24.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi a6643094e7 fuse: pass open flags to read and write
Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.

To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi 7dca9fd39f fuse: cleanup: add fuse_get_attr_version()
Extract repeated code into helper function, as suggested by Akpm.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi bcb4be809d fuse: fix reading past EOF
Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.

So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.

Thanks to mpb for the report.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Tobias Poschwatta 6454d1f903 fix up ext2_fs.h for userspace after reservations backport
In commit a686cd898bd999fd026a51e90fb0a3410d258ddb:

 "Val's cross-port of the ext3 reservations code into ext2."

include/linux/ext2_fs.h got a new function whose return value is only
defined if __KERNEL__ is defined. Putting #ifdef __KERNEL__ around the
function seems to help, patch below.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:53 -08:00
Eric W. Biederman 19fd4bb2a0 proc: remove races from proc_id_readdir()
Oleg noticed that the call of task_pid_nr_ns() in proc_pid_readdir
is racy with respect to tasks exiting.

After a bit of examination it also appears that the call itself
is completely unnecessary.

So to fix the problem this patch modifies next_tgid() to return
both a tgid and the task struct in question.

A structure is introduced to return these values because it is
slightly cleaner and easier to optimize, and the resulting code
is a little shorter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Alexey Dobriyan c2319540cd proc: fix NULL ->i_fop oops
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file->f_op->readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
   could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
   readdir will return only "." and "..", because pde->next was cleared by
   remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
   fixed, just race window made smaller, because vfs_readdir() is run
   without sb_lock held and without file_list_lock held.  Effectively,
   ->i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
       00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
       00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
 [<c1061040>] filldir64+0x0/0xc5
 [<c1061295>] sys_getdents64+0x63/0xa5
 [<c10026ba>] sysenter_past_esp+0x5f/0x85
 =======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
      Unfortunately we have another copy of this junk in
      security/selinux/selinuxfs.c:sel_remove_entries() which would need the
      same treatment."

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Linus Torvalds cae2f9c46d Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  sysfs: fix off-by-one error in fill_read_buffer()
  kobject: two typo fixes
  UIO: add UIO documentation target to DocBook Makefile
  UIO: fix up the UIO documentation
  create /sys/.../power when CONFIG_PM is set
  allow LEGACY_PTYS to be set to 0
2007-11-28 15:59:50 -08:00
Miao Xie 8118a859dc sysfs: fix off-by-one error in fill_read_buffer()
I found that there is a off-by-one problem in the following code.

Version:	2.6.24-rc2
File:		fs/sysfs/file.c:118-122
Function:	fill_read_buffer
--------------------------------------------------------------------
	count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);

	sysfs_put_active_two(attr_sd);

	BUG_ON(count > (ssize_t)PAGE_SIZE);
--------------------------------------------------------------------

Because according to the specification of the sysfs and the implement of
the show methods, the show methods return the number of bytes which would
be generated for the given input, excluding the trailing null.So if the
return value of the show methods equals PAGE_SIZE - 1, the buffer is full
in fact.  And if the return value equals PAGE_SIZE, the resulting string
was already truncated,or buffer overflow occurred.

This patch fixes an off-by-one error in fill_read_buffer.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-28 13:53:53 -08:00
Ingo Molnar c46f739dd3 vfs: coredumping fix
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043

only allow coredumping to the same uid that the coredumping
task runs under.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-28 10:58:01 -08:00
Mark Fasheh b1967d0edd ocfs2: reverse inline-data truncate args
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set
i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate
sets i_size, and punching a hole ignores it.

This exposed a problem where punching a hole in an inline-data file wasn't
updating the page cache, so fix that too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh 0d8a4e0cd6 ocfs2: Fix comparison in ocfs2_size_fits_inline_data()
This was causing us to prematurely push out inline data by one byte.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh bccb9dad89 ocfs2: Remove bug statement in ocfs2_dentry_iput()
The existing bug statement didn't take into account unhashed dentries which
might not have a cluster lock on them. This could happen if a node exporting
the file system via NFS is rebooted, re-exported to nfs clients and then
unmounted. It's fine in this case to not have a dentry cluster lock.

Just remove the bug statement and replace it with an error print, which
does the proper checks. Though we want to know if something has happened
which might have prevented a cluster lock from being created, it's
definitely not necessary to panic the machine for this.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Jan Kara 5a58c3ef22 [PATCH] ocfs2: Remove expensive bitmap scanning
Enable expensive bitmap scanning only if DEBUG option is enabled.
The bitmap scanning quite loads the CPU and on my machine the write
throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
improves from 37 MB/s to 45.4 MB/s in local mode...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh a46043e08f ocfs2: log valid inode # on bad inode
If the inode block isn't valid then we don't want to print the value from
that, instead print the block number which was passed in (which should
always be correct). Also, turn this into a debug print for now - folks who
hit an actual problem always have other logs indicating what the source is.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh ef9f86ceb6 ocfs2: Filter -ENOSPC in mlog_errno()
It's almost never worth printing in that situation and we keep forgetting to
manually filter it out.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Joe Perches 2759236f84 [PATCH] fs/ocfs2: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Mark Fasheh e001e796e4 ocfs2: Reset journal parameters after s_mount_opt update
Right now we're just setting them from the existing parameters, not the
new ones that a remount specified.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Linus Torvalds 423eaf8f00 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  NFS: Clean up new multi-segment direct I/O changes
  NFS: Ensure we return zero if applications attempt to write zero bytes
  NFS: Support multiple segment iovecs in the NFS direct I/O path
  NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
  SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c
  SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
  NFS: fs/nfs/dir.c should #include "internal.h"
  NFS: make nfs_wb_page_priority() static
  NFS: mount failure causes bad page state
  SUNRPC: remove NFS/RDMA client's binary sysctls
  kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
  sunrpc: rpc_pipe_poll may miss available data in some cases
  sunrpc: return error if unsupported enctype or cksumtype is encountered
  sunrpc: gss_pipe_downcall(), don't assume all errors are transient
  NFS: Fix the ustat() regression
2007-11-26 19:42:59 -08:00
Linus Torvalds 0685ab4fb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: bump version of kernel/sched_debug.c
  sched: fix minimum granularity tunings
  sched: fix RLIMIT_CPU comment
  sched: fix kernel/acct.c comment
  sched: fix prev_stime calculation
  sched: don't forget to unlock uids_mutex on error paths
2007-11-26 19:42:08 -08:00
Chuck Lever 02fe494619 NFS: Clean up new multi-segment direct I/O changes
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:40 -05:00
Chuck Lever b9148c6b80 NFS: Ensure we return zero if applications attempt to write zero bytes
A zero byte count direct write request should be a successful no-op, not an
error.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:38 -05:00
Chuck Lever c216fd708e NFS: Support multiple segment iovecs in the NFS direct I/O path
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:36 -05:00
Chuck Lever 19f737879c NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
Add helpers that iterate over multi-segment iovecs.  These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:35 -05:00
Adrian Bunk 4c30d56edc NFS: fs/nfs/dir.c should #include "internal.h"
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:49 -05:00
Adrian Bunk 5334eb13d4 NFS: make nfs_wb_page_priority() static
nfs_wb_page_priority() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:48 -05:00
Russell King f16c960332 NFS: mount failure causes bad page state
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:

IP-Config: Complete:
      device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
     host=192.168.1.101, domain=, nis-domain=(none),
     bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...

This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data().  Ensure that the
parsed mount data is always initialised.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
     (Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:22 -05:00
Ingo Molnar 08e4570a4a sched: fix prev_stime calculation
Srivatsa Vaddagiri noticed occasionally incorrect CPU usage
values in top and tracked it down to stime going below 0 in
task_stime(). Negative values are possible there due to the
sampled nature of stime/utime.

Fix suggested by Balbir Singh.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
2007-11-26 21:21:49 +01:00
Steve French 2b83457bde [CIFS] Fix check after use error in ACL code
Spotted by the coverity scanner.

CC: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-25 10:01:00 +00:00
Steve French 058250a0d5 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-11-25 09:53:27 +00:00
Jeff Layton cea218054a [CIFS] Fix potential data corruption when writing out cached dirty pages
Fix RedHat bug 329431

The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.

Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.

The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.

CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.

This patch does the following:

1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.

2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.

3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.

4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.

Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.

Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-20 23:19:03 +00:00
Petr Tesarik 2a97468024 [CIFS] Fix spurious reconnect on 2nd peek from read of SMB length
When retrying kernel_recvmsg() because of a short read, check returned
length against the remaining length, not against total length. This
avoids unneeded session reconnects which would otherwise occur when
kernel_recvmsg() finally returns zero when asked to read zero bytes.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-20 02:24:08 +00:00
Neil Brown 4c1fe2f78a kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
Hi Trond,

I have discovered that the BUG_ON in nfs_follow_mountpoint:

	BUG_ON(IS_ROOT(dentry));

can be triggered by a misbehaving server.

What happens is the client does a lookup and discoveres that the named
directory has a different fsid, so it initiates a mount.
It then performs a GETATTR on the mounted directory and gets a
different fsid again (due to a bug in the NFS server).
This causes nfs_follow_mountpoint to be called on the newly mounted
root, which triggers the BUG_ON.

To duplicate this, have a directory which contains some mountpoints,
and export that directory with the "crossmnt" flag using nfs-utils
1.1.1 (or 1.1.0 I think)

The GETATTR on the root of the mounted filesystem will return the
information for the top exportpoint, while a lookup will return the
correct information.  This difference causes the NFS client to BUG.

I think the best way to fix this is to trap this possibility early, so
just before completing the mount in the NFS client, check that it isn't
going to use nfs_mountpoint_inode_operations.
As long as i_op will never change once set (is that true?), this
should be adequately safe.

The following patch shows a possible approach, and it works for me.
i.e. when the NFS server is misbehaving, I get ESTALE on those
mountpoints, while when the NFS server is working correctly, I get
correct behaviour on the client.

NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:48 -05:00
Trond Myklebust b09b9417d0 NFS: Fix the ustat() regression
Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a
dummy inode. This breaks ustat(), which actually uses sb->s_root in a
vfstat() call.

Fix this by making the s_root a dummy alias to the directory inode that was
used when creating the superblock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:44 -05:00
Steve French f7a44eadd5 [CIFS] remove build warning
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-17 00:01:51 +00:00
Steve French 2442421b17 [CIFS] Have CIFS_SessSetup build correct SPNEGO SessionSetup request
Have CIFS_SessSetup call cifs_get_spnego_key when Kerberos is
negotiated. Use the info in the key payload to build a session
setup request packet. Also clean up how the request buffer in
the function is freed on error.

With appropriate user space helper (in samba/source/client). Kerberos
support (secure session establishment can be done now via Kerberos,
previously users would have to use NTLMv2 instead for more secure
session setup).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 23:37:35 +00:00
Steve French 8840dee9dc [CIFS] minor checkpatch cleanup
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 23:05:52 +00:00
Jeff Layton d6c2e4d02b [CIFS] have cifs_get_spnego_key get the hostname from TCP_Server_Info
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 22:23:17 +00:00
Jeff Layton c359cf3c61 [CIFS] add hostname field to TCP_Server_Info struct
...and populate it with the hostname portion of the UNC string.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 22:22:06 +00:00
Jeff Layton 70fe7dc055 [CIFS] clean up error handling in cifs_mount
Move all of the kfree's sprinkled in the middle of the function to the
end, and have the code set rc and just goto there on error. Also zero
out the password string before freeing it. Looks like this should also
fix a potential memory leak of the prepath string if an error occurs
near the end of the function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 22:21:07 +00:00
Steve French 68bf728a22 [CIFS] add ver= prefix to upcall format version
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Igor Mammedov <niallan@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16 18:32:52 +00:00
Jan Kara 7c06a8dc64 Fix 64KB blocksize in ext3 directories
With 64KB blocksize, a directory entry can have size 64KB which does not
fit into 16 bits we have for entry lenght.  So we store 0xffff instead and
convert value when read from / written to disk.  The patch also converts
some places to use ext3_next_entry() when we are changing them anyway.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:43 -08:00
Jeff Layton dbaf4c024a smbfs: fix debug builds
Fix some warnings with SMBFS_DEBUG_* builds.  This patch makes it so that
builds with -Werror don't fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:43 -08:00
Arjan van de Ven cb51f973bc mark sys_open/sys_read exports unused
sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers.  Since 2.0 or so this was deprecated behavior, but
several drivers still were using this.  Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time....  however with commit c2b1239a9f the
last user is now gone.

This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table....  which then can be tampered with from
other threads for example.  For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.

The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:42 -08:00
Eric W. Biederman 9fcc2d15b1 proc: simplify and correct proc_flush_task
Currently we special case when we have only the initial pid namespace.
Unfortunately in doing so the copied case for the other namespaces was
broken so we don't properly flush the thread directories :(

So this patch removes the unnecessary special case (removing a usage of
proc_mnt) and corrects the flushing of the thread directories.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:42 -08:00
Adrian Bunk 8744969a81 fuse_file_alloc(): fix NULL dereferences
Fix obvious NULL dereferences spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:42 -08:00
Fengguang Wu c06a018fa5 reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file
This is not a new problem in 2.6.23-git17.  2.6.22/2.6.23 is buggy in the
same way.

Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands.  Only `umount' can do the trick.

The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
	 [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
	 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
	 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
	 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
	 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
	 [<ffffffff802a187c>] __fput+0xcc/0x1b0
	 [<ffffffff802a1ba6>] fput+0x16/0x20
	 [<ffffffff8029e676>] filp_close+0x56/0x90
	 [<ffffffff8029fe0d>] sys_close+0xad/0x110
	 [<ffffffff8020c41e>] system_call+0x7e/0x83

Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.

=== for the patient ===
Here are more detailed demonstrations of the problem.

1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
   and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___UD__Bd_      2
[T2] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___U___Bd_      2

2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# echo hi > /tmp/hi
[T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
[T2] hi
[T3] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /proc/4397# cd /proc/`pidof cp`
[T1] root /proc/4713# cat io
     rchar: 8396
     wchar: 3
     syscr: 20
     syscw: 1
     read_bytes: 0
     write_bytes: 20480
     cancelled_write_bytes: 4096
[T2] root /proc/4713# cat io
     rchar: 8399
     wchar: 6
     syscr: 21
     syscw: 2
     read_bytes: 0
     write_bytes: 24576
     cancelled_write_bytes: 4096

//Question: the 'write_bytes' is a bit more than expected ;-)

Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Reviewed-by: Chris Mason <chris.mason@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:41 -08:00
Dmitri Vorobiev f433dc5634 Fixes to the BFS filesystem driver
I found a few bugs in the BFS driver.  Detailed description of the bugs as
well as the steps to reproduce the errors are given in the kernel bugzilla.
 Please follow these links for more information:

http://bugzilla.kernel.org/show_bug.cgi?id=9363
http://bugzilla.kernel.org/show_bug.cgi?id=9364
http://bugzilla.kernel.org/show_bug.cgi?id=9365
http://bugzilla.kernel.org/show_bug.cgi?id=9366

This patch fixes the bugs described above.  Besides, the patch introduces
coding style changes to make the BFS driver conform to the requirements
specified for Linux kernel code.  Finally, I made a few cosmetic changes
such as removal of trivial debug output.

Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
in-core superblock structure.  These fields are initialized but never
actually used.

If you are wondering why I need BFS, here is the answer: I am using this
driver in the context of Linux kernel classes I am teaching in the Moscow
State University and in the International Institute of Information
Technology in Pune, India.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Cc: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:40 -08:00
Adam Litke 9a119c056d hugetlb: allow bulk updating in hugetlb_*_quota()
Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
allow bulk updating of the sbinfo->free_blocks counter.  This will be used by
the next patch in the series.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:40 -08:00
Adam Litke c79fb75e5a hugetlb: fix quota management for private mappings
The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
mappings when that support was added.  Currently, quota is debited at page
instantiation and credited at file truncation.  This approach works correctly
for shared pages but is incomplete for private pages.  In addition to
hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
this function does not respect quotas.

Private huge pages are treated very much like normal, anonymous pages.  They
are not "backed" by the hugetlbfs file and are not stored in the mapping's
radix tree.  This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.

This patch (based on a prototype provided by Ken Chen) moves quota crediting
for all pages into free_huge_page().  page->private is used to store a pointer
to the mapping to which this page belongs.  This is used to credit quota on
the appropriate hugetlbfs instance.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:40 -08:00
Eric W. Biederman e1a1c997af proc: fix proc_kill_inodes to kill dentries on all proc superblocks
It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks.  Handle
that now.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:38 -08:00
Jan Kara e47776a0a4 Forbid user to change file flags on quota files
Forbid user from changing file flags on quota files.  User has no bussiness
in playing with these flags when quota is on.  Furthermore there is a
remote possibility of deadlock due to a lock inversion between quota file's
i_mutex and transaction's start (i_mutex for quota file is locked only when
trasaction is started in quota operations) in ext3 and ext4.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: LIOU Payphone <lioupayphone@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:38 -08:00
Michael Halcrow 8a146a2b0d eCryptfs: cast page->index to loff_t instead of off_t
page->index should be cast to loff_t instead of off_t.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:36 -08:00
Steve French 133672efbc [CIFS] Fix buffer overflow if server sends corrupt response to small
request

In SendReceive() function in transport.c - it memcpy's
message payload into a buffer passed via out_buf param. The function
assumes that all buffers are of size (CIFSMaxBufSize +
MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller
(MAX_CIFS_SMALL_BUFFER_SIZE) buffers.  There are eight callers
(SMB worker functions) which are primarily affected by this change:

TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes,
Lock and PosixLock

CC: Dave Kleikamp <shaggy@austin.ibm.com>
CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-13 22:41:37 +00:00
Linus Torvalds 31083eba37 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
  [NETFILTER]: xt_time should not assume CONFIG_KTIME_SCALAR
  [NET]: Move unneeded data to initdata section.
  [NET]: Cleanup pernet operation without CONFIG_NET_NS
  [TEHUTI]: Fix incorrect usage of strncat in bdx_get_drvinfo()
  [MYRI_SBUS]: Prevent that myri_do_handshake lies about ticks.
  [NETFILTER]: bridge: fix double POSTROUTING hook invocation
  [NETFILTER]: Consolidate nf_sockopt and compat_nf_sockopt
  [NETFILTER]: nf_nat: fix memset error
  [INET]: Use list_head-s in inetpeer.c
  [IPVS]: Remove unused exports.
  [NET]: Unexport sysctl_{r,w}mem_max.
  [TG3]: Update version to 3.86
  [TG3]: MII => TP
  [TG3]: Add A1 revs
  [TG3]: Increase the PCI MRRS
  [TG3]: Prescaler fix
  [TG3]: Limit 5784 / 5764 to MAC LED mode
  [TG3]: Disable GPHY autopowerdown
  [TG3]: CPMU adjustments for loopback tests
  [TG3]: Fix nvram selftest failures
  ...
2007-11-13 09:04:48 -08:00
Linus Torvalds 0b832a4b93 Revert "ext2/ext3/ext4: add block bitmap validation"
This reverts commit 7c9e69faa2, fixing up
conflicts in fs/ext4/balloc.c manually.

The cost of doing the bitmap validation on each lookup - even when the
bitmap is cached - is absolutely prohibitive.  We could, and probably
should, do it only when adding the bitmap to the buffer cache.  However,
right now we are better off just reverting it.

Peter Zijlstra measured the cost of this extra validation as a 85%
decrease in cached iozone, and while I had a patch that took it down to
just 17% by not being _quite_ so stupid in the validation, it was still
a big slowdown that could have been avoided by just doing it right.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-13 08:09:11 -08:00
Denis V. Lunev 022cbae611 [NET]: Move unneeded data to initdata section.
This patch reverts Eric's commit 2b008b0a8e

It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.

Signed-of-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-13 03:23:50 -08:00
Trond Myklebust 91cf45f02a [NET]: Add the helper kernel_sock_shutdown()
...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers.

Looking at the sock->op->shutdown() handlers, it looks as if all of them
take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the
RCV_SHUTDOWN/SEND_SHUTDOWN arguments.
Add a helper, and then define the SHUT_* enum to ensure that kernel users
of shutdown() don't get confused.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 18:10:39 -08:00
J. Bruce Fields 6fa02839bf nfsd4: recheck for secure ports in fh_verify
As with commit 7fc90ec93a ("knfsd: nfsd:
call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem")
this is a case where we need to redo a security check in fh_verify()
even though the filehandle already has an associated dentry--if the
filehandle was created by fh_compose() in an earlier operation of the
nfsv4 compound, then we may not have done these checks yet.

Without this fix it is possible, for example, to traverse from an export
without the secure ports requirement to one with it in a single
compound, and bypass the secure port check on the new export.

While we're here, fix up some minor style problems and change a printk()
to a dprintk(), to make it harder for random unprivileged users to spam
the logs.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-12 14:28:08 -08:00
J. Bruce Fields ac8587dcb5 knfsd: fix spurious EINVAL errors on first access of new filesystem
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval.  This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.

Thanks to Roland <devzero@web.de> for bug report and data collection.

Cc: Roland <devzero@web.de>
Acked-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-12 14:28:08 -08:00
Linus Torvalds 46015977e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
  [CIFS] fix oops on second mount to same server when null auth is used
  [CIFS] Fix stale mode after readdir when cifsacl specified
  [CIFS] add mode to acl conversion helper function
  [CIFS] Fix incorrect mode when ACL had deny access control entries
  [CIFS] Add uid to key description so krb can handle user mounts
  [CIFS] Fix walking out end of cifs dacl
  [CIFS] Add upcall files for cifs to use spnego/kerberos
  [CIFS] add OIDs for KRB5 and MSKRB5 to ASN1 parsing routines
  [CIFS] Register and unregister cifs_spnego_key_type on module init/exit
  [CIFS] implement upcalls for SPNEGO blob via keyctl API
  [CIFS] allow cifs_calc_signature2 to deal with a zero length iovec
  [CIFS] If no Access Control Entries, set mode perm bits to zero
  [CIFS] when mount helper missing fix slash wrong direction in share
  [CIFS] Don't request too much permission when reading an ACL
  [CIFS] enable get mode from ACL when cifsacl mount option specified
  [CIFS] ACL support part 8
  [CIFS] acl support part 7
  [CIFS] acl support part 6
  [CIFS] acl support part 6
  [CIFS] remove unused funtion compile warning when experimental off
  ...
2007-11-12 11:11:39 -08:00
Roland McGrath 00ec99da43 core dump: remain dumpable
The coredump code always calls set_dumpable(0) when it starts (even
if RLIMIT_CORE prevents any core from being dumped).  The effect of
this (via task_dumpable) is to make /proc/pid/* files owned by root
instead of the user, so the user can no longer examine his own
process--in a case where there was never any privileged data to
protect.  This affects e.g. auxv, environ, fd; in Fedora (execshield)
kernels, also maps.  In practice, you can only notice this when a
debugger has requested PTRACE_EVENT_EXIT tracing.

set_dumpable was only used in do_coredump for synchronization and not
intended for any security purpose.  (It doesn't secure anything that wasn't
already unsecured when a process dies by SIGTERM instead of SIGQUIT.)

This changes do_coredump to check the core_waiters count as the means of
synchronization, which is sufficient.  Now we leave the "dumpable" bits alone.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-12 10:32:29 -08:00
Jeff Layton 9b8f5f5737 [CIFS] fix oops on second mount to same server when null auth is used
When a share is mounted using no username, cifs_mount sets
volume_info.username as a NULL pointer, and the sesInfo userName as an
empty string. The volume_info.username is passed to a couple of other
functions to see if there is an existing unc or tcp connection that can
be used. These functions assume that the username will be a valid
string that can be passed to strncmp. If the pointer is NULL, then the
kernel will oops if there's an existing session to which the string
can be compared.

This patch changes cifs_mount to set volume_info.username to an empty
string in this situation, which prevents the oops and should make it
so that the comparison to other null auth sessions match.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-09 23:25:04 +00:00
Linus Torvalds 4c31c30302 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Add UNPLUG traces to all appropriate places
  block: fix requeue handling in blk_queue_invalidate_tags()
  mmc: Fix sg helper copy-and-paste error
  pktcdvd: fix BUG caused by sysfs module reference semantics change
  ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
  cfq_idle_class_timer: add paranoid checks for jiffies overflow
  cfq: fix IOPRIO_CLASS_IDLE delays
  cfq: fix IOPRIO_CLASS_IDLE accounting
2007-11-09 15:17:49 -08:00
Linus Torvalds e36aeee65d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: fix rename vs unlink race
  [PATCH] Fix possibly too long write in o2hb_setup_one_bio()
  ocfs2: fix write() performance regression
  ocfs2: Commit journal on sync writes
  ocfs2: Re-order iput in ocfs2_drop_dentry_lock
  ocfs2: Create locks at initially requested level
  [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}
  [2.6 patch] make ocfs2_find_entry_el() static
2007-11-09 15:11:58 -08:00
Steve French a6f8de3d9b [CIFS] Fix stale mode after readdir when cifsacl specified
When mounted with cifsacl mount option, readdir can not
instantiate the inode with the estimated mode based on the ACL
for each file since we have not queried for the ACL for
each of these files yet.  So set the refresh time to zero
for these inodes so that the next stat will cause the client
to go to the server for the ACL info so we can build the estimated
mode (this means we also will issue an extra QueryPathInfo if
the stat happens within 1 second, but this is trivial compared to
the time required to open/getacl/close for each).

ls -l is slower when cifsacl mount option is specified, but
displays correct mode information.

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-08 23:10:32 +00:00
Steve French ce06c9f025 [CIFS] add mode to acl conversion helper function
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-08 21:12:01 +00:00
Steve French 15b0395911 [CIFS] Fix incorrect mode when ACL had deny access control entries
When mounted with the cifsacl mount option, we were
treating any deny ACEs found like allow ACEs and it turns out for
SFU and SUA Windows set these type of access control entries often.
The order of ACEs is important too.  The canonical order that most
ACL tools and Windows explorer consruct ACLs with is to begin with
DENY entries then follow with ALLOW, otherwise an allow entry
could be encountered first, making the subsequent deny entry like "dead
code which would be superflous since Windows stops when a match is
made for the operation you are trying to perform for your user

We start with no permissions in the mode and build up as we find
permissions (ie allow ACEs).  This fixes deny ACEs so they affect
the mask used to set the subsequent allow ACEs.

Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
CC: Alexander Bokovoy <ab@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-08 17:57:40 +00:00
Igor Mammedov 9eae8a8903 [CIFS] Add uid to key description so krb can handle user mounts
Adds uid to key description fro supporting user mounts
and minor formating changes

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-08 16:13:31 +00:00
Jens Axboe 8ec680e4c3 ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
Normally io priorities follow the CPU nice, unless a specific scheduling
class has been set. Once that is set, there's no way to reset the
behaviour to 'none' so that it follows CPU nice again.

Currently passing in 0 as the ioprio class/value will return -1/EINVAL,
change that to allow resetting of a set scheduling class.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-07 13:54:07 +01:00