Brian Behlendorf writes:
The root cause of the NFS hang we were observing appears to be a rare
deadlock between the kernel provided usermodehelper API and the linux NFS
client. The deadlock can arise because both of these services use the
generic linux work queues. The usermodehelper API run the specified user
application in the context of the work queue. And NFS submits both cleanup
and reconnect work to the generic work queue for handling. Normally this
is fine but a deadlock can result in the following situation.
- NFS client is in a disconnected state
- [events/0] runs a usermodehelper app with an NFS dependent operation,
this triggers an NFS reconnect.
- NFS reconnect happens to be submitted to [events/0] work queue.
- Deadlock, the [events/0] work queue will never process the
reconnect because it is blocked on the previous NFS dependent
operation which will not complete.`
The solution is simply to run reconnect requests on rpciod.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of taking the mutex every time we just need to increment/decrement
rpciod_users, we can optmise by using atomic_inc_not_zero and
atomic_dec_and_test.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The kref now does most of what cl_count + cl_user used to do. The only
remaining role for cl_count is to tell us if we are in a 'shutdown'
phase. We can provide that information using a single bit field instead
of a full atomic counter.
Also rename rpc_destroy_client() to rpc_close_client(), which reflects
better what its role is these days.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace it with explicit calls to rpc_shutdown_client() or
rpc_destroy_client() (for the case of asynchronous calls).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Its use is at best racy, and there is only one user (lockd), which has
additional locking that makes the whole thing redundant.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Also ensure that nfs_inode ncommit and npages are large enough to represent
all possible values for the number of pages.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Also get rid of a redundant call to nfs_setattr_update_inode(). The call to
nfs3_proc_setattr() already takes care of that.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The Linux NFS4 client simply skips over the bitmask in an O_EXCL open
call and so it doesn't bother to reset any fields that may be holding
the verifier. This patch has us save the first two words of the bitmask
(which is all the current client has #defines for). The client then
later checks this bitmask and turns on the appropriate flags in the
sattr->ia_verify field for the following SETATTR call.
This patch only currently checks to see if the server used the atime
and mtime slots for the verifier (which is what the Linux server uses
for this). I'm not sure of what other fields the server could
reasonably use, but adding checks for others should be trivial.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most
pagecache pages are allocated using GFP_HIGHUSER, and there's no reason
not to do that in nfs_symlink() as well.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
We don't need to revalidate the fsid on the root directory. It suffices to
revalidate it on the current directory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_do_close() does not currently have any way to ensure that the user
won't attempt to unmount the partition while the asynchronous RPC call
is completing. This again may cause Oopses in nfs_update_inode().
Add a vfsmount argument to nfs4_close_state to ensure that the partition
remains mounted while we're closing the file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A number of race conditions may currently ensue if the user presses ^C
and then unmounts the partition while an asynchronous open() is in
progress.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since PG_uptodate may now end up getting set during the call to
nfs_wb_page(), we can avoid putting a read request on the wire in those
situations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The write may fail, so we should not mark the page as uptodate until we are
certain that the data has been accepted and written to disk by the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no need to fail the entire O_DIRECT read/write just because
get_user_pages() returned fewer pages than we requested.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add the termios2 structure ready for enabling on most platforms. One or
two like Sparc are plain weird so have been left alone. Most can use the
same structure as ktermios for termios2 (ie the newer ioctl uses the
structure matching the current kernel structure)
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
This makes code about 300 lines smaller:
The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:
2.6.22-rc1-mm1: 0x001794d5
with the previous version: 0x00179505
with this patch: 0x00179135
The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.
This patch:
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a hybrid version of the patch to add the LZO1X compression
algorithm to the kernel. Nitin and myself have merged the best parts of
the various patches to form this version which we're both happy with (and
are jointly signing off).
The performance of this version is equivalent to the original minilzo code
it was based on. Bytecode comparisons have also been made on ARM, i386 and
x86_64 with favourable results.
There are several users of LZO lined up including jffs2, crypto and reiser4
since its much faster than zlib.
Signed-off-by: Nitin Gupta <nitingupta910@gmail.com>
Signed-off-by: Richard Purdie <rpurdie@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: at91_mci: fix hanging and rework to match flowcharts
mmc: at91_mci typo
sdhci: Fix "Unexpected interrupt" handling
mmc: fix silly copy-and-paste error
mmc: move layer init and workqueue to core file
mmc: refactor host class handling
mmc: refactor bus operations
sdhci: add ene controller id
mmc: bounce requests for simple hosts
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (40 commits)
bonding/bond_main.c: make 2 functions static
ps3: gigabit ethernet driver for PS3, take3
[netdrvr] Fix dependencies for ax88796 ne2k clone driver
eHEA: Capability flag for DLPAR support
Remove sk98lin ethernet driver.
sunhme.c:quattro_pci_find() must be __devinit
bonding / ipv6: no addrconf for slaves separately from master
atl1: remove write-only var in tx handler
macmace: use "unsigned long flags;"
Cleanup usbnet_probe() return value handling
netxen: deinline and sparse fix
eeprom_93cx6: shorten pulse timing to match spec (bis)
phylib: Add Marvell 88E1112 phy id
phylib: cleanup marvell.c a bit
AX88796 network driver
IOC3: Switch to pci refcounting safe APIs
e100: Fix Tyan motherboard e100 not receiving IPMI commands
QE Ethernet driver writes to wrong register to mask interrupts
rrunner.c:rr_init() must be __devinit
tokenring/3c359.c:xl_init() must be __devinit
...
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (32 commits)
[libata] sata_mv: print out additional chip info during probe
[libata] Use ATA_UDMAx standard masks when filling driver's udma_mask info
[libata] AHCI: Add support for Marvell AHCI-like chips (initially 6145)
[libata] Clean up driver udma_mask initializers
libata: Support chips with 64K PRD quirk
Add a PCI ID for santa rosa's PATA controller.
sata_sil24: sil24_interrupt() micro-optimisation
Add irq_flags to struct pata_platform_info
sata_promise: cleanups
[libata] pata_ixp4xx: kill unused var
ata_piix: fix pio/mwdma programming
[libata] ahci: minor internal cleanups
[ATA] Add named constant for ATAPI command DEVICE RESET
[libata] sata_sx4, sata_via: minor documentation updates
[libata] ahci: minor internal cleanups
[libata] ahci: Factor out SATA port init into a separate function
[libata] pata_sil680: minor cleanups from benh
[libata] sata_sx4: named constant cleanup
[libata] pata_ixp4xx: convert to new EH
[libata] pdc_adma: Reorder initializers with a couple structs
...
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] vmlogrdr function annotation.
[S390] s390: rename CPU_IDLE to S390_CPU_IDLE
[S390] cio: Remove prototype for non-existing function cmf_reset().
[S390] zcrypt: fix request timeout handling
[S390] system call optimization.
[S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE
[S390] Remove volatile from atomic_t
[S390] Program check in diag 210 under 31 bit
[S390] Bogomips calculation for 64 bit.
[S390] smp: Merge smp_count_cpus() and smp_get_save_areas().
[S390] zcore: Fix __user annotation.
[S390] fixed cdl-format detection.
[S390] sclp: Test facility list before executing a service call.
[S390] sclp: introduce some new interfaces.
[S390] Fixed comment typo.
[S390] vmcp cleanup
* 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block:
pipe: add documentation and comments
pipe: change the ->pin() operation to ->confirm()
Remove remnants of sendfile()
xip sendfile removal
splice: completely document external interface with kerneldoc
sendfile: remove bad_sendfile() from bad_file_ops
shmem: convert to using splice instead of sendfile()
relay: use splice_to_pipe() instead of open-coding the pipe loop
pipe: allow passing around of ops private pointer
splice: divorce the splice structure/function definitions from the pipe header
splice: relay support
sendfile: convert nfsd to splice_direct_to_actor()
sendfile: convert nfs to using splice_read()
loop: convert to using splice_direct_to_actor() instead of sendfile()
splice: add void cookie to the actor data
sendfile: kill generic_file_sendfile()
sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
sys_sendfile: switch to using ->splice_read, if available
vmsplice: add vmsplice-to-user support
splice: abstract out actor data