The NFSv4 server was initializing the dp->dl_flock pointer by the
somewhat ridiculous method of a locks_copy_lock callback.
Now that setlease uses the passed-in lock instead of doing a copy,
dl_flock no longer gets set, resulting in the lock leaking on delegation
release, and later possible hangs (among other problems).
So, initialize dl_flock and get rid of the callback.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As suggested by Christoph Hellwig, this moves allocation
of new file locks out of generic_setlease into the
callers, nfs4_open_delegation and fcntl_setlease in order
to allow GFP_KERNEL allocations when lock_flocks has
become a spinlock.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: assume svc_delete_xprt() called only once
svcrpc: never clear XPT_BUSY on dead xprt
nfsd4: fix connection allocation in sequence()
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: move minorversion to client
nfsd4: delay session removal till free_client
nfsd4: separate callback change and callback probe
nfsd4: callback program number is per-session
nfsd4: track backchannel connections
nfsd4: confirm only on succesful create_session
nfsd4: make backchannel sequence number per-session
nfsd4: use client pointer to backchannel session
nfsd4: move callback setup into session init code
nfsd4: don't cache seq_misordered replies
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
SUNRPC: Use conventional switch statement when reclassifying sockets
sunrpc/xprtrdma: clean up workqueue usage
sunrpc: Turn list_for_each-s into the ..._entry-s
...
Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
We're doing an allocation under a spinlock, and ignoring the
possibility of allocation failure.
A better fix wouldn't require an unnecessary allocation in the common
case, but we'll leave that for later.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The minorversion seems more a property of the client than the callback
channel.
Some time we should probably also enforce consistent minorversion usage
from the client; for now, this is just a cosmetic change.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Have unhash_client_locked() remove client and associated sessions from
global hashes, but delay further dismantling till free_client().
(After unhash_client_locked(), the only remaining references outside the
destroying thread are from any connections which have xpt_user callbacks
registered.)
This will simplify locking on session destruction.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Only one of the nfsd4_callback_probe callers actually cares about
changing the callback information.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The callback program is allowed to depend on the session which the
callback is going over.
No change in behavior yet, while we still only do callbacks over a
single session for the lifetime of the client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We need to keep track of which connections are available for use with
the backchannel, which for the forechannel, and which for both.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Following rfc 5661, section 18.36.4: "If the session is not successfully
created, then no changes are made to any client records on the server."
We shouldn't be confirming or incrementing the sequence id in this case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently we don't deal well with a client that has multiple sessions
associated with it (even simultaneously, or serially over the lifetime
of the client).
In particular, we don't attempt to keep the backchannel running after
the original session diseappears.
We will fix that soon.
Once we do that, we need the slot sequence number to be per-session;
otherwise, for example, we cannot correctly handle a case like this:
- All session 1 connections are lost.
- The client creates session 2. We use it for the backchannel
(since it's the only working choice).
- The client gives us a new connection to use with session 1.
- The client destroys session 2.
At this point our only choice is to go back to using session 1. When we
do so we must use the sequence number that is next for session 1. We
therefore need to maintain multiple sequence number streams.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of copying the sessionid, use the new cl_cb_session pointer,
which indicates which session we're using for the backchannel.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The backchannel should be associated with a session, it isn't really
global to the client.
We do, however, want a pointer global to the client which tracks which
session we're currently using for client-based callbacks.
This is a first step in that direction; for now, just reshuffling of
code with no significant change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it and ceph might sleep, but we can flip the definition
to a private spinlock as soon as that's done.
All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.
Based on an earlier patch to use a spinlock from Matthew
Wilcox, who has attempted this a few times before, the
earliest patch from over 10 years ago turned it into
a semaphore, which ended up being slower than the BKL
and was subsequently reverted.
Someone should do some serious performance testing when
this becomes a spinlock, since this has caused problems
before. Using a spinlock should be at least as good
as the BKL in theory, but who knows...
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Commit 78155ed75f "nfsd4: distinguish
expired from stale stateids" attempted to distinguish expired and stale
stateid's using time information that may not have been completely
reliable, so I reverted it.
That was throwing out the baby with the bathwater; we still do want to
return expired, but let's do that using the simpler approach of just
assuming any stateid is expired if it looks like it was given out by the
current server instance, but we can't find it any more.
This may help clients that are recovering from network partitions.
Reported-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
As long as we're not implementing any session security, we should just
automatically add any new connections that come along to the list of
sessions associated with the session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The spec requires us in various places to keep track of the connections
associated with each session.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Changes:
- make sure session memory reservation is released on failure
path.
- use min_t()/min() for more compact code in several places.
- break alloc_init_session into smaller pieces.
- miscellaneous other cleanup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of creating the new rpc client from a regular server thread,
set a flag, kick off a null call, and allow the null call to do the work
of setting up the client on the callback workqueue.
Use a spinlock to ensure the callback work gets a consistent view of the
callback parameters.
This allows, for example, changing the callback from contexts where
sleeping is not allowed. I hope it will also keep the locking simple as
we add more session and trunking features, by serializing most of the
callback-specific work.
This also closes a small race where the the new cb_ident could be used
with an old connection (or vice-versa).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This will eventually allow us, for example, to kick off null callback
from contexts where we can't sleep.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that we have both nfsd4_callback and nfsd4_cb_conn structures, I get
confused if variables of both types are always named cb....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If we already had a RW open for a file, and get a readonly open, we were
piggybacking on the existing RW open. That's inconsistent with the
downgrade logic which blows away the RW open assuming you'll still have
a readonly open.
Also, make sure there is a readonly or writeonly open available for
locking, again to prevent bad behavior in downgrade cases when any RW
open may be lost.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's OK for this function to return without setting filp--we do it in
the special-stateid case.
And there's a legitimate case where we can hit this, since we do permit
reads on write-only stateid's.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd4: fix file open accounting for RDWR opens
nfsd: don't allow setting maxblksize after svc created
nfsd: initialize nfsd versions before creating svc
net: sunrpc: removed duplicated #include
nfsd41: Fix a crash when a callback is retried
nfsd: fix startup/shutdown order bug
nfsd: minor nfsd read api cleanup
gcc-4.6: nfsd: fix initialized but not read warnings
nfsd4: share file descriptors between stateid's
nfsd4: fix openmode checking on IO using lock stateid
nfsd4: miscellaneous process_open2 cleanup
nfsd4: don't pretend to support write delegations
nfsd: bypass readahead cache when have struct file
nfsd: minor nfsd_svc() cleanup
nfsd: move more into nfsd_startup()
nfsd: just keep single lockd reference for nfsd
nfsd: clean up nfsd_create_serv error handling
nfsd: fix error handling in __write_ports_addxprt
nfsd: fix error handling when starting nfsd with rpcbind down
nfsd4: fix v4 state shutdown error paths
...
Commit f9d7562fdb "nfsd4: share file
descriptors between stateid's" didn't correctly account for O_RDWR opens.
Symptoms include leaked files, resulting in failures to unmount and/or
warnings about orphaned inodes on reboot.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.
Also I finished Al's 1e41568d73 ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.
The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The vfs doesn't really allow us to "upgrade" a file descriptor from
read-only to read-write, and our attempt to do so in nfs4_upgrade_open
is ugly and incomplete.
Move to a different scheme where we keep multiple opens, shared between
open stateid's, in the nfs4_file struct. Each file will be opened at
most 3 times (for read, write, and read-write), and those opens will be
shared between all clients and openers. On upgrade we will do another
open if necessary instead of attempting to upgrade an existing open.
We keep count of the number of readers and writers so we know when to
close the shared files.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It is legal to perform a write using the lock stateid that was
originally associated with a read lock, or with a file that was
originally opened for read, but has since been upgraded.
So, when checking the openmode, check the mode associated with the
open stateid from which the lock was derived.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The delegation code mostly pretends to support either read or write
delegations. However, correct support for write delegations would
require, for example, breaking of delegations (and/or implementation of
cb_getattr) on stat. Currently all that stops us from handing out
delegations is a subtle reference-counting issue.
Avoid confusion by adding an earlier check that explicitly refuses write
delegations.
For now, though, I'm not going so far as to rip out existing
half-support for write delegations, in case we get around to using that
soon.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If someone tries to shut down the laundry_wq while it isn't up it'll
cause an oops.
This can happen because write_ports can create a nfsd_svc before we
really start the nfs server, and we may fail before the server is ever
started.
Also make sure state is shutdown on error paths in nfsd_svc().
Use a common global nfsd_up flag instead of nfs4_init, and create common
helper functions for nfsd start/shutdown, as there will be other work
that we want done only when we the number of nfsd threads transitions
between zero and nonzero.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the server is out of memory is better for clients to back off and
retry than to just error out.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reportedly causes a lockdep warning on nfsd shutdown. That looks
like a false positive to me, but there's no reason why this needs the
state lock anyway.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSv4.1 adds additional flags to the share_access argument of the open
call. These flags need to be masked out in some of the existing code,
but current code does that inconsistently.
Tested-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit 78155ed75f.
We're depending here on the boot time that we use to generate the
stateid being monotonic, but get_seconds() is not necessarily.
We still depend at least on boot_time being different every time, but
that is a safer bet.
We have a few reports of errors that might be explained by this problem,
though we haven't been able to confirm any of them.
But the minor gain of distinguishing expired from stale errors seems not
worth the risk.
Conflicts:
fs/nfsd/nfs4state.c
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The alloc_init_file() first adds a file to the hash and then
initializes its fi_inode, fi_id and fi_had_conflict.
The uninitialized fi_inode could thus be erroneously checked by
the find_file(), so move the hash insertion lower.
The client_mutex should prevent this race in practice; however, we
eventually hope to make less use of the client_mutex, so the ordering
here is an accident waiting to happen.
I didn't find whether the same can be true for two other fields,
but the common sense tells me it's better to initialize an object
before putting it into a global hash table :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This is a mandatory operation. Also, here (not in open) is where we
should be committing the reboot recovery information.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
nfsd4_set_callback_client must be called under the state lock to atomically
set or unset the callback client and shutting down the previous one.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.
Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Mark the client as expired under the client_lock so it won't be renewed
when an nfsv4.1 session is done, after it was explicitly expired
during processing of the compound.
Do not renew a client mark as expired (in particular, it is not
on the lru list anymore)
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently just initialize the cl_refcount to 1
and decrement in expire_client(), conditionally freeing the
client when the refcount reaches 0.
To be used later by nfsv4.1 compounds to keep the client from
timing out while in use.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Separate out unhashing of the client and session.
To be used later by the laundromat.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
To be used later on to hold a reference count on the client while in use by a
nfsv4.1 compound.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
and grab the client lock once for all the client's sessions.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In preparation to share the lock's scope to both client
and session hash tables.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It's legal to send a DESTROY_SESSION outside any session (as the only
operation in a compound), in which case cstate->session will be NULL;
check for that case.
While we're at it, move these checks into a separate helper function.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In the replay case, the
renew_client(session->se_client);
happens after we've droppped the sessionid_lock, and without holding a
reference on the session; so there's nothing preventing the session
being freed before we get here.
Thanks to Benny Halevy for catching a bug in an earlier version of this
patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Benny Halevy <bhalevy@panasas.com>
Enforce the rules about compound op ordering.
Motivated by implementing RECLAIM_COMPLETE, for which the client is
implicit in the current session, so it is important to ensure a
succesful SEQUENCE proceeds the RECLAIM_COMPLETE.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The rfc allows a client to change the callback parameters, but we didn't
previously implement it.
Teach the callbacks to rerun themselves (by placing themselves on a
workqueue) when they recognize that their rpc task has been killed and
that the callback connection has changed.
Then we can change the callback connection by setting up a new rpc
client, modifying the nfs4 client to point at it, waiting for any work
in progress to complete, and then shutting down the old client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the shutdown sequence guarantees callbacks are shut down before
the client is destroyed, we no longer have a use for cl_count.
We'll probably reinstate a reference count on the client some day, but
it will be held by users other than callbacks.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The NFSv4 server's fl_break callback can sleep (dropping the BKL), in
order to allocate a new rpc task to send a recall to the client.
As far as I can tell this doesn't cause any races in the current code,
but the analysis is difficult. Also, the sleep here may complicate the
move away from the BKL.
So, just schedule some work to do the job for us instead. The work will
later also prove useful for restarting a call after the callback
information is changed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Once we've expired the client, there's no further purpose to the
callbacks; go ahead and shut down the callback client rather than
waiting for the last reference to go.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
The original code here assumed we'd allow the user to change the lease
any time, but only allow the change to take effect on restart. Since
then we modified the code to allow setting the lease on when the server
is down. Update the rest of the code to reflect that fact, clarify
variable names, and add document.
Also, the code insisted that the grace period always be the longer of
the old and new lease periods, but that's overly conservative--as long
as it lasts at least the old lease period, old clients should still know
to recover in time.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of accessing the lease time directly, some users call
nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as
nfs4_lease_time(). Neither layer of indirection serves any purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd4: fix minor memory leak
svcrpc: treat uid's as unsigned
nfsd: ensure sockets are closed on error
Revert "sunrpc: move the close processing after do recvfrom method"
Revert "sunrpc: fix peername failed on closed listener"
sunrpc: remove unnecessary svc_xprt_put
NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
xfs_export_operations.commit_metadata
commit_metadata export operation replacing nfsd_sync_dir
lockd: don't clear sm_monitored on nsm_reboot_lookup
lockd: release reference to nsm_handle in nlm_host_rebooted
nfsd: Use vfs_fsync_range() in nfsd_commit
NFSD: Create PF_INET6 listener in write_ports
SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
NFSD: Support AF_INET6 in svc_addsock() function
SUNRPC: Use rpc_pton() in ip_map_parse()
nfsd: 4.1 has an rfc number
nfsd41: Create the recovery entry for the NFSv4.1 client
nfsd: use vfs_fsync for non-directories
...
We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.org> [2.6.33.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.
This patch should improve the compilation speed of the nfsd module.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
None of this stuff is used outside nfsd, so move it out of the common
linux include directory.
Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We do the same calculation in a couple places; use a helper function,
and add a little documentation, in the hopes of preventing bugs like
that fixed in the last patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Unbalanced calculations on creation and destruction of sessions could
cause our estimate of cache memory used to become negative, sometimes
resulting in spurious SERVERFAULT returns to client CREATE_SESSION
requests.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
ca_maxresponsesize and ca_maxrequest size include the RPC header.
sv_max_mesg is sv_max_payolad plus a page for overhead and is used in
svc_init_buffer to allocate server buffer space for both the request and reply.
Note that this means we can service an RPC compound that requires
ca_maxrequestsize (MAXWRITE) or ca_max_responsesize (MAXREAD) but that we do
not support an RPC compound that requires both ca_maxrequestsize and
ca_maxresponsesize.
Signed-off-by: Andy Adamson <andros@netapp.com>
[bfields@citi.umich.edu: more documentation updates]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
nfsd4: nfsv4 clients should cross mountpoints
nfsd: revise 4.1 status documentation
sunrpc/cache: avoid variable over-loading in cache_defer_req
sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
nfsd: return success for non-NFS4 nfs4_state_start
nfsd41: Refactor create_client()
nfsd41: modify nfsd4.1 backchannel to use new xprt class
nfsd41: Backchannel: Implement cb_recall over NFSv4.1
nfsd41: Backchannel: cb_sequence callback
nfsd41: Backchannel: Setup sequence information
nfsd41: Backchannel: Server backchannel RPC wait queue
nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
nfsd41: Backchannel: callback infrastructure
nfsd4: use common rpc_cred for all callbacks
nfsd4: allow nfs4 state startup to fail
SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
nfsd4: fix null dereference creating nfsv4 callback client
nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
...
Move common initialization of 'struct nfs4_client' inside create_client().
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: Remember the auth flavor to use for callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Follows the model used by the NFS client. Setup the RPC prepare and done
function pointers so that we can populate the sequence information if
minorversion == 1. rpc_run_task() is then invoked directly just like
existing NFS client operations do.
nfsd4_cb_prepare() determines if the sequence information needs to be setup.
If the slot is in use, it adds itself to the wait queue.
nfsd4_cb_done() wakes anyone sleeping on the callback channel wait queue
after our RPC reply has been received. It also sets the task message
result pointer to NULL to clearly indicate we're done using it.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define and initialize cl_cb_seq_nr here]
[pulled out unused defintion of nfsd4_cb_done]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
RPC callback requests will wait on this wait queue if the backchannel
is out of slots.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Keep the xprt used for create_session in cl_cb_xprt.
Mark cl_callback.cb_minorversion = 1 and remember
the client provided cl_callback.cb_prog rpc program number.
Use it to probe the callback path.
Use the client's network address to initialize as the
callback's address as expected by the xprt creation
routines.
Define xdr sizes and code nfs4_cb_compound header to be able
to send a null callback rpc.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[get callback minorversion from fore channel's]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pulled definition for cl_cb_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: set up backchannel's cb_addr]
[moved rpc_create_args init to "nfsd: modify nfsd4.1 backchannel to use new xprt class"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Callbacks are always made using the machine's identity, so we can use a
single auth_generic credential shared among callbacks to all clients and
let the rpc code take care of the rest.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.
Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.
Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.
Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.
The iov_len calculation in nfs4svc_encode_compoundres is now always
correct. Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.
The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().
Move useful nfsd4_cache_entry fields into nfsd4_slot.
Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
nfserr_resource is not a legal error for NFSv4.1. Replace it with
nfserr_serverfault for EXCHANGE_ID and CREATE_SESSION processing.
We will also need to map nfserr_resource to other errors in routines shared
by NFSv4.0 and NFSv4.1
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This fixes a bug in the sequence operation reply.
The sequence operation returns the highest slotid it will accept in the future
in sr_highest_slotid, and the highest slotid it prefers the client to use.
Since we do not re-negotiate the session slot table yet, these should both
always be set to the session ca_maxrequests.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
a forechannel drc request size, clients can tailor a session to usage.
For example, an I/O session (READ/WRITE only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
ca_maxresponses to service a large in-flight data window.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Compounds consisting of only a sequence operation don't need any
additional caching beyond the sequence information we store in the slot
entry. Fix nfsd4_is_solo_sequence to identify this case correctly.
The additional check for a failed sequence in nfsd4_store_cache_entry()
is redundant, since the nfsd4_is_solo_sequence call lower down catches
this case.
The final ce_cachethis set in nfsd4_sequence is also redundant.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Until we work out the state locking so we can use a spin lock to protect
the cl_lru, we need to take the state_lock to renew the client.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Do not renew state on error]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Simplify exit code]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When a SETCLIENTID call comes in, one of the args given is the svc_rqst.
This struct contains an rq_addr field which holds the address that sent
the call. If this is an IPv6 address, then we can use the sin6_scope_id
field in this address to populate the sin6_scope_id field in the
callback address.
AFAICT, the rq_addr.sin6_scope_id is non-zero if and only if the client
mounted the server's link-local address.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The framework to add this is all in place. Now, add the code to allow
support for establishing a callback channel on an IPv6 socket.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
...rather than as a separate address and port fields. This will be
necessary for implementing callbacks over IPv6. Also, convert
gen_callback to use the standard rpcuaddr2sockaddr routine rather than
its own private one.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>