WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Randy Dunlap	4111d4fde6	sunrpc/rpc_pipe: fix kernel-doc notation Fix kernel-doc notation (& warnings) in sunrpc/rpc_pipe.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-09-23 14:36:38 -04:00
Neil Brown	61d0a8e6a8	NFS/RPC: fix problems with reestablish_timeout and related code. [[resending with correct cc: - "vfs.kernel.org" just isn't right!]] xprt->reestablish_timeout is used to cause TCP connection attempts to back off if the connection fails so as not to hammer the network, but to still allow immediate connections when there is no reason to believe there is a problem. It is not used for the first connection (when transport->sock is NULL) but only on reconnects. It is currently set: a/ to 0 when xs_tcp_state_change finds a state of TCP_FIN_WAIT1 on the assumption that the client has closed the connection so the reconnect should be immediate when needed. b/ to at least XS_TCP_INIT_REEST_TO when xs_tcp_state_change detects TCP_CLOSING or TCP_CLOSE_WAIT on the assumption that the server closed the connection so a small delay at least is required. c/ as above when xs_tcp_state_change detects TCP_SYN_SENT, so that it is never 0 while a connection has been attempted, else the doubling will produce 0 and there will be no backoff. d/ to double is value (up to a limit) when delaying a connection, thus providing exponential backoff and e/ to XS_TCP_INIT_REEST_TO in xs_setup_tcp as simple initialisation. So you can see it is highly dependant on xs_tcp_state_change being called as expected. However experimental evidence shows that xs_tcp_state_change does not see all state changes. ("rpcdebug -m rpc trans" can help show what actually happens). Results show: TCP_ESTABLISHED is reported when a connection is made. TCP_SYN_SENT is never reported, so rule 'c' above is never effective. When the server closes the connection, TCP_CLOSE_WAIT and TCP_LAST_ACK might be reported, and TCP_CLOSE is always reported. This rule 'b' above will sometimes be effective, but not reliably. When the client closes the connection, it used to result in TCP_FIN_WAIT1, TCP_FIN_WAIT2, TCP_CLOSE. However since commit `f75e674` (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect) we don't see any events on client-close. I think this is because xs_restore_old_callbacks is called to disconnect xs_tcp_state_change before the socket is closed. In any case, rule 'a' no longer applies. So all that is left are rule d, which successfully doubles the timeout which is never rest, and rule e which initialises the timeout. Even if the rules worked as expected, there would be a problem because a successful connection does not reset the timeout, so a sequence of events where the server closes the connection (e.g. during failover testing) will cause longer and longer timeouts with no good reason. This patch: - sets reestablish_timeout to 0 in xs_close thus effecting rule 'a' - sets it to 0 in xs_tcp_data_ready to ensure that a successful connection resets the timeout - sets it to at least XS_TCP_INIT_REEST_TO after it is doubled, thus effecting rule c I have not reimplemented rule b and the new version of rule c seems sufficient. I suspect other code in xs_tcp_data_ready needs to be revised as well. For example I don't think connect_cookie is being incremented as often as it should be. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-09-23 14:36:37 -04:00
Linus Torvalds	a87e84b5cd	Merge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux * 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits) nfsd4: nfsv4 clients should cross mountpoints nfsd: revise 4.1 status documentation sunrpc/cache: avoid variable over-loading in cache_defer_req sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req nfsd: return success for non-NFS4 nfs4_state_start nfsd41: Refactor create_client() nfsd41: modify nfsd4.1 backchannel to use new xprt class nfsd41: Backchannel: Implement cb_recall over NFSv4.1 nfsd41: Backchannel: cb_sequence callback nfsd41: Backchannel: Setup sequence information nfsd41: Backchannel: Server backchannel RPC wait queue nfsd41: Backchannel: Add sequence arguments to callback RPC arguments nfsd41: Backchannel: callback infrastructure nfsd4: use common rpc_cred for all callbacks nfsd4: allow nfs4 state startup to fail SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous nfsd4: fix null dereference creating nfsv4 callback client nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked. ...	2009-09-22 07:54:33 -07:00
Alexey Dobriyan	b87221de6a	const: mark remaining super_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
NeilBrown	cd68c374ea	sunrpc/cache: avoid variable over-loading in cache_defer_req In cache_defer_req, 'dreq' is used for two significantly different values that happen to be of the same type. This is both confusing, and makes it hard to extend the range of one of the values as we will in the next patch. So introduce 'discard' to take one of the values. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-18 17:01:12 -04:00
NeilBrown	67e7328f15	sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req Using list_del_init is generally safer than list_del, and it will allow us, in a subsequent patch, to see if an entry has already been processed or not. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-18 11:47:49 -04:00
Trond Myklebust	5d351754fc	SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous Otherwise, the upcall is going to be synchronous, which may not be what the caller wants... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:33 -04:00
Alexandros Batsakis	f300baba5a	nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel [sunrpc: change idle timeout value for the backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-13 15:46:15 -04:00
NeilBrown	908329f2c0	sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked. The extra call to cache_revisit_request in cache_fresh_unlocked is not needed, as should have been fairly clear at the time of commit `4013edea9a` If there are requests to be revisited, then we can be sure that CACHE_PENDING is set, so the second call is sufficient. So remove the first call. Then remove the 'new' parameter, then remove the return value for cache_fresh_locked which is only used to provide the value for 'new'. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-11 17:08:54 -04:00
NeilBrown	9e4c6379a6	sunrpc/cache: change cache_defer_req to return -ve error, not boolean. As "cache_defer_req" does not sound like a predicate, having it return a boolean value can be confusing. It is more consistent to return 0 for success and negative for error. Exactly what error code to return is not important as we don't differentiate between reasons why the request wasn't deferred, we only care about whether it was deferred or not. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-11 17:03:27 -04:00
Rahul Iyer	4cfc7e6019	nfsd41: sunrpc: Added rpc server-side backchannel handling When the call direction is a reply, copy the xid and call direction into the req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns rpc_garbage. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [get rid of CONFIG_NFSD_V4_1] [sunrpc: refactoring of svc_tcp_recvfrom] [nfsd41: sunrpc: create common send routine for the fore and the back channels] [nfsd41: sunrpc: Use free_page() to free server backchannel pages] [nfsd41: sunrpc: Document server backchannel locking] [nfsd41: sunrpc: remove bc_connect_worker()] [nfsd41: sunrpc: Define xprt_server_backchannel()[ [nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions] [nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()] [nfsd41: sunrpc: Don't auto close the server backchannel connection] [nfsd41: sunrpc: Remove unused functions] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: change bc_sock to bc_xprt] [nfsd41: sunrpc: move struct rpc_buffer def into a common header file] [nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex] [removed cosmetic changes] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: add new xprt class for nfsv4.1 backchannel] [sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [reverted more cosmetic leftovers] [got rid of xprt_server_backchannel] [separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@netapp.com> [sunrpc: change idle timeout value for the backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Trond Myklebust <trond.myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-11 15:04:16 -04:00
Trond Myklebust	ab3bbaa8b2	Merge branch 'nfs-for-2.6.32'	2009-09-11 14:59:37 -04:00
Benny Halevy	6951867b99	nfsd41: sunrpc: move struct rpc_buffer def into sunrpc.h Move struct rpc_buffer's definition into a sunrpc.h, a common, internal header file, in preparation for supporting the nfsv4.1 backchannel. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: sunrpc: #include <linux/net.h> from sunrpc.h] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-10 12:09:06 -04:00
Trond Myklebust	2574cc9f4f	SUNRPC: Fix rpc_task_force_reencode This patch fixes the bug that was reported in http://bugzilla.kernel.org/show_bug.cgi?id=14053 If we're in the case where we need to force a reencode and then resend of the RPC request, due to xprt_transmit failing with a networking error, then we _must_ retransmit the entire request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-28 19:35:56 -10:00
Wei Yongjun	b0401d7253	sunrpc: move the close processing after do recvfrom method sunrpc: "Move close processing to a single place" (`d7979ae4a0`) moved the close processing before the recvfrom method. This may cause the close processing never to execute. So this patch moves it to the right place. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-27 17:18:38 -04:00
Wei Yongjun	eac81736e6	sunrpc: reply AUTH_BADCRED to RPCSEC_GSS with unknown service When an RPC message is received with RPCSEC_GSS with an unknown service (not RPC_GSS_SVC_NONE, RPC_GSS_SVC_INTEGRITY, or RPC_GSS_SVC_PRIVACY), svcauth_gss_accept() returns AUTH_BADCRED, but svcauth_gss_release() subsequently drops the response entirely, discarding the error. Fix that so the AUTH_BADCRED error is returned to the client. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-25 17:39:43 -04:00
Ryusei Yamaguchi	ed2d8aed52	knfsd: Replace lock_kernel with a mutex in nfsd pool stats. lock_kernel() in knfsd was replaced with a mutex. The later commit `03cf6c9f49` ("knfsd: add file to export stats about nfsd pools") did not follow that change. This patch fixes the issue. Also move the get and put of nfsd_serv to the open and close methods (instead of start and stop methods) to allow atomic check and increment of reference count in the open method (where we can still return an error). Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Greg Banks <gnb@fmeh.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-25 12:39:37 -04:00
Alexandros Batsakis	8f55f3c0a0	nfsd41: sunrpc: svc_tcp_recv_record() Factor functionality out of svc_tcp_recvfrom() to simplify routine Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-24 18:13:10 -04:00
J. Bruce Fields	e9dc122166	Merge branch 'nfs-for-2.6.32' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 into for-2.6.32-incoming Conflicts: net/sunrpc/cache.c	2009-08-21 11:27:29 -04:00
Trond Myklebust	405d8f8b1d	SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc... We can oops if rpc_pipefs isn't properly initialised before we start to set up objects that depend upon it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-21 08:17:56 -04:00
Trond Myklebust	f7e86ab92f	SUNRPC: cache must take a reference to the cache detail's module on open() Otherwise we Oops if the module containing the cache detail is removed before all cache readers have closed the file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 18:22:16 -04:00
Trond Myklebust	e571cbf1a4	NFS: Add a dns resolver for use with NFSv4 referrals and migration The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client from one server to another in order to support filesystem migration and replication. For full protocol support, we need to add the ability to convert a DNS host name into an IP address that we can feed to the RPC client. We'll reuse the sunrpc cache, now that it has been converted to work with rpc_pipefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 18:22:15 -04:00
Trond Myklebust	96c61cbd0f	SUNRPC: Fix a typo in cache_pipefs_files We want the channel to be a regular file, so that we don't need to supply rpc_pipe_ops. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 18:22:15 -04:00
Trond Myklebust	6a396f67d2	Merge branch 'nfsv4_xdr_cleanups-for-2.6.32' into nfs-for-2.6.32 Conflicts: fs/nfs/nfs4xdr.c	2009-08-19 18:21:52 -04:00
Benny Halevy	98866b5abe	sunrpc: ntoh -> be*_to_cpu ntohl is already defined as be32_to_cpu. be64_to_cpu has architecture specific optimized implementations. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:12:52 -04:00
Benny Halevy	9f162d2a81	sunrpc: hton -> cpu_to_be* htonl is already defined as cpu_to_be32. cpu_to_be64 has architecture specific optimized implementations. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:12:38 -04:00
Trond Myklebust	f884dcaead	Merge branch 'sunrpc_cache-for-2.6.32' into nfs-for-2.6.32	2009-08-10 17:45:58 -04:00
Trond Myklebust	976a6f921c	Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32	2009-08-10 17:45:50 -04:00
Trond Myklebust	8854e82d9a	SUNRPC: Add an rpc_pipefs front end for the sunrpc cache code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:30 -04:00
Trond Myklebust	173912a6ad	SUNRPC: Move procfs-specific stuff out of the generic sunrpc cache code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:29 -04:00
Trond Myklebust	bc74b4f5e6	SUNRPC: Allow the cache_detail to specify alternative upcall mechanisms For events that are rare, such as referral DNS lookups, it makes limited sense to have a daemon constantly listening for upcalls on a channel. An alternative in those cases might simply be to run the app that fills the cache using call_usermodehelper_exec() and friends. The following patch allows the cache_detail to specify alternative upcall mechanisms for these particular cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:29 -04:00
Trond Myklebust	da77005f0d	SUNRPC: Remove the global temporary write buffer in net/sunrpc/cache.c While we do want to protect against multiple concurrent readers and writers on each upcall/downcall pipe, we don't want to limit concurrent reading and writing to separate caches. This patch therefore replaces the static buffer 'write_buf', which can only be used by one writer at a time, with use of the page cache as the temporary buffer for downcalls. We still fall back to using the the old global buffer if the downcall is larger than PAGE_CACHE_SIZE, since this is apparently needed by the SPKM security context initialisation. It then replaces the use of the global 'queue_io_mutex' with the inode->i_mutex in cache_read() and cache_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:28 -04:00
Trond Myklebust	5b7a1b9f92	SUNRPC: Ensure we initialise the cache_detail before creating procfs files Also ensure that we destroy those files before we destroy the cache_detail. Otherwise, user processes might attempt to write into uninitialised caches. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:27 -04:00
Trond Myklebust	2da8ca26c6	NFSD: Clean up the idmapper warning... What part of 'internal use' is so hard to understand? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:26 -04:00
Trond Myklebust	e57aed77ad	SUNRPC: One more clean up for rpc_create_client_dir() In order to allow rpc_pipefs to create directories with different types of subtrees, it is useful to allow the caller to customise the subtree filling process. In order to do so, we separate out the parts which are specific to making an RPC client directory, and put them in a separate helper, then we convert the process of filling the directory contents into a callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:26 -04:00
Trond Myklebust	23ac658170	SUNRPC: clean up rpc_setup_pipedir() There is still a little wart or two there: Since we've already got a vfsmount, we might as well pass that in to rpc_create_client_dir. Another point is that if we open code __rpc_lookup_path() here, then we can avoid looking up the entire parent directory path over and over again: it doesn't change. Also get rid of rpc_clnt->cl_pathname, since it has no users... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:25 -04:00
Trond Myklebust	7d217caca5	SUNRPC: Replace rpc_client->cl_dentry and cl_mnt, with a cl_path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:24 -04:00
Trond Myklebust	7d59d1e865	SUNRPC: Clean up rpc_create_client_dir() Factor out the code that does lookups from the code that actually creates the directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:23 -04:00
Trond Myklebust	458adb8ba9	SUNRPC: Rename rpc_mkdir to rpc_create_client_dir() This reflects the fact that rpc_mkdir() as it stands today, can only create a RPC client type directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:22 -04:00
Trond Myklebust	bb1567491e	SUNRPC: rpc_pipefs cleanup Move the files[] array closer to rpc_fill_super() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:21 -04:00
Trond Myklebust	ac6fecee31	SUNRPC: Clean up rpc_populate/depopulate Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:20 -04:00
Trond Myklebust	cfeaa4a3ca	SUNRPC: Clean up rpc_lookup_create Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:20 -04:00
Trond Myklebust	810d90bc2a	SUNRPC: Clean up rpc_unlink() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:18 -04:00
Trond Myklebust	7589806e96	SUNRPC: Clean up file creation code in rpc_pipefs Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:17 -04:00
Trond Myklebust	b5bb61da2e	SUNRPC: Clean up rpc_pipefs lookup code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:17 -04:00
Trond Myklebust	7364af6a2d	SUNRPC: Allow rpc_pipefs_ops to have null values for upcall and downcall Also ensure that we use the umode_t type when appropriate... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:16 -04:00
Trond Myklebust	b693ba4a33	SUNRPC: Constify rpc_pipe_ops... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:15 -04:00
Chuck Lever	c05988cdb0	SUNRPC: Add documenting comments in net/sunrpc/timer.c Clean up: provide documenting comments for the functions in net/sunrpc/timer.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:47 -04:00
Chuck Lever	9dc3b095b7	SUNRPC: Update xprt address strings after an rpcbind completes After a bind completes, update the transport instance's address strings so debugging messages display the current port the transport is connected to. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:46 -04:00
Chuck Lever	c740eff84b	SUNRPC: Kill RPC_DISPLAY_ALL At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL. Currently there are no uses of RPC_DISPLAY_ALL outside the transport modules themselves, so we can safely get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:46 -04:00
Chuck Lever	fbfffbd5e7	SUNRPC: Rename sock_xprt.addr as sock_xprt.srcaddr Clean up: Give the "addr" and "port" field less ambiguous names. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:46 -04:00
Chuck Lever	f8b761eff1	SUNRPC: Eliminate PROC macro from rpcb_clnt Clean up: Replace PROC macro with open coded C99 structure initializers to improve readability. The rpcbind v4 GETVERSADDR procedure is never sent by the current implementation, so it is not copied to the new structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:44 -04:00
Chuck Lever	0e47f0d665	SUNRPC: Clean up: Remove unused XDR decoder functions from rpcb_clnt.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:43 -04:00
Chuck Lever	c0c077df00	SUNRPC: Introduce new xdr_stream-based decoders to rpcb_clnt.c Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new implementation also checks that the incoming port number is reasonable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:43 -04:00
Chuck Lever	7ed0ff983c	SUNRPC: Introduce xdr_stream-based decoders for RPCB_UNSET Replace the open-coded decode logic for rpcbind UNSET results with an xdr_stream-based implementation, similar to what NFSv4 uses, to protect against buffer overflows. The new function is unused for the moment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:42 -04:00
Chuck Lever	0d36c4f757	SUNRPC: Clean up: Remove unused XDR encoder functions from rpcb_clnt.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:41 -04:00
Chuck Lever	6f2c2db7a4	SUNRPC: Introduce new xdr_stream-based encoders to rpcb_clnt.c Replace the open-coded encode logic for rpcbind arguments with an xdr_stream-based implementation, similar to what NFSv4 uses, to better protect against buffer overflows. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:40 -04:00
Chuck Lever	c877b849d3	SUNRPC: Use rpc_ntop() for constructing transport address strings Clean up: In addition to using the new generic rpc_ntop() and rpc_get_port() functions, have the RPC client compute the presentation address buffer sizes dynamically using kstrdup(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:36 -04:00
Chuck Lever	ba809130bc	SUNRPC: Remove duplicate universal address generation RPC universal address generation is currently done in several places: rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the redundant cases that convert a socket address to a universal address. The nfs4proc.c case takes a pre-formatted presentation address string, not a socket address, so we'll leave that one. Because the new uaddr constructor uses the recently introduced rpc_ntop(), it now supports proper "::" shorthanding for IPv6 addresses. This allows the kernel to register properly formed universal addresses with the local rpcbind service, in _all_ cases. The kernel can now also send properly formed universal addresses in RPCB_GETADDR requests, and support link-local properly when encoding and decoding IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:35 -04:00
Chuck Lever	a02d692611	SUNRPC: Provide functions for managing universal addresses Introduce a set of functions in the kernel's RPC implementation for converting between a socket address and either a standard presentation address string or an RPC universal address. The universal address functions will be used to encode and decode RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are part of a previous promise to deliver shared functions that can be used by upper-layer protocols to display and manipulate IP addresses. The kernel's current address printf formatters were designed specifically for kernel to user-space APIs that require a particular string format for socket addresses, thus are somewhat limited for the purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does not support short-handing or scope IDs. Also, these printf formatters are unique per address family, so a separate formatter string is required for printing AF_INET and AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:34 -04:00
Chuck Lever	0b10bf5e14	SUNRPC: Move XDR data type size macros Clean up: To make subsequent patches cleaner, move the XDR data type size macros to the top of the file (similar to nfs4xdr.c) first. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:33 -04:00
Trond Myklebust	cbf1107126	SUNRPC: convert some sysctls into module parameters Parameters like the minimum reserved port, and the number of slot entries should really be module parameters rather than sysctls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
NeilBrown	560ab42ef9	sunrpc: fix memory leak in unix_gid cache. When we look up an entry in the uid->gidlist cache, we take a reference to the content but don't drop the reference to the cache entry. So it never gets freed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-04 16:52:44 -04:00
NeilBrown	989a19b9b1	sunrpc/cache: recheck cache validity after cache_defer_req If cache_defer_req did not leave the request on a queue, then it could possibly have waited long enough that the cache became valid. So check the status after the call. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-04 16:21:44 -04:00
NeilBrown	5c4d263903	sunrpc/cache: make sure deferred requests eventually get revisited. While deferred requests normally get revisited quite quickly, it is possible for a request to remain in the deferral queue when the cache item is discarded. We can easily make sure that doesn't happen by calling cache_revisit_request just before the final 'put'. Also there is a small chance that a race would cause one thread to defer a request against a cache item while another thread is failing to queue an upcall for that item. So when the upcall fails, make sure to revisit all deferred requests. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-04 11:03:01 -04:00
NeilBrown	f866a8194f	sunrpc/cache: rename queue_loose to cache_dequeue 'loose' was a mis-spelling of 'lose', and even that wasn't a good word choice. So give this function a more useful name. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-04 09:14:30 -04:00
Chuck Lever	7702ce40bc	SUNRPC: handle IPv6 PKTINFO when extracting destination address PKTINFO is needed to scrape the caller's IP address off the socket so RPC datagram replies are routed correctly. Fill in missing pieces in the kernel RPC server's UDP receive path to request IPv6 PKTINFO and correctly parse the IPv6 cmsg header. Without this patch, kernel RPC services drop all incoming requests on UDP on IPv6. Related commit: `7a37f5787e` Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-14 17:39:46 -04:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Wei Yongjun	846d8e7cc8	svcrdma: fix error handling of rdma_alloc_frmr() ib_alloc_fast_reg_mr() and ib_alloc_fast_reg_page_list() returns ERR_PTR() and not NULL. Compile tested only. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-03 10:14:59 -04:00
Jesper Dangaard Brouer	75de874f5c	sunrpc: Use rcu_barrier() on unload. The sunrpc module uses rcu_call() thus it should use rcu_barrier() on module unload. Have not verified that the possibility for new call_rcu() callbacks has been disabled. As a hint for checking, the functions calling call_rcu() (unx_destroy_cred and generic_destroy_cred) are registered as crdestroy function pointer in struct rpc_credops. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-26 13:51:34 -07:00
Linus Torvalds	7e0338c0de	Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...	2009-06-22 12:55:50 -07:00
Ricardo Labiaga	e9f0298558	nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh() xprt_alloc_bc_request() is always called in soft interrupt context. Grab the spin_lock instead of the bottom half spin_lock. Softirqs do not preempt other softirqs running on the same processor, so there is no need to disable bottom halves. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-20 14:55:39 -04:00
Trond Myklebust	47fcb03fef	SUNRPC: Fix the TCP server's send buffer accounting Currently, the sunrpc server is refusing to allow us to process new RPC calls if the TCP send buffer is 2/3 full, even if we do actually have enough free space to guarantee that we can send another request. The following patch fixes svc_tcp_has_wspace() so that we only stop processing requests if we know that the socket buffer cannot possibly fit another reply. It also fixes the tcp write_space() callback so that we only clear the SOCK_NOSPACE flag when the TCP send buffer is less than 2/3 full. This should ensure that the send window will grow as per the standard TCP socket code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 19:58:51 -07:00
Trond Myklebust	1f84603c09	Merge branch 'devel-for-2.6.31' into for-2.6.31 Conflicts: fs/nfs/client.c fs/nfs/super.c	2009-06-18 18:13:44 -07:00
Trond Myklebust	301933a0ac	Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31	2009-06-17 17:59:58 -07:00
Ricardo Labiaga	dd2b63d049	nfs41: Rename rq_received to rq_reply_bytes_recvd The 'rq_received' member of 'struct rpc_rqst' is used to track when we have received a reply to our request. With v4.1, the backchannel can now accept callback requests over the existing connection. Rename this field to make it clear that it is only used for tracking reply bytes and not all bytes received on the connection. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:40 -07:00
Rahul Iyer	343952fa5a	nfs41: Get the rpc_xprt * from the rpc_rqst instead of the rpc_clnt. Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies can both use the same code path. A client needs the rpc_xprt in order to reply to a callback. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:34 -07:00
Benny Halevy	8f97524235	nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:31 -07:00
Andy Adamson	9c9f3f5fa6	nfs41: sunrpc: add a struct svc_xprt pointer to struct svc_serv for backchannel use This svc_xprt is passed on to the callback service thread to be later used to processes incoming svc_rqst's Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:31 -07:00
Benny Halevy	7652e5a09b	nfs41: sunrpc: provide functions to create and destroy a svc_xprt for backchannel use For nfs41 callbacks we need an svc_xprt to process requests coming up the backchannel socket as rpc_rqst's that are transformed into svc_rqst's that need a rq_xprt to be processed. The svc_{udp,tcp}_create methods are too heavy for this job as svc_create_socket creates an actual socket to listen on while for nfs41 we're "reusing" the fore channel's socket. Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:30 -07:00
Ricardo Labiaga	4d6bbb6233	nfs41: Backchannel bc_svc_process() Implement the NFSv4.1 backchannel service. Invokes the common callback processing logic svc_process_common() to authenticate the call and dispatch the appropriate NFSv4.1 XDR decoder and operation procedure. It then invokes bc_send() to send the reply over the same connection. bc_send() is implemented in a separate patch. At this time there is no slot validation or reply cache handling. [nfs41: Preallocate rpc_rqst receive buffer for handling callbacks] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Move bc_svc_process() declaration to correct patch] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:29 -07:00
Ricardo Labiaga	1cad7ea6fe	nfs41: Refactor svc_process() net/sunrpc/svc.c:svc_process() is used by the NFSv4 callback service to process RPC requests arriving over connections initiated by the server. NFSv4.1 supports callbacks over the backchannel on connections initiated by the client. This patch refactors svc_process() so that common code can also be used by the backchannel. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:28 -07:00
Ricardo Labiaga	0d90ba1cd4	nfs41: Backchannel callback service helper routines Executes the backchannel task on the RPC state machine using the existing open connection previously established by the client. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> nfs41: Add bc_svc.o to sunrpc Makefile. [nfs41: bc_send() does not need to be exported outside RPC module] [nfs41: xprt_free_bc_request() need not be exported outside RPC module] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Update copyright] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:28 -07:00
Ricardo Labiaga	55ae1aabfb	nfs41: Add backchannel processing support to RPC state machine Adds rpc_run_bc_task() which is called by the NFS callback service to process backchannel requests. It performs similar work to rpc_run_task() though "schedules" the backchannel task to be executed starting at the call_trasmit state in the RPC state machine. It also introduces some miscellaneous updates to the argument validation, call_transmit, and transport cleanup functions to take into account that there are now forechannel and backchannel tasks. Backchannel requests do not carry an RPC message structure, since the payload has already been XDR encoded using the existing NFSv4 callback mechanism. Introduce a new transmit state for the client to reply on to backchannel requests. This new state simply reserves the transport and issues the reply. In case of a connection related error, disconnects the transport and drops the reply. It requires the forechannel to re-establish the connection and the server to retransmit the request, as stated in NFSv4.1 section 2.9.2 "Client and Server Transport Behavior". Note: There is no need to loop attempting to reserve the transport. If EAGAIN is returned by xprt_prepare_transmit(), return with tk_status == 0, setting tk_action to call_bc_transmit. rpc_execute() will invoke it again after the task is taken off the sleep queue. [nfs41: rpc_run_bc_task() need not be exported outside RPC module] [nfs41: New call_bc_transmit RPC state] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Backchannel: No need to loop in call_bc_transmit()] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [rpc_count_iostats incorrectly exits early] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Convert rpc_reply_expected() to inline function] [Remove unnecessary BUG_ON()] [Rename variable] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:24 -07:00
Trond Myklebust	88b5ed73bc	SUNRPC: Fix a missing "break" option in xs_tcp_setup_socket() In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want to get rid of the existing socket and retry immediately, just as the comment says. Currently we end up sleeping for a minute, due to the missing "break" statement. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:22:57 -07:00
Ricardo Labiaga	44b98efdd0	nfs41: New xs_tcp_read_data() Handles RPC replies and backchannel callbacks. Traditionally the NFS client has expected only RPC replies on its open connections. With NFSv4.1, callbacks can arrive over an existing open connection. This patch refactors the old xs_tcp_read_request() into an RPC reply handler: xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(), and a common routine to read the data off the transport: xs_tcp_read_common(). The new xs_tcp_read_callback() queues callback requests onto a queue where the callback service (a separate thread) is listening for the processing. This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com) and Benny Halevy (bhalevy@panasas.com). xs_tcp_read_callback() drops the connection when the number of expected callbacks is exceeded. Use xprt_force_disconnect(), ensuring tasks on the pending queue are awaken on disconnect. [nfs41: Keep track of RPC call/reply direction with a flag] [nfs41: Preallocate rpc_rqst receive buffer for handling callbacks] Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Moves embedded #ifdefs into #ifdef function blocks] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 13:06:16 -07:00
Ricardo Labiaga	fb7a0b9add	nfs41: New backchannel helper routines This patch introduces support to setup the callback xprt on the client side. It allocates/ destroys the preallocated memory structures used to process backchannel requests. At setup time, xprt_setup_backchannel() is invoked to allocate one or more rpc_rqst structures and substructures. This ensures that they are available when an RPC callback arrives. The rpc_rqst structures are maintained in a linked list attached to the rpc_xprt structure. We keep track of the number of allocations so that they can be correctly removed when the channel is destroyed. When an RPC callback arrives, xprt_alloc_bc_request() is invoked to obtain a preallocated rpc_rqst structure. An rpc_xprt structure is returned, and its RPC_BC_PREALLOC_IN_USE bit is set in rpc_xprt->bc_flags. The structure is removed from the the list since it is now in use, and it will be later added back when its user is done with it. After the RPC callback replies, the rpc_rqst structure is returned by invoking xprt_free_bc_request(). This clears the RPC_BC_PREALLOC_IN_USE bit and adds it back to the list, allowing it to be reused by a subsequent RPC callback request. To be consistent with the reception of RPC messages, the backchannel requests should be placed into the 'struct rpc_rqst' rq_rcv_buf, which is then in turn copied to the 'struct rpc_rqst' rq_private_buf. [nfs41: Preallocate rpc_rqst receive buffer for handling callbacks] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Update copyright notice and explain page allocation] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 13:06:14 -07:00
Ricardo Labiaga	f9acac1a47	nfs41: Initialize new rpc_xprt callback related fields Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 13:06:14 -07:00
Ricardo Labiaga	f4a2e418bf	nfs41: Process the RPC call direction Reading and storing the RPC direction is a three step process. 1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it in the XDR buffer since the 'struct rpc_rqst' is not yet available. 2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state. This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR. For example, we may be reading a continuation packet to a large reply. Therefore, we can't simply obtain the 'struct rpc_rqst' during the TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA. This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to read the RPC direction. It then uses TCP_RCV_COPY_CALLDIR to indicate the RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated. 3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper functions. xs_tcp_read_common() then saves the RPC direction in the XDR buffer if TCP_RCV_COPY_CALLDIR is set. This will happen when we're reading the data immediately after the direction was read. xs_tcp_read_common() then clears this flag. [was nfs41: Skip past the RPC call direction] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: sunrpc: Add RPC direction back into the XDR buffer] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: sunrpc: Don't skip past the RPC call direction] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:43:46 -07:00
Ricardo Labiaga	18dca02aeb	nfs41: Add ability to read RPC call direction on TCP stream. NFSv4.1 callbacks can arrive over an existing connection. This patch adds the logic to read the RPC call direction (call or reply). It does this by updating the state machine to look for the call direction invoking xs_tcp_read_calldir(...) after reading the XID. [nfs41: Keep track of RPC call/reply direction with a flag] As per 11/14/08 review of RFC 53/85. Add a new flag to track whether the incoming message is an RPC call or an RPC reply. TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel. It is cleared if the message is an RPC request sent on the back channel. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:43:45 -07:00
Andy Adamson	aae2006e9b	nfs41: sunrpc: Export the call prepare state for session reset Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:07 -07:00
Christoph Lameter	62bc62a873	page allocator: use a pre-calculated value instead of num_online_nodes() in fast paths num_online_nodes() is called in a number of places but most often by the page allocator when deciding whether the zonelist needs to be filtered based on cpusets or the zonelist cache. This is actually a heavy function and touches a number of cache lines. This patch stores the number of online nodes at boot time and updates the value when nodes get onlined and offlined. The value is then used in a number of important paths in place of num_online_nodes(). [rientjes@google.com: do not override definition of node_set_online() with macro] Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:35 -07:00
Christian Engelmayer	59fb30660b	sunrpc: potential memory leak in function rdma_read_xdr In case the check on ch_count fails the cleanup path is skipped and the previously allocated memory 'rpl_map', 'chl_map' is not freed. Reported by Coverity. Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 19:34:32 -07:00
Anton Blanchard	6aad89c837	sunrpc: align cache_clean work's timer Align cache_clean work. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 18:14:58 -07:00
J. Bruce Fields	7eef4091a6	Merge commit 'v2.6.30' into for-2.6.31	2009-06-15 18:08:07 -07:00
David S. Miller	9cbc1cb8cd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c	2009-06-15 03:02:23 -07:00
Jesper Dangaard Brouer	bf12691d84	sunrpc/auth_gss: Call rcu_barrier() on module unload. As the module uses rcu_call() we should make sure that all rcu callback has been completed before removing the code. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-10 01:11:27 -07:00
Eric Dumazet	adf30907d6	net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:04 -07:00
Linus Torvalds	c8bce3d3bd	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: svcrdma: dma unmap the correct length for the RPCRDMA header page. nfsd: Revert "svcrpc: take advantage of tcp autotuning" nfsd: fix hung up of nfs client while sync write data to nfs server	2009-05-29 08:49:09 -07:00
Steve Wise	98779be861	svcrdma: dma unmap the correct length for the RPCRDMA header page. The svcrdma module was incorrectly unmapping the RPCRDMA header page. On IBM pserver systems this causes a resource leak that results in running out of bus address space (10 cthon iterations will reproduce it). The code was mapping the full page but only unmapping the actual header length. The fix is to only map the header length. I also cleaned up the use of ib_dma_map_page() calls since the unmap logic always uses ib_dma_unmap_single(). I made these symmetrical. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-05-27 18:57:24 -04:00
J. Bruce Fields	7f4218354f	nfsd: Revert "svcrpc: take advantage of tcp autotuning" This reverts commit `47a14ef1af` "svcrpc: take advantage of tcp autotuning", which uncovered some further problems in the server rpc code, causing significant performance regressions in common cases. We will likely reinstate this patch after releasing 2.6.30 and applying some work on the underlying fixes to the problem (developed by Trond). Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: Olga Kornievskaia <aglo@citi.umich.edu> Cc: Jim Rees <rees@umich.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-05-27 18:51:06 -04:00
Vu Pham	68743082b5	XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices mlx4/connectX FRMR requires local write enable together with remote rdma write enable. This fixes NFS/RDMA operation over the ConnectX Infiniband HCA in the default memreg mode. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-05-26 14:51:00 -04:00
Linus Torvalds	2ea3f86848	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: nfsd: silence lockdep warning lockd: fix list corruption on lockd restart nfsd4: check for negative dentry before use in nfsv4 readdir nfsd41: slots are freed with session svcrdma: clean up error paths. svcrdma: Fix dma map direction for rdma read targets	2009-05-12 17:11:56 -07:00
Steve Wise	21515e46bc	svcrdma: clean up error paths. These fixes resolved crashes due to resource leak BUG_ON checks. The resource leaks were detected by introducing asynchronous transport errors. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-05-03 14:19:10 -04:00
Trond Myklebust	f75e6745aa	SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect See http://bugzilla.kernel.org/show_bug.cgi?id=13034 If the port gets into a TIME_WAIT state, then we cannot reconnect without binding to a new port. Tested-by: Petr Vandrovec <petr@vandrovec.name> Tested-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-02 16:35:08 -07:00
Chuck Lever	017cb47f46	SUNRPC: Clean up one_sock_name() Clean up svc_one_sock_name() by setting up automatic variables for frequently used expressions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:29 -04:00
Chuck Lever	58de2f8658	SUNRPC: Support PF_INET6 in one_sock_name() Add an arm to the switch statement in svc_one_sock_name() so it can construct the name of PF_INET6 sockets properly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aime Le Rouzic <aime.le-rouzic@bull.net> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:29 -04:00
Chuck Lever	e7942b9f25	SUNRPC: Switch one_sock_name() to use snprintf() Use snprintf() in one_sock_name() to prevent overflowing the output buffer. If the name doesn't fit in the buffer, the buffer is filled in with an empty string, and -ENAMETOOLONG is returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:29 -04:00
Chuck Lever	8435d34dbb	SUNRPC: pass buffer size to svc_sock_names() Adjust the synopsis of svc_sock_names() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:28 -04:00
Chuck Lever	bfba9ab4c6	SUNRPC: pass buffer size to svc_addsock() Adjust the synopsis of svc_addsock() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:28 -04:00
Chuck Lever	335c54bdc4	NFSD: Prevent a buffer overflow in svc_xprt_names() The svc_xprt_names() function can overflow its buffer if it's so near the end of the passed in buffer that the "name too long" string still doesn't fit. Of course, it could never tell if it was near the end of the passed in buffer, since its only caller passes in zero as the buffer length. Let's make this API a little safer. Change svc_xprt_names() so it always checks for a buffer overflow, and change its only caller to pass in the correct buffer length. If svc_xprt_names() does overflow its buffer, it now fails with an ENAMETOOLONG errno, instead of trying to write a message at the end of the buffer. I don't like this much, but I can't figure out a clean way that's always safe to return some of the names, and an indication that the buffer was not long enough. The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is "File name too long". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:28 -04:00
Chuck Lever	abc5c44d62	SUNRPC: Fix error return value of svc_addr_len() The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't recognize the address family of the passed-in socket address. However, the return type of this function is size_t, which means -EAFNOSUPPORT is turned into a very large positive value in this case. The check in svc_udp_recvfrom() to see if the return value is less than zero therefore won't work at all. Additionally, handle_connect_req() passes this value directly to memset(). This could cause memset() to clobber a large chunk of memory if svc_addr_len() has returned an error. Currently the address family of these addresses, however, is known to be supported long before handle_connect_req() is called, so this isn't a real risk. Change the error return value of svc_addr_len() to zero, which fits in the range of size_t, and is safer to pass to memset() directly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:25 -04:00
H Hartley Sweeten	dcf1a3573e	net/sunrpc/svc_xprt.c: fix sparse warnings Fix the following sparse warnings in net/sunrpc/svc_xprt.c. warning: symbol 'svc_recv' was not declared. Should it be static? warning: symbol 'svc_drop' was not declared. Should it be static? warning: symbol 'svc_send' was not declared. Should it be static? warning: symbol 'svc_close_all' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:00:02 -04:00
Steve Wise	d0687be7c7	svcrdma: Fix dma map direction for rdma read targets The nfs server rdma transport was mapping rdma read target pages for TO_DEVICE instead of FROM_DEVICE. This causes data corruption on non cache-coherent systems if frmrs are used. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-25 18:11:14 -04:00
Linus Torvalds	a63856252d	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...	2009-04-06 13:25:56 -07:00
Linus Torvalds	90975ef712	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...	2009-04-05 10:33:07 -07:00
Andy Adamson	2f425878b6	nfsd: don't use the deferral service, return NFS4ERR_DELAY On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:12 -07:00
Linus Torvalds	811158b147	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits) trivial: Update my email address trivial: NULL noise: drivers/mtd/tests/mtd_*test.c trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h trivial: Fix misspelling of "Celsius". trivial: remove unused variable 'path' in alloc_file() trivial: fix a pdlfush -> pdflush typo in comment trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL trivial: wusb: Storage class should be before const qualifier trivial: drivers/char/bsr.c: Storage class should be before const qualifier trivial: h8300: Storage class should be before const qualifier trivial: fix where cgroup documentation is not correctly referred to trivial: Give the right path in Documentation example trivial: MTD: remove EOL from MODULE_DESCRIPTION trivial: Fix typo in bio_split()'s documentation trivial: PWM: fix of #endif comment trivial: fix typos/grammar errors in Kconfig texts trivial: Fix misspelling of firmware trivial: cgroups: documentation typo and spelling corrections trivial: Update contact info for Jochen Hein trivial: fix typo "resgister" -> "register" ...	2009-04-03 15:24:35 -07:00
Trond Myklebust	cc85906110	Merge branch 'devel' into for-linus	2009-04-01 13:28:15 -04:00
Trond Myklebust	c69da774b2	SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port Also ensure that we use the protocol family instead of the address family when calling sock_create_kern(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-04-01 13:24:29 -04:00
Rusty Russell	558f6ab910	Merge branch 'cpumask-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).	2009-03-31 13:33:50 +10:30
Linus Torvalds	d17abcd541	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: oprofile: Thou shalt not call __exit functions from __init functions cpumask: remove the now-obsoleted pcibus_to_cpumask(): generic cpumask: remove cpumask_t from core cpumask: convert rcutorture.c cpumask: use new cpumask_ functions in core code. cpumask: remove references to struct irqaction's mask field. cpumask: use mm_cpumask() wrapper: kernel/fork.c cpumask: use set_cpu_active in init/main.c cpumask: remove node_to_first_cpu cpumask: fix seq_bitmap_*() functions. cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL	2009-03-30 18:00:26 -07:00
Ingo Molnar	65fb0d23fc	Merge branch 'linus' into cpumask-for-linus Conflicts: arch/x86/kernel/cpu/common.c	2009-03-30 23:53:32 +02:00
Alexey Dobriyan	99b7623380	proc 2/2: remove struct proc_dir_entry::owner Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy as correctly noted at bug #12454. Someone can lookup entry with NULL ->owner, thus not pinning enything, and release it later resulting in module refcount underflow. We can keep ->owner and supply it at registration time like ->proc_fops and ->data. But this leaves ->owner as easy-manipulative field (just one C assignment) and somebody will forget to unpin previous/pin current module when switching ->owner. ->proc_fops is declared as "const" which should give some thoughts. ->read_proc/->write_proc were just fixed to not require ->owner for protection. rmmod'ed directories will be empty and return "." and ".." -- no harm. And directories with tricky enough readdir and lookup shouldn't be modular. We definitely don't want such modular code. Removing ->owner will also make PDE smaller. So, let's nuke it. Kudos to Jeff Layton for reminding about this, let's say, oversight. http://bugzilla.kernel.org/show_bug.cgi?id=12454 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:44 +04:00
Matt LaPlante	692105b8ac	trivial: fix typos/grammar errors in Kconfig texts Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:01 +02:00
Rusty Russell	aa85ea5b89	cpumask: use new cpumask_ functions in core code. Impact: cleanup Time to clean up remaining laggards using the old cpu_ functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Trond.Myklebust@netapp.com	2009-03-30 22:05:16 +10:30
Chuck Lever	9355982830	SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 We just augmented the kernel's RPC service registration code so that it automatically adjusts to what is supported in user space. Thus we no longer need the kernel configuration option to enable registering RPC services with v4 -- it's all done automatically. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 16:00:17 -04:00
Chuck Lever	363f724cdd	SUNRPC: rpcb_register() should handle errors silently Move error reporting for RPC registration to rpcb_register's caller. This way the caller can choose to recover silently from certain errors, but report errors it does not recognize. Error reporting for kernel RPC service registration is now handled in one place. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:59:48 -04:00
Chuck Lever	cadc0fa534	SUNRPC: Simplify kernel RPC service registration The kernel registers RPC services with the local portmapper with an rpcbind SET upcall to the local portmapper. Traditionally, this used rpcbind v2 (PMAP), but registering RPC services that support IPv6 requires rpcbind v3 or v4. Since we now want separate PF_INET and PF_INET6 listeners for each kernel RPC service, svc_register() will do only one of those registrations at a time. For PF_INET, it tries an rpcb v4 SET upcall first; if that fails, it does a legacy portmap SET. This makes it entirely backwards compatible with legacy user space, but allows a proper v4 SET to be used if rpcbind is available. For PF_INET6, it does an rpcb v4 SET upcall. If that fails, it fails the registration, and thus the transport creation. This let's the kernel detect if user space is able to support IPv6 RPC services, and thus whether it should maintain a PF_INET6 listener for each service at all. This provides complete backwards compatibilty with legacy user space that only supports rpcbind v2. The only down-side is that registering a new kernel RPC service may take an extra exchange with the local portmapper on legacy systems, but this is an infrequent operation and is done over UDP (no lingering sockets in TIMEWAIT), so it shouldn't be consequential. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:58:37 -04:00
Chuck Lever	d5a8620f7c	SUNRPC: Simplify svc_unregister() Our initial implementation of svc_unregister() assumed that PMAP_UNSET cleared all rpcbind registrations for a [program, version] tuple. However, we now have evidence that PMAP_UNSET clears only "inet" entries, and not "inet6" entries, in the rpcbind database. For backwards compatibility with the legacy portmapper, the svc_unregister() function also must work if user space doesn't support rpcbind version 4 at all. Thus we'll send an rpcbind v4 UNSET, and if that fails, we'll send a PMAP_UNSET. This simplifies the code in svc_unregister() and provides better backwards compatibility with legacy user space that does not support rpcbind version 4. We can get rid of the conditional compilation in here as well. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:58:07 -04:00
Chuck Lever	1673d0de40	SUNRPC: Allow callers to pass rpcb_v4_register a NULL address The user space TI-RPC library uses an empty string for the universal address when unregistering all target addresses for [program, version]. The kernel's rpcb client should behave the same way. Here, we are switching between several registration methods based on the protocol family of the incoming address. Rename the other rpcbind v4 registration functions to make it clear that they, as well, are switched on protocol family. In /etc/netconfig, this is either "inet" or "inet6". NB: The loopback protocol families are not supported in the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:57:00 -04:00
Chuck Lever	126e4bc3b3	SUNRPC: rpcbind actually interprets r_owner string RFC 1833 has little to say about the contents of r_owner; it only specifies that it is a string, and states that it is used to control who can UNSET an entry. Our port of rpcbind (from Sun) assumes this string contains a numeric UID value, not alphabetical or symbolic characters, but checks this value only for AF_LOCAL RPCB_SET or RPCB_UNSET requests. In all other cases, rpcbind ignores the contents of the r_owner string. The reference user space implementation of rpcb_set(3) uses a numeric UID for all SET/UNSET requests (even via the network) and an empty string for all other requests. We emulate that behavior here to maintain bug-for-bug compatibility. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:56:04 -04:00
Chuck Lever	3aba45536f	SUNRPC: Clean up address type casts in rpcb_v4_register() Clean up: Simplify rpcb_v4_register() and its helpers by moving the details of sockaddr type casting to rpcb_v4_register()'s helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:55:52 -04:00
Chuck Lever	ba5c35e0c7	SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers The RPC client returns -EPROTONOSUPPORT if there is a protocol version mismatch (ie the remote RPC server doesn't support the RPC protocol version sent by the client). Helpers for the svc_register() function return -EPROTONOSUPPORT if they don't recognize the passed-in IPPROTO_ value. These are two entirely different failure modes. Have the helpers return -ENOPROTOOPT instead of -EPROTONOSUPPORT. This will allow callers to determine more precisely what the underlying problem is, and decide to report or recover appropriately. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:55:40 -04:00
Chuck Lever	fc28decdc9	SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services The kernel uses an IPv6 loopback address when registering its AF_INET6 RPC services so that it can tell whether the local portmapper is actually IPv6-enabled. Since the legacy portmapper doesn't listen on IPv6, however, this causes a long timeout on older systems if the kernel happens to try creating and registering an AF_INET6 RPC service. Originally I wanted to use a connected transport (either TCP or connected UDP) so that the upcall would fail immediately if the portmapper wasn't listening on IPv6, but we never agreed on what transport to use. In the end, it's of little consequence to the kernel whether the local portmapper is listening on IPv6. It's only important whether the portmapper supports rpcbind v4. And the kernel can't tell that at all if it is sending requests via IPv6 -- the portmapper will just ignore them. So, send both rpcbind v2 and v4 SET/UNSET requests via IPv4 loopback to maintain better backwards compatibility between new kernels and legacy user space, and prevent multi-second hangs in some cases when the kernel attempts to register RPC services. This patch is part of a series that addresses http://bugzilla.kernel.org/show_bug.cgi?id=12256 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:55:28 -04:00
Chuck Lever	7d21c0f984	SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets We are about to convert to using separate RPC listener sockets for PF_INET and PF_INET6. This echoes the way IPv6 is handled in user space by TI-RPC, and eliminates the need for ULPs to worry about mapped IPv4 AF_INET6 addresses when doing address comparisons. Start by setting the IPV6ONLY flag on PF_INET6 RPC listener sockets. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:55:18 -04:00
Chuck Lever	49a9072f29	SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() Since an RPC service listener's protocol family is specified now via svc_create_xprt(), it no longer needs to be passed to svc_create() or svc_create_pooled(). Remove that argument from the synopsis of those functions, and remove the sv_family field from the svc_serv struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:48 -04:00
Chuck Lever	9652ada3fb	SUNRPC: Change svc_create_xprt() to take a @family argument The sv_family field is going away. Pass a protocol family argument to svc_create_xprt() instead of extracting the family from the passed-in svc_serv struct. Again, as this is a listener socket and not an address, we make this new argument an "int" protocol family, instead of an "sa_family_t." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:36 -04:00
Chuck Lever	baf01caf09	SUNRPC: svc_setup_socket() gets protocol family from socket Since the sv_family field is going away, modify svc_setup_socket() to extract the protocol family from the passed-in socket instead of from the passed-in svc_serv struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:23 -04:00
Chuck Lever	4b62e58ccc	SUNRPC: Pass a family argument to svc_register() The sv_family field is going away. Instead of using sv_family, have the svc_register() function take a protocol family argument. Since this argument represents a protocol family, and not an address family, this argument takes an int, as this is what is passed to sock_create_kern(). Also make sure svc_register's helpers are checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are equivalent; this is simply a symbolic change to reflect the semantics of the value stored in that variable. sock_create_kern() should return EPFNOSUPPORT if the passed-in protocol family isn't supported, but it uses EAFNOSUPPORT for this case. We will stick with that tradition here, as svc_register() is called by the RPC server in the same path as sock_create_kern(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:12 -04:00
Chuck Lever	156e62094a	SUNRPC: Clean up svc_find_xprt() calling sequence Clean up: add documentating comment and use appropriate data types for svc_find_xprt()'s arguments. This also eliminates a mixed sign comparison: @port was an int, while the return value of svc_xprt_local_port() is an unsigned short. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:53:57 -04:00
Chuck Lever	776bd5c7a2	SUNRPC: Don't flag empty RPCB_GETADDR reply as bogus In 2007, commit `e65fe3976f` added additional sanity checking to rpcb_decode_getaddr() to make sure we were getting a reply that was long enough to be an actual universal address. If the uaddr string isn't long enough, the XDR decoder returns EIO. However, an empty string is a valid RPCB_GETADDR response if the requested service isn't registered. Moreover, "::.n.m" is also a valid RPCB_GETADDR response for IPv6 addresses that is shorter than rpcb_decode_getaddr()'s lower limit of 11. So this sanity check introduced a regression for rpcbind requests against IPv6 remotes. So revert the lower bound check added by commit `e65fe3976f`, and add an explicit check for an empty uaddr string, similar to libtirpc's rpcb_getaddr(3). Pointed-out-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:52:08 -04:00
Linus Torvalds	3ae5080f4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...	2009-03-27 16:23:12 -07:00
ideawu	abd91ee979	sunrpc/svc.c: Remove unused line 'rqstp->rq_server = serv;' in svc_process There is no need to set rqstp->rq_server to serv, while serv is initialized as rqstp->rq_server at previous line. And between these two lines, there is no change to rqstp->rq_server. Signed-off-by: ideawu <ideawu@163.com> Reviewed-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-27 19:15:21 -04:00
Al Viro	3ba13d179e	constify dentry_operations: rest Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
David S. Miller	08abe18af1	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: drivers/net/wimax/i2400m/usb-notif.c	2009-03-26 15:23:24 -07:00
Tom Talpey	2e3c230bc7	SVCRDMA: fix recent printk format warnings. printk formats in prior commit were reversed/incorrect. Compiled without warning on x86 and x86_64, but detected on ppc. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:37 -04:00
Trond Myklebust	55420c24a0	SUNRPC: Ensure we close the socket on EPIPE errors too... As long as one task is holding the socket lock, then calls to xprt_force_disconnect(xprt) will not succeed in shutting down the socket. In particular, this would mean that a server initiated shutdown will not succeed until the lock is relinquished. In order to avoid the deadlock, we should ensure that xs_tcp_send_request() closes the socket on EPIPE errors too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:36 -04:00
Trond Myklebust	b61d59fffd	SUNRPC: xs_tcp_connect_worker{4,6}: merge common code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:35 -04:00
Trond Myklebust	25fe6142a5	SUNRPC: Add a sysctl to control the duration of the socket linger timeout Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:34 -04:00
Trond Myklebust	7d1e8255cf	SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets This fixes a regression against FreeBSD servers as reported by Tomas Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers don't ever react to the client closing the socket, and so commit `e06799f958` (SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket) causes the setup to hang forever whenever the client attempts to close and then reconnect. We break the deadlock by adding a 'linger2' style timeout to the socket, after which, the client will abort the connection using a TCP 'RST'. The default timeout is set to 15 seconds. A subsequent patch will put it under user control by means of a systctl. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:34 -04:00
Olga Kornievskaia	47a14ef1af	svcrpc: take advantage of tcp autotuning Allow the NFSv4 server to make use of TCP autotuning behaviour, which was previously disabled by setting the sk_userlocks variable. Set the receive buffers to be big enough to receive the whole RPC request, and set this for the listening socket, not the accept socket. Remove the code that readjusts the receive/send buffer sizes for the accepted socket. Previously this code was used to influence the TCP window management behaviour, which is no longer needed when autotuning is enabled. This can improve IO bandwidth on networks with high bandwidth-delay products, where a large tcp window is required. It also simplifies performance tuning, since getting adequate tcp buffers previously required increasing the number of nfsd threads. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Cc: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:46:59 -04:00
Greg Banks	03cf6c9f49	knfsd: add file to export stats about nfsd pools Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void )1) merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_()" by Harshula Jayasuriya. merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:42 -04:00
Greg Banks	59a252ff8c	knfsd: avoid overloading the CPU scheduler with enormous load averages Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:41 -04:00
Rusty Russell	a70f730282	cpumask: replace node_to_cpumask with cpumask_of_node. Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:46 +10:30
Trond Myklebust	5e3771ce2d	SUNRPC: Ensure that xs_nospace return values are propagated If xs_nospace() finds that the socket has disconnected, it attempts to return ENOTCONN, however that value is then squashed by the callers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:38:01 -04:00
Trond Myklebust	8a2cec295f	SUNRPC: Delay, then retry on connection errors. Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that we should delay, then retry on certain connection errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:38:01 -04:00
Trond Myklebust	2a4919919a	SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending While we should definitely return socket errors to the task that is currently trying to send data, there is no need to propagate the same error to all the other tasks on xprt->pending. Doing so actually slows down recovery, since it causes more than one tasks to attempt socket recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:38:00 -04:00
Trond Myklebust	482f32e65d	SUNRPC: Handle socket errors correctly Ensure that we pick up and handle socket errors as they occur. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:38:00 -04:00
Trond Myklebust	c8485e4d63	SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit() If we get an ECONNREFUSED error, we currently go to sleep on the 'xprt->sending' wait queue. The problem is that no timeout is set there, and there is nothing else that will wake the task up later. We should deal with ECONNREFUSED in call_status, given that is where we also deal with -EHOSTDOWN, and friends. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:59 -04:00
Trond Myklebust	40d2549db5	SUNRPC: Don't disconnect if a connection is still in progress. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:58 -04:00
Trond Myklebust	670f945731	SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN... ...so that we can distinguish between when we need to shutdown and when we don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(), since xprt_connect() makes the same test. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:58 -04:00
Trond Myklebust	15f081ca8d	SUNRPC: Avoid an unnecessary task reschedule on ENOTCONN If the socket is unconnected, and xprt_transmit() returns ENOTCONN, we currently give up the lock on the transport channel. Doing so means that the lock automatically gets assigned to the next task in the xprt->sending queue, and so that task needs to be woken up to do the actual connect. The following patch aims to avoid that unnecessary task switch. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:57 -04:00
Tom Talpey	441e3e2429	SUNRPC: dynamically load RPC transport modules on-demand Provide an api to attempt to load any necessary kernel RPC client transport module automatically. By convention, the desired module name is "xprt"+"transport name". For example, when NFS mounting with "-o proto=rdma", attempt to load the "xprtrdma" module. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:56 -04:00
Tom Talpey	b38ab40ad5	XPRTRDMA: correct an rpc/rdma inline send marshaling error Certain client rpc's which contain both lengthy page-contained metadata and a non-empty xdr_tail buffer require careful handling to avoid overlapped memory copying. Rearranging of existing rpcrdma marshaling code avoids it; this fixes an NFSv4 symlink creation error detected with connectathon basic/test8 to multiple servers. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:55 -04:00
Tom Talpey	b1e1e15877	SVCRDMA: remove faulty assertions in rpc/rdma chunk validation. Certain client-provided RPCRDMA chunk alignments result in an additional scatter/gather entry, which triggered nfs/rdma server assertions incorrectly. OpenSolaris nfs/rdma client connectathon testing was blocked by these in the special/locking section. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:37:55 -04:00
Chuck Lever	fe315e76fc	SUNRPC: Avoid spurious wake-up during UDP connect processing To clear out old state, the UDP connect workers unconditionally invoke xs_close() before proceeding with a new connect. Nowadays this causes a spurious wake-up of the task waiting for the connect to complete. This is a little racey, but usually harmless. The waiting task immediately retries the connect via a call_bind/call_connect sequence, which usually finds the transport already in the connected state because the connect worker has finished in the background. To avoid a spurious wake-up, factor the xs_close() logic that resets the underlying socket into a helper, and have the UDP connect workers call that helper instead of xs_close(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:10:21 -04:00
Trond Myklebust	01d37c428a	SUNRPC: xprt_connect() don't abort the task if the transport isn't bound If the transport isn't bound, then we should just return ENOTCONN, letting call_connect_status() and/or call_status() deal with retrying. Currently, we appear to abort all pending tasks with an EIO error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:09:39 -04:00
Trond Myklebust	fba91afbec	SUNRPC: Fix an Oops due to socket not set up yet... We can Oops in both xs_udp_send_request() and xs_tcp_send_request() if the call to xs_sendpages() returns an error due to the socket not yet being set up. Deal with that situation by returning a new error: ENOTSOCK, so that we know to avoid dereferencing transport->sock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-11 14:06:41 -04:00
Trond Myklebust	eb9b55ab4d	SUNRPC: Tighten up the task locking rules in __rpc_execute() We should probably not be testing any flags after we've cleared the RPC_TASK_RUNNING flag, since rpc_make_runnable() is then free to assign the rpc_task to another workqueue, which may then destroy it. We can fix any races with rpc_make_runnable() by ensuring that we only clear the RPC_TASK_RUNNING flag while holding the rpc_wait_queue->lock that the task is supposed to be sleeping on (and then checking whether or not the task really is sleeping). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-10 20:33:16 -04:00
Ilpo Järvinen	1f0fa15432	net/sunrpc/xprtsock.c: some common code found $ diff-funcs xs_udp_write_space net/sunrpc/xprtsock.c net/sunrpc/xprtsock.c xs_tcp_write_space --- net/sunrpc/xprtsock.c:xs_udp_write_space() +++ net/sunrpc/xprtsock.c:xs_tcp_write_space() @@ -1,4 +1,4 @@ - * xs_udp_write_space - callback invoked when socket buffer space + * xs_tcp_write_space - callback invoked when socket buffer space * becomes available * @sk: socket whose state has changed * @@ -7,12 +7,12 @@ * progress, otherwise we'll waste resources thrashing kernel_sendmsg * with a bunch of small requests. / -static void xs_udp_write_space(struct sock sk) +static void xs_tcp_write_space(struct sock sk) { read_lock(&sk->sk_callback_lock); - / from net/core/sock.c:sock_def_write_space / - if (sock_writeable(sk)) { + / from net/core/stream.c:sk_stream_write_space / + if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) { struct socket sock; struct rpc_xprt *xprt; $ codiff net/sunrpc/xprtsock.o net/sunrpc/xprtsock.o.new net/sunrpc/xprtsock.c: xs_tcp_write_space \| -163 xs_udp_write_space \| -163 2 functions changed, 326 bytes removed net/sunrpc/xprtsock.c: xs_write_space \| +179 1 function changed, 179 bytes added net/sunrpc/xprtsock.o.new: 3 functions changed, 179 bytes added, 326 bytes removed, diff: -147 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-06 23:48:33 -08:00
Randy Dunlap	55128bc23e	sunrpc: fix rdma dependencies Fix sunrpc/rdma build dependencies. Survives 12 build combinations of INET, IPV6, SUNRPC, INFINIBAND, and INFINIBAND_ADDR_TRANS. ERROR: "rdma_destroy_id" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_connect" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_destroy_qp" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_create_id" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_create_qp" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_resolve_route" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_disconnect" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_resolve_addr" [net/sunrpc/xprtrdma/xprtrdma.ko] undefined! ERROR: "rdma_accept" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_destroy_id" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_listen" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_create_id" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_create_qp" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_bind_addr" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! ERROR: "rdma_disconnect" [net/sunrpc/xprtrdma/svcrdma.ko] undefined! Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-03 15:20:13 -08:00
J. Bruce Fields	ce0cf6622c	nfs: note that CONFIG_SUNRPC_XPRT_RDMA turns on server side support too We forgot to update this when adding server-side support. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-27 17:26:58 -05:00
Alexey Dobriyan	9098c24f35	fs/Kconfig: move sunrpc out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:00 +03:00
Trond Myklebust	24c3767e41	SUNRPC: The sunrpc server code should not be used by out-of-tree modules Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-07 17:18:42 -05:00
Tom Tucker	22945e4a1c	svc: Clean up deferred requests on transport destruction A race between svc_revisit and svc_delete_xprt can result in deferred requests holding references on a transport that can never be recovered because dead transports are not enqueued for subsequent processing. Check for XPT_DEAD in revisit to clean up completing deferrals on a dead transport and sweep a transport's deferred queue to do the same for queued but unprocessed deferrals. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-07 17:08:46 -05:00
Tom Tucker	2779e3ae39	svc: Move kfree of deferral record to common code The rqstp structure has a pointer to a svc_deferred_req record that is allocated when requests are deferred. This record is common to all transports and can be freed in common code. Move the kfree of the rq_deferred to the common svc_xprt_release function. This also fixes a memory leak in the RDMA transport which does not kfree the dr structure in it's version of the xpo_release_rqst callback. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-07 15:40:45 -05:00
Trond Myklebust	69b6ba3712	SUNRPC: Ensure the server closes sockets in a timely fashion We want to ensure that connected sockets close down the connection when we set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the stuff that is keeping a reference to the socket. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:57 -05:00
Jeff Layton	c9233eb7b0	sunrpc: add sv_maxconn field to svc_serv (try #3 ) svc_check_conn_limits() attempts to prevent denial of service attacks by having the service close old connections once it reaches a threshold. This threshold is based on the number of threads in the service: (serv->sv_nrthreads + 3) * 20 Once we reach this, we drop the oldest connections and a printk pops to warn the admin that they should increase the number of threads. Increasing the number of threads isn't an option however for services like lockd. We don't want to eliminate this check entirely for such services but we need some way to increase this limit. This patch adds a sv_maxconn field to the svc_serv struct. When it's set to 0, we use the current method to calculate the max number of connections. RPC services can then set this on an as-needed basis. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:47 -05:00
Al Viro	56ff5efad9	zero i_uid/i_gid on inode allocation ... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Trond Myklebust	08cc36cbd1	Merge branch 'devel' into next	2008-12-30 16:51:43 -05:00
Linus Torvalds	0191b625ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).	2008-12-28 12:49:40 -08:00
Olga Kornievskaia	2efef7080f	rpc: add service field to new upcall This patch extends the new upcall with a "service" field that currently can have 2 values: "" or "nfs". These values specify matching rules for principals in the keytab file. The "" means that gssd is allowed to use "root", "nfs", or "host" keytab entries while the other option requires "nfs". Restricting gssd to use the "nfs" principal is needed for when the server performs a callback to the client. The server in this case has to authenticate itself as an "nfs" principal. We also need "service" field to distiguish between two client-side cases both currently using a uid of 0: the case of regular file access by the root user, and the case of state-management calls (such as setclientid) which should use a keytab for authentication. (And the upcall should fail if an appropriate principal can't be found.) Signed-off: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:19:56 -05:00
Olga Kornievskaia	8b1c7bf5b6	rpc: add target field to new upcall This patch extends the new upcall by adding a "target" field communicating who we want to authenticate to (equivalently, the service principal that we want to acquire a ticket for). Signed-off: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:19:26 -05:00
Olga Kornievskaia	61054b14d5	nfsd: support callbacks with gss flavors This patch adds server-side support for callbacks other than AUTH_SYS. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:19:00 -05:00
Olga Kornievskaia	945b34a772	rpc: allow gss callbacks to client This patch adds client-side support to allow for callbacks other than AUTH_SYS. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:18:34 -05:00
Olga Kornievskaia	608207e888	rpc: pass target name down to rpc level on callbacks The rpc client needs to know the principal that the setclientid was done as, so it can tell gssd who to authenticate to. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:17:40 -05:00
Olga Kornievskaia	68e76ad0ba	nfsd: pass client principal name in rsc downcall Two principals are involved in krb5 authentication: the target, who we authenticate to (normally the name of the server, like nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we authenticate as (normally a user, like bfields@UMICH.EDU) In the case of NFSv4 callbacks, the target of the callback should be the source of the client's setclientid call, and the source should be the nfs server's own principal. Therefore we allow svcgssd to pass down the name of the principal that just authenticated, so that on setclientid we can store that principal name with the new client, to be used later on callbacks. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:17:15 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	34769fc488	rpc: implement new upcall Implement the new upcall. We decide which version of the upcall gssd will use (new or old), by creating both pipes (the new one named "gssd", the old one named after the mechanism (e.g., "krb5")), and then waiting to see which version gssd actually opens. We don't permit pipes of the two different types to be opened at once. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:16:37 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	5b7ddd4a7b	rpc: store pointer to pipe inode in gss upcall message Keep a pointer to the inode that the message is queued on in the struct gss_upcall_msg. This will be convenient, especially after we have a choice of two pipes that an upcall could be queued on. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:15:44 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	79a3f20b64	rpc: use count of pipe openers to wait for first open Introduce a global variable pipe_version which will eventually be used to keep track of which version of the upcall gssd is using. For now, though, it only keeps track of whether any pipe is open or not; it is negative if not, zero if one is opened. We use this to wait for the first gssd to open a pipe. (Minor digression: note this waits only for the very first open of any pipe, not for the first open of a pipe for a given auth; thus we still need the RPC_PIPE_WAIT_FOR_OPEN behavior to wait for gssd to open new pipes that pop up on subsequent mounts.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:10:52 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	cf81939d6f	rpc: track number of users of the gss upcall pipe Keep a count of the number of pipes open plus the number of messages on a pipe. This count isn't used yet. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:10:19 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	e712804ae4	rpc: call release_pipe only on last close I can't see any reason we need to call this until either the kernel or the last gssd closes the pipe. Also, this allows to guarantee that open_pipe and release_pipe are called strictly in pairs; open_pipe on gssd's first open, release_pipe on gssd's last close (or on the close of the kernel side of the pipe, if that comes first). That will make it very easy for the gss code to keep track of which pipes gssd is using. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:09:47 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	c381060869	rpc: add an rpc_pipe_open method We want to transition to a new gssd upcall which is text-based and more easily extensible. To simplify upgrades, as well as testing and debugging, it will help if we can upgrade gssd (to a version which understands the new upcall) without having to choose at boot (or module-load) time whether we want the new or the old upcall. We will do this by providing two different pipes: one named, as currently, after the mechanism (normally "krb5"), and supporting the old upcall. One named "gssd" and supporting the new upcall version. We allow gssd to indicate which version it supports by its choice of which pipe to open. As we have no interest in supporting simultaneous use of both versions, we'll forbid opening both pipes at the same time. So, add a new pipe_open callback to the rpc_pipefs api, which the gss code can use to track which pipes have been open, and to refuse opens of incompatible pipes. We only need this to be called on the first open of a given pipe. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:08:32 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	db75b3d6b5	rpc: minor gss_alloc_msg cleanup I want to add a little more code here, so it'll be convenient to have this flatter. Also, I'll want to add another error condition, so it'll be more convenient to return -ENOMEM than NULL in the error case. The only caller is already converting NULL to -ENOMEM anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:07:13 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	b03568c322	rpc: factor out warning code from gss_pipe_destroy_msg We'll want to call this from elsewhere soon. And this is a bit nicer anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:55 -05:00
$\"J. Bruce Fields\$ \"J. Bruce Fields\	99db356368	rpc: remove unnecessary assignment We're just about to kfree() gss_auth, so there's no point to setting any of its fields. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:33 -05:00
Jeff Layton	6dcd3926b2	sunrpc: fix code that makes auth_gss send destroy_cred message (try #2 ) There's a bit of a chicken and egg problem when it comes to destroying auth_gss credentials. When we destroy the last instance of a GSSAPI RPC credential, we should send a NULL RPC call with a GSS procedure of RPCSEC_GSS_DESTROY to hint to the server that it can destroy those creds. This isn't happening because we're setting clearing the uptodate bit on the credentials and then setting the operations to the gss_nullops. When we go to do the RPC call, we try to refresh the creds. That fails with -EACCES and the call fails. Fix this by not clearing the UPTODATE bit for the credentials and adding a new crdestroy op for gss_nullops that just tears down the cred without trying to destroy the context. The only difference between this patch and the first one is the removal of some minor formatting deltas. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:57 -05:00
Peter Staubach	64672d55d9	optimize attribute timeouts for "noac" and "actimeo=0" Hi. I've been looking at a bugzilla which describes a problem where a customer was advised to use either the "noac" or "actimeo=0" mount options to solve a consistency problem that they were seeing in the file attributes. It turned out that this solution did not work reliably for them because sometimes, the local attribute cache was believed to be valid and not timed out. (With an attribute cache timeout of 0, the cache should always appear to be timed out.) In looking at this situation, it appears to me that the problem is that the attribute cache timeout code has an off-by-one error in it. It is assuming that the cache is valid in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The cache should be considered valid only in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo). With this change, the options, "noac" and "actimeo=0", work as originally expected. This problem was previously addressed by special casing the attrtimeo == 0 case. However, since the problem is only an off- by-one error, the cleaner solution is address the off-by-one error and thus, not require the special case. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:56 -05:00
Trond Myklebust	7bd8826915	SUNRPC: rpcsec_gss modules should not be used by out-of-tree code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:32 -05:00

... 2 3 4 5 6 ...

1278 Коммитов