Граф коммитов

1353 Коммитов

Автор SHA1 Сообщение Дата
Chuck Lever a4ae308143 SUNRPC: Move client-side disconnect injection
Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

Convert the existing client-side disconnect injection infrastructure
to use the kernel's generic error injection facility. The generic
facility has a richer set of injection criteria.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20 13:50:32 -04:00
Chuck Lever b4ab2fea7c SUNRPC: Add RPC_AUTH_TLS protocol numbers
Shared by client and server. See:

https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtml

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:53 -04:00
Chuck Lever 5c11720767 SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
Some paths through svc_process() leave rqst->rq_procinfo set to
NULL, which triggers a crash if tracing happens to be enabled.

Fixes: 89ff87494c ("SUNRPC: Display RPC procedure names instead of proc numbers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:53 -04:00
Chuck Lever 07a92d009f svcrdma: Convert rdma->sc_rw_ctxts to llist
Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts
to an llist.

The goal is to reduce the average overhead of Send completions,
because a transport's completion handlers are single-threaded on
one CPU core. This change reduces CPU utilization of each Send
completion by 2-3% on my server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17 11:47:53 -04:00
Chuck Lever b6c2bfea09 svcrdma: Relieve contention on sc_send_lock.
/proc/lock_stat indicates the the sc_send_lock is heavily
contended when the server is under load from a single client.

To address this, convert the send_ctxt free list to an llist.
Returning an item to the send_ctxt cache is now waitless, which
reduces the instruction path length in the single-threaded Send
handler (svc_rdma_wc_send).

The goal is to enable the ib_comp_wq worker to handle a higher
RPC/RDMA Send completion rate given the same CPU resources. This
change reduces CPU utilization of Send completion by 2-3% on my
server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17 11:47:53 -04:00
Chuck Lever 6c8c84f525 svcrdma: Fewer calls to wake_up() in Send completion handler
Because wake_up() takes an IRQ-safe lock, it can be expensive,
especially to call inside of a single-threaded completion handler.
What's more, the Send wait queue almost never has waiters, so
most of the time, this is an expensive no-op.

As always, the goal is to reduce the average overhead of each
completion, because a transport's completion handlers are single-
threaded on one CPU core. This change reduces CPU utilization of
the Send completion thread by 2-3% on my server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17 11:47:53 -04:00
Chuck Lever 2f0f88f42f SUNRPC: Add svc_rqst_replace_page() API
Replacing a page in rq_pages[] requires a get_page(), which is a
bus-locked operation, and a put_page(), which can be even more
costly.

To reduce the cost of replacing a page in rq_pages[], batch the
put_page() operations by collecting "freed" pages in a pagevec,
and then release those pages when the pagevec is full. This
pagevec is also emptied when each RPC completes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:52 -04:00
Chuck Lever 9082e1d914 SUNRPC: Eliminate the RQ_AUTHERR flag
Now that there is an alternate method for returning an auth_stat
value, replace the RQ_AUTHERR flag with use of that new method.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever 438623a06b SUNRPC: Add svc_rqst::rq_auth_stat
I'd like to take commit 4532608d71 ("SUNRPC: Clean up generic
dispatcher code") even further by using only private local SVC
dispatchers for all kernel RPC services. This change would enable
the removal of the logic that switches between
svc_generic_dispatch() and a service's private dispatcher, and
simplify the invocation of the service's pc_release method
so that humans can visually verify that it is always invoked
properly.

All that will come later.

First, let's provide a better way to return authentication errors
from SVC dispatcher functions. Instead of overloading the dispatch
method's *statp argument, add a field to struct svc_rqst that can
hold an error value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Trond Myklebust c2dc3e5fad SUNRPC: Fix potential memory corruption
We really should not call rpc_wake_up_queued_task_set_status() with
xprt->snd_task as an argument unless we are certain that is actually an
rpc_task.

Fixes: 0445f92c5d ("SUNRPC: Fix disconnection races")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:05 -04:00
Trond Myklebust 71d3d0ebc8 SUNRPC: Convert rpc_client refcount to use refcount_t
There are now tools in the refcount library that allow us to convert the
client shutdown code.

Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Linus Torvalds 96890bc2ea NFS client updates for Linux 5.14
Highlights include:
 
 Stable fixes:
 - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues
 
 Bugfixes
 - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()
 - SUNRPC: prevent port reuse on transports which don't request it.
 - NFSv3: Fix memory leak in posix_acl_create()
 - NFS: Various fixes to attribute revalidation timeouts
 - NFSv4: Fix handling of non-atomic change attribute updates
 - NFSv4: If a server is down, don't cause mounts to other servers to
   hang as well
 - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
 - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts
   filesystem flag
  - NFS: Ensure nfs_readpage returns promptly when an internal error occurs
  - NFS: Fix fscache read from NFS after cache error
  - pNFS: Various bugfixes around the LAYOUTGET operation
 
 Features
 - Multiple patches to add support for fcntl() leases over NFSv4.
 - A sysfs interface to display more information about the various
   transport connections used by the RPC client
 - A sysfs interface to allow a suitably privileged user to offline a
   transport that may no longer point to a valid server
 - A sysfs interface to allow a suitably privileged user to change the
   server IP address used by the RPC client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmDnPrkACgkQZwvnipYK
 APKGAw/9EjdGoic6VpShQyb5uxaRoDd4uwWgFLBOfPhIWC7qNMtkj49wHIOEUm7e
 YfdF5RlCdmshaoMxjY84wjl8NTMwHbPahgooDd4+UsZUs2qxZ8dBsr0itfbFsJv8
 BpaCYKQt6XGQngGrWfC7SiCETnMej2YsmjDfHvhD58TxnRfPWexHUvx9xi9uGRCS
 sIWRA2QMNs7LwdShkkRotagodRLhu/zo4g0lon5lI8D/SRg6o8RoO4YP6oKH1FN4
 OyVzy1aWZGocgwCMUtNeuigJSRyDa+bJTfJ2c27uw5g18s0XWZ3j2DxD5I+HCEuE
 B4rhg+ujtPIifYLHf2Aj3nlxdBePZ5L67a2MOOUo+wSD+nPmNMZF1eIT/3Jsg/HA
 Z8gqcBiTIkBfVGJxWWbrbHfxPXQiK1IRGQx9acyhLCN9M6Kv5bbkn4R4dnronvJR
 g6O968fgC5uvl60CXdc8NCpWtSitXB/nH8pn7MbJ8JBGq7QIYNkS0d4E8ePhYwxk
 sRYJt21O+ryjodfQDHaUxodzCKGcpRoknpirMmgoAp4zdkva4ltViNsQvHa7jFh8
 HIuhU6Aia1xVYpUMDEXf2WMXCT9yLa2TyMDuS5KDfb69wBkQJWeKNkebf+1k03wQ
 saEmdoP4aEEujimkA7rqyOlI8XhsudKvBd3HXg+w9+xIt4yoie0=
 =NaOI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Multiple patches to add support for fcntl() leases over NFSv4.

   - A sysfs interface to display more information about the various
     transport connections used by the RPC client

   - A sysfs interface to allow a suitably privileged user to offline a
     transport that may no longer point to a valid server

   - A sysfs interface to allow a suitably privileged user to change the
     server IP address used by the RPC client

  Stable fixes:

   - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues

  Bugfixes:

   - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()

   - SUNRPC: prevent port reuse on transports which don't request it.

   - NFSv3: Fix memory leak in posix_acl_create()

   - NFS: Various fixes to attribute revalidation timeouts

   - NFSv4: Fix handling of non-atomic change attribute updates

   - NFSv4: If a server is down, don't cause mounts to other servers to
     hang as well

   - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT

   - NFS: Fix mount failures due to incorrect setting of the
     has_sec_mnt_opts filesystem flag

   - NFS: Ensure nfs_readpage returns promptly when an internal error
     occurs

   - NFS: Fix fscache read from NFS after cache error

   - pNFS: Various bugfixes around the LAYOUTGET operation"

* tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
  NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
  NFSv4/pnfs: Clean up layout get on open
  NFSv4/pnfs: Fix layoutget behaviour after invalidation
  NFSv4/pnfs: Fix the layout barrier update
  NFS: Fix fscache read from NFS after cache error
  NFS: Ensure nfs_readpage returns promptly when internal error occurs
  sunrpc: remove an offlined xprt using sysfs
  sunrpc: provide showing transport's state info in the sysfs directory
  sunrpc: display xprt's queuelen of assigned tasks via sysfs
  sunrpc: provide multipath info in the sysfs directory
  NFSv4.1 identify and mark RPC tasks that can move between transports
  sunrpc: provide transport info in the sysfs directory
  SUNRPC: take a xprt offline using sysfs
  sunrpc: add dst_attr attributes to the sysfs xprt directory
  SUNRPC for TCP display xprt's source port in sysfs xprt_info
  SUNRPC query transport's source port
  SUNRPC display xprt's main value in sysfs's xprt_info
  SUNRPC mark the first transport
  sunrpc: add add sysfs directory per xprt under each xprt_switch
  ...
2021-07-09 09:43:57 -07:00
Olga Kornievskaia 6f081693e7 sunrpc: remove an offlined xprt using sysfs
Once a transport has been put offline, this transport can be also
removed from the list of transports. Any tasks that have been stuck
on this transport would find the next available active transport
and be re-tried. This transport would be removed from the xprt_switch
list and freed.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 85e39feead NFSv4.1 identify and mark RPC tasks that can move between transports
In preparation for when we can re-try a task on a different transport,
identify and mark such RPC tasks as moveable. Only 4.1+ operarations can
be re-tried on a different transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 5b7eb78486 SUNRPC: take a xprt offline using sysfs
Using sysfs's xprt_state attribute, mark a particular transport offline.
It will not be picked during the round-robin selection. It's not allowed
to take the main (1st created transport associated with the rpc_client)
offline. Also bring a transport back online via sysfs by writing "online"
and that would allow for this transport to be picked during the round-
robin selection.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia a8482488a7 SUNRPC query transport's source port
Provide ability to query transport's source port.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia e091853ebd SUNRPC mark the first transport
When an RPC client gets created it's first transport is special
and should be marked a main transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 587bc7255d sunrpc: add dst_attr attributes to the sysfs xprt directory
Allow to query and set the destination's address of a transport.
Setting of the destination address is allowed only for TCP or RDMA
based connections.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia d408ebe04a sunrpc: add add sysfs directory per xprt under each xprt_switch
Add individual transport directories under each transport switch
group. For instance, for each nconnect=X connections there will be
a transport directory. Naming conventions also identifies transport
type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia baea99445d sunrpc: add xprt_switch direcotry to sunrpc's sysfs
Add xprt_switch directory to the sysfs and create individual
xprt_swith subdirectories for multipath transport group.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia d3abc73987 sunrpc: keep track of the xprt_class in rpc_xprt structure
We need to keep track of the type for a given transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia 5b9268727f sunrpc: add IDs to multipath
This is used to uniquely identify sunrpc multipath objects in /sys.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia 572caba402 sunrpc: add xprt id
This adds a unique identifier for a sunrpc transport in sysfs, which is
similarly managed to the unique IDs of clients.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia c5a382ebdb sunrpc: Create per-rpc_clnt sysfs kobjects
These will eventually have files placed under them for sysfs operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Andy Shevchenko 4c52729377 kernel.h: split out kstrtox() and simple_strtox() to a separate header
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out kstrtox() and
simple_strtox() helpers.

At the same time convert users in header and lib folders to use new
header.  Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.

[andy.shevchenko@gmail.com: fix documentation references]
  Link: https://lkml.kernel.org/r/20210615220003.377901-1-andy.shevchenko@gmail.com

Link: https://lkml.kernel.org/r/20210611185815.44103-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Francis Laniel <laniel_francis@privacyrequired.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kars Mulder <kerneldev@karsmulder.nl>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:05 -07:00
Trond Myklebust e86be3a04b SUNRPC: More fixes for backlog congestion
Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d1f ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26 06:36:13 -04:00
Linus Torvalds a647034fe2 NFS client updates for Linux 5.13
Highlights include:
 
 Stable fixes:
 - Add validation of the UDP retrans parameter to prevent shift out-of-bounds
 - Don't discard pNFS layout segments that are marked for return
 
 Bugfixes:
 - Fix a NULL dereference crash in xprt_complete_bc_request() when the
   NFSv4.1 server misbehaves.
 - Fix the handling of NFS READDIR cookie verifiers
 - Sundry fixes to ensure attribute revalidation works correctly when the
   server does not return post-op attributes.
 - nfs4_bitmask_adjust() must not change the server global bitmasks
 - Fix major timeout handling in the RPC code.
 - NFSv4.2 fallocate() fixes.
 - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
 - Copy offload attribute revalidation fixes
 - Fix an incorrect filehandle size check in the pNFS flexfiles driver
 - Fix several RDMA transport setup/teardown races
 - Fix several RDMA queue wrapping issues
 - Fix a misplaced memory read barrier in sunrpc's call_decode()
 
 Features:
 - Micro optimisation of the TCP transmission queue using TCP_CORK
 - statx() performance improvements by further splitting up the tracking
   of invalid cached file metadata.
 - Support the NFSv4.2 "change_attr_type" attribute and use it to
   optimise handling of change attribute updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK
 APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL
 FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg
 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1
 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd
 W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn
 bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ
 XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2
 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp
 f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI
 daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R
 QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA=
 =h01b
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Add validation of the UDP retrans parameter to prevent shift
     out-of-bounds

   - Don't discard pNFS layout segments that are marked for return

  Bugfixes:

   - Fix a NULL dereference crash in xprt_complete_bc_request() when the
     NFSv4.1 server misbehaves.

   - Fix the handling of NFS READDIR cookie verifiers

   - Sundry fixes to ensure attribute revalidation works correctly when
     the server does not return post-op attributes.

   - nfs4_bitmask_adjust() must not change the server global bitmasks

   - Fix major timeout handling in the RPC code.

   - NFSv4.2 fallocate() fixes.

   - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling

   - Copy offload attribute revalidation fixes

   - Fix an incorrect filehandle size check in the pNFS flexfiles driver

   - Fix several RDMA transport setup/teardown races

   - Fix several RDMA queue wrapping issues

   - Fix a misplaced memory read barrier in sunrpc's call_decode()

  Features:

   - Micro optimisation of the TCP transmission queue using TCP_CORK

   - statx() performance improvements by further splitting up the
     tracking of invalid cached file metadata.

   - Support the NFSv4.2 'change_attr_type' attribute and use it to
     optimise handling of change attribute updates"

* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
  xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
  sunrpc: Fix misplaced barrier in call_decode
  NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
  xprtrdma: Move fr_mr field to struct rpcrdma_mr
  xprtrdma: Move the Work Request union to struct rpcrdma_mr
  xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
  xprtrdma: Move cqe to struct rpcrdma_mr
  xprtrdma: Move fr_cid to struct rpcrdma_mr
  xprtrdma: Remove the RPC/RDMA QP event handler
  xprtrdma: Don't display r_xprt memory addresses in tracepoints
  xprtrdma: Add an rpcrdma_mr_completion_class
  xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
  xprtrdma: Avoid Send Queue wrapping
  xprtrdma: Do not wake RPC consumer on a failed LocalInv
  xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
  xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
  xprtrdma: Rename frwr_release_mr()
  xprtrdma: rpcrdma_mr_pop() already does list_del_init()
  xprtrdma: Delete rpcrdma_recv_buffer_put()
  xprtrdma: Fix cwnd update ordering
  ...
2021-05-07 11:23:41 -07:00
Trond Myklebust d737e5d418 SUNRPC: Set TCP_CORK until the transmit queue is empty
When we have multiple RPC requests queued up, it makes sense to set the
TCP_CORK option while the transmit queue is non-empty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-05 09:04:20 -04:00
Chuck Lever 5533c4f4b9 svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever 9af723be86 svcrdma: Remove sc_read_complete_q
Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever 7dcfbd86ad SUNRPC: Export svc_xprt_received()
Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever 579900670a svcrdma: Remove unused sc_pages field
Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever 2a1e4f21d8 svcrdma: Normalize Send page handling
Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever e844d307d4 svcrdma: Add a "deferred close" helper
Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever c558d47596 svcrdma: Maintain a Receive water mark
Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever ded04a587f NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:55 -04:00
Chuck Lever cc9bcdad77 NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:53 -04:00
Chuck Lever bddfdbcddb NFSD: Extract the svcxdr_init_encode() helper
NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:51 -04:00
Chuck Lever bade4be69a svcrdma: Revert "svcrdma: Reduce Receive doorbell rate"
I tested commit 43042b90ca ("svcrdma: Reduce Receive doorbell
rate") with mlx4 (IB) and software iWARP and didn't find any
issues. However, I recently got my hardware iWARP setup back on
line (FastLinQ) and it's crashing hard on this commit (confirmed
via bisect).

The failure mode is complex.
 - After a connection is established, the first Receive completes
   normally.
 - But the second and third Receives have garbage in their Receive
   buffers. The server responds with ERR_VERS as a result.
 - When the client tears down the connection to retry, a couple
   of posted Receives flush twice, and that corrupts the recv_ctxt
   free list.
 - __svc_rdma_free then faults or loops infinitely while destroying
   the xprt's recv_ctxts.

Since 43042b90ca ("svcrdma: Reduce Receive doorbell rate") does
not fix a bug but is a scalability enhancement, it's safe and
appropriate to revert it while working on a replacement.

Fixes: 43042b90ca ("svcrdma: Reduce Receive doorbell rate")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-11 15:26:07 -05:00
Linus Torvalds 7c70f3a748 Optimization:
- Cork the socket while there are queued replies
 
 Fixes:
 
 - DRC shutdown ordering
 - svc_rdma_accept() lockdep splat
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m
 f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn
 Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ
 qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm
 cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC
 C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue
 QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf
 +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb
 tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj
 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX
 J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn
 foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac=
 =+R4U
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
 "Here are a few additional NFSD commits for the merge window:

 Optimization:
   - Cork the socket while there are queued replies

  Fixes:
   - DRC shutdown ordering
   - svc_rdma_accept() lockdep splat"

* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Further clean up svc_tcp_sendmsg()
  SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
  SUNRPC: Use TCP_CORK to optimise send performance on the server
  svcrdma: Hold private mutex while invoking rdma_accept()
  nfsd: register pernet ops last, unregister first
2021-02-22 13:29:55 -08:00
Linus Torvalds 99f1a5872b Highlights:
- Update NFSv2 and NFSv3 XDR decoding functions
 - Further improve support for re-exporting NFS mounts
 - Convert NFSD stats to per-CPU counters
 - Add batch Receive posting to the server's RPC/RDMA transport
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAYVsAACgkQM2qzM29m
 f5f1Lg/+IBC7Bhnnc8jNr4nv4IntCwwKdx2VzSzQszbN/kkhLZK89u36nZyqp0RB
 Vg3olyS5DseEisMMx0rI0KkHBz7pz+kXVdOGvve8fHBZvewnJ/FpxNZPChG4aMDc
 mfjHLvDHO0/GoUqSftrBrjSEJ2jHoNdDcmvzgdAlugTuLOjGX3HhmKa3ZYVTNgFn
 kDmFMaEHjS3pb3LqNDHNIYYpNnvtIukxHUh9weDvr+AH8Rmt/WVfjDc26xBS0FQu
 jDJUk9AP06VYgZx0dLKp4In8GJYwz9DNjNrWm91+RyJml9AWrFswdBHHcfi0W/Yy
 GipkBZGYE6ZblyMlITZCB4etyHQsq7qLuqicTlcXjL/Fdkd7xlT8DwFlZ8LjpyCU
 LeHTI2cGzRSJ/JjL2hvhPvT3gR5hln/qk17jSP7V4S6psZAqAEvw/Xa/+MDJhB/b
 vnzltFPvEgZc59Q/SJLbaWZLHy1q0enbrOBLMZDmUlk911/tgAuflHJM60N8o732
 vkfy05pvZlrV0cFY546pQd7zTKZcAOYPVHHoP25wPa2ibKBu6eQ6kZEi5zu+tVK3
 CkvqIhePFspBMQ6GOPKixTiFV4KFoO1HBtk+JEeMkiHXHk1xATCWbg1m7wkaagsq
 NNS/qFkLRnftGYpFViBaxTFBGxiBOSbsTIS/zfj5L7JOpW4FRD4=
 =02xw
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:

 - Update NFSv2 and NFSv3 XDR decoding functions

 - Further improve support for re-exporting NFS mounts

 - Convert NFSD stats to per-CPU counters

 - Add batch Receive posting to the server's RPC/RDMA transport

* tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (65 commits)
  nfsd: skip some unnecessary stats in the v4 case
  nfs: use change attribute for NFS re-exports
  NFSv4_2: SSC helper should use its own config.
  nfsd: cstate->session->se_client -> cstate->clp
  nfsd: simplify nfsd4_check_open_reclaim
  nfsd: remove unused set_client argument
  nfsd: find_cpntf_state cleanup
  nfsd: refactor set_client
  nfsd: rename lookup_clientid->set_client
  nfsd: simplify nfsd_renew
  nfsd: simplify process_lock
  nfsd4: simplify process_lookup1
  SUNRPC: Correct a comment
  svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()
  svcrdma: Reduce Receive doorbell rate
  svcrdma: Deprecate stat variables that are no longer used
  svcrdma: Restore read and write stats
  svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
  svcrdma: Convert rdma_stat_recv to a per-CPU counter
  svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up()
  ...
2021-02-21 10:22:20 -08:00
Trond Myklebust e0a912e8dd SUNRPC: Use TCP_CORK to optimise send performance on the server
Use a counter to keep track of how many requests are queued behind the
xprt->xpt_mutex, and keep TCP_CORK set until the queue is empty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/20210213202532.23146-1-trondmy@kernel.org/T/#u
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-02-16 12:32:31 -05:00
Dave Wysochanski ba6dfce47c SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c.  Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-25 15:59:12 -05:00
Chuck Lever 43042b90ca svcrdma: Reduce Receive doorbell rate
This is similar to commit e340c2d6ef ("xprtrdma: Reduce the
doorbell rate (Receive)") which added Receive batching to the
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever c6226ff9a6 svcrdma: Deprecate stat variables that are no longer used
Clean up. We are not permitted to remove old proc files. Instead,
convert these variables to stubs that are only ever allowed to
display a value of zero.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 1e7e557316 svcrdma: Restore read and write stats
Now that we have an efficient mechanism to update these two stats,
let's start maintaining them again.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 22df5a2246 svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
Avoid the overhead of a memory bus lock cycle for counting a value
that is hardly every used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever df971cd853 svcrdma: Convert rdma_stat_recv to a per-CPU counter
Receives are frequent events. Avoid the overhead of a memory bus
lock cycle for counting a value that is hardly every used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 81d2174743 SUNRPC: Move definition of XDR_UNIT
Clean up: The unit of XDR alignment is defined by RFC 4506,
not as part of the RPC message header. Thus it belongs in
include/linux/sunrpc/xdr.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Chuck Lever 2289e87b59 SUNRPC: Make trace_svc_process() display the RPC procedure symbolically
The next few patches will employ these strings to help make server-
side trace logs more human-readable. A similar technique is already
in use in kernel RPC client code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Linus Torvalds 74f602dc96 NFS client updates for Linux 5.11
Highlights include:
 
 Features:
 - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
   support.
 - A series of patches to improve readdir performance, particularly with
   large directories.
 - Basic support for using NFS/RDMA with the pNFS files and flexfiles
   drivers.
 - Micro-optimisations for RDMA.
 - RDMA tracing improvements.
 
 Bugfixes:
 - Fix a long standing bug with xs_read_xdr_buf() when receiving partial
   pages (Dan Aloni).
 - Various fixes for getxattr and listxattr, when used over non-TCP
   transports.
 - Fixes for containerised NFS from Sargun Dhillon.
 - switch nfsiod to be an UNBOUND workqueue (Neil Brown).
 - READDIR should not ask for security label information if there is no
   LSM policy. (Olga Kornievskaia)
 - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay).
 - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code.
 - A couple of fixes for pnfs/flexfiles read failover
 
 Cleanups:
 - Various cleanups for the SUNRPC xdr code in conjunction with the
   READ_PLUS fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK
 APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1
 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY
 b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz
 A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu
 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq
 eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR
 jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4
 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL
 DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO
 /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx
 MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck=
 =IKWG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
     support

   - A series of patches to improve readdir performance, particularly
     with large directories

   - Basic support for using NFS/RDMA with the pNFS files and flexfiles
     drivers

   - Micro-optimisations for RDMA

   - RDMA tracing improvements

  Bugfixes:

   - Fix a long standing bug with xs_read_xdr_buf() when receiving
     partial pages (Dan Aloni)

   - Various fixes for getxattr and listxattr, when used over non-TCP
     transports

   - Fixes for containerised NFS from Sargun Dhillon

   - switch nfsiod to be an UNBOUND workqueue (Neil Brown)

   - READDIR should not ask for security label information if there is
     no LSM policy (Olga Kornievskaia)

   - Avoid using interval-based rebinding with TCP in lockd (Calum
     Mackay)

   - A series of RPC and NFS layer fixes to support the NFSv4.2
     READ_PLUS code

   - A couple of fixes for pnfs/flexfiles read failover

  Cleanups:

   - Various cleanups for the SUNRPC xdr code in conjunction with the
     READ_PLUS fixes"

* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
  NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
  pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
  NFSv4/pnfs: Add tracing for the deviceid cache
  fs/lockd: convert comma to semicolon
  NFSv4.2: fix error return on memory allocation failure
  NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
  NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
  NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
  NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
  NFSv4.2: decode_read_plus_hole() needs to check the extent offset
  NFSv4.2: decode_read_plus_data() must skip padding after data segment
  NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
  SUNRPC: When expanding the buffer, we may need grow the sparse pages
  SUNRPC: Cleanup - constify a number of xdr_buf helpers
  SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
  SUNRPC: _copy_to/from_pages() now check for zero length
  SUNRPC: Cleanup xdr_shrink_bufhead()
  SUNRPC: Fix xdr_expand_hole()
  SUNRPC: Fixes for xdr_align_data()
  SUNRPC: _shift_data_left/right_pages should check the shift length
  ...
2020-12-17 12:15:03 -08:00
Trond Myklebust f8d0e60f10 SUNRPC: Cleanup - constify a number of xdr_buf helpers
There are a number of xdr helpers for struct xdr_buf that do not change
the structure itself. Mark those as taking const pointers for
documentation purposes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust c4f2f591f0 SUNRPC: Fix xdr_expand_hole()
We do want to try to grow the buffer if possible, but if that attempt
fails, we still want to move the data and truncate the XDR message.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust 9a20f6f4e6 SUNRPC: Fixes for xdr_align_data()
The main use case right now for xdr_align_data() is to shift the page
data to the left, and in practice shrink the total XDR data buffer.
This patch ensures that we fix up the accounting for the buffer length
as we shift that data around.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust c87b056e58 SUNRPC: Remove unused function xprt_load_transport()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:53 -05:00
Trond Myklebust 1fc5f13186 SUNRPC: Add a helper to return the transport identifier given a netid
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:52 -05:00
Trond Myklebust d5aa6b22e2 SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
According to RFC5666, the correct netid for an IPv6 addressed RDMA
transport is "rdma6", which we've supported as a mount option since
Linux-4.7. The problem is when we try to load the module "xprtrdma6",
that will fail, since there is no modulealias of that name.

Fixes: 181342c5eb ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:52 -05:00
Chuck Lever 8918cc0d2b NFSD: Add helper for decoding locker4
Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:38 -05:00
Chuck Lever cbd9abb370 NFSD: Replace READ* macros in nfsd4_decode_commit()
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:36 -05:00
Chuck Lever c1346a1216 NFSD: Replace the internals of the READ_BUF() macro
Convert the READ_BUF macro in nfs4xdr.c from open code to instead
use the new xdr_stream-style decoders already in use by the encode
side (and by the in-kernel NFS client implementation). Once this
conversion is done, each individual NFSv4 argument decoder can be
independently cleaned up to replace these macros with C code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:36 -05:00
Chuck Lever 5191955d6f SUNRPC: Prepare for xdr_stream-style decoding on the server-side
A "permanent" struct xdr_stream is allocated in struct svc_rqst so
that it is usable by all server-side decoders. A per-rqst scratch
buffer is also allocated to handle decoding XDR data items that
cross page boundaries.

To demonstrate how it will be used, add the first call site for the
new svcxdr_init_decode() API.

As an additional part of the overall conversion, add symbolic
constants for successful and failed XDR operations. Returning "0" is
overloaded. Sometimes it means something failed, but sometimes it
means success. To make it more clear when XDR decoding functions
succeed or fail, introduce symbolic constants.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:35 -05:00
Chuck Lever 0ae4c3e8a6 SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()
Clean up: De-duplicate some frequently-used code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:35 -05:00
Chuck Lever d96962e6d0 svcrdma: Use the new parsed chunk list when pulling Read chunks
As a pre-requisite for handling multiple Read chunks in each Read
list, convert svc_rdma_recv_read_chunk() to use the new parsed Read
chunk list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever 7954c8503b svcrdma: Remove chunk list pointers
Clean up: These pointers are no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever 41bc163ffe svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunk
Refactor svc_rdma_send_reply_chunk() so that it Sends only the parts
of rq_res that do not contain a result payload.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever 2371bcc056 svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg()
Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only
the parts of rq_res that do not contain a result payload.

This change has been tested to confirm that it does not cause a
regression in the no Write chunk and single Write chunk cases.
Multiple Write chunks have not been tested.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever 9d0b09d5ef svcrdma: Support multiple write chunks when pulling up
When counting the number of SGEs needed to construct a Send request,
do not count result payloads. And, when copying the Reply message
into the pull-up buffer, result payloads are not to be copied to the
Send buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever 7a1cbfa180 svcrdma: Use parsed chunk lists to construct RDMA Writes
Refactor: Instead of re-parsing the ingress RPC Call transport
header when constructing RDMA Writes, use the new parsed chunk lists
for the Write list and Reply chunk, which are version-agnostic and
already XDR-decoded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever 58b2e0fefa svcrdma: Use parsed chunk lists to detect reverse direction replies
Refactor: Don't duplicate header decoding smarts here. Instead, use
the new parsed chunk lists.

Note that the XID sanity test is also removed. The XID is already
looked up by the cb handler, and is rejected if it's not recognized.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever 78147ca8b4 svcrdma: Add a "parsed chunk list" data structure
This simple data structure binds the location of each data payload
inside of an RPC message to the chunk that will be used to push it
to or pull it from the client.

There are several benefits to this small additional overhead:

 * It enables support for more than one chunk in incoming Read and
   Write lists.

 * It translates the version-specific on-the-wire format into a
   generic in-memory structure, enabling support for multiple
   versions of the RPC/RDMA transport protocol.

 * It enables the server to re-organize a chunk list if it needs to
   adjust where Read chunk data lands in server memory without
   altering the contents of the XDR-encoded Receive buffer.

Construction of these lists is done while sanity checking each
incoming RPC/RDMA header. Subsequent patches will make use of the
generated data structures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever f6ad77590a svcrdma: Post RDMA Writes while XDR encoding replies
The only RPC/RDMA ordering requirement between RDMA Writes and RDMA
Sends is that the responder must post the Writes on the Send queue
before posting the Send that conveys the RPC Reply for that Write
payload.

The Linux NFS server implementation now has a transport method that
can post result Payload Writes earlier than svc_rdma_sendto:

   ->xpo_result_payload()

This gets RDMA Writes going earlier so they are more likely to be
complete at the remote end before the Send completes.

Some care must be taken with pulled-up Replies. We don't want to
push the Write chunk and then send the same payload data via Send.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever 03493bca08 SUNRPC: Rename svc_encode_read_payload()
Clean up: "result payload" is a less confusing name for these
payloads. "READ payload" reflects only the NFS usage.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:21 -05:00
Chuck Lever 5a7e702670 SUNRPC: Adjust synopsis of xdr_buf_subsegment()
Clean up: This enables xdr_buf_subsegment()'s callers to pass in a
const pointer to that buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:21 -05:00
Linus Torvalds 24717cfbbb The one new feature this time, from Anna Schumaker, is READ_PLUS, which
has the same arguments as READ but allows the server to return an array
 of data and hole extents.
 
 Otherwise it's a lot of cleanup and bugfixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl+Q5vsVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUAP/RlALnXbaoWi8YCcEcc9U1LoQKbD
 CJpDR+FqCOyGwRuzWung/5pvkOO50fGEeAroos+2rF/NgRkQq8EFr9AuBhNOYUFE
 IZhWEOfu/r2ukXyBmcu21HGcWLwPnyJehvjuzTQW2wOHlBi/sdoL5Ap1sVlwVLj5
 EZ5kqJLD+ioG2sufW99Spi55l1Cy+3Y0IhLSWl4ZAE6s8hmFSYAJZFsOeI0Afx57
 USPTDRaeqjyEULkb+f8IhD0eRApOUo4evDn9dwQx+of7HPa1CiygctTKYwA3hnlc
 gXp2KpVA1REaiYVgOPwYlnqBmJ2K9X0wCRzcWy2razqEcVAX/2j7QCe9M2mn4DC8
 xZ2q4SxgXu9yf0qfUSVnDxWmP6ipqq7OmsG0JXTFseGKBdpjJY1qHhyqanVAGvEg
 I+xHnnWfGwNCftwyA3mt3RfSFPsbLlSBIMZxvN4kn8aVlqszGITOQvTdQcLYA6kT
 xWllBf4XKVXMqF0PzerxPDmfzBfhx6b1VPWOIVcu7VLBg3IXoEB2G5xG8MUJiSch
 OUTCt41LUQkerQlnzaZYqwmFdSBfXJefmcE/x/vps4VtQ/fPHX1jQyD7iTu3HfSP
 bRlkKHvNVeTodlBDe/HTPiTA99MShhBJyvtV5wfzIqwjc1cNreed+ePppxn8mxJi
 SmQ2uZk/MpUl7/V0
 =rcOj
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "The one new feature this time, from Anna Schumaker, is READ_PLUS,
  which has the same arguments as READ but allows the server to return
  an array of data and hole extents.

  Otherwise it's a lot of cleanup and bugfixes"

* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
  NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
  SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
  sunrpc: raise kernel RPC channel buffer size
  svcrdma: fix bounce buffers for unaligned offsets and multiple pages
  nfsd: remove unneeded break
  net/sunrpc: Fix return value for sysctl sunrpc.transports
  NFSD: Encode a full READ_PLUS reply
  NFSD: Return both a hole and a data segment
  NFSD: Add READ_PLUS hole segment encoding
  NFSD: Add READ_PLUS data support
  NFSD: Hoist status code encoding into XDR encoder functions
  NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
  NFSD: Remove the RETURN_STATUS() macro
  NFSD: Call NFSv2 encoders on error returns
  NFSD: Fix .pc_release method for NFSv2
  NFSD: Remove vestigial typedefs
  NFSD: Refactor nfsd_dispatch() error paths
  NFSD: Clean up nfsd_dispatch() variables
  NFSD: Clean up stale comments in nfsd_dispatch()
  NFSD: Clean up switch statement in nfsd_dispatch()
  ...
2020-10-22 09:44:27 -07:00
Linus Torvalds 59f0e7eb2f NFS Client Updates for Linux 5.10
- Stable Fixes:
   - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+
   - Fix nfs_path in case of a rename retry
   - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag
 
 - New features and improvements:
   - Replace dprintk() calls with tracepoints
   - Make cache consistency bitmap dynamic
   - Added support for the NFS v4.2 READ_PLUS operation
   - Improvements to net namespace uniquifier
 
 - Other bugfixes and cleanups
   - Remove redundant clnt pointer
   - Don't update timeout values on connection resets
   - Remove redundant tracepoints
   - Various cleanups to comments
   - Fix oops when trying to use copy_file_range with v4.0 source server
   - Improvements to flexfiles mirrors
   - Add missing "local_lock=posix" mount option
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl+PJv0ACgkQ18tUv7Cl
 QOtB+hAAza4BwQSAZs0QbszPcoNV0f0qND99hWcJ2ad5KCr8S5eNYaMqB8G8lbS+
 Ahuv6xRy69l2vjHvINXoSaSiXClYNAlAr5myiW0DqMLDpVV8VEMAhqgFJK97VpqQ
 AsbsdFwC4xAcQ+6nJ+IfK9f8AOYhvARkOP3171qkq0jX3Eq4j1IiQARn+JrGFZFL
 waC4UtVhCnx5z5pg7jyu7Ar4N0Xky72+Fm1EvRog4gndX2JbNNSkwavD33mWZ8iN
 3q6DRq/AtEk9F4ttPwVeALvpNCBqjoGcNw5FCBil7x4V+xIbDI2FBhusKas3GzXm
 W2tyUuJMTGpA6ZjkyYDcg0jC8vKUiUpMWgT3oZoQ/6g7P4vPzB/XB/RQcAInMRjS
 /MhWZ3YNQmApnVCIZSwOa9+oNUc69dM0+vqmesLvvb/aNjzRnGwiJNiuWGyrwoGX
 3SzpPUcJixa3+eKLi9uEHojKB+SuPIrB/HW4UYh/ZpOMyilVqUnX8RYbaK3qQ/gK
 Un++LjTmao9cXyjIDCiHJB630W01YSWXq1nwuSYkLuXCuALmmtgA4z8pm+4K+w+G
 SctETKVcD0qi3Yx5mFdU5A1dHXMB7nzYNaiRWYLvLO5qAR33nUPI4lV8rog2TUPe
 2iW4n0j2YxnTlwR3QMPYJRndP7KeR5XvOoNuh3XpKRVbmbc7/A4=
 =6RM5
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable Fixes:
   - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+
   - Fix nfs_path in case of a rename retry
   - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag

  New features and improvements:
   - Replace dprintk() calls with tracepoints
   - Make cache consistency bitmap dynamic
   - Added support for the NFS v4.2 READ_PLUS operation
   - Improvements to net namespace uniquifier

  Other bugfixes and cleanups:
   - Remove redundant clnt pointer
   - Don't update timeout values on connection resets
   - Remove redundant tracepoints
   - Various cleanups to comments
   - Fix oops when trying to use copy_file_range with v4.0 source server
   - Improvements to flexfiles mirrors
   - Add missing 'local_lock=posix' mount option"

* tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits)
  NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag
  NFSv4: Fix up RCU annotations for struct nfs_netns_client
  NFS: Only reference user namespace from nfs4idmap struct instead of cred
  nfs: add missing "posix" local_lock constant table definition
  NFSv4: Use the net namespace uniquifier if it is set
  NFSv4: Clean up initialisation of uniquified client id strings
  NFS: Decode a full READ_PLUS reply
  SUNRPC: Add an xdr_align_data() function
  NFS: Add READ_PLUS hole segment decoding
  SUNRPC: Add the ability to expand holes in data pages
  SUNRPC: Split out _shift_data_right_tail()
  SUNRPC: Split out xdr_realign_pages() from xdr_align_pages()
  NFS: Add READ_PLUS data segment support
  NFS: Use xdr_page_pos() in NFSv4 decode_getacl()
  SUNRPC: Implement a xdr_page_pos() function
  SUNRPC: Split out a function for setting current page
  NFS: fix nfs_path in case of a rename retry
  fs: nfs: return per memcg count for xattr shrinkers
  NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE
  nfs: remove incorrect fallthrough label
  ...
2020-10-20 13:26:30 -07:00
Anna Schumaker e6ac0accb2 SUNRPC: Add an xdr_align_data() function
For now, this function simply aligns the data at the beginning of the
pages. This can eventually be expanded to shift data to the correct
offsets when we're ready.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:40 -04:00
Anna Schumaker 84ce182ab8 SUNRPC: Add the ability to expand holes in data pages
This patch adds the ability to "read a hole" into a set of XDR data
pages by taking the following steps:

1) Shift all data after the current xdr->p to the right, possibly into
   the tail,
2) Zero the specified range, and
3) Update xdr->p to point beyond the hole.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker cf1f08cac3 SUNRPC: Implement a xdr_page_pos() function
I'll need this for READ_PLUS to help figure out the offset where page
data is stored at, but it might also be useful for other things.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker 403217f304 SUNRPC/NFSD: Implement xdr_reserve_space_vec()
Reserving space for a large READ payload requires special handling when
reserving space in the xdr buffer pages. One problem we can have is use
of the scratch buffer, which is used to get a pointer to a contiguous
region of data up to PAGE_SIZE. When using the scratch buffer, calls to
xdr_commit_encode() shift the data to it's proper alignment in the xdr
buffer. If we've reserved several pages in a vector, then this could
potentially invalidate earlier pointers and result in incorrect READ
data being sent to the client.

I get around this by looking at the amount of space left in the current
page, and never reserve more than that for each entry in the read
vector. This lets us place data directly where it needs to go in the
buffer pages.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-25 18:01:27 -04:00
Alexander A. Klimov 0bdd4cea12 Replace HTTP links with HTTPS ones: NFS, SUNRPC, and LOCKD clients
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:10 -04:00
Randy Dunlap 1138ce1cf6 sunrpc: fix duplicated word in <linux/sunrpc/cache.h>
Change "time time" to "time expiry_time" to match the field name.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:10 -04:00
Ard Biesheuvel e33d2a7b30 SUNRPC: remove RC4-HMAC-MD5 support from KerberosV
The RC4-HMAC-MD5 KerberosV algorithm is based on RFC 4757 [0], which
was specifically issued for interoperability with Windows 2000, but was
never intended to receive the same level of support. The RFC says

  The IETF Kerberos community supports publishing this specification as
  an informational document in order to describe this widely
  implemented technology.  However, while these encryption types
  provide the operations necessary to implement the base Kerberos
  specification [RFC4120], they do not provide all the required
  operations in the Kerberos cryptography framework [RFC3961].  As a
  result, it is not generally possible to implement potential
  extensions to Kerberos using these encryption types.  The Kerberos
  encryption type negotiation mechanism [RFC4537] provides one approach
  for using such extensions even when a Kerberos infrastructure uses
  long-term RC4 keys.  Because this specification does not implement
  operations required by RFC 3961 and because of security concerns with
  the use of RC4 and MD4 discussed in Section 8, this specification is
  not appropriate for publication on the standards track.

  The RC4-HMAC encryption types are used to ease upgrade of existing
  Windows NT environments, provide strong cryptography (128-bit key
  lengths), and provide exportable (meet United States government
  export restriction requirements) encryption.  This document describes
  the implementation of those encryption types.

Furthermore, this RFC was re-classified as 'historic' by RFC 8429 [1] in
2018, stating that 'none of the encryption types it specifies should be
used'

Note that other outdated algorithms are left in place (some of which are
guarded by CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES), so this should only
adversely affect interoperability with Windows NT/2000 systems that have
not received any updates since 2008 (but are connected to a network
nonetheless)

[0] https://tools.ietf.org/html/rfc4757
[1] https://tools.ietf.org/html/rfc8429

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11 14:39:15 +10:00
Linus Torvalds 37711e5e23 NFS client updates for Linux 5.9
Highlights include:
 
 Stable fixes:
 - pNFS: Don't return layout segments that are being used for I/O
 - pNFS: Don't move layout segments off the active list when being used for I/O
 
 Features:
 - NFS: Add support for user xattrs through the NFSv4.2 protocol
 - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
 - NFSv4.0 allow nconnect for v4.0
 
 Bugfixes and cleanups:
 - nfs: ensure correct writeback errors are returned on close()
 - nfs: nfs_file_write() should check for writeback errors
 - nfs: Fix getxattr kernel panic and memory overflow
 - NFS: Fix the pNFS/flexfiles mirrored read failover code
 - SUNRPC: dont update timeout value on connection reset
 - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
 - sunrpc: destroy rpc_inode_cachep after unregister_filesystem
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl83CYEACgkQZwvnipYK
 APLx4g/9EQNTG5VkUToElRHs4b3vFxbmh9Odnk/JwPHxY5GQ8/AyGqwWHBgMZc7/
 2AV/C83pk7pJsDNsKbVAaFCT1cwjmItHM63vKJYGBYbE8LhkZ/1sYkdtPBYwHoVl
 7CWfpVY/4NjYw5GJfrVA5Y0m7lrQInRtIMzfaENw2tpw+/cKUpadxgEJltzFNvpa
 Ploinr1ZRBl1tvfeHNRP5ZMPk2AfgGWtQKQ/b2UWUk5tXALoQm2Eu+/oku39uqhy
 hW5tCbU2BzR91gg5JwF9n7VowkCHXfe7lMzDBTVfwZOELOmoyys/1wKv550FWcWi
 yymljWiPGZOnXGT1vfKptPESQjdtElMfanvEZ0BzS+yNR0HZGnIupaxGlYlG9ZGU
 2sXHQPp3mk2Q+L1IgbTSCnSju0YlZo32JQpYCZiROjIXnPWPQ50YNhr8GL18M1FW
 hTeShg2avWH+59GB6moEBmsuvui7Dy1jkimblToLEoGJ4kbvEl72FYSqTCkAXXbB
 rVUzhmJFgfk/EOS4d+QKJoBqNzw3aw79wyT7PLkoCYBqPZBHQexlmI+ktMbgUEdw
 c/fM7l5/Vcb9weIHKzul2Jbk5q6bFME/xPnIkr3v/oKIFlLFwQ04BX6R/42AMJHw
 V5Q9Wp81Vy6RXQMn8P0ZMeY0WQC/rhpijOEVMUC+Ni+spz44AdM=
 =k4pE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - pNFS: Don't return layout segments that are being used for I/O
   - pNFS: Don't move layout segments off the active list when being used for I/O

  Features:
   - NFS: Add support for user xattrs through the NFSv4.2 protocol
   - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
   - NFSv4.0 allow nconnect for v4.0

  Bugfixes and cleanups:
   - nfs: ensure correct writeback errors are returned on close()
   - nfs: nfs_file_write() should check for writeback errors
   - nfs: Fix getxattr kernel panic and memory overflow
   - NFS: Fix the pNFS/flexfiles mirrored read failover code
   - SUNRPC: dont update timeout value on connection reset
   - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
   - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"

* tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
  NFS: Fix flexfiles read failover
  fs: nfs: delete repeated words in comments
  rpc_pipefs: convert comma to semicolon
  nfs: Fix getxattr kernel panic and memory overflow
  NFS: Don't return layout segments that are in use
  NFS: Don't move layouts to plh_return_segs list while in use
  NFS: Add layout segment info to pnfs read/write/commit tracepoints
  NFS: Add tracepoints for layouterror and layoutstats.
  NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
  SUNRPC dont update timeout value on connection reset
  nfs: nfs_file_write() should check for writeback errors
  nfs: ensure correct writeback errors are returned on close()
  NFSv4.2: xattr cache: get rid of cache discard work queue
  NFS: remove redundant initialization of variable result
  NFSv4.0 allow nconnect for v4.0
  freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
  sunrpc: destroy rpc_inode_cachep after unregister_filesystem
  NFSv4.2: add client side xattr caching.
  NFSv4.2: hook in the user extended attribute handlers
  NFSv4.2: add the extended attribute proc functions.
  ...
2020-08-15 08:26:55 -07:00
Olga Kornievskaia 7de62bc09f SUNRPC dont update timeout value on connection reset
Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply.

Instead, this patch proposes to keep track off when the minor timeout
should happen and if it didn't, then don't update the new timeout.
Value is updated based on the previous value to make timeouts
predictable.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-04 23:17:11 -04:00
Chuck Lever 17f70f8dd5 svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
First, refactor: Dereference the svc_rdma_send_ctxt inside
svc_rdma_send() instead of at every call site.

Then, it can be passed into trace_svcrdma_post_send() to get the
proper completion ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever 3ac56c2fb1 svcrdma: Introduce Send completion IDs
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used
to match an incoming Send completion to a transport and to a
previous ib_post_send().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever 9b3bcf8c5c svcrdma: Introduce Receive completion IDs
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used
to match an incoming Receive completion to a transport and to a
previous ib_post_recv().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever f7bd657b55 svcrdma: Introduce infrastructure to support completion IDs
The goal is to replace CQE kernel memory addresses in completion-
related tracepoints.

Each completion ID matches an incoming Send or Receive completion
to a Completion Queue and to a previous ib_post_*(). The ID can
then be displayed in an error message or recorded in a trace
record.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever 379c3bc6b4 svcrdma: Add common XDR encoders for RDMA and Read segments
Clean up: De-duplicate some code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever f60a08697d svcrdma: Add common XDR decoders for RDMA and Read segments
Clean up: De-duplicate some code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever 07e9a6325a SUNRPC: Add helpers for decoding list discriminators symbolically
Use these helpers in a few spots to demonstrate their use.

The remaining open-coded discriminator checks in rpcrdma will be
addressed in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever 0b8dc1b699 svcrdma: Remove declarations for functions long removed
Pavane pour une infante défunte.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever c65b326b1e svcrdma: Make svc_rdma_send_error_msg() a global function
Prepare for svc_rdma_send_error_msg() to be invoked from another
source file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Linus Torvalds a539568299 NFS Client Updates for Linux 5.8
New features and improvements:
 - Sunrpc receive buffer sizes only change when establishing a GSS credentials
 - Add more sunrpc tracepoints
 - Improve on tracepoints to capture internal NFS I/O errors
 
 Other bugfixes and cleanups:
 - Move a dprintk() to after a call to nfs_alloc_fattr()
 - Fix off-by-one issues in rpc_ntop6
 - Fix a few coccicheck warnings
 - Use the correct SPDX license identifiers
 - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
 - Replace zero-length array with flexible array
 - Remove duplicate headers
 - Set invalid blocks after NFSv4 writes to update space_used attribute
 - Fix direct WRITE throughput regression
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl7ibyIACgkQ18tUv7Cl
 QOsOHBAA1A1stYld0gOhKZtMqxRJi3fnJ5mgroLGtyVQe8uAjpD8Ib1oRleC4MJq
 ifpYPozIhMZQCvDiGTAKJ8629OYiXGrN8D5nV6Y2tEGpu5wYv98MyZlU9Y8rVzCP
 5vsIMUp5XH8y2wYO8k7fDPPxWNH9Ax89wz5OI16mZxgY/LDm4ojZq+pGbYnWZa4w
 oK6Efa66z7yQkPV8oIWuvLe1zZYWGAPibBEwJbrvUWyfygB3owI36sc6nuiEQM+4
 hD3h5UtVn8BnudUqvLLa21rnQROMFpgYf4Q/2A1UaNfyRAPoPXMztECBSEYXO0L4
 saiMc5o/yTTBCC0ZjV1F+xuGQzMgSQ83KOdbr+a+upvBeFpBynJxccdvMTDEam+q
 rl7Ypdc42CsTZ1aVWG/AoIk6GENzR0tXqNR6BcDjYG/yRWvnt/RIZlp6G67IbtRH
 b9we+3MbI/lTBoCFGahkkBYO3elTNwilxH3pWcRi8ehNn0GPjlLqHePR17Tmq1tL
 QycDlm7QB1m5xNsOOLaBoB4SyguPV0SBprZJ4yYU1B3KC3bGurZVK3+TSLXQrO9V
 12RLDt4AOGr0TlctBIhNbkGp8xHY6Dg7HgbdjdrVq8Y9YCfg0C37789BnZA5nVxF
 4L101lsTI0puymh+MwmhiyOvCldn30f+MjuWJSm17Id+eRIxYj4=
 =a84h
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New features and improvements:
   - Sunrpc receive buffer sizes only change when establishing a GSS credentials
   - Add more sunrpc tracepoints
   - Improve on tracepoints to capture internal NFS I/O errors

  Other bugfixes and cleanups:
   - Move a dprintk() to after a call to nfs_alloc_fattr()
   - Fix off-by-one issues in rpc_ntop6
   - Fix a few coccicheck warnings
   - Use the correct SPDX license identifiers
   - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
   - Replace zero-length array with flexible array
   - Remove duplicate headers
   - Set invalid blocks after NFSv4 writes to update space_used attribute
   - Fix direct WRITE throughput regression"

* tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
  NFS: Fix direct WRITE throughput regression
  SUNRPC: rpc_xprt lifetime events should record xprt->state
  xprtrdma: Make xprt_rdma_slot_table_entries static
  nfs: set invalid blocks after NFSv4 writes
  NFS: remove redundant initialization of variable result
  sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
  NFS: Add a tracepoint in nfs_set_pgio_error()
  NFS: Trace short NFS READs
  NFS: nfs_xdr_status should record the procedure name
  SUNRPC: Set SOFTCONN when destroying GSS contexts
  SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
  SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
  SUNRPC: trace RPC client lifetime events
  SUNRPC: Trace transport lifetime events
  SUNRPC: Split the xdr_buf event class
  SUNRPC: Add tracepoint to rpc_call_rpcerror()
  SUNRPC: Update the RPC_SHOW_SOCKET() macro
  SUNRPC: Update the rpc_show_task_flags() macro
  SUNRPC: Trace GSS context lifetimes
  SUNRPC: receive buffer size estimation values almost never change
  ...
2020-06-11 12:22:41 -07:00
Chuck Lever 53bc19f17f SUNRPC: receive buffer size estimation values almost never change
Avoid unnecessary cache sloshing by placing the buffer size
estimation update logic behind an atomic bit flag.

The size of GSS information included in each wrapped Reply does
not change during the lifetime of a GSS context. Therefore, the
au_rslack and au_ralign fields need to be updated only once after
establishing a fresh GSS credential.

Thus a slack size update must occur after a cred is created,
duplicated, renewed, or expires. I'm not sure I have this exactly
right. A trace point is introduced to track updates to these
variables to enable troubleshooting the problem if I missed a spot.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:47 -04:00
NeilBrown 24c5efe41c sunrpc: clean up properly in gss_mech_unregister()
gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each
flavour, but gss_mech_unregister() does not call auth_domain_put().
This is unbalanced and makes it impossible to reload the module.

Change svcauth_gss_register_pseudoflavor() to return the registered
auth_domain, and save it for later release.

Cc: stable@vger.kernel.org (v2.6.12+)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-05-28 18:15:00 -04:00
J. Bruce Fields 6670ee2ef2 Merge branch 'nfsd-5.8' of git://linux-nfs.org/~cel/cel-2.6 into for-5.8-incoming
Highlights of this series:
* Remove serialization of sending RPC/RDMA Replies
* Convert the TCP socket send path to use xdr_buf::bvecs (pre-requisite for
RPC-on-TLS)
* Fix svcrdma backchannel sendto return code
* Convert a number of dprintk call sites to use tracepoints
* Fix the "suggest braces around empty body in an 'else' statement" warning
2020-05-21 10:58:15 -04:00
Chuck Lever ca07eda33e SUNRPC: Refactor svc_recvfrom()
This function is not currently "generic" so remove the documenting
comment and rename it appropriately. Its internals are converted to
use bio_vecs for reading from the transport socket.

In existing typical sunrpc uses of bio_vecs, the bio_vec array is
allocated dynamically. Here, instead, an array of bio_vecs is added
to svc_rqst. The lifetime of this array can be greater than one call
to xpo_recvfrom():

- Multiple calls to xpo_recvfrom() might be needed to read an RPC
  message completely.

- At some later point, rq_arg.bvecs will point to this array and it
  will carry the received message into svc_process().

I also expect that a future optimization will remove either the
rq_vec or rq_pages array in favor of rq_bvec, thus conserving the
size of struct svc_rqst.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-20 17:30:12 -04:00
Chuck Lever 02648908d1 SUNRPC: Rename svc_sock::sk_reclen
Clean up. I find the name of the svc_sock::sk_reclen field
confusing, so I've changed it to better reflect its function. This
field is not read directly to get the record length. Rather, it is
a buffer containing a record marker that needs to be decoded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:23 -04:00
Chuck Lever 08e3c9f181 svcrdma: Remove the SVCRDMA_DEBUG macro
Clean up: Commit d21b05f101 ("rdma: SVCRMDA Header File")
introduced the SVCRDMA_DEBUG macro, but it doesn't seem to have been
used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:22 -04:00