IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit
without freeing the list of invalidated layout segments, leading
to a reference leak.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the client holds no more writeable open state, and does not hold a
write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE.
We do this only for writes, since some layout drivers may require you to
also hold a read layout if you are doing a R/W workload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Add a callback to allow the flexfiles layout driver to initialise the
layout private payload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cleanup to allow layout drivers to attach private data to layoutreturn,
and manage the data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that the layout state bits are synced when we cache a layout
segment for layoutreturn using an appropriate call to
pnfs_set_plh_return_info.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of
whether or not there are layout segments pending.
Furthermore, we should ensure that we leave the plh_return_segs list
empty.
This patch fixes a memory leak of the layout segments on plh_return_segs.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When the layout state is invalidated, then so is the layout segment
state, and hence we do need to clean up the state bits.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Amend the pnfs return on close helper functions to enable sending the
layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between
CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the
client and the server to agree on whether or not there is an outstanding
layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The layoutreturn call will take care of invalidating the layout segments
once the call is successful.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
There is no change to the value of NFS_LAYOUT_RETURN, so we should
not be waking up the RPC call.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fix a potential race with CB_LAYOUTRECALL in which the server recalls the
remaining layout segments while our LAYOUTRETURN is still in transit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We may want to process and transmit layout stat information for the
layout segments that are being returned, so we should defer freeing
them until after the layoutreturn has completed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Both pnfs.c and the flexfiles code have their own versions of the
range intersection testing, and the "end_offset" helper.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.
Fixes: 500d701f33 ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If there is an I/O error, we should not call LAYOUTGET until the
LAYOUTRETURN that reports the error is complete.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
If the server sends us a completely new stateid, and the client thinks
it already holds a layout, then force a retry of the LAYOUTGET after
invalidating the existing layout in order to avoid corruption due to
races.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We must ensure that we don't schedule a layoutreturn if the layout stateid
has been marked as invalid.
Fixes: 2a59a04116 ("pNFS: Fix pnfs_set_layout_stateid() to clear...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
If we no longer hold any layout segments, we're normally expected to
consider the layout stateid to be invalid. However we cannot assume this
if we're about to, or in the process of sending a layoutreturn.
Fixes: 334a8f3711 ("pNFS: Don't forget the layout stateid if...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor
while holding a reference to a layout segment, as that can deadlock
pnfs_update_layout().
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
We used to check for a valid layout type id before verifying pNFS flags
as an indicator for if we are using pNFS. This changed in 3132e49ece
with the introduction of multiple layout types, since now we are passing
an array of ids instead of just one. Since then, users have been seeing
a KERN_ERR printk show up whenever mounting NFS v4 without pNFS. This
patch restores the original behavior of exiting set_pnfs_layoutdriver()
early if we aren't using pNFS.
Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo
structure")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fix the code so that we always mark the atime as invalid in nfs4_read_done().
Currently, the expectation appears to be that the pNFS drivers should always
do this, with the result that most of them don't.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Currently, the layout driver selection code always chooses the first one
from the list. That's not really ideal however, as the server can send
the list of layout types in any order that it likes. It's up to the
client to select the best one for its needs.
This patch adds an ordered list of preferred driver types and has the
selection code sort the list of available layout drivers according to it.
Any unrecognized layout type is sorted to the end of the list.
For now, the order of preference is hardcoded, but it should be possible
to make this configurable in the future.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Current NFSv4.1/pNFS client assumes that MDS supports only one layout
type. While it's true for most existing servers, nevertheless, this can
be change in the near future.
For now, this patch just plumbs in the ability to track a list of
layouts in the fsinfo structure. The existing behavior of the client
is preserved, by having it just select the first entry in the list.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If there are outstanding LAYOUTGET rpc calls, then we want to ensure that
we keep the layout stateid around so we that don't inadvertently pick up
an old/misordered sequence id.
The race is as follows:
Client Server
====== ======
LAYOUTGET(seqid)
LAYOUTGET(seqid)
return LAYOUTGET(seqid+1)
return LAYOUTGET(seqid+2)
process LAYOUTGET(seqid+2)
forget layout
process LAYOUTGET(seqid+1)
If it forgets the layout stateid before processing seqid+1, then
the client will not check the layout->plh_barrier, and so will set
the stateid with seqid+1.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the layout was marked as invalid, we want to ensure to initialise
the layout header fields correctly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.
Client Server
====== ======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
LAYOUTGET return seqid=X+1
LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
Forget layout stateid
Process LAYOUTGET
Set seqid=X+1
The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.
When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...
Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that the client conforms to the normative behaviour described in
RFC5661 Section 12.7.2: "If a client believes its lease has expired,
it MUST NOT send I/O to the storage device until it has validated its
lease."
So ensure that we wait for the lease to be validated before using
the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.20+
...instead of splitting the initialisation over init_lseg() and
pnfs_layout_process().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.
Do this by calling pnfs_mark_layout_stateid_invalid().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.
The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Instead of giving up altogether and falling back to doing I/O
through the MDS, which may make the situation worse, wait for
2 lease periods for the callback to resolve itself, and then
try destroying the existing layout.
Only if this was an attempt at getting a first layout, do we
give up altogether, as the server is clearly crazy.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
They are not the same error, and need to be handled differently.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>