Pull pnfs/ore fixes from Boaz Harrosh:
"These are catastrophic fixes to the pnfs objects-layout that were just
discovered. They are also destined for @stable.
I have found these and worked on them at around RC1 time but
unfortunately went to the hospital for kidney stones and had a very
slow recovery. I refrained from sending them as is, before proper
testing, and surly I have found a bug just yesterday.
So now they are all well tested, and have my sign-off. Other then
fixing the problem at hand, and assuming there are no bugs at the new
code, there is low risk to any surrounding code. And in anyway they
affect only these paths that are now broken. That is RAID5 in pnfs
objects-layout code. It does also affect exofs (which was not broken)
but I have tested exofs and it is lower priority then objects-layout
because no one is using exofs, but objects-layout has lots of users."
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
pnfs-obj: don't leak objio_state if ore_write/read fails
ore: Unlock r4w pages in exact reverse order of locking
ore: Remove support of partial IO request (NFS crash)
ore: Fix NFS crash by supporting any unaligned RAID IO
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.
Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.
Fix both birds, by returning a global zero_page, if offset is beyond
i_size.
TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.
[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
... yet. Right now, init_nfs() is calling this function if an error is
encountered when loading the nfs module. An __exit function can't be
called from one declared as __init.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
These functions are only needed by NFS v4, so they can be moved into a
v4 specific file.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This allows me to move the v4 mounting and unmounting functions out of
the generic client and into a file that is only compiled when CONFIG_NFS_V4
is enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
v2 and v3 shared a function for this, but v4 implemented something only
slightly different. Might as well share code whenever possible...
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
These functions are specific to NFS v4 and can be moved to nfs4client.c
to keep them out of the generic client.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
And split these functions out of the generic client into a v4 specific
file.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch moves the NFS v4 file functions into a new file that is only
compiled when CONFIG_NFS_V4 is enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
And split them out of the generic client into their own file.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I want to initialize all of NFS v4 in a single function that will
eventually be used as the v4 module init function.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS v4 file inode operations are already already in nfs4proc.c, so
this patch just needs to move the directory operations to the same file.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch moves the NFS v3 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V3 is enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch moves the NFS v2 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V2 is enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For NFSv4 minor version 0, currently the cl_id_uniquifier allows the
Linux client to generate a unique nfs_client_id4 string whenever a
server replies with NFS4ERR_CLID_INUSE.
This implementation seems to be based on a flawed reading of RFC
3530. NFS4ERR_CLID_INUSE actually means that the client has presented
this nfs_client_id4 string with a different principal at some time in
the past, and that lease is still in use on the server.
For a Linux client this might be rather difficult to achieve: the
authentication flavor is named right in the nfs_client_id4.id
string. If we change flavors, we change strings automatically.
So, practically speaking, NFS4ERR_CLID_INUSE means there is some other
client using our string. There is not much that can be done to
recover automatically. Let's make it a permanent error.
Remove the recovery logic in nfs4_proc_setclientid(), and remove the
cl_id_uniquifier field from the nfs_client data structure. And,
remove the authentication flavor from the nfs_client_id4 string.
Keeping the authentication flavor in the nfs_client_id4.id string
means that we could have a separate lease for each authentication
flavor used by mounts on the client. But we want just one lease for
all the mounts on this client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 state recovery is not always successful. Failure is signalled
by setting the nfs_client.cl_cons_state to a negative (errno) value,
then waking waiters.
Currently this can happen only during mount processing. I'm about to
add an explicit case where state recovery failure during normal
operation should force all NFS requests waiting on that state recovery
to exit.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors. This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.
Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register(). And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).
Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.
A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any. I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.
This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Squelch compiler warnings:
fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
acl data", Dec 7, 2011.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
As a finishing touch, add appropriate documenting comments and some
debugging printk's.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Instead of open-coded flag manipulation, use test_bit() and
clear_bit() just like all other accessors of the state->flag field.
This also eliminates several unnecessary implicit integer type
conversions.
To make it absolutely clear what is going on, a number of comments
are introduced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The "state->flags & flags" test in nfs41_check_expired_stateid()
allows the state manager to squelch a TEST_STATEID operation when
it is known for sure that a state ID is no longer valid. If the
lease was purged, for example, the client already knows that state
ID is now defunct.
But open recovery is still needed for that inode.
To force a call to nfs4_open_expired(), change the default return
value for nfs41_check_expired_stateid() to force open recovery, and
the default return value for nfs41_check_locks() to force lock
recovery, if the requested flags are clear. Fix suggested by Bryan
Schumaker.
Also, the presence of a delegation state ID must not prevent normal
open recovery. The delegation state ID must be cleared if it was
revoked, but once cleared I don't think it's presence or absence has
any bearing on whether open recovery is still needed. So the logic
is adjusted to ignore the TEST_STATEID result for the delegation
state ID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The result of a TEST_STATEID operation can indicate a few different
things:
o If NFS_OK is returned, then the client can continue using the
state ID under test, and skip recovery.
o RFC 5661 says that if the state ID was revoked, then the client
must perform an explicit FREE_STATEID before trying to re-open.
o If the server doesn't recognize the state ID at all, then no
FREE_STATEID is needed, and the client can immediately continue
with open recovery.
Let's err on the side of caution: if the server clearly tells us the
state ID is unknown, we skip the FREE_STATEID. For any other error,
we issue a FREE_STATEID. Sometimes that FREE_STATEID will be
unnecessary, but leaving unused state IDs on the server needlessly
ties up resources.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The TEST_STATEID and FREE_STATEID operations can return
-NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION.
nfs41_{test,free}_stateid() should not pass these errors to
nfs4_handle_exception() during state recovery, since that will
recursively kick off state recovery again, resulting in a deadlock.
In particular, when the TEST_STATEID operation returns NFS4_OK,
res.status can contain one of these errors. _nfs41_test_stateid()
replaces NFS4_OK with the value in res.status, which is then returned
to callers.
But res.status is not passed through nfs4_stat_to_errno(), and thus is
a positive NFS4ERR value. Currently callers are only interested in
!NFS4_OK, and nfs4_handle_exception() ignores positive values.
Thus the res.status values are currently ignored by
nfs4_handle_exception() and won't cause the deadlock above. Thanks to
this missing negative, it is only when these operations fail (which
is very rare) that a deadlock can occur.
Bryan agrees the original intent was to return res.status as a
negative NFS4ERR value to callers of nfs41_test_stateid().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
mark_matching_lsegs_invalid() resets the mds_threshold counters and can
dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
in the case of an initial empty list and in a non-emtpy list that was cleared
by calls to mark_lseg_invalid.
Don't send a LAYOUTRETURN if the list was initially empty.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the file layout driver is fencing a DS, _pnfs_return_layout can be
called mulitple times per inode due to in-flight i/o referencing lsegs on it's
plh_segs list.
Remember that LAYOUTRETURN has been called, and do not call it again.
Allow LAYOUTRETURNs after a subsequent LAYOUTGET.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
First mark the deviceid invalid to prevent any future use. Then fence all
files involved in I/O to a DS with a connection error by sending a
LAYOUTRETURN.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called. They could also be passed to the
compare function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just pass struct file *. Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int. Next: saner prototypes for parts in
namei.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change of calling conventions:
old new
NULL 1
file 0
ERR_PTR(-ve) -ve
Caller *knows* that struct file *; no need to return it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.
[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.
It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail. This makes little sense now that this
function isn't used for determining whether to actually open the file or not.
The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't pass nfs_open_context() to ->create(). Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents. That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.
Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain. So allocate a context here.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.
The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix 2 bugs in nfs_direct_write_reschedule:
- The request needs to be removed from the 'reqs' list before it can
be added to 'failed'.
- Fix an infinite loop if the 'failed' list is non-empty.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This gives pnfs a chance to do a layout commit inside the v4 code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
pNFS needs to select a write function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing writes.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
pNFS needs to select a read function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing reads.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This gives NFS v4 a way to set up callbacks and sessions without v2 or
v3 having to do them as well.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS v4 needs a way to shut down callbacks and sessions, but v2 and v3
don't.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Delegations are a v4 feature, so push return_delegation out of the
generic client by creating a new rpc_op and renaming the old function to
be in the nfs v4 "namespace"
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Delegations are a v4 feature, so push them out of the generic code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
v2 and v3 don't need to worry about doing a pnfs layoutcommit.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I can use this function to return delegations and unset the pnfs layout
driver rather than continuing to do these things in the generic client.
With this change, we no longer need an nfs4_kill_super().
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The generic client doesn't need to know about pnfs layout drivers, so
this should be done in the v4 code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert the pNFS file layout to use the same system as the
object and block layout.
Remove unnecessary dependencies on NFS_FS
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We prepare for the largest possible GETDEVICEINFO response, which
can not be greater than the negotiated session maximum response size.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The 'committed' field is not needed once we have put the struct nfs_page
on the right list.
Also correct the type of the verifier: it is not an array of __be32, but
simply an 8 byte long opaque array.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The verifier returned by the GETDEVICELIST operation is not a write
verifier, but a nfs4_verifier.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Handling a slot recall situation should always takes precedence over
state recovery to allow the server to manage its resources.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the xdr_stream position counter as the basis for the calculation
instead of assuming that we can calculate an offset to the start
of the iovec.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
xdr_read_pages will already do all of the buffer overflow checks that are
currently being open-coded in the various callers. This patch simplifies
the existing code by replacing the open coded checks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The actual size of the directory is unknown to the client, so it is
always requesting the maximum number it can handle. If the server
is replying with fewer entries than was requested, then that will
usually reflect the fact that we've hit the end of the directory.
Flagging it as an error is therefore incorrect.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It was initially coded under the assumption that there would only be one
request at a time, so use a lock to enforce this requirement..
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In nfs_direct_write_reschedule(), the requests from nfs_scan_commit_list
have a refcount of 2, whereas the operations in
nfs_direct_write_completion_ops expect them to have a refcount of 1.
This patch adds a call to release the extra references.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
The call to try_module_get() dereferences ld_type outside the
spin locks, which means that it may be pointing to garbage if
a module unload was in progress.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client. The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.
The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.
The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.
Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Highlights include:
- Fix a couple of mount regressions due to the recent cleanups.
- Fix an Oops in the open recovery code
- Fix an rpc_pipefs upcall hang that results from some of the
net namespace work from 3.4.x (stable kernel candidate).
- Fix a couple of write and o_direct regressions that were found
at last weeks Bakeathon testing event in Ann Arbor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
Ydste3pQJkS7PiZu1sWA
=unbu
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Fix a couple of mount regressions due to the recent cleanups.
- Fix an Oops in the open recovery code
- Fix an rpc_pipefs upcall hang that results from some of the net
namespace work from 3.4.x (stable kernel candidate).
- Fix a couple of write and o_direct regressions that were found at
last weeks Bakeathon testing event in Ann Arbor."
* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: add an endian notation for sparse
NFSv4.1: integer overflow in decode_cb_sequence_args()
rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
NFSv4 do not send an empty SETATTR compound
NFSv2: EOF incorrectly set on short read
NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
NFS: fix directio refcount bug on commit
NFSv4: Fix unnecessary delegation returns in nfs4_do_open
NFSv4.1: Convert another trivial printk into a dprintk
NFS4: Fix open bug when pnfs module blacklisted
NFS: Remove incorrect BUG_ON in nfs_found_client
NFS: Map minor mismatch error to protocol not support error.
NFS: Fix a commit bug
NFS4: Set parsed mount data version to 4
NFSv4.1: Ensure we clear session state flags after a session creation
NFSv4.1: Convert a trivial printk into a dprintk
NFSv4: Fix up decode_attr_mdsthreshold
NFSv4: Fix an Oops in the open recovery code
NFSv4.1: Fix a request leak on the back channel
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed
to the point already. So, dereferencing of it is invalid.
This patch hard-code "init_net" for all network namespace references for NFS
callback services. This will be fixed with proper NFS callback
containerization.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is supposed to be a __be32 value. Sparse complains a lot:
fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types)
fs/nfs/callback_xdr.c:699:30: expected unsigned int [unsigned] status
fs/nfs/callback_xdr.c:699:30: got restricted __be32 const [usertype] csr_status
fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32
fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types)
fs/nfs/callback_xdr.c:716:16: expected restricted __be32
fs/nfs/callback_xdr.c:716:16: got unsigned int [unsigned] status
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This seems like it could overflow on 32 bits. Use kmalloc_array() which
has overflow protection built in.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit 536e43d12b ATTR_OPEN check can result in
an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to
request from the server.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In cases where the server returns fewer bytes then those requested, we
can incorrectly set the eof flag for the file. Fixing this allows the
request to be retried with updated offset and count arguments.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Older versions of nfs utils don't always pass a "vers=" mount option for
NFS. This chould lead to attempts at using NFS v0 due to a zeroed out
nfs_parsed_mount_data struct. I solve this by setting the default NFS
version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been
taken care of by a similar patch).
Reported-by: Joerg Roedel <joro@&bytes.org>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This reverts a hunk from commit 0427708657
"NFS: Clean up - Simplify reference counting in fs/nfs/direct.c"
The cleanups in that patch affect the write path, but by the time
processing hits commit the removed reference has been added back by
nfs_scan_commit_list(). Without this reversion, any page that is
sent to commit holds on to an unbalanced reference that is never
freed. The immediate effect is an imbalance over the wire between
OPENs and CLOSEs.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While nfs4_do_open() expects the fmode argument to be restricted to
combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open()
and nfs4_proc_create will pass the nfs_open_context->mode,
which contains the full fmode_t.
This patch ensures that nfs4_do_open strips the other fmode_t bits,
fixing a problem in which the nfs4_do_open call would result in an
unnecessary delegation return.
Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
It is perfectly valid for nfs_get_client() to return a nfs_client that
is in the process of setting up the NFSv4.1 session.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sservers that only have NFSv4.1 support the
NFS4ERR_MINOR_VERS_MISMATCH error is return on
v4.0 mounts. Mapping that error to EPROTONOSUPPORT
will cause the mount to back off to v3 instead of
failing.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The new commit code fails to copy the verifier into the wb_verf field
of _all_ the nfs_page structures; it only copies it into the first entry.
The consequence is that most requests end up failing to match in
nfs_commit_release.
Fix is to copy the verifier into the req->wb_verf field in
nfs_write_completion.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
This patch only affects mounting through "-t nfs4" since it doesn't set
up an nfs version to use in the mount data. The nfs client was trying
to mount using NFS v0, causing either a BUG() or a protocol not
supported message.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both nfs4_reset_session and nfs41_init_clientid need to clear all the
session related state flags on success.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no need to bug the user about the server returning an error
on destroy_session. The error will be handled by the state manager,
without any need for further input from anyone else.
So convert that printk into a debugging dprintk.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix an incorrect use of 'likely()'. The FATTR4_WORD2_MDSTHRESHOLD
bit is only expected in NFSv4.1 OPEN calls, and so is actually
rather _unlikely_.
decode_attr_mdsthreshold needs to clear FATTR4_WORD2_MDSTHRESHOLD
from the attribute bitmap after it has decoded the data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
The open recovery code does not need to request a new value for the
mdsthreshold, and so does not allocate a struct nfs4_threshold.
The problem is that encode_getfattr_open() will still request an
mdsthreshold, and so we end up Oopsing in decode_attr_mdsthreshold.
This patch fixes encode_getfattr_open so that it doesn't request an
mdsthreshold when the caller isn't asking for one. It also fixes
decode_attr_mdsthreshold so that it errors if the server returns
an mdsthreshold that we didn't ask for (instead of Oopsing).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not. It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:
# mount --bind /mnt/nfs /mnt/nfs-clone
# echo something > /mnt/nfs/tmp/bar
# echo x > /tmp/file
# mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
# cat /mnt/nfs/tmp/bar
cat: /mnt/nfs/tmp/bar: Not a directory
Which should, by any sane filesystem, result in "something" being printed.
So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>