In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
div_u64 can only handle 32-bits divisor, so our divisor with u64 type
passed to div_u64 will overflow, result in the wrong calculation when
show debug info of f2fs as below:
BDF: 464, avg. vblocks: 23509
(BDF should never exceed 100)
So change to use div64_u64 to handle this case correctly.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a new helper __try_update_largest_extent for cleanup.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This fixes error handling for calls to various functions in the
function recover_inline_data to check if these particular functions
either return a error code or the boolean value false to signal their
caller they have failed internally and if this arises return false
to signal failure immediately to the caller of recover_inline_data
as we cannot continue after failures to calling either the function
truncate_inline_inode or truncate_blocks.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc88 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.
But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.
This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
__drop_largest_extent, which makes __is_extent_same after always
return false, and largest extent unchanged. Now we update it properly.
2. when extent is split and the latter part remains in tree, next_en
should be the latter part instead of next extent of original extent.
It will cause merge failure if there is in-place update, although
there is not, I think this fix will still makes codes less ambiguous.
This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.
kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
(__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
(blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
(crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
(crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
(async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
(f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
(f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
(f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
(vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
(SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:
/* check status of mapping */
if (unmapped) {
/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */
if (create) {
/* write path, handle dio write case here */
alloc_and_map;
} else {
/*
* handle read cases from all call paths:
* 1. generic read;
* 2. dio read;
* 3. fiemap;
* 4. bmap
*/
}
}
/* map buffer_header */
Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:
pgoff_t index = 0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000
So we should cast index with 64-bits type to avoid this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.
In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.
In this patch, we reset searching position to beginning of global extent
tree for fixing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This number is referenced by checkpoint under node_write lock.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Highlights include:
Bugfixes:
- Fix a use-after-free bug in the RPC/RDMA client
- Fix a write performance regression
- Fix up page writeback accounting
- Don't try to reclaim unused state owners
- Fix a NFSv4 nograce recovery hang
- reset states to use open_stateid when returning delegation voluntarily
- Fix a tracepoint NULL-pointer dereference
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWFIgBAAoJEGcL54qWCgDy3qwQAJrMvwiO0shZe9+PsUZcDIhw
1CnDmWYafJmpNGK+YEZatI+tdR9pwSYXdfiCGj/Ijfvl1PXUgyVAmNARAB9oFUza
DVvZjqJ6aiFzeawGC8f2IfwY4XcAy4+BIZOiwp2JafepRnoSgZl24olKbO4cQ7UD
i5IaDrYYvAxefsUoRogEF19H1y8zC1yUA2aDKrriV6A9rEZSbaZLRfS8BHppXBjY
w0OP74neD4rnn/rL0YDEdsjiI17W7QwoMk05yzOJH3wQt/Y4Ll/lwLO4y3URpIGF
wzHzMIeggGPPEM9e1JixPc3Y9F9kCHW8YjGJ3xxY2C6q8vt7dzpaVhh10AxycZtZ
gcbepjMhoL7gJqu5DQ/0S86Sb5jNaL0KlUDsEnqtOfe3/UiyTJ/f57TMfdscm+wI
pdyFFtxUHcFueO1a2XuEOuSIUFzFuwIQ2aiHlbu90ev04dd7dqzU0PffhRlzu3tJ
8+ZHQMbSmotUmhxlpI+VA4rG0JUsaLY09chH5r0NvsXm0LR+z3vX7Q6oONN7IBDv
5hULj4ecB69smBv+FjQyVUAu0LiahINAGu0p0wEjTdBwFMic5qpVVfhTs8qrkGRZ
M8RYrANtVhY17fJf5WF7Wyt58icAWRKDHslGdzUav+2VFBfNK1ZeG+QhYYqDNF5k
SkJsG4iCIN9JazwqfqJI
=aoNS
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Bugfixes:
- Fix a use-after-free bug in the RPC/RDMA client
- Fix a write performance regression
- Fix up page writeback accounting
- Don't try to reclaim unused state owners
- Fix a NFSv4 nograce recovery hang
- reset states to use open_stateid when returning delegation
voluntarily
- Fix a tracepoint NULL-pointer dereference"
* tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a tracepoint NULL-pointer dereference
nfs4: reset states to use open_stateid when returning delegation voluntarily
NFSv4: Fix a nograce recovery hang
NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
NFSv4: Don't try to reclaim unused state owners
NFS: Fix a write performance regression
NFS: Fix up page writeback accounting
xprtrdma: disconnect and flush cqs before freeing buffers
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file. Fix this by checking if state is NULL
before trying to use the inode pointer.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Pull CIFS fixes from Steve French:
"Two fixes for problems pointed out by automated tools.
Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Update cifs version number
[SMB3] Do not fall back to SMBWriteX in set_file_size error cases
[SMB3] Missing null tcon check
When the client goes to return a delegation, it should always update any
nfs4_state currently set up to use that delegation stateid to instead
use the open stateid. It already does do this in some cases,
particularly in the state recovery code, but not currently when the
delegation is voluntarily returned (e.g. in advance of a RENAME). This
causes the client to try to continue using the delegation stateid after
the DELEGRETURN, e.g. in LAYOUTGET.
Set the nfs4_state back to using the open stateid in
nfs4_open_delegation_recall, just before clearing the
NFS_DELEGATED_STATE bit.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Since commit 5cae02f427 an OPEN_CONFIRM should
have a privileged sequence in the recovery case to allow nograce recovery to
proceed for NFSv4.0.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We need to warn against broken NFSv4.1 servers that try to hand out
delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Currently, we don't test if the state owner is in use before we try to
recover it. The problem is that if the refcount is zero, then the
state owner will be waiting on the lru list for garbage collection.
The expectation in that case is that if you bump the refcount, then
you must also remove the state owner from the lru list. Otherwise
the call to nfs4_put_state_owner will corrupt that list by trying
to add our state owner a second time.
Avoid the whole problem by just skipping state owners that hold no
state.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If all other conditions in nfs_can_extend_write() are met, and there
are no locks, then we should be able to assume close-to-open semantics
and the ability to extend our write to cover the whole page.
With this patch, the xfstests generic/074 test completes in 242s instead
of >1400s on my test rig.
Fixes: bd61e0a9c8 ("locks: convert posix locks to file_lock_context")
Cc: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Currently, we are crediting all the calls to nfs_writepages_callback()
(i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from
being inconsistent with the behaviour of the equivalent readpage/readpages
accounting, this also means that we cannot distinguish between bulk writes
and single page writebacks (which confuses the 'nfsiostat -p' tool).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The error paths in set_file_size for cifs and smb3 are incorrect.
In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.
Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.
Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
Pointed out by PaX/grsecurity team
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: PaX Team <pageexec@freemail.hu>
CC: Emese Revfy <re.emese@gmail.com>
CC: Brad Spengler <spender@grsecurity.net>
CC: Stable <stable@vger.kernel.org>
Commit 46c043ede4 ("mm: take i_mmap_lock in unmap_mapping_range() for
DAX") moved some code in __dax_pmd_fault() that was responsible for
zeroing newly allocated PMD pages. The new location didn't properly set
up 'kaddr', so when run this code resulted in a NULL pointer BUG.
Fix this by getting the correct 'kaddr' via bdev_direct_access().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and UBIFS.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJWCmxzAAoJEEtJtSqsAOnWvfUP/R4NXpQmTJvmKfPaHJxuKMO3
uzEZET8qoc54OVN/GvvPFPRsZhZ5C6a1apWiCg77/WuDm9HHHEYrJVMYcOwqkPU1
5eqXSYdsvS7MjuSJS1fW4zIG+/HYaTXGJ/3bdP0vogtjzaKIBksKBmMTRNOAL8b8
2R6htwkVTMJdOUq6/xQuxG7FzT5m6wPEqUENfqGB3livbiqvU7OTud8I6yvcfD1M
tN02BuUduFgBR/4TwMQSbLzWH0T+XG74t79J5s7sBJwe5/dEeTUXV0HfcPEuG/9+
8TBDeoaxz+m9bvQYROPSRlkAIkh9TPsxTeKTdBDN67/CB2y5P06rz+Kta7ygNSTD
Dn/fZ0I2JhQOtz2EiXvK9N36aHbZAltUFpFp0KNf8GUUM9vNMDY3sjeGQidAwxMc
/qVtu+Syk5+HMz8hQCWpdIbqk3ahZsOvTADwedMn+vxxri6IaQqcnBWmIRy7rffq
prYxJx0VTVbLua5WXCOJILQCGEELqsnUKlnCm6LtznBUpff0Wmj6KsXmmXLs/X7X
NoztNx9FfhHQkWIIx92vu2cbC76LvsCXSuAfwC7k3KyW1hA9uWkc39Hs7yO5UcBp
lQZwsIZTe7qSuVt8lVC5omTeIgQiSc/Gte3WFEtNXNo2uq1VJa717NH6qwNOPayy
/L6on4YEUleHKrvJFjcd
=j/qn
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS fixes from Richard Weinberger:
"This contains three bug fixes for both UBI and UBIFS"
* tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifs:
UBI: return ENOSPC if no enough space available
UBI: Validate data_size
UBIFS: Kill unneeded locking in ubifs_init_security
Pull CIFS fixes from Steve French:
"Four fixes from testing at the recent SMB3 Plugfest including two
important authentication ones (one fixes authentication problems to
some popular servers when clock times differ more than two hours
between systems, the other fixes Kerberos authentication for SMB3)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
fix encryption error checks on mount
[SMB3] Fix sec=krb5 on smb3 mounts
cifs: use server timestamp for ntlmv2 authentication
disabling oplocks/leases via module parm enable_oplocks broken for SMB3
Pointed out by Dan Carpenter via smatch code analysis tool
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Pull btrfs fixes from Chris Mason:
"This is an assorted set I've been queuing up:
Jeff Mahoney tracked down a tricky one where we ended up starting IO
on the wrong mapping for special files in btrfs_evict_inode. A few
people reported this one on the list.
Filipe found (and provided a test for) a difficult bug in reading
compressed extents, and Josef fixed up some quota record keeping with
snapshot deletion. Chandan killed off an accounting bug during DIO
that lead to WARN_ONs as we freed inodes"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: keep dropped roots in cache until transaction commit
Btrfs: Direct I/O: Fix space accounting
btrfs: skip waiting on ordered range for special files
Btrfs: fix read corruption of compressed and shared extents
Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock
Btrfs: don't initialize a space info as full to prevent ENOSPC
Highlights include:
Stable patches:
- fix v4.2 SEEK on files over 2 gigs
- Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
- Fix recovery of recalled read delegations
Bugfixes:
- Fix a case where NFSv4 fails to send CLOSE after a server reboot
- Fix sunrpc to wait for connections to complete before retrying
- Fix sunrpc races between transport connect/disconnect and shutdown
- Fix an infinite loop when layoutget fail with BAD_STATEID
- nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
- Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
- Fix layoutreturn/close ordering issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWBWKdAAoJEGcL54qWCgDy2rUP/iIWUQSpUPfKKw7xquQUQe4j
ci4nFxpJ/zhKj1u7x3wrkxZAXcEooYo+ZJ7ayzROKcfQL/sUSWGbSLdr3mqrQynv
b0SDmnJK9V+CdBQrA+Jp5UGQxcumpMxsAfqVznT0qkf/wDp44DCVgDz5Aj8cRbWU
6xPfMgVLEnXiId9IgKqg3sJ2NmvMZXuI9sHM6hp6OzRmQDjTcx+LgRz7tnQHgaEk
zGz8R6eDm3OA0wfApqZwJ6JY793HsDdy30W9L0Yi2PVGXfzwoEB8AqgLVwSDIY1B
5hG5zn3tg9PSz9vhJ7M2h4AgFHdB3w3XGdJUafwqZEeqEIagw1iFCWlMyo/lE2dG
G7oob9Jiiwxjc3RDWn2wGaafymrrWZwl2nYzC4O3UvJ3hVJ0mEl1iJagK1m8LzfN
fmnP7tTyPuoOXkzDogZ0YI3FrngO6430PoR2hUPkS1yce/a+IV0HQEmXbSDSwN80
1d9zyC9TnPj6rFjZeaGxGK17BpkC0oIQCPq4OSJB4396wzAwMqoJjJVVWWeAK4UC
PxzoXqAAaBFguSsDbuBMcXgiuUw/7DIZ/pdzsWSiCFgocgF5ZdJdieCNtGk0nbLM
37R7HCauF93JDrkpUMKPnLXScb2IbEh31pFtKzptJYKwMxEiScXXiP3NE9hfX65i
2zLkl2aBvd154RvVKNbp
=GdeV
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable patches:
- fix v4.2 SEEK on files over 2 gigs
- Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
- Fix recovery of recalled read delegations
Bugfixes:
- Fix a case where NFSv4 fails to send CLOSE after a server reboot
- Fix sunrpc to wait for connections to complete before retrying
- Fix sunrpc races between transport connect/disconnect and shutdown
- Fix an infinite loop when layoutget fail with BAD_STATEID
- nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
- Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
- Fix layoutreturn/close ordering issues"
* tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS41: make close wait for layoutreturn
NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set
NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget
NFSv4: Recovery of recalled read delegations is broken
NFS: Fix an infinite loop when layoutget fail with BAD_STATEID
NFS: Do cleanup before resetting pageio read/write to mds
SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose
SUNRPC: Lock the transport layer on shutdown
nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
SUNRPC: Ensure that we wait for connections to complete before retrying
SUNRPC: drop null test before destroy functions
nfs: fix v4.2 SEEK on files over 2 gigs
SUNRPC: Fix races between socket connection and destroy code
nfs: fix pg_test page count calculation
Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
Kerberos, which is very important for security, was only enabled for
CIFS not SMB2/SMB3 mounts (e.g. vers=3.0)
Patch based on the information detailed in
http://thread.gmane.org/gmane.linux.kernel.cifs/10081/focus=10307
to enable Kerberized SMB2/SMB3
a) SMB2_negotiate: enable/use decode_negTokenInit in SMB2_negotiate
b) SMB2_sess_setup: handle Kerberos sectype and replicate Kerberos
SMB1 processing done in sess_auth_kerberos
Signed-off-by: Noel Power <noel.power@suse.com>
Signed-off-by: Jim McDonough <jmcd@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
If we send a layoutreturn asynchronously before close, the close
might reach server first and layoutreturn would fail with BADSTATEID
because there is nothing keeping the layout stateid alive.
Also do not pretend sending layoutreturn if we are not.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The order of the following three spinlocks should be:
dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
But dlm_dispatch_assert_master() is called while holding
dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
dlm_grab() which will take dlm_domain_lock.
Once another thread (for example, dlm_query_join_handler) has already
taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
happens.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 51360155ec and adapts
fs/userfaultfd.c to use the old version of that function.
It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what we
insert in the two waitqueue heads of the blocked userfaults. No
exclusive waitqueue risks to be inserted into those two waitqueue heads
so we can as well stick to "nr == 1" of the old code and we can rely
purely on the fact no waitqueue inserted in one of the two waitqueue
heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
10.10.5) share fails in case the clocks differ more than +/-2h:
digest-service: digest-request: od failed with 2 proto=ntlmv2
digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
Fix this by (re-)using the given server timestamp for the
ntlmv2 authentication (as Windows 7 does).
A related problem was also reported earlier by Namjae Jaen (see below):
Windows machine has extended security feature which refuse to allow
authentication when there is time difference between server time and
client time when ntlmv2 negotiation is used. This problem is prevalent
in embedded enviornment where system time is set to default 1970.
Modern servers send the server timestamp in the TargetInfo Av_Pair
structure in the challenge message [see MS-NLMP 2.2.2.1]
In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
use the server provided timestamp if present OR current time if it is
not
Reported-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
leases (oplocks) were always requested for SMB2/SMB3 even when oplocks
disabled in the cifs.ko module.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Chandrika Srinivasan <chandrika.srinivasan@citrix.com>
CC: Stable <stable@vger.kernel.org>
When dropping a snapshot we need to account for the qgroup changes. If we drop
the snapshot in all one go then the backref code will fail to find blocks from
the snapshot we dropped since it won't be able to find the root in the fs root
cache. This can lead to us failing to find refs from other roots that pointed
at blocks in the now deleted root. To handle this we need to not remove the fs
roots from the cache until after we process the qgroup operations. Do this by
adding dropped roots to a list on the transaction, and letting the transaction
remove the roots at the same time it drops the commit roots. This will keep all
of the backref searching code in sync properly, and fixes a problem Mark was
seeing with snapshot delete and qgroups. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
The following call trace is seen when generic/095 test is executed,
WARNING: CPU: 3 PID: 2769 at /home/chandan/code/repos/linux/fs/btrfs/inode.c:8967 btrfs_destroy_inode+0x284/0x2a0()
Modules linked in:
CPU: 3 PID: 2769 Comm: umount Not tainted 4.2.0-rc5+ #31
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20150306_163512-brownie 04/01/2014
ffffffff81c08150 ffff8802ec9cbce8 ffffffff81984058 ffff8802ffd8feb0
0000000000000000 ffff8802ec9cbd28 ffffffff81050385 ffff8802ec9cbd38
ffff8802d12f8588 ffff8802d12f8588 ffff8802f15ab000 ffff8800bb96c0b0
Call Trace:
[<ffffffff81984058>] dump_stack+0x45/0x57
[<ffffffff81050385>] warn_slowpath_common+0x85/0xc0
[<ffffffff81050465>] warn_slowpath_null+0x15/0x20
[<ffffffff81340294>] btrfs_destroy_inode+0x284/0x2a0
[<ffffffff8117ce07>] destroy_inode+0x37/0x60
[<ffffffff8117cf39>] evict+0x109/0x170
[<ffffffff8117cfd5>] dispose_list+0x35/0x50
[<ffffffff8117dd3a>] evict_inodes+0xaa/0x100
[<ffffffff81165667>] generic_shutdown_super+0x47/0xf0
[<ffffffff81165951>] kill_anon_super+0x11/0x20
[<ffffffff81302093>] btrfs_kill_super+0x13/0x110
[<ffffffff81165c99>] deactivate_locked_super+0x39/0x70
[<ffffffff811660cf>] deactivate_super+0x5f/0x70
[<ffffffff81180e1e>] cleanup_mnt+0x3e/0x90
[<ffffffff81180ebd>] __cleanup_mnt+0xd/0x10
[<ffffffff81069c06>] task_work_run+0x96/0xb0
[<ffffffff81003a3d>] do_notify_resume+0x3d/0x50
[<ffffffff8198cbc2>] int_signal+0x12/0x17
This means that the inode had non-zero "outstanding extents" during
eviction. This occurs because, during direct I/O a task which successfully
used up its reserved data space would set BTRFS_INODE_DIO_READY bit and does
not clear the bit after finishing the DIO write. A future DIO write could
actually fail and the unused reserve space won't be freed because of the
previously set BTRFS_INODE_DIO_READY bit.
Clearing the BTRFS_INODE_DIO_READY bit in btrfs_direct_IO() caused the
following issue,
|-----------------------------------+-------------------------------------|
| Task A | Task B |
|-----------------------------------+-------------------------------------|
| Start direct i/o write on inode X.| |
| reserve space | |
| Allocate ordered extent | |
| release reserved space | |
| Set BTRFS_INODE_DIO_READY bit. | |
| | splice() |
| | Transfer data from pipe buffer to |
| | destination file. |
| | - kmap(pipe buffer page) |
| | - Start direct i/o write on |
| | inode X. |
| | - reserve space |
| | - dio_refill_pages() |
| | - sdio->blocks_available == 0 |
| | - Since a kernel address is |
| | being passed instead of a |
| | user space address, |
| | iov_iter_get_pages() returns |
| | -EFAULT. |
| | - Since BTRFS_INODE_DIO_READY is |
| | set, we don't release reserved |
| | space. |
| | - Clear BTRFS_INODE_DIO_READY bit.|
| -EIOCBQUEUED is returned. | |
|-----------------------------------+-------------------------------------|
Hence this commit introduces "struct btrfs_dio_data" to track the usage of
reserved data space. The remaining unused "reserve space" can now be freed
reliably.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
If the current open or layout stateid doesn't match the stateid used
in the layoutget RPC call, then don't try to recover it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When a read delegation is being recalled, and we're reclaiming the
cached opens, we need to make sure that we only reclaim read-only
modes.
A previous attempt to do this, relied on retrieving the delegation
type from the nfs4_opendata structure. Unfortunately, as Kinglong
pointed out, this field can only be set when performing reboot recovery.
Furthermore, if we call nfs4_open_recover(), then we end up clobbering
the state->flags for all modes that we're not recovering...
The fix is to have the delegation recall code pass this information
to the recovery call, and then refactor the recovery code so that
nfs4_open_delegation_recall() does not need to call nfs4_open_recover().
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 39f897fdbd ("NFSv4: When returning a delegation, don't...")
Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>