Граф коммитов

57255 Коммитов

Автор SHA1 Сообщение Дата
Linus Torvalds 076a3f5537 fuse fixes for 5.0-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXFls+wAKCRDh3BK/laaZ
 PC+hAQDRkyJeAmMzpHwvv/IASqpJgc6HrSzH0p201lDyARcKIAD+MWxZHYP4ltAn
 WVTLIvYT1xsoqGG3plfZ/d1iNbAWcwU=
 =cL/O
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "A fix for a CUSE regression introduced in v4.20, as well as fixes for
  a couple of old bugs"

* tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: decrement NR_WRITEBACK_TEMP on the right page
  fuse: call pipe_buf_release() under pipe lock
  cuse: fix ioctl
  fuse: handle zero sized retrieve correctly
2019-02-07 07:52:08 +00:00
Trond Myklebust e3fdc89ca4 nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.

Fixes: 42ec3d4c02 ("vfs: make remap_file_range functions take and...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06 15:32:05 -05:00
Tetsuo Handa 43636c804d fs: ratelimit __find_get_block_slow() failure message.
When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.873324][T15342] b_state=0x00000029, b_size=512
  [  399.878403][T15342] device loop0 blocksize: 4096
  [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.890400][T15342] b_state=0x00000029, b_size=512
  [  399.895595][T15342] device loop0 blocksize: 4096
  [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.907471][T15342] b_state=0x00000029, b_size=512
  [  399.912506][T15342] device loop0 blocksize: 4096

This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-06 12:58:56 -07:00
Mike Marshall ec51f8ee1e aio: initialize kiocb private in case any filesystems expect it.
A recent optimization had left private uninitialized.

Fixes: 2bc4ca9bb6 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-06 08:04:22 -07:00
Darrick J. Wong add46b3b02 xfs: set buffer ops when repair probes for btree type
In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates.  However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.

Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-03 14:03:59 -08:00
Brian Foster 465fa17f4a xfs: end sync buffer I/O properly on shutdown error
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03 14:03:06 -08:00
Brian Foster aa6ee4ab69 xfs: eof trim writeback mapping as soon as it is cached
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d128e ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03 14:02:49 -08:00
Linus Torvalds 312b3a93dd for-5.0-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxWtJgACgkQxWXV+ddt
 WDsDow//ZpnyDwQWvSIfF2UUQOPlcBjbHKuBA7rU0wdybW635QYGR0mqnI+1VnMj
 7ssUkeN6N0a2gQzrUG4Y+zpdzOWv2xQ4jKZ9GMOp9gwyzEFyPkcFXOnmM8UfYtVu
 e3fK65e8BZHmTeu0kGKah4Dt1g0t4fUmhsKR4Pfp5YNJC+zuuGTwUW1K/ZQHXJ+3
 kTHc7WP1lsF7wgaZ+Gl+Kvp8fVrHVdygMVTdRBW8QaBgPLa/KExvK62jW+NmCYhj
 7OIkWdew7e8IXc3Ie5IbOomHAv7IgqqgiO9VO9+n0EpyV4UobUgxrgBKJ+0yc1Ya
 eidbKhMslwUE50y00JVm+vw0gwQHkR+hZDn/mRB6xiIeI8tu/yQIJZ6AhYJXoByR
 cu8+SNO5Z5dOZ1f146ZH8lnkr24tuSnkDUhbRDR5pdb4tAHHej2ALzhbfbwbPEpF
 IverYLw5fOMGeRU/mBsjkVadpSZ4S0HVNU85ERdhLtVLK1PSaY2UkUaA+Ii5y7au
 EYDjaGMflmJ8cAVqgtgedEff2n8OKDnzRZlz4IPLI73MVSITGZkM7PmYmYsLm2Bs
 mDPnmyqR8kzcd1RRtSeZTvqOpAIZG+QUOmD2jiKrchmp54Sz0V/HJWRs3aQybD6Q
 ph0yAcbkvgp/ewe5IFgaI0kcyH7zYdL6GtiI2WUE3/8DObgrgsA=
 =E2PP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - regression fix: transaction commit can run away due to delayed ref
   waiting heuristic, this is not necessary now because of the proper
   reservation mechanism introduced in 5.0

 - regression fix: potential crash due to use-before-check of an ERR_PTR
   return value

 - fix for transaction abort during transaction commit that needs to
   properly clean up pending block groups

 - fix deadlock during b-tree node/leaf splitting, when this happens on
   some of the fundamental trees, we must prevent new tree block
   allocation to re-enter indirectly via the block group flushing path

 - potential memory leak after errors during mount

* tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: On error always free subvol_name in btrfs_mount
  btrfs: clean up pending block groups when transaction commit aborts
  btrfs: fix potential oops in device_list_add
  btrfs: don't end the transaction for delayed refs in throttle
  Btrfs: fix deadlock when allocating tree block during leaf/node split
2019-02-03 08:48:33 -08:00
Linus Torvalds b9de6efed2 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "24 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  autofs: fix error return in autofs_fill_super()
  autofs: drop dentry reference only when it is never used
  fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
  mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  psi: clarify the Kconfig text for the default-disable option
  mm, memory_hotplug: __offline_pages fix wrong locking
  mm: hwpoison: use do_send_sig_info() instead of force_sig()
  kasan: mark file common so ftrace doesn't trace it
  init/Kconfig: fix grammar by moving a closing parenthesis
  lib/test_kmod.c: potential double free in error handling
  mm, oom: fix use-after-free in oom_kill_process
  mm/hotplug: invalid PFNs from pfn_to_online_page()
  mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
  psi: fix aggregation idle shut-off
  mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
  mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
  oom, oom_reaper: do not enqueue same task twice
  mm: migrate: make buffer_migrate_page_norefs() actually succeed
  kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
  x86_64: increase stack size for KASAN_EXTRA
  ...
2019-02-02 09:32:58 -08:00
Linus Torvalds 33640d718c SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a particularly important one for queryxattr (see xfstests 70 and 117)
-----BEGIN PGP SIGNATURE-----
 
 iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxUf8kACgkQiiy9cAdy
 T1Fy8Qv4rlgVRBPcSJpvZKhFjMd5KskVIDnP0tOc5od+Mg+547eAg7UKKnJVDgKS
 OUtiP8uC/UErWvQ214hoXF1sMwoG+rDdTRdIOVDLD2a2gIIJzaIPADhYt4w5kiAX
 nO9NwPwk03KTzNUQmpdFPofbLK4csmJUR4kBNE+XhZTulmw9BkplGXnNkQGjsEdf
 nY6YdfqofE//DmR/KB1BylD0lxDtnfuB5zoELPmM6X4iWtD9W6pKVg23huEv+Dmh
 80LdYt77vI4RKEzOAmsYEpROdAmlCksVQC1nlh5EHOuMN/9TCdpdWRahJWWhbbKr
 /b8TuD+0QUUKEQdGvwM8DM/OCjnwoJphzI8CK1VzRpM4XkZxpQTzng2zVFPNzs4Y
 wwKCsxxTO7ePGg6hE/DTnRn9XJtKbR1nQsIjHcw6rE0ydl+P6YRZDUEehqyRU+jd
 Z09XCPS1+zeUPE6PRE9wlrIt7QdxOynufhaWRKf18b2b4cfKbNGcd+rMMZPagUQe
 x2CMkEU=
 =qX/z
 -----END PGP SIGNATURE-----

Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
 "SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
  particularly important one for queryxattr (see xfstests 70 and 117)"

* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  CIFS: fix use-after-free of the lease keys
  CIFS: Do not consider -ENODATA as stat failure for reads
  CIFS: Do not count -ENODATA as failure for query directory
  CIFS: Fix trace command logging for SMB2 reads and writes
  CIFS: Fix possible oops and memory leaks in async IO
  cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
  cifs: fix computation for MAX_SMB2_HDR_SIZE
2019-02-01 16:53:01 -08:00
Ian Kent f585b283e3 autofs: fix error return in autofs_fill_super()
In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.

Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Pan Bian 63ce5f552b autofs: drop dentry reference only when it is never used
autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry.  However, dentry is read via autofs_dentry_ino(dentry) after
that.  This may result in a use-free-bug.  The patch drops the reference
count of dentry only when it is never used.

Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Jan Kara c27d82f52f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Alexey Dobriyan 1fde6f21d9 proc: fix /proc/net/* after setns(2)
/proc entries under /proc/net/* can't be cached into dcache because
setns(2) can change current net namespace.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: avoid vim miscolorization]
[adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time]
  Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2
Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2
Fixes: 1da4d377f9 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mateusz Stępień <mateusz.stepien@netrounds.com>
Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:22 -08:00
Linus Torvalds 9ace868a17 Changes since last update:
- fix page migration when using iomap for pagecache management
 - fix a use-after-free bug in the directio code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxPK1IACgkQ+H93GTRK
 tOvzeA/+I4bWVmovfV+EGFzHSV6zRsy17v6c4ncMrdia41rhmvxl+sAJgj2+uFCb
 J37cCMpPrmAOz+JGIW7PbCt8uzmwaXOfB9p9N58wM+hSSxtlN+wZFsIaoepOUTDK
 t3e2L7QQxQjN9HXZU0RNUi/zgS3poDfzap7cZ71spBxX5hVd1zVQa0q/o5OXr7OI
 sBlZLsIOhmS8WU2TmfkwzUVi+/FR4dCgyP8eDAGho/KbwvO9sfWzLNxf0U/ORWfA
 JG+2LX42eKKjX6wo39zW1mAXwOBhnLnCqOOgKrDy8XRXPARiNAzLNn0AwhxJSAqD
 z3qE298Oag6gZo0lNJjDXIko5D9y2koWjQe7z4fJ2JYpV5mSyq8/F4XDNO2FKIap
 07p0OBGa1yfwQYmS5TrhJvYwvsHqTNs122jpowqeD3o0Xh64y2TZLEMdhhIsRjga
 +S9OSVQ15JDf/QeI5LPGK6Oc6B3JnRYgrYf7g7DYu4eqEsJ3V3pqtbXzjxGkzUjx
 5xf85ujuRUQoKCPZQ00ewmsfZMfOcaYqfhosx6LvR6ZPvPH3Ex3nLHks5ZV/eXfR
 Auusq6XMiHb5ljukfVCa0WStntUl5gMaRha5QJy1Vg5Zd1ikvTkB5CkJFlODJ8hS
 GaEIa58Gf75dkHOvm4bFEOoy2EcE3I+qdwR+0xgg+6gDlNeZZNQ=
 =KYgc
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:
 "A couple of iomap fixes to eliminate some memory corruption and hang
  problems that were reported:

   - fix page migration when using iomap for pagecache management

   - fix a use-after-free bug in the directio code"

* tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: fix a use after free in iomap_dio_rw
  iomap: get/put the page in iomap_page_create/release()
2019-02-01 10:30:18 -08:00
Theodore Ts'o 8fdd60f2ae Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
This reverts commit ad211f3e94.

As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case.  The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().

The real fix to this problem was discussed here:

https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org

The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing.  Anyway, commit
ad211f3e94 is absolutely the wrong way to suppress the lockdep, so
revert it.

Fixes: ad211f3e94 ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
2019-01-31 23:41:11 -05:00
Andreas Gruenbacher e74c98ca2d gfs2: Revert "Fix loop in gfs2_rbm_find"
This reverts commit 2d29f6b96d.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-31 11:45:11 -08:00
Linus Torvalds 937108b093 NFS client fixes for Linux 5.0
Stable bugfix:
 - Fix up return value on fatal errors in nfs_page_async_flush()
 
 Other bugfix:
 - Fix NULL pointer dereference of dev_name
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxTOEsACgkQ18tUv7Cl
 QOuS2A//U2J1xz2N8R/k9I4puMXss+DpUAfryNRrDul0qL4tsr7UhHzHezJVl17X
 coPGA/YD+voybyT+eYeACCHUhDMNN8gj2KoCMlE1ueWAbiCOxrS4NgFM2djO3lka
 dlfqgSbVS1Z7+KtEEiFGq/HiF6y0WxanMBHnfhllNbXBDE6W0/+EPdgjX7fZF3FF
 AS6QQmruXL/b1/hJasfTsF3wcHs3y+Y23RP85j4F8aYrcWLOyPUhhuzv/o6Zoh37
 fqltMxueWy+2qpn8dBE+9ILuKnUxnIsIwpF4YFhI7XrQlqMIWYMrShiqSDqYeVUP
 3qdX8LtRR2VsNCTDR9HamVtCkbi9DkJRXQA/fChVPiLA+P0W2Q2uiKsNKEijuZdl
 9fvl9aIL/+glczHrZeJTKellFSEocaZ/L5gVmpM6Fk8zyFitP0+nkO40g/qou+A0
 O77A+EK9v4XPe8z87kwrZhphT12QZK2oIPMAZDnjitktbuObip0Wva4w92KnIqK0
 QPIN081oxNF7BnWEUESCTeqXl670lV83Xek1eVHSCTnFOI68riP1YoUQlIhujV/R
 82J+y6HJYtLDj87NuJrAAXtUrtzAPDr39TJr3V2aH0kdpPajUAhkC3gLix13ORyM
 cmP3K1M3U5f3HAElrywqQrGcxaYKN/Hpfb2427vEnbxieTKElVo=
 =ZOXa
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "This addresses two bugs, one in the error code handling of
  nfs_page_async_flush() and one to fix a potential NULL pointer
  dereference in nfs_parse_devname().

  Stable bugfix:
   - Fix up return value on fatal errors in nfs_page_async_flush()

  Other bugfix:
   - Fix NULL pointer dereference of dev_name"

* tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix up return value on fatal errors in nfs_page_async_flush()
  nfs: Fix NULL pointer dereference of dev_name
2019-01-31 10:13:05 -08:00
Steve French b9b9378b49 cifs: update internal module version number
To 2.17

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-31 07:05:06 -06:00
Aurelien Aptel d339adc12a CIFS: fix use-after-free of the lease keys
The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.

Simple reproducer (requires KASAN enabled) on a cifs mount:

echo foo > foo ; tail -f foo & rm foo

Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d49c ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2019-01-31 07:03:20 -06:00
Waiman Long af0c9af1b3 fs/dcache: Track & report number of negative dentries
The current dentry number tracking code doesn't distinguish between
positive & negative dentries.  It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file.  The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.

The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.

Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension.  They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero.  IOW, it will be safe to use one of the dummy array entry for
negative dentry count.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30 11:02:11 -08:00
Waiman Long 1dbd449c99 fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused.  This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().

To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.

Fixes: 4e717f5c10 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30 11:02:11 -08:00
Eric W. Biederman 532b618bdf btrfs: On error always free subvol_name in btrfs_mount
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol.  Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.

Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:47 +01:00
David Sterba c7cc64a985 btrfs: clean up pending block groups when transaction commit aborts
The fstests generic/475 stresses transaction aborts and can reveal
space accounting or use-after-free bugs regarding block goups.

In this case the pending block groups that remain linked to the
structures after transaction commit aborts in the middle.

The corrupted slabs lead to failures in following tests, eg. generic/476

  [ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  [ 8172.755799] #PF error: [normal kernel read fault]
  [ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0
  [ 8172.759000] Oops: 0000 [#1] PREEMPT SMP
  [ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G        W         5.0.0-rc2-default #408
  [ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90
  [ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287
  [ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000
  [ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0
  [ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000
  [ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78
  [ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200
  [ 8172.778886] FS:  0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000
  [ 8172.780545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0
  [ 8172.783547] Call Trace:
  [ 8172.784112]  shrink_inactive_list+0x194/0x410
  [ 8172.784747]  shrink_node_memcg.constprop.85+0x3a5/0x6a0
  [ 8172.785472]  shrink_node+0x62/0x1e0
  [ 8172.786011]  balance_pgdat+0x216/0x460
  [ 8172.786577]  kswapd+0xe3/0x4a0
  [ 8172.787085]  ? finish_wait+0x80/0x80
  [ 8172.787795]  ? balance_pgdat+0x460/0x460
  [ 8172.788799]  kthread+0x116/0x130
  [ 8172.789640]  ? kthread_create_on_node+0x60/0x60
  [ 8172.790323]  ret_from_fork+0x24/0x30
  [ 8172.794253] CR2: 0000000000000058

or accounting errors at umount time:

  [ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G        W         5.0.0-rc2-default #408
  [ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206
  [ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000
  [ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800
  [ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000
  [ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108
  [ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100
  [ 8159.562852] FS:  00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000
  [ 8159.564839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0
  [ 8159.567898] Call Trace:
  [ 8159.568597]  close_ctree+0x17f/0x350 [btrfs]
  [ 8159.569628]  generic_shutdown_super+0x64/0x100
  [ 8159.570808]  kill_anon_super+0x14/0x30
  [ 8159.571857]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8159.573063]  deactivate_locked_super+0x29/0x60
  [ 8159.574234]  cleanup_mnt+0x3b/0x70
  [ 8159.575176]  task_work_run+0x98/0xc0
  [ 8159.576177]  exit_to_usermode_loop+0x83/0x90
  [ 8159.577315]  do_syscall_64+0x15b/0x180
  [ 8159.578339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This fix is based on 2 Josef's patches that used sideefects of
btrfs_create_pending_block_groups, this fix introduces the helper that
does what we need.

CC: stable@vger.kernel.org # 4.4+
CC: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:47 +01:00
Al Viro 92900e5160 btrfs: fix potential oops in device_list_add
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.

Fixes: d1a6300282 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:40 +01:00
Greg Kroah-Hartman 37ea7b630a debugfs: debugfs_lookup() should return NULL if not found
Lots of callers of debugfs_lookup() were just checking NULL to see if
the file/directory was found or not.  By changing this in ff9fb72bc0
("debugfs: return error values, not NULL") we caused some subsystems to
easily crash.

Fixes: ff9fb72bc0 ("debugfs: return error values, not NULL")
Reported-by: syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30 12:39:49 +01:00
Pavel Shilovsky 082aaa8700 CIFS: Do not consider -ENODATA as stat failure for reads
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-29 17:27:16 -06:00
Pavel Shilovsky 8e6e72aece CIFS: Do not count -ENODATA as failure for query directory
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-29 17:24:53 -06:00
Pavel Shilovsky 7d42e72fe8 CIFS: Fix trace command logging for SMB2 reads and writes
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.

Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-29 17:19:56 -06:00
Pavel Shilovsky 9bda8723da CIFS: Fix possible oops and memory leaks in async IO
Allocation of a page array for non-cached IO was separated from
allocation of rdata and wdata structures and this introduced memory
leaks and a possible null pointer dereference. This patch fixes
these problems.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-29 17:19:47 -06:00
Ronnie Sahlberg c4627e66f7 cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
minus the various headers and blobs that will be part of the reply.

or else we might trigger a session reconnect.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-01-29 16:17:25 -06:00
Ronnie Sahlberg 58d15ed120 cifs: fix computation for MAX_SMB2_HDR_SIZE
The size of the fixed part of the create response is 88 bytes not 56.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-01-29 16:15:08 -06:00
Trond Myklebust 8fc75bed96 NFS: Fix up return value on fatal errors in nfs_page_async_flush()
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().

Fixes: c373fff7bd ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-29 16:33:24 -05:00
Greg Kroah-Hartman ff9fb72bc0 debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL.  This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).

So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.

Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-29 21:28:35 +01:00
Yao Liu 80ff001724 nfs: Fix NULL pointer dereference of dev_name
There is a NULL pointer dereference of dev_name in nfs_parse_devname()

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
  ...
  Call Trace:
   ? ida_alloc_range+0x34b/0x3d0
   ? nfs_clone_super+0x80/0x80 [nfs]
   ? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
   mount_fs+0x52/0x170
   ? __init_waitqueue_head+0x3b/0x50
   vfs_kern_mount+0x6b/0x170
   do_mount+0x216/0xdc0
   ksys_mount+0x83/0xd0
   __x64_sys_mount+0x25/0x30
   do_syscall_64+0x65/0x220
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix this by adding a NULL check on dev_name

Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-28 12:08:30 -05:00
Josef Bacik 302167c50b btrfs: don't end the transaction for delayed refs in throttle
Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space.  This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.

This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now.  Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure.  With
this patch we no longer take 5 minutes to balance a metadata only fs.

Qu has submitted a fstest to catch slow balance or excessive transaction
commits. Steps to reproduce:

* create subvolume
* create many (eg. 16000) inlined files, of size 2KiB
* iteratively snapshot and touch several files to trigger metadata
  updates
* start balance -m

Reported-by: Qu Wenruo <wqu@suse.com>
Fixes: 64403612b7 ("btrfs: rework btrfs_check_space_for_delayed_refs")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add tags and steps to reproduce ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-28 15:41:11 +01:00
Filipe Manana a627947076 Btrfs: fix deadlock when allocating tree block during leaf/node split
When splitting a leaf or node from one of the trees that are modified when
flushing pending block groups (extent, chunk, device and free space trees),
we need to allocate a new tree block, which in turn can result in the need
to allocate a new block group. After allocating the new block group we may
need to flush new block groups that were previously allocated during the
course of the current transaction, which is what may cause a deadlock due
to attempts to write lock twice the same leaf or node, as when splitting
a leaf or node we are holding a write lock on it and its parent node.

The same type of deadlock can also happen when increasing the tree's
height, since we are holding a lock on the existing root while allocating
the tree block to use as the new root node.

An example trace when the deadlock happens during the leaf split path is:

  [27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G        W         4.19.16 #1
  [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
  [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
  (...)
  [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
  [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
  [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
  [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
  [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
  [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
  [27175.303601] FS:  0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
  [27175.304468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
  [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [27175.307940] Call Trace:
  [27175.308802]  btrfs_search_slot+0x779/0x9a0 [btrfs]
  [27175.309669]  ? update_space_info+0xba/0xe0 [btrfs]
  [27175.310534]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
  [27175.311397]  btrfs_insert_item+0x60/0xd0 [btrfs]
  [27175.312253]  btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
  [27175.313116]  do_chunk_alloc+0x25f/0x300 [btrfs]
  [27175.313984]  find_free_extent+0x706/0x10d0 [btrfs]
  [27175.314855]  btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
  [27175.315707]  btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
  [27175.316548]  split_leaf+0x130/0x610 [btrfs]
  [27175.317390]  btrfs_search_slot+0x94d/0x9a0 [btrfs]
  [27175.318235]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
  [27175.319087]  alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
  [27175.319938]  __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
  [27175.320792]  btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
  [27175.321643]  delayed_ref_async_start+0x81/0x90 [btrfs]
  [27175.322491]  normal_work_helper+0xd0/0x320 [btrfs]
  [27175.323328]  ? move_linked_works+0x6e/0xa0
  [27175.324160]  process_one_work+0x191/0x370
  [27175.324976]  worker_thread+0x4f/0x3b0
  [27175.325763]  kthread+0xf8/0x130
  [27175.326531]  ? rescuer_thread+0x320/0x320
  [27175.327284]  ? kthread_create_worker_on_cpu+0x50/0x50
  [27175.328027]  ret_from_fork+0x35/0x40
  [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---

Fix this by preventing the flushing of new blocks groups when splitting a
leaf/node and when inserting a new root node for one of the trees modified
by the flushing operation, similar to what is done when COWing a node/leaf
from on of these trees.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383
Reported-by: Eli V <eliventer@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-28 15:04:58 +01:00
Christoph Hellwig 4ea899ead2 iomap: fix a use after free in iomap_dio_rw
Introduce a local wait_for_completion variable to avoid an access to the
potentially freed dio struture after dropping the last reference count.

Also use the chance to document the completion behavior to make the
refcounting clear to the reader of the code.

Fixes: ff6a9292e6 ("iomap: implement direct I/O")
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-01-27 08:47:42 -08:00
Piotr Jaroszynski 8e47a45732 iomap: get/put the page in iomap_page_create/release()
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1.  This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.

Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().

It causes the move_pages() syscall to misbehave on memory mapped files
from xfs.  It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall.  Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though.

Fixes: 82cb14175e ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
[hch: actually get/put the page iomap_migrate_page() to make it work
      properly]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-01-27 08:46:45 -08:00
Linus Torvalds 7c2614bf7a a set of small smb3 fixes, some fixing various crediting issues discovered during xfstest runs, five for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxLV8IACgkQiiy9cAdy
 T1HRIgv/fieDI6HBXAoK9mXzVLJXCZ1IxVZTygC3Xn/bCaxHerS+QSe88w0r1QTZ
 KLCoT7pBBy06uTowX4/cAF0IYTStsYWngBsH3+HL4EEchkZAVe02wAJ8QOCXs9YU
 UjoMvNLmcyrzwNt8EduNM2jUUpOL8EHiZOQKUvLt1Vdo3xBtyB3Nkttji92YLOQb
 Cxs0RJdKNaalMphOQwAtbdbZwlL79KGSA0gMm9W2VD744YKMjQ3uGXaTuoe6+w19
 voYKVTSelgyZg51yzF20x1Pl3OV/8L6tWgN1BIcKZbfGBshJhhcD93Pef7bmhxNS
 Y2yF3SPr/zcdgEkwigVlNs1uBCBBeU04/yQr+YvJwNcjsRTcTKOMuquxq4gXM1P8
 7f8gA1u0n1H70YKK29sd3sl7Iu8rHVacwlxjfoAcCjMe0VQvOJtR+V/zwP2uWr8T
 IjX05unBe/B1v+NXqNdkOpE6ZaiJtt2ny5NDZhfeubOouBLsey7/g/gSx48ICrq+
 U20kKsb6
 =CCH2
 -----END PGP SIGNATURE-----

Merge tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
 "A set of small smb3 fixes, some fixing various crediting issues
  discovered during xfstest runs, five for stable"

* tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData
  smb3: add credits we receive from oplock/break PDUs
  CIFS: Fix mounts if the client is low on credits
  CIFS: Do not assume one credit for async responses
  CIFS: Fix credit calculations in compound mid callback
  CIFS: Fix credit calculation for encrypted reads with errors
  CIFS: Fix credits calculations for reads with errors
  CIFS: Do not reconnect TCP session in add_credits()
  smb3: Cleanup license mess
  CIFS: Fix possible hang during async MTU reads and writes
  cifs: fix memory leak of an allocated cifs_ntsd structure
2019-01-26 15:38:22 -08:00
Linus Torvalds 6b8f915916 for-linus-20190125
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxLdgsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoVGD/4sGYQqfiXogQIJYbPH2RRPrJuLIIITjiAv
 lPXX1wx/tvz/ktwKJE2OiWTES0JjdH1HlC+7/0L/fLb8DXiKBmFuUHlwhureoL9Y
 o//BIQKuaje35kyHITTy2UAJOOqnNtJTaAP2AfkL+eOcj/V/G5rJIfLGs9QtuAR7
 sRJ+uhg1EbW/+uO0bULDmG6WUjxFu8mqcw3i6g0VVLVOnXB2EKcZTl3KPrdAXrUp
 XtmouERga6OfAUSJyZmTSV136mL+opRB2WFFVeIzjQfLmyItDGbSX/YPS8oJ2pow
 v7630F+CMrd4aKpqqtnAhfWpGqd0Xw7cYfZ9MKTJmZPmGzf9a1fQFpmgZosD4Dh3
 7MrhboU4TUt9PdXESA7CmE7LkTp99ghfj5/ysKrSV5h3HsH2RbLxJk91Rx3vmAWD
 u1xWRYL+GYLH6ZwOLvM1iqBrrLN3mUyrx98SaMgoXuqNzmQmgz9LPeA0Gt09FJbo
 uj+ebg4dRwuThjni4xQhl3zL2RQy7nlTDFDdKOz/XoiYk2NUVksss+sxGjNarHj0
 b5pCD4HOp57OreGExaOARpBRah5HSNdQtBRsIOnbyEq6f/e1LsIY23Z9nNF0deGO
 sZzgsbnsn+zg8bC6T/Gk4UY6XdCcgaS3SL04SVKAE3lO6A4C/Awo8DgD9Bl1zpC1
 HQlNkl5fBg==
 =iucY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190125' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A collection of fixes for this release. This contains:

   - Silence sparse rightfully complaining about non-static wbt
     functions (Bart)

   - Fixes for the zoned comments/ioctl documentation (Damien)

   - direct-io fix that's been lingering for a while (Ernesto)

   - cgroup writeback fix (Tejun)

   - Set of NVMe patches for nvme-rdma/tcp (Sagi, Hannes, Raju)

   - Block recursion tracking fix (Ming)

   - Fix debugfs command flag naming for a few flags (Jianchao)"

* tag 'for-linus-20190125' of git://git.kernel.dk/linux-block:
  block: Fix comment typo
  uapi: fix ioctl documentation
  blk-wbt: Declare local functions static
  blk-mq: fix the cmd_flag_name array
  nvme-multipath: drop optimization for static ANA group IDs
  nvmet-rdma: fix null dereference under heavy load
  nvme-rdma: rework queue maps handling
  nvme-tcp: fix timeout handler
  nvme-rdma: fix timeout handler
  writeback: synchronize sync(2) against cgroup writeback membership switches
  block: cover another queue enter recursion via BIO_QUEUE_ENTERED
  direct-io: allow direct writes to empty inodes
2019-01-26 12:42:41 -08:00
Greg Kroah-Hartman d88c93f090 debugfs: fix debugfs_rename parameter checking
debugfs_rename() needs to check that the dentries passed into it really
are valid, as sometimes they are not (i.e. if the return value of
another debugfs call is passed into this one.)  So fix this up by
properly checking if the two parent directories are errors (they are
allowed to be NULL), and if the dentry to rename is not NULL or an
error.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-25 12:56:32 +01:00
Ronnie Sahlberg a5f1a81f70 cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData
Was helpful in debug for some recent problems.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:06 -06:00
Ronnie Sahlberg 2e5700bdde smb3: add credits we receive from oplock/break PDUs
Otherwise we gradually leak credits leading to potential
hung session.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:06 -06:00
Pavel Shilovsky 6a9cbdd1ce CIFS: Fix mounts if the client is low on credits
If the server doesn't grant us at least 3 credits during the mount
we won't be able to complete it because query path info operation
requires 3 credits. Use the cached file handle if possible to allow
the mount to succeed.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:06 -06:00
Pavel Shilovsky 0fd1d37b05 CIFS: Do not assume one credit for async responses
If we don't receive a response we can't assume that the server
granted one credit. Assume zero credits in such cases.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:06 -06:00
Pavel Shilovsky 3d3003fce8 CIFS: Fix credit calculations in compound mid callback
The current code doesn't do proper accounting for credits
in SMB1 case: it adds one credit per response only if we get
a complete response while it needs to return it unconditionally.
Fix this and also include malformed responses for SMB2+ into
accounting for credits because such responses have Credit
Granted field, thus nothing prevents to get a proper credit
value from them.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:06 -06:00
Pavel Shilovsky ec678eae74 CIFS: Fix credit calculation for encrypted reads with errors
We do need to account for credits received in error responses
to read requests on encrypted sessions.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:05 -06:00
Pavel Shilovsky 8004c78c68 CIFS: Fix credits calculations for reads with errors
Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:52:05 -06:00
Pavel Shilovsky ef68e83184 CIFS: Do not reconnect TCP session in add_credits()
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:

mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().

Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 14:50:57 -06:00
Linus Torvalds c04e2a780c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlxJ6LUACgkQnJ2qBz9k
 QNn75Qf/U60RPju54Kdyepo4iMFoQXS/VJg87bpRetUdN9vKU3CbnyYAx43ftPOk
 GH+UU2u+24DX2AWbzy4yx2CJP1YUUF1oa2QkSW2P9WWn1NYU3goWha1Il8bc96Kq
 TQU/ypl83RoaFtw4JwpL27ot39UH5WWafYiackqzOYIuAdpgei2aaZFNvVeb9Ije
 9udt7ygcTn94DVdYAFI6NeH5KAA+kzpIyWWRCWlmESsNYnCUQntSLm3tQG8sTHZt
 sXk2Kp1r7XFl/zkg9iFSOhSIIyVizSC4cHrE0iJ/0kLgoJjfjbOa1jZ5Eeq8yTc3
 UsXI5ICNS3RuZWwSWkh759JnhQ1R/A==
 =cB63
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull inotify fix from Jan Kara:
 "Fix a file refcount leak in an inotify error path"

* tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify: Fix fd refcount leak in inotify_add_watch().
2019-01-25 06:03:24 +13:00
Thomas Gleixner b0b2cac7e2 smb3: Cleanup license mess
Precise and non-ambiguous license information is important. The recently
added aegis header file has a SPDX license identifier, which is nice, but
at the same time it has a contradictionary license boiler plate text.

  SPDX-License-Identifier: GPL-2.0

versus

  *   This program is free software;  you can redistribute it and/or modify
  *   it under the terms of the GNU General Public License as published by
  *   the Free Software Foundation; either version 2 of the License, or
  *   (at your option) any later version.

Oh well.

Assuming that the SPDX identifier is correct and according to x86/hyper-v
contributions from Microsoft GPL V2 only is the usual license.

Remove the boiler plate as it is wrong and even if correct it is redundant.

Fixes: eccb4422cf ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 09:37:33 -06:00
Pavel Shilovsky acc58d0bab CIFS: Fix possible hang during async MTU reads and writes
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.

Was able to reproduce this when server was configured
to give out fewer credits than usual.

The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-24 09:37:33 -06:00
Colin Ian King 73aaf920cc cifs: fix memory leak of an allocated cifs_ntsd structure
The call to SMB2_queary_acl can allocate memory to pntsd and also
return a failure via a call to SMB2_query_acl (and then query_info).
This occurs when query_info allocates the structure and then in
query_info the call to smb2_validate_and_copy_iov fails. Currently the
failure just returns without kfree'ing pntsd hence causing a memory
leak.

Currently, *data is allocated if it's not already pointing to a buffer,
so it needs to be kfree'd only if was allocated in query_info, so the
fix adds an allocated flag to track this.  Also set *dlen to zero on
an error just to be safe since *data is kfree'd.

Also set errno to -ENOMEM if the allocation of *data fails.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Dan Carpener <dan.carpenter@oracle.com>
2019-01-24 09:37:33 -06:00
Tejun Heo 7fc5854f8c writeback: synchronize sync(2) against cgroup writeback membership switches
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22 14:39:38 -07:00
Ernesto A. Fernández 8b9433eb4d direct-io: allow direct writes to empty inodes
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently
not allowed to create blocks for an empty inode.  This confusion comes
from trying to bit shift a negative number, so check the size of the
inode first.

The problem is most visible for hfsplus, because the fallback to
buffered I/O doesn't happen and the write fails with EIO.  This is in
part the fault of the module, because it gives a wrong return value on
->get_block(); that will be fixed in a separate patch.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22 08:26:44 -07:00
Thomas Gleixner 74827ee295 ceph: quota: cleanup license mess
Precise and non-ambiguous license information is important. The recently
added quota.c file has a SPDX license identifier, which is nice, but
at the same time it has a contradictionary license boiler plate text.

  SPDX-License-Identifier: GPL-2.0

versus

  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version 2
  * of the License, or (at your option) any later version.

Oh well.

As the other ceph related files are licensed under the GPL v2 only, it's
assumed that the SPDX id is correct and the boiler plate was randomly
copied into that patch.

Remove the boiler plate as it is wrong and even if correct it is redundant.

Fixes: fb18a57568 ("ceph: quota: add initial infrastructure to support cephfs quotas")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Henriques <lhenriques@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Acked-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-21 14:53:23 +01:00
Yan, Zheng d95e674c01 ceph: clear inode pointer when snap realm gets dropped by its inode
snap realm and corresponding inode have pointers to each other.
The two pointer should get clear at the same time. Otherwise,
snap realm's pointer may reference freed inode.

Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-21 14:52:41 +01:00
Linus Torvalds 1e556ba3b6 Fixes for pstore/ram
- Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan)
 - Avoid allocation and leak of platform data
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlxE/QYWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsVXD/94pJsNHogOle3XnZJst/1cV4+R
 KTyQ6QQ1184lI5FPGJliV3veCuKnTY4Q7e5vCl2mzEvds+VRxu2yUm+LFxKVj04W
 yiRNv8Yn3+3eKxNufqLa9ySbsSZyIpoCKQ/20nbC5cNM3r+Gq12GUnDBixtGdl9V
 fB316emIaKR70Qvi9KlfwDTcjh+SWBR2u8L0YsmTaoJkjtaCJhekN3hfya6Wnjf5
 1sj1fCDzteda/Ld0gXoImHPvt0UcHPJDp1+ZhY0ja64QIuoWwDxsisp4X4nS8iFD
 oak9BWRDhRwSzPnyCu8BI3Hc4ayxcBYR/vmBS12LTkIJVdR1CaqfeDYvZexyX04n
 TYZnmPeZ+2R3wzA9aCLjUDEAYNi403sLqIvR99jlrOoiMo+5Bf05TvBG8lND2dQg
 A13854vg9ssxfiuEmvxJTg8SLtcFiqEtrzZhqaLVfuf4cqaN3o0Ed+A0gjZadwwD
 TPxiMh4YEZuw+qszZwyzi7uECKOvJIJeLCL8CApPvihCNFy2vKF5wj58ZizXXmoT
 Orq6DJ1ITx4DTkYIgtiTC9HKDuvNDZ/ncdb5QoqlpoTe01r8GQOzffZ+tD3xRTNc
 8+yAdK0/bTjLVNbFX/Q4dwagh0I8EQSrdw8CmmQs0lsCvfyYTfHz8uPeBQ7rrzeN
 2r1H7a/OKpL/duTkEw==
 =m6sX
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fixes from Kees Cook:

 - Fix console ramoops to show the previous boot logs (Sai Prakash
   Ranjan)

 - Avoid allocation and leak of platform data

* tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Avoid allocation and leak of platform data
  pstore/ram: Fix console ramoops to show the previous boot logs
2019-01-21 13:12:03 +13:00
Kees Cook 5631e8576a pstore/ram: Avoid allocation and leak of platform data
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.

Reported-by: Yue Hu <huyue2@yulong.com>
Fixes: 35da60941e ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-20 14:44:52 -08:00
Linus Torvalds 1be969f468 for-5.0-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxElHsACgkQxWXV+ddt
 WDsF8Q/6A+l/Ku8D+xSmw+JGXGNroX1V62sxVYWqIgrkYNg6iSMjicqF3aN1bCbR
 LqvOQ5skerrKteNYIPbTTOD5Xp37ccinSeEWEF0ktFkeU1G6yJo7aRsnQlO1sduk
 2PKUOA1/PdeTBiOj1bej/1ybhtIW+d0MaoPtnUCMC8DD/ihfmU332+KC8VmRUYGZ
 4kT1DvKfBOSVz1UTl6OJdWo76crvjz0eGVnH1YG7DoFbpVzAbphHE7+aC/WkHQme
 X5Ux2NtYVMe/0IGAC7kJnj4jCZ1weAdlmvzzagjzGtWdhWgTxntiJ/FVJs9nO8Dm
 G/pVtD8RVjgMkPOWcfw5fdderrBJqjVGgl4VDrDLqjO9OTGNFJs+HgcPRi6Oli28
 sA+HG+U34YzSfKY0L9eAmpkNxMjWywBuXTQIAlMhHNZCL0vZ2K5EYa3dTp9OswAW
 IIcOh/LfZxiomvMvUqQWcRCy5y/b+cYjOjbHwkrw+ewd3IWXVLG8YLMyZI3vnHKu
 /f1xn6KCap9a1cS4LwyK6gzstEugn0MYmnmD/Jx8I1BJFBt55Q31ES6tPHgTmh/d
 QjveRjMkxNCql4h5Hq0+LiXoSoocBmsO0wrs2QrWSx4PBnJsvjySnWr/8GfAOj79
 BhnuQFxbr/BkNyBzvKrjoI+zZnrVm0cBU59lP6PzN75+kQTaIfs=
 =hGlM
 -----END PGP SIGNATURE-----

Merge tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A handful of fixes (some of them in testing for a long time):

   - fix some test failures regarding cleanup after transaction abort

   - revert of a patch that could cause a deadlock

   - delayed iput fixes, that can help in ENOSPC situation when there's
     low space and a lot data to write"

* tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: wakeup cleaner thread when adding delayed iput
  btrfs: run delayed iputs before committing
  btrfs: wait on ordered extents on abort cleanup
  btrfs: handle delayed ref head accounting cleanup in abort
  Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
2019-01-21 07:35:26 +13:00
Linus Torvalds b0efca46b5 NFS client fixes for Linux 5.0
Stable bugfixes:
 - Fix TCP receive code on archs with flush_dcache_page()
 
 Other bugfixes:
 - Fix error code in rpcrdma_buffer_create()
 - Fix a double free in rpcrdma_send_ctxs_create()
 - Fix kernel BUG at kernel/cred.c:825
 - Fix unnecessary retry in nfs42_proc_copy_file_range()
 - Ensure rq_bytes_sent is reset before request transmission
 - Ensure we respect the RPCSEC_GSS sequence number limit
 - Address Kerberos performance/behavior regression
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxCH1kACgkQ18tUv7Cl
 QOs4rBAAymqyhUzNgap1TX/KezFxqii7CVMVabrA5eGN+ZXbSVAZkwy7BMZWwVIp
 tEvD7lxWtF11x7bQDw7Xz+ruBCjLdD0RQIFnlBpVKqsRy9oSRA4PsgSbuIFaw+gX
 Bun4Z0xmOCPF7knRv6gQonArEZfHeokIIN8AtSBtWVByaOrnZwgDkNTIub8akpUl
 FQlzgq7lTydVzNcju2ImBeubU7KgFEu0F2Zub5z/iR+F2Mx/bAju8Q4YeVlPyD8U
 QJoIBlXAvgK8LK4bZCh40zPeEt0TMWXnW7o0JHgVQ0g6VbT+hp17I7fz91xEazye
 qbjpIJIjv5daEv0REM8t5ZCZB3tEatVjb4EQWXp0gJYb0l5E3I/O+7MO44n4uMYx
 s3UTxzM6NjwCtlgmn4tYUj+vEIExQHUUnwOl02e5iEa7bqNNY75ehAhj5Rh7iQBH
 H4b+OVuqc608q87rNePdK1LRyh0/u1cDI1kDAQoIP2omlb5hJQGk0Nuz9G2BodIj
 rP0x7nV+ykOXZtr6TR+RvaksL1W39PzVKYA0aL+e2gbcv4YO+Oq1phvNKwRWPM4a
 g08r/kvifS5h6/Jq8Wmn83f1vAOX7Sf23RtEoj+t9hc4S4JbsV2iYK3PY3eWbSYE
 Oz0Vt4gvBBJ+0rHJ10BsQ7686OQkyMKpIlvmx6O5mWVlthovbJM=
 =6Nzz
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "These are mostly fixes for SUNRPC bugs, with a single v4.2
  copy_file_range() fix mixed in.

  Stable bugfixes:
   - Fix TCP receive code on archs with flush_dcache_page()

  Other bugfixes:
   - Fix error code in rpcrdma_buffer_create()
   - Fix a double free in rpcrdma_send_ctxs_create()
   - Fix kernel BUG at kernel/cred.c:825
   - Fix unnecessary retry in nfs42_proc_copy_file_range()
   - Ensure rq_bytes_sent is reset before request transmission
   - Ensure we respect the RPCSEC_GSS sequence number limit
   - Address Kerberos performance/behavior regression"

* tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Address Kerberos performance/behavior regression
  SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit
  SUNRPC: Ensure rq_bytes_sent is reset before request transmission
  NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
  sunrpc: kernel BUG at kernel/cred.c:825!
  SUNRPC: Fix TCP receive code on archs with flush_dcache_page()
  xprtrdma: Double free in rpcrdma_sendctxs_create()
  xprtrdma: Fix error code in rpcrdma_buffer_create()
2019-01-20 09:27:38 +12:00
Linus Torvalds 0facb89245 for-linus-20190118
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxCLxcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmBID/0SpsLsnoeki37Wei0HzBZeRdFGIXFfg1XB
 +s66nJUL+igHIIwH+EO7Ct+vw8jfCWZfasI+ThGtkgRg88bqJO3HEbE1MQY8KlTh
 Dm9LRHH9Jp4OYbbag2RXammaD4XKi980fIyVTp9R0keszQroSUTXXS40fksszTtB
 K4LEBiAxGJgin60W9RgbbK93n/pw17U5Wu2WID6ZfVxZYVzpXl0iIQXUz1zmnpy5
 3aP7ORqJGnH7WQiV57LMpq5/QFBoKrfUsjtxpnJ+i2694sLYX8NCqnH82SUliSU+
 NN3Vxic1ozmZEADFKg4dfmgVqqyJaoVCRvsA5i5YjqG9m65G/S7/BmsaWbq5Ixf2
 N1KQr/36vkYGYWqtb2rbYRVf9XfbxWKp1wuSaX3N1Vh1/Ee180qD68+wpa/0mtkh
 2dXV6CQP9De4BXH6mxj3Yr7LI2a2r7KLYhULZCAvLMvDtsBrPyHAmRRgXxaLlKL8
 QMFYYO/1wjlzpGnn4IxP8sDMH6N2shtZHAmC2n9TM2gmk5tA0OmajJJ4uxNbtyhw
 +ZlDco3aw/x0v2onjlbTiPij2cReRgwN+tVaEvn0dj3uxaisCru60iYdnr9FX4nL
 g3NxNUJbUhl6YMwvNMI0GKfz1IILzjPgJIhDqSHvYhtCA1w0DMVXs3Qv4HXoVh3m
 ffdby6/Yqw==
 =5yJp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - block size setting fixes for loop/nbd (Jan Kara)

 - md bio_alloc_mddev() cleanup (Marcos)

 - Ensure we don't lose the REQ_INTEGRITY flag (Ming)

 - Two NVMe fixes by way of Christoph:
    - Fix NVMe IRQ calculation (Ming)
    - Uninitialized variable in nvmet-tcp (Sagi)

 - BFQ comment fix (Paolo)

 - License cleanup for recently added blk-mq-debugfs-zoned (Thomas)

* tag 'for-linus-20190118' of git://git.kernel.dk/linux-block:
  block: Cleanup license notice
  nvme-pci: fix nvme_setup_irqs()
  nvmet-tcp: fix uninitialized variable access
  block: don't lose track of REQ_INTEGRITY flag
  blockdev: Fix livelocks on loop device
  nbd: Use set_blocksize() to set device blocksize
  md: Make bio_alloc_mddev use bio_alloc_bioset
  block, bfq: fix comments on __bfq_deactivate_entity
2019-01-20 09:12:50 +12:00
Josef Bacik fd340d0f68 btrfs: wakeup cleaner thread when adding delayed iput
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path.  Delaying iputs
means we are potentially delaying the eviction of an inode and it's
respective space.  The cleaner thread only gets woken up every 30
seconds, or when we require space.  If there are a lot of inodes that
need to be deleted we could induce a serious amount of latency while we
wait for these inodes to be evicted.  So instead wakeup the cleaner if
it's not already awake to process any new delayed iputs we add to the
list.  If we suddenly need space we will less likely be backed up
behind a bunch of inodes that are waiting to be deleted, and we could
possibly free space before we need to get into the flushing logic which
will save us some latency.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:27:23 +01:00
Josef Bacik 3ec9a4c81c btrfs: run delayed iputs before committing
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:27:21 +01:00
Josef Bacik 74d5d229b1 btrfs: wait on ordered extents on abort cleanup
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening.  Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.

generic/475 can produce this warning:

 [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G        W 5.0.0-rc1-default+ #394
 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
 [ 8531.202707] FS:  00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
 [ 8531.204851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
 [ 8531.207516] Call Trace:
 [ 8531.208175]  btrfs_free_fs_roots+0xdb/0x170 [btrfs]
 [ 8531.210209]  ? wait_for_completion+0x5b/0x190
 [ 8531.211303]  close_ctree+0x157/0x350 [btrfs]
 [ 8531.212412]  generic_shutdown_super+0x64/0x100
 [ 8531.213485]  kill_anon_super+0x14/0x30
 [ 8531.214430]  btrfs_kill_super+0x12/0xa0 [btrfs]
 [ 8531.215539]  deactivate_locked_super+0x29/0x60
 [ 8531.216633]  cleanup_mnt+0x3b/0x70
 [ 8531.217497]  task_work_run+0x98/0xc0
 [ 8531.218397]  exit_to_usermode_loop+0x83/0x90
 [ 8531.219324]  do_syscall_64+0x15b/0x180
 [ 8531.220192]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 [ 8531.221286] RIP: 0033:0x7fc68e5e4d07
 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000

Leaving a tree in the rb-tree:

3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855         iput(root->ino_cache_inode);
3856         WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));

CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add stacktrace ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:24:19 +01:00
Josef Bacik 31890da0bf btrfs: handle delayed ref head accounting cleanup in abort
We weren't doing any of the accounting cleanup when we aborted
transactions.  Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.

The test generic/475 on a 2G VM can trigger the problems eg.:

  [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
  [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
  [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
  [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
  [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206
  [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000
  [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400
  [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000
  [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108
  [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100
  [ 8502.170296] FS:  00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000
  [ 8502.172439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0
  [ 8502.175094] Call Trace:
  [ 8502.175759]  close_ctree+0x17f/0x350 [btrfs]
  [ 8502.176721]  generic_shutdown_super+0x64/0x100
  [ 8502.177702]  kill_anon_super+0x14/0x30
  [ 8502.178607]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8502.179602]  deactivate_locked_super+0x29/0x60
  [ 8502.180595]  cleanup_mnt+0x3b/0x70
  [ 8502.181406]  task_work_run+0x98/0xc0
  [ 8502.182255]  exit_to_usermode_loop+0x83/0x90
  [ 8502.183113]  do_syscall_64+0x15b/0x180
  [ 8502.183919]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Corresponding to

  release_global_block_rsv() {
  ...
  WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);

CC: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add log dump ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:10:04 +01:00
David Sterba 77b7aad195 Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
This reverts commit e73e81b6d0.

This patch causes a few problems:

- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
  more work from btrfs_btree_balance_dirty_nodelay could end up in the
  same workque, effectively deadlocking

12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Transaction commit will wait on the freespace cache:

838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

And then writepages ends up waiting on transaction commit:

9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.

The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.

Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:09:55 +01:00
Linus Torvalds a3a80255d5 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXECcyPu3V2unywtrAQIpGw//ctcGHg2sfUEra17pvlKEbpZe9bMeJxp6
 UY2YR5gPpiYMmvNe8hLl3I68b43h03jGOx+KmqowqX7anNq3o2nMy0pDbuGmKtuS
 5NmIOECAml8k2uPSpacAF2s6TsxB2lTDYwZdyeuRmZ4scOTujNby33RlijGIxX4s
 87WJFRuCacm9I1KkiKKn4PWoYGjDdsZ7rsDyEeBmQ/MiKOSLG4QP5XuNr4X9zFMX
 r8uF3N8h/NzJWefEirc2DPFfiWLJqkyclq9tgsTB1Z3l+x5u/MHnIg3rZpZhH0uC
 GhGWjlGYqTxOwzYCaOOsNIDRF4rAGPi3lzuJXjONnhvbOO7DCGJ+Mo2obxkAqLL6
 PrtFuQvgXIOl/k8y6AdckuPPu/OMHT1hyY0PQXmTGhHAAfPP3RHoPl7owEjmAbRg
 hvRkFDSIKZ4Kr1nKP4vwaJYEKtxUQrkOwZKmN6ve31ZeJsyrH12MsCgWvp7oQwRJ
 fVbk8DWVRtYzy4RaO3Xr+0WfD+03dDi6KKCPPiC2gtNKOO+1Kco/EtIqPi6SXg0m
 ee/mFOkRsmEh++iNxS58qLxH37On6GSYOElSIMN0NJDNA2TzLrvidXlMSOTD1hj4
 n6gL38E3br/CIimKPYm87qi6yC59CAtrJCulYiPOoMc20eaEUP4DIXN8yjgjLNQp
 Hj5M9GRTwXk=
 =cUrU
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Here's a set of fixes for AFS:

   - Use struct_size() for kzalloc() size calculation.

   - When calling YFS.CreateFile rather than AFS.CreateFile, it is
     possible to create a file with a file lock already held. The
     default value indicating no lock required is actually -1, not 0.

   - Fix an oops in inode/vnode validation if the target inode doesn't
     have a server interest assigned (ie. a server that will notify us
     of changes by third parties).

   - Fix refcounting of keys in file locking.

   - Fix a race in refcounting asynchronous operations in the event of
     an error during request transmission. The provision of a dedicated
     function to get an extra ref on a call is split into a separate
     commit"

* tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix race in async call refcounting
  afs: Provide a function to get a ref on a call
  afs: Fix key refcounting in file locking code
  afs: Don't set vnode->cb_s_break in afs_validate()
  afs: Set correct lock type for the yfs CreateFile
  afs: Use struct_size() in kzalloc()
2019-01-18 06:27:24 +12:00
Sai Prakash Ranjan 6a4c9ab13f pstore/ram: Fix console ramoops to show the previous boot logs
commit b05c950698 ("pstore/ram: Simplify ramoops_get_next_prz()
arguments") changed update assignment in getting next persistent ram zone
by adding a check for record type. But the check always returns true since
the record type is assigned 0. And this breaks console ramoops by showing
current console log instead of previous log on warm reset and hard reset
(actually hard reset should not be showing any logs).

Fix this by having persistent ram zone type check instead of record type
check. Tested this on SDM845 MTP and dragonboard 410c.

Reproducing this issue is simple as below:

1. Trigger hard reset and mount pstore. Will see console-ramoops
   record in the mounted location which is the current log.

2. Trigger warm reset and mount pstore. Will see the current
   console-ramoops record instead of previous record.

Fixes: b05c950698 ("pstore/ram: Simplify ramoops_get_next_prz() arguments")
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[kees: dropped local variable usage]
Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-17 09:14:06 -08:00
David Howells 34fa47612b afs: Fix race in async call refcounting
There's a race between afs_make_call() and afs_wake_up_async_call() in the
case that an error is returned from rxrpc_kernel_send_data() after it has
queued the final packet.

afs_make_call() will try and clean up the mess, but the call state may have
been moved on thereby causing afs_process_async_call() to also try and to
delete the call.

Fix this by:

 (1) Getting an extra ref for an asynchronous call for the call itself to
     hold.  This makes sure the call doesn't evaporate on us accidentally
     and will allow the call to be retained by the caller in a future
     patch.  The ref is released on leaving afs_make_call() or
     afs_wait_for_call_to_complete().

 (2) In the event of an error from rxrpc_kernel_send_data():

     (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
     	 call has been aborted and ended.  This prevents
     	 afs_deliver_to_call() from doing anything with any notifications
     	 it gets.

     (b) Explicitly end the call immediately to prevent further callbacks.

     (c) Cancel any queued async_work and wait for the work if it's
     	 executing.  This allows us to be sure the race won't recur when we
     	 change the state.  We put the work queue's ref on the call if we
     	 managed to cancel it.

     (d) Put the call's ref that we got in (1).  This belongs to us as long
     	 as the call is in state AFS_CALL_CL_REQUESTING.

Fixes: 341f741f04 ("afs: Refcount the afs_call struct")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
David Howells 7a75b0079a afs: Provide a function to get a ref on a call
Provide a function to get a reference on an afs_call struct.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
David Howells 59d49076ae afs: Fix key refcounting in file locking code
Fix the refcounting of the authentication keys in the file locking code.
The vnode->lock_key member points to a key on which it expects to be
holding a ref, but it isn't always given an extra ref, however.

Fixes: 0fafdc9f88 ("afs: Fix file locking")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
Marc Dionne 4882a27cec afs: Don't set vnode->cb_s_break in afs_validate()
A cb_interest record is not necessarily attached to the vnode on entry to
afs_validate(), which can cause an oops when we try to bring the vnode's
cb_s_break up to date in the default case (ie. no current callback promise
and the vnode has not been deleted).

Fix this by simply removing the line, as vnode->cb_s_break will be set when
needed by afs_register_server_cb_interest() when we next get a callback
promise from RPC call.

The oops looks something like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    ...
    RIP: 0010:afs_validate+0x66/0x250 [kafs]
    ...
    Call Trace:
     afs_d_revalidate+0x8d/0x340 [kafs]
     ? __d_lookup+0x61/0x150
     lookup_dcache+0x44/0x70
     ? lookup_dcache+0x44/0x70
     __lookup_hash+0x24/0xa0
     do_unlinkat+0x11d/0x2c0
     __x64_sys_unlink+0x23/0x30
     do_syscall_64+0x4d/0xf0
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: ae3b7361dc ("afs: Fix validation/callback interaction")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:15:52 +00:00
Miklos Szeredi a2ebba8241 fuse: decrement NR_WRITEBACK_TEMP on the right page
NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
the page cache page.

Fixes: 8b284dc472 ("fuse: writepages: handle same page rewrites")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16 10:27:59 +01:00
Jann Horn 9509941e9c fuse: call pipe_buf_release() under pipe lock
Some of the pipe_buf_release() handlers seem to assume that the pipe is
locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
without taking any extra locks. From a glance through the callers of
pipe_buf_release(), it looks like FUSE is the only one that calls
pipe_buf_release() without having the pipe locked.

This bug should only lead to a memory leak, nothing terrible.

Fixes: dd3bb14f44 ("fuse: support splice() writing to fuse device")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16 10:27:59 +01:00
Miklos Szeredi 8a3177db59 cuse: fix ioctl
cuse_process_init_reply() doesn't initialize fc->max_pages and thus all
cuse bases ioctls fail with ENOMEM.

Reported-by: Andreas Steinmetz <ast@domdv.de>
Fixes: 5da784cce4 ("fuse: add max_pages to init_out")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16 10:27:59 +01:00
Miklos Szeredi 97e1532ef8 fuse: handle zero sized retrieve correctly
Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.

Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Fixes: b2430d7567 ("fuse: add per-page descriptor <offset, length> to fuse_req")
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16 10:27:59 +01:00
Olga Kornievskaia 45ac486ecf NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
Currently nfs42_proc_copy_file_range() can not return EAGAIN.

Fixes: e4648aa4f9 ("NFS recover from destination server reboot for copies")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-15 11:24:49 -05:00
Jan Kara 04906b2f54 blockdev: Fix livelocks on loop device
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
 init_page_buffers+0x3e2/0x530 fs/buffer.c:904
 grow_dev_page fs/buffer.c:947 [inline]
 grow_buffers fs/buffer.c:1009 [inline]
 __getblk_slow fs/buffer.c:1036 [inline]
 __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
 __bread_gfp+0x2d/0x310 fs/buffer.c:1347
 sb_bread include/linux/buffer_head.h:307 [inline]
 fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
 fat_ent_read_block fs/fat/fatent.c:441 [inline]
 fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
 __fat_get_block fs/fat/inode.c:148 [inline]
...

Trivial reproducer for the problem looks like:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt

Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.

Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.

Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-15 07:30:56 -07:00
Linus Torvalds 6b529fb0a3 for-5.0-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlw7Y68ACgkQxWXV+ddt
 WDs+Jw/9Hn71GLb1YNpNUlzJSQHUzbvFtXbtvEVyYeuuLkmjTG4FT2bkGqYhAeC9
 2jvaqPaxI0VJ7vNulAaMVZayFwNEQrF9p+z+vzV/Ty2lu0Ep/7Uuwp9KL8X49req
 5hEtb5p9Dbj9hXqa8XNOfF2GPDRTQ6D4eknYiNM7MY+aTkvMEtpuZg9kfwXrZ2au
 JeFBzZo9SA5nxqrXZg1XlRYttDnOf44h6YJmEFOZOJuAouKcd5I7C8BshCRKDuWo
 iJBjYTBTFqjgUgCrn10UM92T19hKufHiDE3WWQ/7zykqtThpKgBFR1opAUcNBJBf
 HvOOsYnZcGbxIKOjFpucSpButTmjFcnI83f/7dmZYXUsyIzP/xH51uLiz/CLJqWj
 JsRgtgtJ7l5s1M8GkFG6B9Tp89KxHnVKqNC5HyX+4AuFWiIJ+CWWSddGqNSfAIHe
 o2ceQRMumvwbVKgfd0AfPcZ9v4sRM/DfwxWXgEQmCSNJikSuUjP9b3D51ttnXQ8c
 q6lz7L6nRKK4mgBnfJCmpus3IArdvNwJ7CF8C1RejfRwL824RzH+yHjWdg36nVXB
 oaBYKJf8GRRn+3In1W6npr65NxCQERIZV4M89EgST/RK7tW5u0Fotd1KJCmo+ayO
 cSRbp16On9G6gBV+qrBs8X65KIWALZpdTycwyFplCU581DXdtnU=
 =aCB2
 -----END PGP SIGNATURE-----

Merge tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - two regression fixes in clone/dedupe ioctls, the generic check
   callback needs to lock extents properly and wait for io to avoid
   problems with writeback and relocation

 - fix deadlock when using free space tree due to block group creation

 - a recently added check refuses a valid fileystem with seeding device,
   make that work again with a quickfix, proper solution needs more
   intrusive changes

* tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Use real device structure to verify dev extent
  Btrfs: fix deadlock when using free space tree due to block group creation
  Btrfs: fix race between reflink/dedupe and relocation
  Btrfs: fix race between cloning range ending at eof and writeback
2019-01-14 05:55:51 +12:00
Linus Torvalds 72d657dd21 Driver core fixes for 5.0-rc2
Here is one small sysfs change, and a documentation update for 5.0-rc2
 
 The sysfs change moves from using BUG_ON to WARN_ON, as discussed in an
 email thread on lkml while trying to track down another driver bug.
 sysfs should not be crashing and preventing people from seeing where
 they went wrong.  Now it properly recovers and warns the developer.
 
 The documentation update removes the use of BUS_ATTR() as the kernel is
 moving away from this to use the specific BUS_ATTR_RW() and friends
 instead.  There are pending patches in all of the different subsystems
 to remove the last users of this macro, but for now, don't advertise it
 should be used anymore to keep new ones from being introduced.
 
 Both have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXDsRFA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynbgACfTY/rAZpWTgMdPDfoOmF+s/XHQXsAoJKZ+v+f
 Tpkcw76Wo1ESpPLuT1u1
 =W0D+
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here is one small sysfs change, and a documentation update for 5.0-rc2

  The sysfs change moves from using BUG_ON to WARN_ON, as discussed in
  an email thread on lkml while trying to track down another driver bug.
  sysfs should not be crashing and preventing people from seeing where
  they went wrong. Now it properly recovers and warns the developer.

  The documentation update removes the use of BUS_ATTR() as the kernel
  is moving away from this to use the specific BUS_ATTR_RW() and friends
  instead. There are pending patches in all of the different subsystems
  to remove the last users of this macro, but for now, don't advertise
  it should be used anymore to keep new ones from being introduced.

  Both have been in linux-next with no reported issues"

* tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: driver core: remove use of BUS_ATTR
  sysfs: convert BUG_ON to WARN_ON
2019-01-14 05:51:08 +12:00
Linus Torvalds 0f9d140a56 a set of cifs/smb3 fixes, 4 for stable, most from Pavel. His patches fix an important set of crediting (flow control) problems, and also two problems in cifs_writepages, ddressing some large i/o and also compounding issues
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlw6VMAACgkQiiy9cAdy
 T1GA4AwAiLxAK9gYkv7oCi4xkr+GeDr9/J0R4JZOglXrJQfbEDAgejpTqIVOeiTW
 n2fDEe4JCj3Dk9VnhGIAcV3Lyp5pcXuiqKwBJsxlJIIM81rDa98K8NgpJqjFDt1P
 QEsGHijO7/5rlefnMPUy0YrFdoGr9ZjDEYDx2RdTkuRAX2rqLI02ytqyB6AHMjSM
 zx5Ck1JdltrZa9vN1k2QoUW92FLrHWHDQM6ooRTiTv1n5v3CtVCFNFEUD9AWCdKy
 APFXva3cWPAh6Dvsh6Qtryn2WdSfDdy69iiqfkBrDaaLsJC0oCogK9TsQOL66Szu
 onpXQROqxvOA0UoyJJgRaDhRjGzRLglChpitS9ps4ZqziXRMpKBsn9SzaZiJ3Jna
 /u5Pv5CFr4JkXRnvLAx2h6coZ+FyfVmSgxtoCU+tdn8G0t2AXl7swUw2e2UX7k4y
 FR20/w0hA9pDVSlrY1Z+Yg7TCZLS5677+6xFo+b+rnF6VW8XZQc/2F9MYlxyGAj5
 30R/Q0ck
 =410H
 -----END PGP SIGNATURE-----

Merge tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "A set of cifs/smb3 fixes, 4 for stable, most from Pavel. His patches
  fix an important set of crediting (flow control) problems, and also
  two problems in cifs_writepages, ddressing some large i/o and also
  compounding issues"

* tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  CIFS: Fix error paths in writeback code
  CIFS: Move credit processing to mid callbacks for SMB3
  CIFS: Fix credits calculation for cancelled requests
  cifs: Fix potential OOB access of lock element array
  cifs: Limit memory used by lock request calls to a page
  cifs: move large array from stack to heap
  CIFS: Do not hide EINTR after sending network packets
  CIFS: Fix credit computation for compounded requests
  CIFS: Do not set credits to 1 if the server didn't grant anything
  CIFS: Fix adjustment of credits for MTU requests
  cifs: Fix a tiny potential memory leak
  cifs: Fix a debug message
2019-01-14 05:43:40 +12:00
Linus Torvalds f87092c433 A patch to allow setting abort_on_full and a fix for an old "rbd unmap"
edge case, marked for stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAlw4yfETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi9loB/4iTBs/c5vI8oXSi+n4IAAG/LunP8Ff
 HfxMr7Mpbpp0cLdzmJWO63Wejii95w7IkdepJTF5VlpH6dwJDUL6jSAyAXPO1a2G
 phZHmnGYjFBKLGPftCZmGQCJeYeS9JJyRgTC4IZECVkQRiPRGXn8cAbEqMYOy6Am
 mXmy0u4Me3RJTpcOCqaPPugontvtsZdfqvwVE2dGqXTj9Kh1hfum5ddrvf1CXugQ
 MpXzW3mJCMMfcXlZOCVuQInKqPT4HRhTpzJxdECiJTxBXDQUd6e0ekOXmKW6jOYk
 ZpNVnXHGee0uvzwQC0/VMWkRF6d+pOsYFQkDlX79h04rqnoBkUNc2Doe
 =dJev
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A patch to allow setting abort_on_full and a fix for an old "rbd
  unmap" edge case, marked for stable"

* tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client:
  rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
  ceph: use vmf_error() in ceph_filemap_fault()
  libceph: allow setting abort_on_full for rbd
2019-01-11 12:17:30 -08:00
Steve French 48d2ba6257 cifs: update internal module version number
To 2.16

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:40 -06:00
Pavel Shilovsky 9a66396f18 CIFS: Fix error paths in writeback code
This patch aims to address writeback code problems related to error
paths. In particular it respects EINTR and related error codes and
stores and returns the first error occurred during writeback.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:40 -06:00
Pavel Shilovsky ee258d7915 CIFS: Move credit processing to mid callbacks for SMB3
Currently we account for credits in the thread initiating a request
and waiting for a response. The demultiplex thread receives the response,
wakes up the thread and the latter collects credits from the response
buffer and add them to the server structure on the client. This approach
is not accurate, because it may race with reconnect events in the
demultiplex thread which resets the number of credits.

Fix this by moving credit processing to new mid callbacks that collect
credits granted by the server from the response in the demultiplex thread.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:40 -06:00
Pavel Shilovsky 8a26f0f781 CIFS: Fix credits calculation for cancelled requests
If a request is cancelled, we can't assume that the server returns
1 credit back. Instead we need to wait for a response and process
the number of credits granted by the server.

Create a separate mid callback for cancelled request, parse the number
of credits in a response buffer and add them to the client's credits.
If the didn't get a response (no response buffer available) assume
0 credits granted. The latter most probably happens together with
session reconnect, so the client's credits are adjusted anyway.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:40 -06:00
Ross Lagerwall b9a74cde94 cifs: Fix potential OOB access of lock element array
If maxBuf is small but non-zero, it could result in a zero sized lock
element array which we would then try and access OOB.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-11 07:14:40 -06:00
Ross Lagerwall 92a8109e4d cifs: Limit memory used by lock request calls to a page
The code tries to allocate a contiguous buffer with a size supplied by
the server (maxBuf). This could fail if memory is fragmented since it
results in high order allocations for commonly used server
implementations. It is also wasteful since there are probably
few locks in the usual case. Limit the buffer to be no larger than a
page to avoid memory allocation failures due to fragmentation.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:40 -06:00
Aurelien Aptel 15bc77f94e cifs: move large array from stack to heap
This addresses some compile warnings that you can
see depending on configuration settings.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:14:39 -06:00
Pavel Shilovsky ee13919c2e CIFS: Do not hide EINTR after sending network packets
Currently we hide EINTR code returned from sock_sendmsg()
and return 0 instead. This makes a caller think that we
successfully completed the network operation which is not
true. Fix this by properly returning EINTR to callers.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-11 07:13:05 -06:00
Pavel Shilovsky 8544f4aa9d CIFS: Fix credit computation for compounded requests
In SMB3 protocol every part of the compound chain consumes credits
individually, so we need to call wait_for_free_credits() for each
of the PDUs in the chain. If an operation is interrupted, we must
ensure we return all credits taken from the server structure back.

Without this patch server can sometimes disconnect the session
due to credit mismatches, especially when first operation(s)
are large writes.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-10 14:32:38 -06:00
Pavel Shilovsky 33fa5c8b8a CIFS: Do not set credits to 1 if the server didn't grant anything
Currently we reset the number of total credits granted by the server
to 1 if the server didn't grant us anything int the response. This
violates the SMB3 protocol - we need to trust the server and use
the credit values from the response. Fix this by removing the
corresponding code.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-10 14:32:36 -06:00
Pavel Shilovsky b983f7e923 CIFS: Fix adjustment of credits for MTU requests
Currently for MTU requests we allocate maximum possible credits
in advance and then adjust them according to the request size.
While we were adjusting the number of credits belonging to the
server, we were skipping adjustment of credits belonging to the
request. This patch fixes it by setting request credits to
CreditCharge field value of SMB2 packet header.

Also ask 1 credit more for async read and write operations to
increase parallelism and match the behavior of other operations.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-10 14:32:32 -06:00
Dan Carpenter c715f89c4d cifs: Fix a tiny potential memory leak
The most recent "it" allocation is leaked on this error path.  I
believe that small allocations always succeed in current kernels so
this doesn't really affect run time.

Fixes: 54be1f6c1c ("cifs: Add DFS cache routines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-10 14:32:30 -06:00
Dan Carpenter 8428817dc4 cifs: Fix a debug message
This debug message was never shown because it was checking for NULL
returns but extract_hostname() returns error pointers.

Fixes: 93d5cb517d ("cifs: Add support for failover in cifs_reconnect()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2019-01-10 14:32:27 -06:00
Marc Dionne 5edc22cc1d afs: Set correct lock type for the yfs CreateFile
A lock type of 0 is "LockRead", which makes the fileserver record an
unintentional read lock on the new file.  This will cause problems
later on if the file is the subject of locking operations.

The correct default value should be -1 ("LockNone").

Fix the operation marshalling code to set the value and provide an enum to
symbolise the values whilst we're at it.

Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10 17:12:05 +00:00
Gustavo A. R. Silva c2b8bd49d3 afs: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10 17:12:05 +00:00
Qu Wenruo 1b3922a8bc btrfs: Use real device structure to verify dev extent
[BUG]
Linux v5.0-rc1 will fail fstests/btrfs/163 with the following kernel
message:

  BTRFS error (device dm-6): dev extent devid 1 physical offset 13631488 len 8388608 is beyond device boundary 0
  BTRFS error (device dm-6): failed to verify dev extents against chunks: -117
  BTRFS error (device dm-6): open_ctree failed

[CAUSE]
Commit cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent
mapping check") introduced strict check on dev extents.

We use btrfs_find_device() with dev uuid and fs uuid set to NULL, and
only dependent on @devid to find the real device.

For seed devices, we call clone_fs_devices() in open_seed_devices() to
allow us search seed devices directly.

However clone_fs_devices() just populates devices with devid and dev
uuid, without populating other essential members, like disk_total_bytes.

This makes any device returned by btrfs_find_device(fs_info, devid,
NULL, NULL) is just a dummy, with 0 disk_total_bytes, and any dev
extents on the seed device will not pass the device boundary check.

[FIX]
This patch will try to verify the device returned by btrfs_find_device()
and if it's a dummy then re-search in seed devices.

Fixes: cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent mapping check")
CC: stable@vger.kernel.org # 4.19+
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-10 17:13:00 +01:00
Filipe Manana a6d8654d88 Btrfs: fix deadlock when using free space tree due to block group creation
When modifying the free space tree we can end up COWing one of its extent
buffers which in turn might result in allocating a new chunk, which in
turn can result in flushing (finish creation) of pending block groups. If
that happens we can deadlock because creating a pending block group needs
to update the free space tree, and if any of the updates tries to modify
the same extent buffer that we are COWing, we end up in a deadlock since
we try to write lock twice the same extent buffer.

So fix this by skipping pending block group creation if we are COWing an
extent buffer from the free space tree. This is a case missed by commit
5ce555578e ("Btrfs: fix deadlock when writing out free space caches").

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202173
Fixes: 5ce555578e ("Btrfs: fix deadlock when writing out free space caches")
CC: stable@vger.kernel.org # 4.18+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-09 14:52:36 +01:00
Filipe Manana d8b5524242 Btrfs: fix race between reflink/dedupe and relocation
The recent rework that makes btrfs' remap_file_range operation use the
generic helper generic_remap_file_range_prep() introduced a race between
relocation and reflinking (for both cloning and deduplication) the file
extents between the source and destination inodes.

This happens because we no longer lock the source range anymore, and we do
not lock it anymore because we wait for direct IO writes and writeback to
complete early on the code path right after locking the inodes, which
guarantees no other file operations interfere with the reflinking. However
there is one exception which is relocation, since it replaces the byte
number of file extents items in the fs tree after locking the range the
file extent items represent. This is a problem because after finding each
file extent to clone in the fs tree, the reflink process copies the file
extent item into a local buffer, releases the search path, inserts new
file extent items in the destination range and then increments the
reference count for the extent mentioned in the file extent item that it
previously copied to the buffer. If right after copying the file extent
item into the buffer and releasing the path the relocation process
updates the file extent item to point to the new extent, the reflink
process ends up creating a delayed reference to increment the reference
count of the old extent, for which the relocation process already created
a delayed reference to drop it. This results in failure to run delayed
references because we will attempt to increment the count of a reference
that was already dropped. This is illustrated by the following diagram:

        CPU 1                                       CPU 2

                                        relocation is running

  btrfs_clone_files()

    btrfs_clone()
      --> finds extent item
          in source range
          point to extent
          at bytenr X
      --> copies it into a
          local buffer
      --> releases path

                                        replace_file_extents()
                                          --> successfully locks the
                                              range represented by
                                              the file extent item
                                          --> replaces disk_bytenr
                                              field in the file
                                              extent item with some
                                              other value Y
                                          --> creates delayed reference
                                              to increment reference
                                              count for extent at
                                              bytenr Y
                                          --> creates delayed reference
                                              to drop the extent at
                                              bytenr X

      --> starts transaction
      --> creates delayed
          reference to
          increment extent
          at bytenr X

                    <delayed references are run, due to a transaction
                     commit for example, and the transaction is aborted
                     with -EIO because we attempt to increment reference
                     count for the extent at bytenr X after we freed it>

When this race is hit the running transaction ends up getting aborted with
an -EIO error and a trace like the following is produced:

[ 4382.553858] WARNING: CPU: 2 PID: 3648 at fs/btrfs/extent-tree.c:1552 lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556293] CPU: 2 PID: 3648 Comm: btrfs Tainted: G        W         4.20.0-rc6-btrfs-next-41 #1
[ 4382.556294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[ 4382.556308] RIP: 0010:lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556310] RSP: 0018:ffffac784408f738 EFLAGS: 00010202
[ 4382.556311] RAX: 0000000000000001 RBX: ffff8980673c3a48 RCX: 0000000000000001
[ 4382.556312] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
[ 4382.556312] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 4382.556313] R10: 0000000000000001 R11: ffff897f40000000 R12: 0000000000001000
[ 4382.556313] R13: 00000000c224f000 R14: ffff89805de9bd40 R15: ffff8980453f4548
[ 4382.556315] FS:  00007f5e759178c0(0000) GS:ffff89807b300000(0000) knlGS:0000000000000000
[ 4382.563130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4382.563562] CR2: 00007f2e9789fcbc CR3: 0000000120512001 CR4: 00000000003606e0
[ 4382.564005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4382.564451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4382.564887] Call Trace:
[ 4382.565343]  insert_inline_extent_backref+0x55/0xe0 [btrfs]
[ 4382.565796]  __btrfs_inc_extent_ref.isra.60+0x88/0x260 [btrfs]
[ 4382.566249]  ? __btrfs_run_delayed_refs+0x93/0x1650 [btrfs]
[ 4382.566702]  __btrfs_run_delayed_refs+0xa22/0x1650 [btrfs]
[ 4382.567162]  btrfs_run_delayed_refs+0x7e/0x1d0 [btrfs]
[ 4382.567623]  btrfs_commit_transaction+0x50/0x9c0 [btrfs]
[ 4382.568112]  ? _raw_spin_unlock+0x24/0x30
[ 4382.568557]  ? block_rsv_release_bytes+0x14e/0x410 [btrfs]
[ 4382.569006]  create_subvol+0x3c8/0x830 [btrfs]
[ 4382.569461]  ? btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.569906]  btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.570383]  ? rcu_sync_lockdep_assert+0xe/0x60
[ 4382.570822]  ? __sb_start_write+0xd4/0x1c0
[ 4382.571262]  ? mnt_want_write_file+0x24/0x50
[ 4382.571712]  btrfs_ioctl_snap_create_transid+0x117/0x1a0 [btrfs]
[ 4382.572155]  ? _copy_from_user+0x66/0x90
[ 4382.572602]  btrfs_ioctl_snap_create+0x66/0x80 [btrfs]
[ 4382.573052]  btrfs_ioctl+0x7c1/0x30e0 [btrfs]
[ 4382.573502]  ? mem_cgroup_commit_charge+0x8b/0x570
[ 4382.573946]  ? do_raw_spin_unlock+0x49/0xc0
[ 4382.574379]  ? _raw_spin_unlock+0x24/0x30
[ 4382.574803]  ? __handle_mm_fault+0xf29/0x12d0
[ 4382.575215]  ? do_vfs_ioctl+0xa2/0x6f0
[ 4382.575622]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[ 4382.576020]  do_vfs_ioctl+0xa2/0x6f0
[ 4382.576405]  ksys_ioctl+0x70/0x80
[ 4382.576776]  __x64_sys_ioctl+0x16/0x20
[ 4382.577137]  do_syscall_64+0x60/0x1b0
[ 4382.577488]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[ 4382.578837] RSP: 002b:00007ffe04bf64c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 4382.579174] RAX: ffffffffffffffda RBX: 00005564136f3050 RCX: 00007f5e74724dd7
[ 4382.579505] RDX: 00007ffe04bf64d0 RSI: 000000005000940e RDI: 0000000000000003
[ 4382.579848] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000044
[ 4382.580164] R10: 0000000000000541 R11: 0000000000000202 R12: 00005564136f3010
[ 4382.580477] R13: 0000000000000003 R14: 00005564136f3035 R15: 00005564136f3050
[ 4382.580792] irq event stamp: 0
[ 4382.581106] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[ 4382.581441] hardirqs last disabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.581772] softirqs last  enabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.582095] softirqs last disabled at (0): [<0000000000000000>]           (null)
[ 4382.582413] ---[ end trace d3c188e3e9367382 ]---
[ 4382.623855] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2981: errno=-5 IO failure
[ 4382.624295] BTRFS info (device sdc): forced readonly

Fix this by locking the source range before searching for the file extent
items in the fs tree, since the relocation process will try to lock the
range a file extent item represents before updating it with the new extent
location.

Fixes: 34a28e3d77 ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-09 14:52:25 +01:00
Filipe Manana f7fa1107f3 Btrfs: fix race between cloning range ending at eof and writeback
The recent rework that makes btrfs' remap_file_range operation use the
generic helper generic_remap_file_range_prep() introduced a race between
writeback and cloning a range that covers the eof extent of the source
file into a destination offset that is greater then the same file's size.

This happens because we now wait for writeback to complete before doing
the truncation of the eof block, while previously we did the truncation
and then waited for writeback to complete. This leads to a race between
writeback of the truncated block and cloning the file extents in the
source range, because we copy each file extent item we find in the fs
root into a buffer, then release the path and then increment the reference
count for the extent referred in that file extent item we copied, which
can no longer exist if writeback of the truncated eof block completes
after we copied the file extent item into the buffer and before we
incremented the reference count. This is illustrated by the following
diagram:

        CPU 1                                       CPU 2

  btrfs_clone_files()
    btrfs_cont_expand()
      btrfs_truncate_block()
         --> zeroes part of the
             page containg eof,
             marking it for
            delalloc

    btrfs_clone()
      --> finds extent item
          covering eof,
          points to extent
          at bytenr X
      --> copies it into a
          local buffer
      --> releases path

                                        writeback starts

                                        btrfs_finish_ordered_io()
                                          insert_reserved_file_extent()
                                            __btrfs_drop_extents()
                                              --> creates delayed
                                                  reference to drop
                                                  the extent at
                                                  bytenr X

      --> starts transaction
      --> creates delayed
          reference to
          increment extent
          at bytenr X

                    <delayed references are run, due to a transaction
                     commit for example, and the transaction is aborted
                     with -EIO because we attempt to increment reference
                     count for the extent at bytenr X after we freed it>

When this race is hit the running transaction ends up getting aborted with
an -EIO error and a trace like the following is produced:

[ 4382.553858] WARNING: CPU: 2 PID: 3648 at fs/btrfs/extent-tree.c:1552 lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556293] CPU: 2 PID: 3648 Comm: btrfs Tainted: G        W         4.20.0-rc6-btrfs-next-41 #1
[ 4382.556294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[ 4382.556308] RIP: 0010:lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
(...)
[ 4382.556310] RSP: 0018:ffffac784408f738 EFLAGS: 00010202
[ 4382.556311] RAX: 0000000000000001 RBX: ffff8980673c3a48 RCX: 0000000000000001
[ 4382.556312] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
[ 4382.556312] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 4382.556313] R10: 0000000000000001 R11: ffff897f40000000 R12: 0000000000001000
[ 4382.556313] R13: 00000000c224f000 R14: ffff89805de9bd40 R15: ffff8980453f4548
[ 4382.556315] FS:  00007f5e759178c0(0000) GS:ffff89807b300000(0000) knlGS:0000000000000000
[ 4382.563130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4382.563562] CR2: 00007f2e9789fcbc CR3: 0000000120512001 CR4: 00000000003606e0
[ 4382.564005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4382.564451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4382.564887] Call Trace:
[ 4382.565343]  insert_inline_extent_backref+0x55/0xe0 [btrfs]
[ 4382.565796]  __btrfs_inc_extent_ref.isra.60+0x88/0x260 [btrfs]
[ 4382.566249]  ? __btrfs_run_delayed_refs+0x93/0x1650 [btrfs]
[ 4382.566702]  __btrfs_run_delayed_refs+0xa22/0x1650 [btrfs]
[ 4382.567162]  btrfs_run_delayed_refs+0x7e/0x1d0 [btrfs]
[ 4382.567623]  btrfs_commit_transaction+0x50/0x9c0 [btrfs]
[ 4382.568112]  ? _raw_spin_unlock+0x24/0x30
[ 4382.568557]  ? block_rsv_release_bytes+0x14e/0x410 [btrfs]
[ 4382.569006]  create_subvol+0x3c8/0x830 [btrfs]
[ 4382.569461]  ? btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.569906]  btrfs_mksubvol+0x317/0x600 [btrfs]
[ 4382.570383]  ? rcu_sync_lockdep_assert+0xe/0x60
[ 4382.570822]  ? __sb_start_write+0xd4/0x1c0
[ 4382.571262]  ? mnt_want_write_file+0x24/0x50
[ 4382.571712]  btrfs_ioctl_snap_create_transid+0x117/0x1a0 [btrfs]
[ 4382.572155]  ? _copy_from_user+0x66/0x90
[ 4382.572602]  btrfs_ioctl_snap_create+0x66/0x80 [btrfs]
[ 4382.573052]  btrfs_ioctl+0x7c1/0x30e0 [btrfs]
[ 4382.573502]  ? mem_cgroup_commit_charge+0x8b/0x570
[ 4382.573946]  ? do_raw_spin_unlock+0x49/0xc0
[ 4382.574379]  ? _raw_spin_unlock+0x24/0x30
[ 4382.574803]  ? __handle_mm_fault+0xf29/0x12d0
[ 4382.575215]  ? do_vfs_ioctl+0xa2/0x6f0
[ 4382.575622]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[ 4382.576020]  do_vfs_ioctl+0xa2/0x6f0
[ 4382.576405]  ksys_ioctl+0x70/0x80
[ 4382.576776]  __x64_sys_ioctl+0x16/0x20
[ 4382.577137]  do_syscall_64+0x60/0x1b0
[ 4382.577488]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[ 4382.578837] RSP: 002b:00007ffe04bf64c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 4382.579174] RAX: ffffffffffffffda RBX: 00005564136f3050 RCX: 00007f5e74724dd7
[ 4382.579505] RDX: 00007ffe04bf64d0 RSI: 000000005000940e RDI: 0000000000000003
[ 4382.579848] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000044
[ 4382.580164] R10: 0000000000000541 R11: 0000000000000202 R12: 00005564136f3010
[ 4382.580477] R13: 0000000000000003 R14: 00005564136f3035 R15: 00005564136f3050
[ 4382.580792] irq event stamp: 0
[ 4382.581106] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[ 4382.581441] hardirqs last disabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.581772] softirqs last  enabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
[ 4382.582095] softirqs last disabled at (0): [<0000000000000000>]           (null)
[ 4382.582413] ---[ end trace d3c188e3e9367382 ]---
[ 4382.623855] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2981: errno=-5 IO failure
[ 4382.624295] BTRFS info (device sdc): forced readonly

Fix this by waiting for writeback to complete after truncating the eof
block.

Fixes: 34a28e3d77 ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-09 14:52:22 +01:00
Mike Kravetz e7c5809779 hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race"
This reverts c86aa7bbfd

The reverted commit caused ABBA deadlocks when file migration raced with
file eviction for specific hugetlbfs files.  This was discovered with a
modified version of the LTP move_pages12 test.

The purpose of the reverted patch was to close a long existing race
between hugetlbfs file truncation and page faults.  After more analysis
of the patch and impacted code, it was determined that i_mmap_rwsem can
not be used for all required synchronization.  Therefore, revert this
patch while working an another approach to the underlying issue.

Link: http://lkml.kernel.org/r/20190103235452.29335-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Souptick Joarder c64a2b0516 ceph: use vmf_error() in ceph_filemap_fault()
This code is converted to use vmf_error().

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-07 22:48:48 +01:00
Dongsheng Yang 02b2f549d5 libceph: allow setting abort_on_full for rbd
Introduce a new option abort_on_full, default to false. Then
we can get -ENOSPC when the pool is full, or reaches quota.

[ Don't show abort_on_full in /proc/mounts. ]

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-07 22:47:48 +01:00
Greg Kroah-Hartman de96e9fea7 sysfs: convert BUG_ON to WARN_ON
It's rude to crash the system just because the developer did something
wrong, as it prevents them from usually even seeing what went wrong.

So convert the few BUG_ON() calls that have snuck into the sysfs code
over the years to WARN_ON() to make it more "friendly".  All of these
are able to be recovered from, so it makes no sense to crash.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-07 08:53:32 +01:00
Linus Torvalds baa6707381 Add Adiantum support for fscrypt
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlwyBbEACgkQ8vlZVpUN
 gaNrawgAhYWrPwsEFM17dziRWRm8Ub9QgQUK6JRt+vE5KCRRVdXgJSLVH4esW9rJ
 X+QQ0diT8ZMKjdbsyz0cVmwP7nqQ5EKzjxts6J8vtbWDB6+nvaDLNdicJgUOprcT
 jIi8/45XKmyGUVO9Au6Wdda/zZi4dQBkXd+zUFGWYQRYL0LgmboWHKlaWueu7Qha
 xVtavYPSKUSMH8+r1F+HU6P41+1IBiuK4tCwfKfAqJ367Ushzk9xVKHNGrGDAQNi
 BTbn4NOOFaYvmVudJbQjD3tHtuQu2JsxlclB5KAtLBm1r3+vb3fMGsNyPBUmNp6Y
 YE/xKhACP4kYlk9xCG7vWcWGyTu90g==
 =HR7f
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Add Adiantum support for fscrypt"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: add Adiantum support
2019-01-06 12:21:11 -08:00
Linus Torvalds 215240462a Fix a number of ext4 bugs.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlwyAi0ACgkQ8vlZVpUN
 gaMtEgf9GyIJ5UnbR3J5+tpOAjx2GFJmTgpinWcfqVBrWwQDTigxiLm5sRIz5ToY
 hoWvCIxWm9LgrdS0unYOUNzRyzSZisNAtceowbPErlV5NPS9zcftVt4pRYZ6hIZK
 L3wKQ/PdVxIaekP9SXvFx5tfnHSB6CTGOJu1YMlF8ERm4tXUXHIwHzwUwrYqPYN0
 5i0uWxbx7qMlEzTf/9sEMYrmdHjsrPlXe0kIP0sMyd7hJl28l0QTNQ2s126fRNLK
 wkVMduacGuFGLwqbh7O1QrayWtcni7PKgTW9MfTsjLbg/EWx77auZBTSLfEO+ZKq
 2gxxCbM0sID5sgVaw6ku8QJkfiU2fw==
 =aQSK
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Ted Ts'o:
 "Fix a number of ext4 bugs"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix special inode number checks in __ext4_iget()
  ext4: track writeback errors using the generic tracking infrastructure
  ext4: use ext4_write_inode() when fsyncing w/o a journal
  ext4: avoid kernel warning when writing the superblock to a dead device
  ext4: fix a potential fiemap/page fault deadlock w/ inline_data
  ext4: make sure enough credits are reserved for dioread_nolock writes
2019-01-06 12:19:23 -08:00
Eric Biggers 8094c3ceb2 fscrypt: add Adiantum support
Add support for the Adiantum encryption mode to fscrypt.  Adiantum is a
tweakable, length-preserving encryption mode with security provably
reducible to that of XChaCha12 and AES-256, subject to a security bound.
It's also a true wide-block mode, unlike XTS.  See the paper
"Adiantum: length-preserving encryption for entry-level processors"
(https://eprint.iacr.org/2018/720.pdf) for more details.  Also see
commit 059c2a4d8e ("crypto: adiantum - add Adiantum support").

On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and
the NH hash function.  These algorithms are fast even on processors
without dedicated crypto instructions.  Adiantum makes it feasible to
enable storage encryption on low-end mobile devices that lack AES
instructions; currently such devices are unencrypted.  On ARM Cortex-A7,
on 4096-byte messages Adiantum encryption is about 4 times faster than
AES-256-XTS encryption; decryption is about 5 times faster.

In fscrypt, Adiantum is suitable for encrypting both file contents and
names.  With filenames, it fixes a known weakness: when two filenames in
a directory share a common prefix of >= 16 bytes, with CTS-CBC their
encrypted filenames share a common prefix too, leaking information.
Adiantum does not have this problem.

Since Adiantum also accepts long tweaks (IVs), it's also safe to use the
master key directly for Adiantum encryption rather than deriving
per-file keys, provided that the per-file nonce is included in the IVs
and the master key isn't used for any other encryption mode.  This
configuration saves memory and improves performance.  A new fscrypt
policy flag is added to allow users to opt-in to this configuration.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-01-06 08:36:21 -05:00
Linus Torvalds 7e928df80d three fixes, one for stable, one adds the (most secure) SMB3.1.1 dialect to default list requested
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlwwD7EACgkQiiy9cAdy
 T1HVLwwAkjsxPoGSZD1S+corFexh50UZvEQXbEeuIa70h7nVFxGuONOB3TiOebWm
 xz//s6twlduOv+l82riV41W4iEcLT36If7lEl7JX1wVycY6MIm0KwzUqJKnDw60p
 zxNEkk6sQ+1wZ8fj6jonDsdtctGoGhNoWkPybyaYkersQFRBFSkPKnpvWF9bXRz0
 uqioONLpE2rajUJHv5gLE3dswTBvE2SGvCQfy0ANp3ZBVY5JGjyWlUQqC0riiemY
 P541VbJlXmcLeU5COg7Wz45BRlHHYxJVAKc74owS823kMUfHW2Zyae4gWmOhKT0S
 86CEYoZR+7nyegggNJr/sIC2ISftz1cr0q9VWeqXJSesNjGJFAbbqdh7qBf1sUMp
 SseUPwlhhe6LnfYt5scwj49V/BZQqPcxThBNNXfFQayCWSNUuIPSBQ+euGeGaVJf
 j1hhUq28DhbuGcz1464JnI06MCEpSyMAsKHNToK+EZLBH6lu6UcgcA2NHyF8mp/L
 IPyJhjTe
 =SZtN
 -----END PGP SIGNATURE-----

Merge tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
 "Three fixes, one for stable, one adds the (most secure) SMB3.1.1
  dialect to default list requested"

* tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: add smb3.1.1 to default dialect list
  cifs: fix confusing warning message on reconnect
  smb3: fix large reads on encrypted connections
2019-01-05 14:05:06 -08:00
Linus Torvalds acda9efa8c Changes since last update:
- Remove a couple of unnecessary local variables.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlwnwYcACgkQ+H93GTRK
 tOu3NQ/9Gc4+xy2zZMAv1O6E3O1NctQgi8LxvzXcpgAcjopYoFXdCfBDrA2hBFG7
 bNjzcfC4iw66tzq7Fv7jC00st3R9QOqLrjdpoeSjf07vC4kqG8pOyHTnaQCHhP6j
 gqDthDZQb5WhsAT0ytzWxzR0orA2dhCn4UDL8LpYMfVljkexqHUUXHqMCNMr6bxK
 q6ICsBiqbiR/xD/GHgzGlsAm8jC9oajm9dSuRP7e0pPGtRz8SrykotI0SdG8w/+d
 ujT9Q0PDGGXTSN58o+RDH16XTRVTTQCY8JY0X/sF4n6nH0wjUwqja50NipOzjGXD
 nA+e0GEPeFYFo2mrmkQDqvPg2AkCHoEwtw0Pt+Kkqiu0/a3etHpCYEDsJfvW6CAT
 Wrz1dAb4mCIAUmK0VkL0aIBiS6IDAwuWQ/CMIJA2AkVs0qZAzMTURLKEa36+qAEr
 gjERnoQst1s/mrrTCfdC+1k+yBI/RrpEhoyV1I9S48SkiXHYlCcub/oLAyc7Pb5L
 aeLrsRz/n6kGZJBM3tvW5kJglDHBcRjqrV+JinVt0zOOQHGZem65qPTxfdmwUFXo
 46ZsEoJxw7eq4V/hXKAzmnmlRFlTbwcoouhimZUqjjiIQRmZ7GDQdXDYwLoT5pTv
 wHZKmynAuVIBzkLcoyKdGPZ4znRftPxbI+NI9CqCzwuS+VHfBIc=
 =gXVU
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixlets from Darrick Wong:
 "Remove a couple of unnecessary local variables"

* tag 'xfs-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: xfs_fsops: drop useless LIST_HEAD
  xfs: xfs_buf: drop useless LIST_HEAD
2019-01-05 14:00:56 -08:00
Linus Torvalds c7eaf342ec A fairly quiet round: a couple of messenger performance improvements
from myself and a few cap handling fixes from Zheng.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAlwuI7ATHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzizcvB/9GqpAzR+Yy1iIQGNeijPSeuXsrlcQF
 WErfaG8tUwZY3vqv3+OSZBwuMgq6wAyCo3wJmh0GCZoy02WLJbPB/G8AiHtoZUAh
 wAWfL8feZkzx3L7JV0OrPG0GGYkhKu5PebM4rq3cXvlL0OiTKPs8bmbTvh0mSv3z
 gH1odW0j2mAb1/3tqm9M5+7XhrGSnmSfA028NeKx6I4nE0ONd9BEcHZDoRBBQeNf
 tgyxH4IJuuQ+x4/FKIn6+hBbMYiVrTBlz4wQHrJvvzDUeCkWu+E8JZ4utxxNdfmS
 uGsPDRqi4LSMwt1q0HLHhkCP0lg5yf9NByGoy+VH5/gS8ma6be9+IbfX
 =puaN
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A fairly quiet round: a couple of messenger performance improvements
  from myself and a few cap handling fixes from Zheng"

* tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client:
  ceph: don't encode inode pathes into reconnect message
  ceph: update wanted caps after resuming stale session
  ceph: skip updating 'wanted' caps if caps are already issued
  ceph: don't request excl caps when mount is readonly
  ceph: don't update importing cap's mseq when handing cap export
  libceph: switch more to bool in ceph_tcp_sendmsg()
  libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage()
  libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage()
  libceph: drop last_piece logic from write_partial_message_data()
  ceph: remove redundant assignment
  ceph: cleanup splice_dentry()
2019-01-05 13:58:08 -08:00
Linus Torvalds 505b050fdf Merge branch 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount API prep from Al Viro:
 "Mount API prereqs.

  Mostly that's LSM mount options cleanups. There are several minor
  fixes in there, but nothing earth-shattering (leaks on failure exits,
  mostly)"

* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
  mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
  smack: rewrite smack_sb_eat_lsm_opts()
  smack: get rid of match_token()
  smack: take the guts of smack_parse_opts_str() into a new helper
  LSM: new method: ->sb_add_mnt_opt()
  selinux: rewrite selinux_sb_eat_lsm_opts()
  selinux: regularize Opt_... names a bit
  selinux: switch away from match_token()
  selinux: new helper - selinux_add_opt()
  LSM: bury struct security_mnt_opts
  smack: switch to private smack_mnt_opts
  selinux: switch to private struct selinux_mnt_opts
  LSM: hide struct security_mnt_opts from any generic code
  selinux: kill selinux_sb_get_mnt_opts()
  LSM: turn sb_eat_lsm_opts() into a method
  nfs_remount(): don't leak, don't ignore LSM options quietly
  btrfs: sanitize security_mnt_opts use
  selinux; don't open-code a loop in sb_finish_set_opts()
  LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
  new helper: security_sb_eat_lsm_opts()
  ...
2019-01-05 13:25:58 -08:00
Linus Torvalds 9b286efeb5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull trivial vfs updates from Al Viro:
 "A few cleanups + Neil's namespace_unlock() optimization"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  exec: make prepare_bprm_creds static
  genheaders: %-<width>s had been there since v6; %-*s - since v7
  VFS: use synchronize_rcu_expedited() in namespace_unlock()
  iov_iter: reduce code duplication
2019-01-05 13:18:59 -08:00
Linus Torvalds a65981109f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - procfs updates

 - various misc bits

 - lib/ updates

 - epoll updates

 - autofs

 - fatfs

 - a few more MM bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
  mm/page_io.c: fix polled swap page in
  checkpatch: add Co-developed-by to signature tags
  docs: fix Co-Developed-by docs
  drivers/base/platform.c: kmemleak ignore a known leak
  fs: don't open code lru_to_page()
  fs/: remove caller signal_pending branch predictions
  mm/: remove caller signal_pending branch predictions
  arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
  kernel/sched/: remove caller signal_pending branch predictions
  kernel/locking/mutex.c: remove caller signal_pending branch predictions
  mm: select HAVE_MOVE_PMD on x86 for faster mremap
  mm: speed up mremap by 20x on large regions
  mm: treewide: remove unused address argument from pte_alloc functions
  initramfs: cleanup incomplete rootfs
  scripts/gdb: fix lx-version string output
  kernel/kcov.c: mark write_comp_data() as notrace
  kernel/sysctl: add panic_print into sysctl
  panic: add options to print system info when panic happens
  bfs: extra sanity checking and static inode bitmap
  exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
  ...
2019-01-05 09:16:18 -08:00
Nikolay Borisov f86196ea87 fs: don't open code lru_to_page()
Multiple filesystems open code lru_to_page().  Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.

No functional changes.

Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com>		[ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Davidlohr Bueso 08d405c8b8 fs/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.

[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Tigran Aivazian d187715589 bfs: extra sanity checking and static inode bitmap
Strengthen validation of BFS superblock against corruption.  Make
in-core inode bitmap static part of superblock info structure.  Print a
warning when mounting a BFS filesystem created with "-N 512" option as
only 510 files can be created in the root directory.  Make the kernel
messages more uniform.  Update the 'prefix' passed to bfs_dump_imap() to
match the current naming of operations.  White space and comments
cleanup.

Link: http://lkml.kernel.org/r/CAK+_RLkFZMduoQF36wZFd3zLi-6ZutWKsydjeHFNdtRvZZEb4w@mail.gmail.com
Signed-off-by: Tigran Aivazian <aivazian.tigran@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Oleg Nesterov 655c16a8ce exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the
"extra" size for argv/envp pointers every time, this is a bit ugly and
even not strictly correct: acct_arg_size() must not account this size.

Remove all the rlimit code in get_arg_page().  Instead, add bprm->argmin
calculated once at the start of __do_execve_file() and change
copy_strings to check bprm->p >= bprm->argmin.

The patch adds the new helper, prepare_arg_pages() which initializes
bprm->argc/envc and bprm->argmin.

[oleg@redhat.com: fix !CONFIG_MMU version of get_arg_page()]
  Link: http://lkml.kernel.org/r/20181126122307.GA1660@redhat.com
[akpm@linux-foundation.org: use max_t]
Link: http://lkml.kernel.org/r/20181112160910.GA28440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Oleg Nesterov 8099b047ec exec: load_script: don't blindly truncate shebang string
load_script() simply truncates bprm->buf and this is very wrong if the
length of shebang string exceeds BINPRM_BUF_SIZE-2.  This can silently
truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
happens to be the valid executable path.

Change load_script() to return ENOEXEC if it can't find '\n' or zero in
bprm->buf.  Note that '\0' can come from either
prepare_binprm()->memset() or from kernel_read(), we do not care.

Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Carmeli Tamir 306790f75a fat: new inline functions to determine the FAT variant (32, 16 or 12)
This patch introduces 3 new inline functions - is_fat12, is_fat16 and
is_fat32, and replaces every occurrence in the code in which the FS
variant (whether this is FAT12, FAT16 or FAT32) was previously checked
using msdos_sb_info->fat_bits.

Link: http://lkml.kernel.org/r/1544990640-11604-4-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Carmeli Tamir d19dc01618 fat: move MAX_FAT to fat.h and change it to inline function
MAX_FAT is useless in msdos_fs.h, since it uses the MSDOS_SB function
that is defined in fat.h.  So really, this macro can be only called from
code that already includes fat.h.

Hence, this patch moves it to fat.h, right after MSDOS_SB is defined.  I
also changed it to an inline function in order to save the double call
to MSDOS_SB.  This was suggested by joe@perches.com in the previous
version.

This patch is required for the next in the series, in which the variant
(whether this is FAT12, FAT16 or FAT32) checks are replaced with new
macros.

Link: http://lkml.kernel.org/r/1544990640-11604-3-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Carmeli Tamir b553337a57 fat: remove FAT_FIRST_ENT macro
The comment edited in this patch was the only reference to the
FAT_FIRST_ENT macro, which is not used anymore.  Moreover, the commented
line of code does not compile with the current code.

Since the FAT_FIRST_ENT macro checks the FAT variant in a way that the
patch series changes, I removed it, and instead wrote a clear
explanation of what was checked.

I verified that the changed comment is correct according to Microsoft
FAT spec, search for "BPB_Media" in the following references:

1. Microsoft FAT specification 2005
(http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf).
Search for 'volume label'.
2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification
(https://staff.washington.edu/dittrich/misc/fatgen103.pdf).
Search for 'volume label'.

Link: http://lkml.kernel.org/r/1544990640-11604-2-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Ernesto A. Fernández f93ca1ed9b hfsplus: return file attributes on statx
The immutable, append-only and no-dump attributes can only be retrieved
with an ioctl; implement the ->getattr() method to return them on statx.
Do not return the inode birthtime yet, because the issue of how best to
handle the post-2038 timestamps is still under discussion.

This patch is needed to pass xfstests generic/424.

Link: http://lkml.kernel.org/r/20181014163558.sxorxlzjqccq2lpw@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Ian Kent f5162216b7 autofs: add strictexpire mount option
Commit 092a53452b ("autofs: take more care to not update last_used on
path walk") helped to (partially) resolve a problem where automounts
were not expiring due to aggressive accesses from user space.

This patch was later reverted because, for very large environments, it
meant more mount requests from clients and when there are a lot of
clients this caused a fairly significant increase in server load.

But there is a need for both types of expire check, depending on use
case, so add a mount option to allow for strict update of last use of
autofs dentrys (which just means not updating the last use on path walk
access).

Link: http://lkml.kernel.org/r/154296973880.9889.14085372741514507967.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Ian Kent 9d8719a42e autofs: change catatonic setting to a bit flag
Change the superblock info.  catatonic setting to be part of a flags bit
field.

Link: http://lkml.kernel.org/r/154296973142.9889.17275721668508589639.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Ian Kent 9bf964c9ce autofs: simplify parse_options() function call
The parse_options() function uses a long list of parameters, most of
which are present in the super block info structure already.

The mount parameters set in parse_options() options don't require
cleanup so using the super block info struct directly is simpler.

Link: http://lkml.kernel.org/r/154296972423.9889.9368859245676473329.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Ian Kent 55f0d8205d autofs: improve ioctl sbi checks
Al Viro made some suggestions to improve the implementation of commit
0633da48f0 ("fix autofs_sbi() does not check super block type").

The check is unnecessary in all cases except for ioctl usage so placing
the check in the super block accessor function adds a small overhead to
the common case where it isn't needed.

So it's sufficient to do this in the ioctl code only.

Also the check in the ioctl code is needlessly complex.

[akpm@linux-foundation.org: declare autofs_fs_type in .h, not .c]
Link: http://lkml.kernel.org/r/154296970987.9889.1597442413573683096.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 86c051793b fs/epoll: deal with wait_queue only once
There is no reason why we rearm the waitiqueue upon every fetch_events
retry (for when events are found yet send_events() fails).  If nothing
else, this saves four lock operations per retry, and furthermore reduces
the scope of the lock even further.

[akpm@linux-foundation.org: restore code to original position, fix and reflow comment]
Link: http://lkml.kernel.org/r/20181114182532.27981-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 35cff1a6e0 fs/epoll: rename check_events label to send_events
It is currently called check_events because it, well, did exactly that.
However, since the lockless ep_events_available() call, the label no
longer checks, but just sends the events.  Rename as such.

Link: http://lkml.kernel.org/r/20181114182532.27981-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso abc610e01c fs/epoll: avoid barrier after an epoll_wait(2) timeout
Upon timeout, we can just exit out of the loop, without the cost of the
changing the task's state with an smp_store_mb call.  Just exit out of
the loop and be done - setting the task state afterwards will be, of
course, redundant.

[dave@stgolabs.net: forgotten fixlets]
  Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso c5a282e963 fs/epoll: reduce the scope of wq lock in epoll_wait()
This patch aims at reducing ep wq.lock hold times in epoll_wait(2).  For
the blocking case, there is no need to constantly take and drop the
spinlock, which is only needed to manipulate the waitqueue.

The call to ep_events_available() is now lockless, and only exposed to
benign races.  Here, if false positive (returns available events and
does not see another thread deleting an epi from the list) we call into
send_events and then the list's state is correctly seen.  Otoh, if a
false negative and we don't see a list_add_tail(), for example, from irq
callback, then it is rechecked again before blocking, which will see the
correct state.

In order for more accuracy to see concurrent list_del_init(), use the
list_empty_careful() variant -- of course, this won't be safe against
insertions from wakeup.

For the overflow list we obviously need to prevent load/store tearing as
we don't want to see partial values while the ready list is disabled.

[dave@stgolabs.net: forgotten fixlets]
  Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-6-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 21877e1a5b fs/epoll: robustify ep->mtx held checks
Insted of just commenting how important it is, lets make it more robust
and add a lockdep_assert_held() call.

Link: http://lkml.kernel.org/r/20181108051006.18751-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 76699a67f3 fs/epoll: drop ovflist branch prediction
The ep->ovflist is a secondary ready-list to temporarily store events
that might occur when doing sproc without holding the ep->wq.lock.  This
accounts for every time we check for ready events and also send events
back to userspace; both callbacks, particularly the latter because of
copy_to_user, can account for a non-trivial time.

As such, the unlikely() check to see if the pointer is being used, seems
both misleading and sub-optimal.  In fact, we go to an awful lot of
trouble to sync both lists, and populating the ovflist is far from an
uncommon scenario.

For example, profiling a concurrent epoll_wait(2) benchmark, with
CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
incorrect rate was seen; and when incrementally increasing the number of
epoll instances (which is used, for example for multiple queuing load
balancing models), up to a 90% incorrect rate was seen.

Similarly, by deleting the prediction, 3% throughput boost was seen
across incremental threads.

Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 4e0982a005 fs/epoll: simplify ep_send_events_proc() ready-list loop
The current logic is a bit convoluted.  Lets simplify this with a
standard list_for_each_entry_safe() loop instead and just break out
after maxevents is reached.

While at it, remove an unnecessary indentation level in the loop when
there are in fact ready events.

Link: http://lkml.kernel.org/r/20181108051006.18751-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Davidlohr Bueso 74bdc12985 fs/epoll: remove max_nests argument from ep_call_nested()
Patch series "epoll: some miscellaneous optimizations".

The following are some incremental optimizations on some of the epoll
core.  Each patch has the details, but together, the series is seen to
shave off measurable cycles on a number of systems and workloads.

For example, on a 40-core IB, a pipetest as well as parallel
epoll_wait() benchmark show around a 20-30% increase in raw operations
per second when the box is fully occupied (incremental thread counts),
and up to 15% performance improvement with lower counts.

Passes ltp epoll related testcases.

This patch(of 6):

All callers pass the EP_MAX_NESTS constant already, so lets simplify
this a tad and get rid of the redundant parameter for nested eventpolls.

Link: http://lkml.kernel.org/r/20181108051006.18751-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Alexey Dobriyan afe922c2da fs/proc/base.c: slightly faster /proc/*/limits
Header of /proc/*/limits is a fixed string, so print it directly without
formatting specifiers.

Link: http://lkml.kernel.org/r/20181203164242.GB6904@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Alexey Dobriyan 230f72e9f6 fs/proc/inode.c: delete unnecessary variable in proc_alloc_inode()
Link: http://lkml.kernel.org/r/20181203164015.GA6904@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Eric Biggers 81966d8349 fs/proc/util.c: include fs/proc/internal.h for name_to_int()
name_to_int() is defined in fs/proc/util.c and declared in
fs/proc/internal.h, but the declaration isn't included at the point of
the definition.  Include the header to enforce that the definition
matches the declaration.

This addresses a gcc warning when -Wmissing-prototypes is enabled.

Link: http://lkml.kernel.org/r/20181115001833.49371-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Benjamin Gordon 8da0b4f692 fs/proc/base.c: use ns_capable instead of capable for timerslack_ns
Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
in its effective capability set, but the current check looks in the root
namespace instead of the process' user namespace.  Since a process is
allowed to do other activities controlled by CAP_SYS_NICE inside a
namespace, it should also be able to adjust timerslack_ns.

Link: http://lkml.kernel.org/r/20181030180012.232896-1-bmgordon@google.com
Signed-off-by: Benjamin Gordon <bmgordon@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Colin Cross <ccross@android.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Al Viro e4f2283cc6 Merge branches 'misc.misc' and 'work.iov_iter' into for-linus 2019-01-04 14:02:59 -05:00
Linus Torvalds 96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Linus Torvalds 135143b2ca File locking bugfix for v4.21
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcLfFMAAoJEAAOaEEZVoIVjFAP/33l7xDEbrYRMUuzjcGQEHW0
 aOblkkt+Mo8yuSEgsy0yOcfi0NXuPMdVSEp5nqgwI9YsSULhpDEai/ogP86kb2f7
 xdo9H43cPMZS+pDx/avRyICjBYGZE4qwpSEs0d2zSHNT0dCX5Asuughb0tRtfOwo
 TIUw+8xz3lrUf0+4XAgFHOZ1/1vvC/hlXmHL61e9ZBijafb7t0qbIbIkbmnQp4vN
 0njRpcRck1i1o6FhsVcC8mm+gL/Fu4uXbK3CtkInzMom6M3ecm2FU/Es3yL5qzdV
 UMBwU+G+RLAZnkZ3dGUv89gwQIDaZk43g/lQJLTGv46eY8ghx4kBeT78Iaooav/o
 Ml0ormpc71/iMHRQ8iMXMkhSdx8uKhp/TL+M8nx1UM3E/VloQj8SNMJi2jhnQA5h
 mkNlZGSIERgr8XZF5uRHQa/XFQb5FUAMlfa4wE4sTC4urkAOwlZDQ+qENFEuikkP
 EjNquVjCtLp8BW58WNXyaIv1gc/6A0ZIo2v/TXxcXM6hiH+U4GW72FeDXSDkPmqN
 1rGgMI6blKpE4JmHSJjaHJP2Lo8TsyX6k64z3cV0fPigmAAAQQC+Mhtl8GkL670k
 L4sRxXh0HRR2H7HLrW/nbnbL4A3F8U7a1QmpXu4Be/FetUSbRTx3FOiiS6eyIXWu
 dPoocH9cfRcGXQ+gYg8Z
 =MagU
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking bugfix from Jeff Layton:
 "This is a one-line fix for a bug that syzbot turned up in the new
  patches to mitigate the thundering herd when a lock is released"

* tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: fix error in locks_move_blocks()
2019-01-03 14:33:46 -08:00
Steve French d5c7076b77 smb3: add smb3.1.1 to default dialect list
SMB3.1.1 dialect has additional security (among other) features
and should be requested when mounting to modern servers so it
can be used if the server supports it.

Add SMB3.1.1 to the default list of dialects requested.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-03 14:45:58 -06:00
Steve French 55a7f00655 cifs: fix confusing warning message on reconnect
When DFS is not used on the mount we should not be mentioning
DFS in the warning message on reconnect (it could be confusing).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-02 23:03:56 -06:00
Paul Aurich 6d2f84eee0 smb3: fix large reads on encrypted connections
When passing a large read to receive_encrypted_read(), ensure that the
demultiplex_thread knows that a MID was processed.  Without this, those
operations never complete.

This is a similar issue/fix to lease break handling:
commit 7af929d6d0
("smb3: fix lease break problem introduced by compounding")

CC: Stable <stable@vger.kernel.org> # 4.19+
Fixes: b24df3e30c ("cifs: update receive_encrypted_standard to handle compounded responses")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Tested-by: Yves-Alexis Perez <corsac@corsac.net>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-01-02 23:03:56 -06:00
NeilBrown bf77ae4c98 locks: fix error in locks_move_blocks()
After moving all requests from
   fl->fl_blocked_requests
to
   new->fl_blocked_requests

it is nonsensical to do anything to all the remaining elements, there
aren't any.  This should do something to all the requests that have been
moved. For simplicity, it does it to all requests in the target list.

Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
is "obviously correct" as it preserves the invariant of the linkage
among requests.

Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
Fixes: 5946c4319e ("fs/locks: allow a lock request to block other requests.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-01-02 20:14:50 -05:00
Linus Torvalds e6b9257280 NFS client updates for Linux 4.21
Note that there is a conflict with the rdma tree in this pull request, since
 we delete a file that has been changed in the rdma tree.  Hopefully that's
 easy enough to resolve!
 
 We also were unable to track down a maintainer for Neil Brown's changes to
 the generic cred code that are prerequisites to his RPC cred cleanup patches.
 We've been asking around for several months without any response, so
 hopefully it's okay to include those patches in this pull request.
 
 Stable bugfixes:
 - xprtrdma: Yet another double DMA-unmap # v4.20
 
 Features:
 - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
 - Per-xprt rdma receive workqueues
 - Drop support for FMR memory registration
 - Make port= mount option optional for RDMA mounts
 
 Other bugfixes and cleanups:
 - Remove unused nfs4_xdev_fs_type declaration
 - Fix comments for behavior that has changed
 - Remove generic RPC credentials by switching to 'struct cred'
 - Fix crossing mountpoints with different auth flavors
 - Various xprtrdma fixes from testing and auditing the close code
 - Fixes for disconnect issues when using xprtrdma with krb5
 - Clean up and improve xprtrdma trace points
 - Fix NFS v4.2 async copy reboot recovery
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlwtO50ACgkQ18tUv7Cl
 QOtZWQ//e5Hhp2TnQZ6U+99YKedjwBHP6psH3GKSEdeHSNdlSpZ5ckgHxvMb9TBa
 6t4ecgv5P/uYLIePQ0u2ubUFc9+TlyGi7Iacx13/YhK7kihGHDPnZhfl0QbYixV7
 rwa9bFcKmOrXs8ld+Hw3P2UL22G1gMf/LHDhPNshbW7LFZmcshKz+mKTk70kwkq9
 v7tFC59p6GwV8Sr2YI2NXn2fOWsUS00sQfgj2jceJYJ8PsNa+wHYF4wPj2IY5NsE
 D5Oq2kLPbytBhCllOHgopNZaf4qb5BfqhVETyc1O+kDF3BZKUhQ1PoDi2FPinaHM
 5/d8hS+5fr3eMBsQrPWQLXYjWQFUXnkQQJvU3Bo52AIgomsk/8uBq3FvH7XmFcBd
 C8sgnuUAkAS8feMes8GCS50BTxclnGuYGdyFJyCRXoG9Kn9rMrw9EKitky6EVq0v
 NmXhW79jK84a3yDXVlAIpZ8Y9BU/HQ3GviGX8lQEdZU9YiYRzDIHvpMFwzMgqaBi
 XvLbr8PlLOm8GZokThS8QYT/G2Wu6IwfUq/AufVjVD4+HiL3duKKfWSGAvcm6aAa
 GoRF6UG+OmjWlzKojtRc1dI+sy22Fzh+DW+Mx6tuf/b/66wkmYnW7eKcV4rt6Tm5
 /JEhvTMo9q7elL/4FgCoMCcdoc5eXqQyXRXrQiOU7YHLzn2aWU0=
 =DvVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - xprtrdma: Yet another double DMA-unmap # v4.20

  Features:
   - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
   - Per-xprt rdma receive workqueues
   - Drop support for FMR memory registration
   - Make port= mount option optional for RDMA mounts

  Other bugfixes and cleanups:
   - Remove unused nfs4_xdev_fs_type declaration
   - Fix comments for behavior that has changed
   - Remove generic RPC credentials by switching to 'struct cred'
   - Fix crossing mountpoints with different auth flavors
   - Various xprtrdma fixes from testing and auditing the close code
   - Fixes for disconnect issues when using xprtrdma with krb5
   - Clean up and improve xprtrdma trace points
   - Fix NFS v4.2 async copy reboot recovery"

* tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
  sunrpc: convert to DEFINE_SHOW_ATTRIBUTE
  sunrpc: Add xprt after nfs4_test_session_trunk()
  sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS
  sunrpc: handle ENOMEM in rpcb_getport_async
  NFS: remove unnecessary test for IS_ERR(cred)
  xprtrdma: Prevent leak of rpcrdma_rep objects
  NFSv4.2 fix async copy reboot recovery
  xprtrdma: Don't leak freed MRs
  xprtrdma: Add documenting comment for rpcrdma_buffer_destroy
  xprtrdma: Replace outdated comment for rpcrdma_ep_post
  xprtrdma: Update comments in frwr_op_send
  SUNRPC: Fix some kernel doc complaints
  SUNRPC: Simplify defining common RPC trace events
  NFS: Fix NFSv4 symbolic trace point output
  xprtrdma: Trace mapping, alloc, and dereg failures
  xprtrdma: Add trace points for calls to transport switch methods
  xprtrdma: Relocate the xprtrdma_mr_map trace points
  xprtrdma: Clean up of xprtrdma chunk trace points
  xprtrdma: Remove unused fields from rpcrdma_ia
  xprtrdma: Cull dprintk() call sites
  ...
2019-01-02 16:35:23 -08:00
Linus Torvalds e45428a436 Thanks to Vasily Averin for fixing a use-after-free in the containerized
NFSv4.2 client, and cleaning up some convoluted backchannel server code
 in the process.  Otherwise, miscellaneous smaller bugfixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcLR3nAAoJECebzXlCjuG+oyAQALrPSTH9Qg2AwP2eGm+AevUj
 u/VFmimImIO9dYuT02t4w42w4qMIQ0/Y7R0UjT3DxG5Oixy/zA+ZaNXCCEKwSMIX
 abGF4YalUISbDc6n0Z8J14/T33wDGslhy3IQ9Jz5aBCDCocbWlzXvFlmrowbb3ak
 vtB0Fc3Xo6Z/Pu2GzNzlqR+f69IAmwQGJrRrAEp3JUWSIBKiSWBXTujDuVBqJNYj
 ySLzbzyAc7qJfI76K635XziULR2ueM3y5JbPX7kTZ0l3OJ6Yc0PtOj16sIv5o0XK
 DBYPrtvw3ZbxQE/bXqtJV9Zn6MG5ODGxKszG1zT1J3dzotc9l/LgmcAY8xVSaO+H
 QNMdU9QuwmyUG20A9rMoo/XfUb5KZBHzH7HIYOmkfBidcaygwIInIKoIDtzimm4X
 OlYq3TL/3QDY6rgTCZv6n2KEnwiIDpc5+TvFhXRWclMOJMcMSHJfKFvqERSv9V3o
 90qrCebPA0K8Dnc0HMxcBXZ+0TqZ2QeXp/wfIjibCXqMwlg+BZhmbeA0ngZ7x7qf
 2F33E9bfVJjL+VI5FcVYQf43bOTWZgD6ZKGk4T7keYl0CPH+9P70bfhl4KKy9dqc
 GwYooy/y5FPb2CvJn/EETeILRJ9OyIHUrw7HBkpz9N8n9z+V6Qbp9yW7LKgaMphW
 1T+GpHZhQjwuBPuJhDK0
 =dRLp
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Thanks to Vasily Averin for fixing a use-after-free in the
  containerized NFSv4.2 client, and cleaning up some convoluted
  backchannel server code in the process.

  Otherwise, miscellaneous smaller bugfixes and cleanup"

* tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits)
  nfs: fixed broken compilation in nfs_callback_up_net()
  nfs: minor typo in nfs4_callback_up_net()
  sunrpc: fix debug message in svc_create_xprt()
  sunrpc: make visible processing error in bc_svc_process()
  sunrpc: remove unused xpo_prep_reply_hdr callback
  sunrpc: remove svc_rdma_bc_class
  sunrpc: remove svc_tcp_bc_class
  sunrpc: remove unused bc_up operation from rpc_xprt_ops
  sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
  sunrpc: use-after-free in svc_process_common()
  sunrpc: use SVC_NET() in svcauth_gss_* functions
  nfsd: drop useless LIST_HEAD
  lockd: Show pid of lockd for remote locks
  NFSD remove OP_CACHEME from 4.2 op_flags
  nfsd: Return EPERM, not EACCES, in some SETATTR cases
  sunrpc: fix cache_head leak due to queued request
  nfsd: clean up indentation, increase indentation in switch statement
  svcrdma: Optimize the logic that selects the R_key to invalidate
  nfsd: fix a warning in __cld_pipe_upcall()
  nfsd4: fix crash on writing v4_end_grace before nfsd startup
  ...
2019-01-02 16:21:50 -08:00